Greetings. There is a task:

Create a text processing program for a textbook on programming using the classes: Symbol, Word, Sentence, Punctuation, etc.

Interested in such a thing. I get the text from the file, break it, respectively, into the classes Symbol, Word, Sentence, Punctuation. How then to put these objects in the sheet, but so that they correspond to the order that was from the source file. The idea of ​​creating a sheet came to mind, where there would be a conditional identifier for each class, like w, s, c, se, etc., and accordingly write to this in a new file. Share your thoughts.

    4 answers 4

    My ideas are such that you have a class Sentence , this class includes a word sheet, the word includes a sheet of a symbol, and a sheet of a punctuation mark . Those. logically everything is as in life: it is in this, and then in that, namely in this case - the character in the word, with the word punctuation mark (after the word), and the word in the sentence. And you will have an ordered list of words in the sentence, which in turn can be displayed on the screen or in a file, say, the 'print ()' method. Further, if the book is quite large, it is better to break it into paragraphs and listings, and execute everything in turn. Those. introduce the Paragraph class and the Listing class. Or, if the task is not particularly important to make difficult, then just the Paragraph class. And in this class there will be an offer sheet. So you will have order.
    Another idea of ​​implementation:
    There is a class reader , and a class handler , the head function (most often main (String [] args) ) creates a class reader that has something like this:

     public class Reader { public boolean hasNext(){ /* Этот метод возвращает true, если в книге есть еще один абзац */ } public Paragraph getNextParagraph() { /* Этот метод считывает следующий параграф, заносит в класс абзац нужные данные, и возвращает его. */ } } 

    In the class handler is everything that needs to be done with the paragraph (there withdraw it, format it, in general anything). And in the head method do something like

     public static void main(String[] args) { Reader reader = new Reader("Путь_к_книге_которую_открыть"); while (reader.hasNext()) { //Создаем класс обработчика, в котором параметр конструктора // объект класса Paragraph Worker worker = new Worker(reader.getNextParagraph()); //Делаем все, что надо worker.work(); } } 

    Well, that's it.

    • Did not understand at the expense of listings. How will I know what listing object to display if there is a sentence "Hi Bill Gates, i'm Steave Jobs." or something like that. I will break into words, symbols, my own objects of the respective classes will be created for each element, how can they be properly stuffed into a sheet? - Sergii

    Well, I give a more detailed answer.
    First, the Worker class:

     public class Worker { private Paragraph paragraph; //Конструктор public Worker(Paragraph p) { this.paragraph = p; } /* Далее методы по работе с текстом или что-то там еще надо сделать... */ } 

    As we remember, the paragraph creates a class Reader . Next, you wonder what will be in the Paragraph class. It should take one paragraph from the source file. The logic of implementation to come up with is a technical matter (i.e., yours). Those. in essence, it should take, for example, a string type, break it into sentences, as well as the class sentence (Sentence) should take a string type and break the accepted sentence into words and punctuation, and send them to its subclass - the word . Those. The logic is that each class must do its own . These are the basics of OOP . Here is the Paragraph class:

     public class Paragraph { //Объявляем лист предложений private List<Sentence> sentenses; //Объявляем конструктор public Paragraph(String text) { sentenses = new ArrayList<Sentence>(); List<String> tSentenses = new ArrayList<String>(); /* Тут надо разбить пришедший текст на предложения, после чего получится лист строк, которые и будут предложениями. */ //Идем дальше: //Тут мы заполняем наш список предложений в классе for (String s : tSentenses) sentenses.add(new Sentence(s)); } /*Ну и, т.к. мы будем использовать в качестве основного источника получения всяких там данных из текста, то реализовать тут можно геттеры и сеттеры отдельных частей параграфа. А т.к. все предложения заносятся последовательно, то и порядок, конечно же, сохранится. */ public String getParagraphText() { //Будет совмещать назат весь параграф. Например так: String res = ""; // Извлекаем содержимое предложений из нашего класса. for (Sentence sent : sentenses) { res += sent.getSentenseText(); } return res; } } 

    By absolute analogy, make Sentence class sentence . It is already clear that it must contain a constructor, which accepts a string type, and must contain at least a method that returns a sentence stored in it.

    And note that there is a class in a class, and in that class there is another class. Those. I emphasize once again: Everything is just like in real life - a sentence in a paragraph, words in a sentence. And here: the class sentence in the class paragraph , and in the class sentence class word , etc.

      In addition to the previous answers:

      It would be more logical to make the Предложение list of objects of type Лексема , and already inherit the Слово and Знак препинания from the class Лексема .

      Then the punctuation marks will not be tied to the words that are correct - after all, they are not. It will also be easier to keep statistics of characters and words: you will not need to check the content and share it with each pass.

      • Truly true! - Anton Mukhin

      you have everything written right in the problem statement. so in order:

      1) common interface

       public class Parsable { private String value = null; public Parsable(String value) { setValue(value); } public List<Parsable> parse() { return new LinkedList(); } // методы getValue(), setValue } 

      I think it is intuitive that all complex objects will be parsed (at the output of LinkedList - because it preserves the order), and by default we will return an empty list (suitable for characters and punctuation marks)

      2) the most "main" class is the top of the tree

       class Text extends Parsable {...} 

      which stores the string - the text itself, and the result of calling the parse method is a list of objects — sentences and delimiters (punctuation marks (dots), space characters, tabs, line breaks, etc.).

      3) 2nd level object - offer

       class Sentence extends Parsable {...} 

      The result of the call to the parse method is a list of words and delimiters (valid within the sentence, commas, colons, spaces, etc ...)

      4) Level 3 object

       class Word extends Parsable {...} 

      the parse result for this class will be a list of characters

      5) the template Visitor will be used for parsing and traversing the tree: you create a basic Text object and call the parse method for it, which returns a list of the objects of which it consists. and for each of them the parse method is called. look for more detailed information for yourself (google: Visitor design pattern)

      • Rather, for all this is not important for you, but since you will have many duplicate elements, you can add a Factory class that will store "links" to ready-made objects. eg: Map <String, Parsable> cache = new HasMap <String, Parsable> (); cache.put ("a", new Parsable ("a")); and you can also complicate the code by adding a static initializer / constructor and clog all the valid letters, punctuation marks, etc., by default. - jmu