There is a text-set of sentences. His words are converted to basic form by a stemmer.
Input: String. Usually it is about 20 kB of text. Article from the proposals of the Russian text. Encoding (utf8) and other similar parameters can be made by anyone, this is hardly essential.
Example “Oleg has a friend Oleg”
After the work of the stemmer, you get an object that is not yet clear what to store. For example, it can be a list or a tuple: ['Y', 'Oleg', 'appear', 'Oleg'] by processing this list we find the most frequency word
At the exit: you need to get a line (source text) in which the keywords are wrapped with html tags, for example, bold text tags: «У <b>Олега</b> появился друг <b>Олег</b>»
How to do it better? The only thing that comes to mind is to insert the original words into the list, then after the stemmer, knowing the number of the word in the list, already wrap it with tags. And then from the list back to the plain text.
Maybe there is a simpler version of what?