The task is to generate a certain number of short texts that are similar in content, but do not coincide 1: 1 - for example, posts in social. network.

I suppose that a pattern will be made with places where either the forms of the word depending on the gender of the “author” or one of the synonyms are substituted by chance.

Pseudocode:

{gender:Решил|решила} {random:попробовать|испытать|потестить} {random:новый сервис|новую примочку} для моих задач. 

If you complicate, you will need cross-links for the types of cases of the selected options in different parts of the proposal. Global and local variables. Whole YP?

The question is: is there a standard approach to such a task (what is the name of such a task), is there a markup language suitable for describing such patterns?


Upd. If you dream, ideally and “correctly”, it would be simple to submit a single text version to the input. And to have an engine that can adequately assess the meaning, decompose into elements and independently propose replacements based on the specified tolerances. For example fix the floor with the parameter, do not change the mentioned colors, change the names of the artists mentioned within the framework of one artistic style and historical epoch (well, this would be aerobatics already :)

  • Not a modest question - is it not an attempt to repeat this product here: SeoGenerator . Its generator is especially similar to your task. - Ruslan
  • one
    Well, I do not understand what the problem is to take and implement. If I had to, I would have done so. - Qwertiy
  • The @Ruslan task is different, although the principle is somewhat similar. I will study, thanks! - Sergiks
  • @Qwertiy your comment does not clarify and does not add value to the issue. - Sergiks

2 answers 2

I have never done this, but I know that Markov chains are usually used for this. Information for reflection .

The second source of food for thought . A translation system is described, the general principle of which is reduced to a sequence of transformations: language 1 -> analysis -> semantic description -> generation -> language 2 . As you understand, you should have the same language at the input and at the output, plus you will need to substitute synonyms in different combinations in the semantic description. You can make a simpler option that would contain only the second stage - generation.

In general, what am I doing all this, and to the point that it is worth looking for translators with open code that implement the principle above. By the way, Sokirko and his comrades laid out their developments under the GPL. This is not a translation system but an analysis system, but the main thing is there are tools for working with semantic descriptions.

  • Thanks for the tip. I am reading. - Sergiks
  • Upd. So far I have found only examples of generating meaningless text that repeats the frequency distribution of the words in the source. For seo-garbage, maybe an option. But I have another task - to preserve the meaning and readability. - Sergiks
  • An interesting experience in php with the source for the generation of "syntactically correct" text: the program Geniot - Sergiks

Well, suppose so (works in Firefox current version):

 var processors = { gender(options, m, w) { return options.gender === 'm' ? m : w; }, random(options, ...a) { return a[Math.random() * a.length | 0]; } }; function generate(template, options) { return template.replace(/\{(\w+):(.*?)\}/g, function (match, key, str) { return processors[key] ? processors[key].apply(null, [options].concat(str.match(/(\\[{}|\\]|[^|])+/g))) : match; }); } var template = "{gender:Решил|Решила} {random:попробовать|испытать|потестить} \n\ {random:новый сервис|новую примочку} для моих задач." for (var q=0; q<10; ++q) console.log(generate(template, { gender: 'mw'[q&1] })); 

Like this, you can add any desired logic.

  • a dependency tree can branch deeper. For example, depending on a randomly selected option, then randomly choose from different arrays: {rand:дыню|укроп}, чей {rand:сладкий|спелый||ароматный|свежий} вкус... - Sergiks