Best of all - look towards the ready-made search engines. And if you want to invent the next bike yourself, then start with the fact that the search is conducted not by content, but by a separate array of information that is built, for example, like this:
- words from the page are reduced to a single form (nouns - im. pad, unit. number etc)
- are entered into the database with additional information (for example, the ordinal number of the word on the page, the page itself, the original word form, which tags are surrounded by - everything you might need)
- ...
Next - work with a search query. The easiest option: one word. We bring it to the same word form, as in paragraph 1, we are looking. Is - do not forget to give the original fragment (for this and keep the "original" version of the words, their order ...). We need to search for a few words or work with the language of requests - we continue to wrinkle the forehead, but by this time either the idea of writing your engine will die by itself, or answers to emerging questions will already be received :)
Well, the organization of this repository should deal with a separate engine. Or a completely separate one, which periodically goes around the pages, looks at the changes and builds a subscript index (Yandex.Site etc). Or update the search index occurs when creating / editing the page, the simplest option is implemented in the engines of many forums.
Something like that, in the most general terms, without details :) So - see the first sentence of the first paragraph.