Hello! Search with Lucene:

setlocale(LC_ALL, 'ru_RU.UTF-8'); Lucene\Analysis\Analyzer\Analyzer::setDefault( new Lucene\Analysis\Analyzer\Common\Utf8\CaseInsensitive()); $index = Lucene\Lucene::open($searchIndexLocation); $searchResults = $index->find($validatedQuery['searchQuery']); 

Then the words should be highlighted by the search:

 \ZendSearch\Lucene\Search\QueryParser::setDefaultEncoding('UTF-8'); $doc = Html::loadHTML($high, false, $defaultEncoding = 'UTF-8'); $doc->highlight($validatedQuery['searchQuery'], $colour = '#66ffff'); $highlightedHTML=$doc->getHTMLBody(); 

With the English text, everything works fine, but with Russian words either are not highlighted at all, or individual letters are highlighted in other words that do not match the query. And if the request begins with a capital Russian letter, then the search does not work at all. What can be wrong? Thank.

    1 answer 1

    Use the following code in the class constructor where you are using Lucene.

     Lucene\Analysis\Analyzer\Analyzer::setDefault(new CaseInsensitive()); Lucene\Search\Query\Wildcard::setMinPrefixLength(0); Lucene\Search\QueryParser::setDefaultEncoding('UTF-8'); Lucene\Search\QueryParser::setDefaultOperator(Lucene\Search\QueryParser::B_OR); 

    Use your CaseInsensitive () analyzer:

     class CaseInsensitive extends Utf8Rus { public function __construct() { parent::__construct(); $this->addFilter(new Lucene\Analysis\TokenFilter\LowerCaseUtf8()); } } class Utf8Rus extends Lucene\Analysis\Analyzer\Common\Utf8Num { public function __construct() { parent::__construct(); } public function tokenize($data, $encoding = 'UTF-8') { return parent::tokenize($data, $encoding); } } 

    And everything will work.