fuzzy search in ElasticSearch

Question

Hello, the request is taken from your previous answer. Wrong assessment when searching for fuzziness in elasticsearch I bet with a similar story, it looks like it’s not bad, but when I request "women's dresses", I first find "women's sports dresses", then "women's scarf" and then "women's dresses".

Below is the request itself and the mapping:

$params = [ 'index' => 'keywords_index', 'body' => [ 'settings' => [ 'number_of_shards' => '5', 'number_of_replicas' => '1', 'analysis' => [ 'filter' => [ 'autocomplete_filter'=>[ 'type' => 'edge_ngram', 'min_gram' => '1', 'max_gram' => '15' ], 'russian_stop' => [ 'type' => 'stop', 'stopwords' => '_russian_' ], 'russian_stemmer' => [ 'type' => 'stemmer', 'language' => 'russian' ], 'my_synonyms' => [ 'type' => 'synonym', 'synonyms' => [ 'купить => одежда' ] ] ], 'analyzer' => [ 'a_analyzer' => [ 'type' => 'custom', 'tokenizer' => 'standard', 'filter' => ['lowercase', 'autocomplete_filter'] ], 'f_analyzer' => [ 'type' => 'custom', 'tokenizer' => 'standard', 'filter' => ['lowercase', 'russian_morphology', 'russian_stemmer', 'russian_stop', 'english_morphology'] ] ] ] ], 'mappings' => [ 'my_type' => [ '_source' => [ 'enabled' => true ], 'properties' => [ 'id' => [ 'type' => 'float' ], 'p1' => [ 'type' => 'float' ], 'p2' => [ 'type' => 'float' ], 'p3' => [ 'type' => 'float' ], 'name' => [ 'type' => 'keyword', 'index' => 'not_analyzed' ], 'keywords' => [ 'type' => 'text', 'analyzer' => 'a_analyzer', 'search_analyzer' => 'f_analyzer' ], 'key_filter' => [ 'type' => 'text', 'index' => 'not_analyzed' ], 'type_filtr' => [ 'type' => 'float' ] ] ] ] ] ]; $request = array( "from" => 0, "size"=>5, "query"=> array( "function_score"=> array( "query"=> array( "match"=> array( "keywords"=> array( "query"=> "женские платья", "fuzziness"=> 2, "prefix_length"=> 1 ) ) ), "functions"=> array( "filter" => [ "term" => [ "keywords" => "женские платья" ] ], "weight" => 500 ), "boost_mode"=> "multiply" ) ) );

A.Ustinov A.Ustinov 21 2 · Answer 1 · 2017-12-27T16:22:14

There is no need to use function_score, because additional weight is added to the same sampling results (which does not change the order of issue).

In order to raise the "women's dresses" up, you can use the match_phrase query. With the parameter slop = 2 or 3. At the same time, the "female scarf" will disappear from the results, since missing key "dress".

 "query"=> array( "match"=> array( "keywords"=> array( "query"=> "женские платья", "fuzziness"=> 2, "prefix_length"=> 1, "type"=>'phrase', "slop"=> 2 ) ) )

Or, if you still need scarves, just at the very bottom, you can wrap everything with a bool and add match_phrase to the SHOULD category as a ranking factor with slop = 50.

fuzzy search in ElasticSearch

1 answer 1

More articles: