I ran into such a problem, there is such a code

ESClient.delete({ index: 'index', type: 'type', id: id }, function (err, res) { if (res) { get_list(input, function (data) { callback(data); }); } else { callback(err); } }); 

the response code is also standard

 ESClient.search({ index: 'index', type: 'type', query: 'match_all', }, function (err, res) { if (res) { callback(res); } else { console.dir(err); callback(err); } }); 

The problem is that the answer does not come from a remote table, that is, the answer is I get a table with a value already deleted, and I have to do a manual update, you can somehow make a request to get a response excluding the deleted value

    1 answer 1

    ElasticSearch is not a database in the usual sense. This is a near-realtime search engine that has the full right to issue inconsistent values ​​within five minutes after the change. And, in fact, a great miracle that it is generally changeable - most index engines require a complete replacement of the index to update the search query results; The index itself is an immutable collection of data, and the creators of ES / Lucene had to sweat so much that the search engine would update the time of the search in general.

    The first thing you need to think about in this situation is whether you need to somehow get involved in this time lag. A five-minute delay in search results is usually not noticed by anyone, and the ability to make a transition to the page of an entity that was deleted at that time is always there. Nevertheless, the process can still be accelerated.

    The internal ES indices consist of segments - microindexes, allocated for each N documents. When a new document is added, ES rebuilds exactly one index segment, eliminating the need to rebuild and block the entire index. However, this is not a free procedure that occurs asynchronously, and because of which (most likely, but not exactly), this question has appeared. Therefore, ElasticSearch first adds the document to the queue for updating the segment, and once every five minutes this segment is saved to disk and rebuilt at the Lucene level. Because ElasticSearch target tasks assume a frenzied number of requests, it is simply unacceptable to constantly synchronize index files, and this operation was rendered into asynchronous execution, and when searching for a document by id, ElasticSearch will first use an index segment and then a queue of not yet saved documents to find the document (therefore you can see "phantom" readings - the document is available by identifier, but it is not available in the output). However, the process can be accelerated slightly: Lucene keeps its indexes both on disk and (obviously) in memory, so you can force Lucene to merge all the segments with their queues without causing the disk to write - and for this there is a refresh method. This is a shareware operation (cheaper than writing to a disk right away), but taking it out to a separate API handle by the creators of ElasticSearch tells me that it can still be resource intensive and / or block the search index.

    I note that there is also a flush operation, which will force all data to disk. As mentioned above, this is an expensive operation that needs to be called only in extreme cases (for example, before shutting down the node).

    Summarizing: try using the refresh operation for your index.

    A question on enso through which I learned all this.

    • Yes, even if you make an update manually in half a second, then everything already shows how remote, there may be some function that will make a callback when the object is deleted, or you can make an attribute that will cause such a delay - pnp2000
    • @ vnn198 refresh is such a callback, it is a synchronous method, it will not return the value to you until the whole index is updated - etki
    • I try through bulk, but maybe I just don't refresh like that, I do Client.bulk ({body: [{delete: {_index: 'myindex', _type: 'mytype', _id: id}}, {refresh: { _index: 'myindex'}}]} - pnp2000