Document structure (edited)

{ Id: "Идентификатор редакции", DocumentId: "Идентификатор документа", Date: "Дата создания", Text: "" } 

How to build a query to filter editions with duplicate DocumentId?

    1 answer 1

    Use aggregation.

    If you need to remove all records with duplicate DocumentId (only unique):

     { "aggs": { "withoutDuplicate": { "terms": { "field":"DocumentId" }, "aggs": { "withoutDuplicateDocs" { "top_hits": { "size": 1 } } } } } } 

    Conversely, if you want to show DocumentId, which are not unique:

     { "size": 0, "aggs": { "duplicateCount": { "terms": { "field":"DocumentId", "min_doc_count": 2 }, "aggs": { "duplicateDocuments" { "top_hits": { "_source": "DocumentId" } } } } } } 

    You can read more on the links:

    Terms aggregation

    Top hits aggregation