Friends, such a question. There is a table containing the values ​​of the buyer-product type:

| Ваня | Товар 1 | | Ваня | Товар 2 | | Петя | Товар 1 | | Петя | Товар 2 | | Петя | Товар 3 | 

It is necessary to make a request that counts for each pair of goods, the number of buyers who bought them ie:

 Товар 1, Товар 2 - 2 Товар 1, Товар 3 - 1 Товар 2, Товар 3 - 1 

I can not imagine how to do it. Tell me which way to dig?

  • explain why the example is 2 and 1. - pavel
  • I noticed an error in the example, corrected - lonely_luckily

2 answers 2

Oh, it seems still found the answer!

To optimize the calculations you need an index:

 create index itemslog_userid_itemname on itemslog(userid, itemname); 

And then you can use the usual query via JOIN:

 SELECT t1.itemname, t2.itemname, count(*), FROM itemslog AS t1 JOIN itemslog AS t2 ON t1.userid = t2.userid AND t1.itemname < t2.itemname GROUP BY t1.itemname, t2.itemname; 

    The easiest option to do something in the spirit

     Select t1.name, t2.name, count(*) as count from order as t1, order as t2 where t1.name < t2.name and t1.user = t2.user group by t1.name, t2.name; 
    • Thanks, figured out, the way is quite working. But ... only on small samples. Maybe there are ideas how to make it work on large arrays (more than 200k records)? - lonely_luckily
    • @lonely_luckily are there any other restrictions? in the forehead here the size of the answer is of the order of the square of the size 200,000 squared is billions of lines in the answer. - pavel
    • except for volume, there are no other restrictions - lonely_luckily
    • Thank you, figured out. This option is fully functional if you use an index (I wrote the answer below) - lonely_luckily