Faced a serious problem in developing a dating website for coursework. Created the entire external shell, messaging, etc., but there was a difficult selection of people according to their interests.

I need users to keep their percentage matches in interest with other users, so that later I can make friends with them. But now I need to specifically find the percentage of matches.

Below I attach a scheme, action logic, which I cannot implement, and MySQL code tables.

I have already looked at all the similar topics but have not found anything as confusing as my task. I ask your help, the question of life and death! Base scheme This "cycle" will be separately for each Id_music, id_book and id_film, therefore in the description they go through /

  1. When adding the Id_music / id_book / id_film field to the Watched, Read, Listened tables, create a cycle equal to the number of users minus 1. Our user ID is entered into the user_one position in the Stats table each time, and the id of the one we compare with is user_two.

  2. Read the number of records by id in the Listened / Watched / Read table to get the total number of interests in this category from our user. Do the same for the user in the loop.

3. Alternately compare the ID_Music / ID_Book / ID_Film field in the Read / Listened / Watched table of our user to match the same fields from another user that is now in a loop. As a result, we get the number of matches.

4. We compare this number with the number of records of these users from paragraph 2. For example, if our user has only 4 records, and the second has 8, and matches 2, we put in the table stats 50 (%) in the field bks_prcnt / msc_prcnt / flm_prcnt and 25 (%) in the bks_prcnt_rev / msc_prcnt_rev / flm_prcnt_rev field, because the second user has more interests.

5.? Cycle ends. If the fields existed before, we update, not create them.

CREATE TABLE `users` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `username` text NOT NULL, `password` text NOT NULL, 'about' text(500), PRIMARY KEY (`id`) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `stats` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `user_one` INT(11) NOT NULL, `user_two` INT(11) NOT NULL, `bks_prcnt` INT(4) NOT NULL , `bks_prcnt_rev` INT(4) NOT NULL, `flm_prcnt` INT(4) NOT NULL, `flm_prcnt_rev` INT(4) NOT NULL, `msc_prcnt` INT(4) NOT NULL, `msc_prcnt_rev` INT(4) NOT NULL, PRIMARY KEY (`id`), FOREIGN KEY (user_one) REFERENCES users(id), FOREIGN KEY (user_two) REFERENCES users(id) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `genres` ( `genre_id` INT(11) NOT NULL AUTO_INCREMENT, `genre_name` text NOT NULL, PRIMARY KEY (`genre_id`) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `music` ( `music_id` INT(11) NOT NULL AUTO_INCREMENT, `title` text NOT NULL, `id_genre` INT(11) NOT NULL, `Compositor` text NOT NULL, PRIMARY KEY (`music_id`) , FOREIGN KEY (id_genre) REFERENCES genres(genre_id) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `book` ( `book_id` INT(11) NOT NULL AUTO_INCREMENT, `title` text NOT NULL, `id_genre` INT(11) NOT NULL, `Author` text NOT NULL, PRIMARY KEY (`book_id`) , FOREIGN KEY (id_genre) REFERENCES genres(genre_id) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `film` ( `film_id` INT(11) NOT NULL AUTO_INCREMENT, `title` text NOT NULL, `id_genre` INT(11) NOT NULL, `Producer` text NOT NULL, PRIMARY KEY (`film_id`), FOREIGN KEY (id_genre) REFERENCES genres(genre_id) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `Listened` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `id_user` INT(11) NOT NULL, `id_music` INT(11) NOT NULL, PRIMARY KEY (`id`) , FOREIGN KEY (id_user) REFERENCES users(id), FOREIGN KEY (id_music) REFERENCES music(id) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `Watched` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `id_user` INT(11) NOT NULL, `id_film` INT(11) NOT NULL, PRIMARY KEY (`id`) , FOREIGN KEY (id_user) REFERENCES users(id), FOREIGN KEY (id_film) REFERENCES film(id) ) ENGINE=InnoDB CHARACTER SET=UTF8; CREATE TABLE `Read` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `id_user` INT(11) NOT NULL, `id_book` INT(11) NOT NULL, PRIMARY KEY (`id`) , FOREIGN KEY (id_user) REFERENCES users(id), FOREIGN KEY (id_book) REFERENCES book(id) ) ENGINE=InnoDB CHARACTER SET=UTF8; 

and here are a few Insert-s (do not pay attention to the similarity of genres of music, books and films)

  INSERT INTO `Users`(`id`, `username`, `password`, `about`) VALUES ('','User1','213','something') INSERT INTO `Users`(`id`, `username`, `password`, `about`) VALUES ('','User2','313','something') INSERT INTO `genres`(`genre_id`, `genre_name`) VALUES ('','Genre1') INSERT INTO `genres`(`genre_id`, `genre_name`) VALUES ('','Genre2') INSERT INTO `genres`(`genre_id`, `genre_name`) VALUES ('','Genre3') INSERT INTO `book`(`book_id`, `title`,'id_genre','Author') VALUES ('','Book1','1','Author1') INSERT INTO `book`(`book_id`, `title`,'id_genre','Author') VALUES ('','Book2','2','Author2') INSERT INTO `music`(`music_id`, `title`,'id_genre','Compositor') VALUES ('','Song1','1','Compositor1') INSERT INTO `music`(`music_id`, `title`,'id_genre','Compositor') VALUES ('','Song2','2','Compositor2') INSERT INTO `film`(`film_id`, `title`,'id_genre','Producer') VALUES ('','Film1','1','Producer1') INSERT INTO `film`(`film_id`, `title`,'id_genre','Producer') VALUES ('','Film2','2','Producer2') INSERT INTO `Read`(`id`, `id_user,'id_book') VALUES ('','1','1') INSERT INTO `Read`(`id`, `id_user,'id_book') VALUES ('','1','2') INSERT INTO `Read`(`id`, `id_user,'id_book') VALUES ('','2','1') INSERT INTO `Listened`(`id`, `id_user,'id_music') VALUES ('','1','1') INSERT INTO `Listened`(`id`, `id_user,'id_music') VALUES ('','1','2') INSERT INTO `Listened`(`id`, `id_user,'id_music') VALUES ('','2','1') INSERT INTO `Watched`(`id`, `id_user,'id_film') VALUES ('','1','1') INSERT INTO `Watched`(`id`, `id_user,'id_film') VALUES ('','1','2') INSERT INTO `Watched`(`id`, `id_user,'id_film') VALUES ('','2','1') 
  • Let there be users ID 1 and ID 2. In the stats table there will be two records 1-2 and 2-1 or only one. if there is one, which one. If there are two, then why do you need to keep the percentage in the record direct and the percentage inverse if you could only keep the percentage in one direction and from your example the record 1-2 will have 50% in 2-1 25% - Mike
  • Yes, and could you put some insert with test data in the table creation code - Mike
  • Initially I thought, as you said, to store 1-2 and 2-1, and each has a different percentage value, but then decided to enter forward and reverse percent. In this case, I would like to store in the first place the id of the user who makes changes to id_book, id_music, etc. but not sure if this is better than the original version. Now I will edit and add the Insert with users, books and others - Tigran Oganov

2 answers 2

You are working with a SQL database. SQL allows you to get any data in any desired form and in 99% of cases it is done in one query. If, when working with SQL, you have to use a "loop" and this cycle is not intended for outputting ready data from the database to the client, then most likely you are doing something wrong. The full contents for the stats table can be obtained by the following query:

 select u1, u2, max(prcnt*(source='Read')) as bks_prcnt, max(prcnt_rev*(source='Read')) as bks_prcnt_rev, max(prcnt*(source='List')) as flm_prcnt, max(prcnt_rev*(source='List')) as flm_prcnt_rev, max(prcnt*(source='Watch')) as msc_prcnt, max(prcnt_rev*(source='Watch')) as msc_prcnt_rev from ( select * from ( select least(u1,u2) u1, greatest(u1,u2) u2, 'Read' as source, max((u1<u2)*prcnt) prcnt, max((u1>u2)*prcnt) prcnt_rev from ( select R1.id_user u1, R2.id_user u2, round(count(1)*100/R.all_cnt) prcnt from `Read` R1, `Read` R2, ( select id_user u, count(1) all_cnt from `Read` group by id_user ) R where R1.id_book=R2.id_book and R1.id_user!=R2.id_user and Ru=R1.id_user group by R1.id_user, R2.id_user ) A group by least(u1,u2), greatest(u1,u2) ) A UNION ALL select * from ( select least(u1,u2) u1, greatest(u1,u2) u2, 'List' as source, max((u1<u2)*prcnt) prcnt, max((u1>u2)*prcnt) prcnt_rev from ( select R1.id_user u1, R2.id_user u2, round(count(1)*100/R.all_cnt) prcnt from `Listened` R1, `Listened` R2, ( select id_user u, count(1) all_cnt from `Listened` group by id_user ) R where R1.id_music=R2.id_music and R1.id_user!=R2.id_user and Ru=R1.id_user group by R1.id_user, R2.id_user ) A group by least(u1,u2), greatest(u1,u2) ) A UNION ALL select * from ( select least(u1,u2) u1, greatest(u1,u2) u2, 'Watch' as source, max((u1<u2)*prcnt) prcnt, max((u1>u2)*prcnt) prcnt_rev from ( select R1.id_user u1, R2.id_user u2, round(count(1)*100/R.all_cnt) prcnt from `Watched` R1, `Watched` R2, ( select id_user u, count(1) all_cnt from `Watched` group by id_user ) R where R1.id_film=R2.id_film and R1.id_user!=R2.id_user and Ru=R1.id_user group by R1.id_user, R2.id_user ) A group by least(u1,u2), greatest(u1,u2) ) B ) C 

If there are not many records in the database, you can always get the necessary data on the fly. If it slows down, then you can, of course, cache them in the stats table. The request is easily altered to get similar users relative to a particular one. To do this, add conditions with the ID of the desired user to all 3 subqueries for R1 aliases, as well as to the deepest subquery that receives all_cnt.

In general, the request could be much simpler and shorter if you did not want to see both the forward and reverse percentages in one record. Because of this, you have to first get these 2 percent in separate lines, and then tighten up on one line with the least(), greatest() group (which leaves only the lines where the first ID is less than the second).

Also, this query would be 3 times shorter (and at the same time work with the database would be easier almost everywhere) if you reduced the Book / Music / Film tables to one by simply adding the "content type" field to the record and reducing the equivalent fields to one. After that, the Read / Listed / Watched tables are also summarized in one table, and we can calculate both the total percentage and the percentage in terms of content types by managing the group by phrase in one short query.

As for the stats table, if you need it, you should remove the id field from it, make the PRIMARY KEY (user_one, user_two) primary key. After that, it will be possible to write / update the data in it as:

 insert into stats select ВОТ_ТОТ_БОЛЬШОЙ_ЗАПРОС on duplicate key update bks_prcnt=values(bks_prcnt), bks_prcnt_rev=values(bks_prcnt_rev), ... 

Total: Learn SQL, it is actually quite simple, you can put several basic elements into each other as much as you like and describe any data cut. You can study just by request at the top, take small pieces out of it, try them separately, see what they give, experiment. And remember, in SQL there is no question " can it be done," there is only the question " how to do it".

PS For sure, the "strange" record max((u1<u2)*prcnt) ; such a short record is only supported in MySQL. When calculating any comparison, it produces arithmetic 0 if the condition is false and 1 if true. Therefore, you can use comparison operations in arithmetic. This max() takes only the values ​​of the records where u1<u2 , multiplying the number by 0 or 1. In other DBMSs, the record max(case when u1<u2 then prcnt end) would be required max(case when u1<u2 then prcnt end) .

  • Thank you so much for such an accurate answer and explanation how to convert to other bases. I will check now! - Tigran Oganov

Keep the percentage wrong for a number of reasons:

  1. You will need to recount when adding a new user.
  2. You will need to recalculate when changing an existing user.

    And this is for all users.

It is better to dynamically calculate (cache for re-access speed). The algorithm is as follows:

  1. Take the ID of our user.
  2. Choose all his books
  3. For each book, we look for user IDs and enter the ID => counter into the dictionary (i.e. if there are IDs in the dictionary, then increment by 1 counter)
  4. We do the same with movies and music.
  5. We get something like this:

    005 => 20, // our user
    023 => 10,
    024 => 05,
    007 => 18,
    123 => 03;

  6. Now we can divide the counters of each user with our counter and get the percentage of matches.

I do not know how to speed. Surely you can improve the algorithm (for example, count the number of matches for all the books \ films \ music of our user).

  • I agree that it is not practical to recalculate each time id_interests being added / removed, but at the moment this is a task assigned to me, and I am trying to solve it. But within the framework of the diploma, I will be happy to change the method, although I do not fully understand what you offer in paragraph 3 under the dictionary ID => counter. You propose to compare not the interests of two users, but the interests of one user to coincide with others? (I don’t know how to say more precisely) - Tigran Oganov
  • @TigranOganov, quite right. After all, in the end, for the user, this is necessary. No one bothers you to modify this solution for all users, but I repeat: keeping the% in the database is a bad idea. - ArchDemon