📜 ⬆️ ⬇️

Analysis of trends in Russian YouTube for 2018

Old-timers probably don’t remember, but at the end of 2017, there was a widespread discussion on the Internet that “often wired” videos were often found on YouTube trends.


Therefore, on the eve of the new 2018, I wrote a utility for collecting information about videos that are in trend. For each video, a title, a list of tags, a date of creation are requested, as well as a history of changes in varnishes / dislikes / views. Development led on TypeScript for NodeJS, the code itself is laid out on GitHub .


As a result, now there is an opportunity to build beautiful graphics:
Graph of changes in the number of likes and dislikes


It is also possible to build graphs of changes in trends for keywords. In total for the 2018th year, information was collected on 29,271 videos. Statistics are collected now.



General principle of work


  1. Every 5 minutes, the current list of trends is taken.
  2. For each new video, the main information is saved (title, tag list, creation date)
  3. Based on the title and tags, each video is assigned a keyword cloud.
  4. Schedule information is requested on likes / dislikes / views for each of the videos. The statistics is collected within two days, the first time requests go at intervals of 2 minutes, then the interval increases. If there is a suspicion of cheating, the interval is again set to 2 minutes.

If the graph of changes in the number of likes / dislikes on one of the sites is a straight line, then only the first and last value on this site is saved. This is done to reduce the size of the database. Now there are only 6908449 records in the table with statistics, the table on the disk is 458 mb.


The principle of the automatic detection of markups


For myself, the task was formulated as follows: you need to mark videos that have a “ladder” on the chart for changing likes / dislikes. Steps at this same ladder are determined on the basis of three adjacent statistics measurements. The angle between the two straight lines is taken into account: one straight line is drawn between the first and the second measurement, the second between the second and the third, as well as the lengths of the segments. Graphs that have many small irregularities are also marked.


An example of suspicious graphics:

Suspicious graphics example


All the parameters of the algorithm were determined by me manually and checked on the video already collected at that time and during the year changes were made to this algorithm. Therefore, it’s probably not worth taking these results seriously for each individual video. In my defense, I can say that when the parameters were changed, a recalculation was launched for all the videos already collected, so the same algorithm was applied to all the videos.


In general, it is impossible to say whether there was a cheat on one (or several) changes of likes / dislikes. Any suspicious drops can be explained by the work of CQRS or flares on the sun. Yes, one schedule is smooth, the other is stepped, but perhaps all videos occasionally encounter similar behavior? That is why, to compile a general picture, information was collected from all videos that hit the trends.


Cheat statistics


For the 2018th year, the algorithm showed the following results:
Suspicion of cheating likes: 180 videos (0.32% of the total number of videos)
Suspicion of dizlyku cheating: 1303 video (4.45% of the total number of video)


There are few videos with suspicious graphics, but this was not always the case: in the first month of 2018, 96 such videos were recorded (more than 50% of all suspicious likes for the year). However, in February, such videos became much smaller, only 8.


Here, you probably should again refer to the old-timers, who may recall (or not remember) the event that occurred on January 10, 2018, when YouTube blocked many channels . For my part, I can say that among those blocked were those for which my utility managed to gather information. Graph of one of the deleted videos:

image


If we assume that there were really some cheating, then it seems that YouTube has done a lot of work and now you can see videos that are suspicious about likes in trends not every day (and those that are found often look like an accident or error). On the other hand, such a drop in wrap can be explained by the fact that, unlike dislikes, it does not make sense to wind up likes from videos that have already become trendy.


And some more statistics. On average, tracked videos gain 21,479 likes and 2,863 dies.
Suspicion of cheating likes: 15502/4250
Suspicion of dizlaikov cheat: 16868/22087
Thus, if you look at the result, there is no benefit from cheating likes, while it is quite possible to increase the percentage of dislikes.


Suspicious on dizlakam graphics are uneven. For example, on the Yevgeny Roizman channel, out of 21 videos caught in a trend, more than half are marked by the algorithm as twisted on dizlikes.


About the graphics from the title of this article. If we assume that there is a set of accounts in the amount of 5-10 thousand, which they first gave the command to put on dislikes, and then, not waiting for the end of work on the same set, they gave the command to put likes, then you can probably get a similar schedule.


The strangest schedule that I met:

image
I would be grateful if someone offered an explanation of what the devil is going on here. By the way, it can be noted that according to this schedule, statistics were collected for almost a week, not two days.


The principle of the algorithm for measuring the popularity of keywords


As already said, for each video is saved the name and set of tags. Further, the name and each of the tags is divided into separate words, they are run through a stemmer and saved as a cloud of keywords for the video.


Having the information about when the video got into the trends and when the trends came out, as well as the sets of words for the video, you can make a graph of the change in popularity for each of the keywords. At the moment, the schedule for changing keyword threads is arranged by day. As a measure, the total time (in hours) is used, which all videos with this keyword were in trends.


Example: in the trends there were only two videos corresponding to the keyword. One video lasted 5 hours in trends, the other - 10 hours. Then the popularity of the keyword is set to 10 + 5 = 15.


Sample keyword popularity charts


According to the algorithm that I wrote above, the most resonant and most noticeable event of 2018 was not the elections and not even football, but the tragedy in Kemerovo:

trend chart: Kemerovo, elections, football


Unlike all other events, the tragedy in Kemerovo has affected everyone, and the video on this incident ousted the rest from the trends.


Well, a bit of politics:

image


How to feel


View charts and play with trends here .


The system is now running on Amazon Cloud, using two instances:


Perhaps, in the event of a load, the web server will fall down first, while the second server will continue to collect statistics. This is me to the fact that you should not be surprised if everything stops working.

The database itself as of 01/23/2019 can be downloaded from the link .


I also wrote two plugins for chrome and filrefox in my time. Now the only benefit: right on the list of YouTube trends you can see the number of likes / dislikes for each video.

Source: https://habr.com/ru/post/436750/