Prize named after Ilya Segalovich. The story of computer science and publications on the occasion of the launch

Today we are launching the iseg Ilya Segalovich Scientific Prize . She will be awarded for achievements in computer science. Undergraduate and graduate students may submit their own application for a prize or nominate scientific leaders. The winners will be chosen by representatives of the academic community and Yandex. The main selection criteria: availability of publications and presentations at conferences, as well as contribution to community development.

The first award will take place in April. As part of the award, young scientists will receive 350 thousand rubles each, and in addition, they will be able to go to an international conference, work with a mentor and have an internship in the research department of Yandex. Supervisors will receive 700 thousand rubles.

On the occasion of the launch of the award, we decided to tell here, on Habré, the criteria for success in the world of computer science. A part of Habr's readers are already familiar with these criteria, and the rest might have a false impression about them. Today we will eliminate this gap - we will touch on all major topics, including articles, conferences, datasets, and the transfer of scientific ideas into services.

For scientists in the field of computer science, the main criterion of success is the publication of their scientific work at one of the top international conferences. This is the first "checkpoint" recognition of the work of the researcher. For example, in the field of machine learning, in general, International Conference on Machine Learning (ICML) and Conference on Neural Information Processing Systems (NeurIPS, formerly NIPS) are distinguished. There are numerous conferences on specific areas of ML, such as computer vision, information retrieval, speech technology, machine translation, etc.

Why post your ideas

People far from computer science may be confused that it is better to keep the most valuable ideas secret and seek to benefit from their uniqueness. However, the real situation in our sphere is exactly the opposite. The credibility of a scientist is judged by the significance of his work, by how often other scientists refer to his articles (citation index). This is an important characteristic of his career. The researcher moves up the professional ladder, becoming more respected in his environment, only if he constantly gives out strong works that are published, become famous and form the basis of the work of other scientists.

Many top articles (and perhaps most) are the result of collaborations of researchers at different universities and companies in different countries of the world. Important and very valuable in the career of a researcher is the moment when he gets the opportunity to find and filter out ideas on the basis of his own experience - but even after that, his colleagues continue to provide him with invaluable help. Scientists help each other to work on ideas, write articles in collaboration - and the more a scientist’s contribution to science, the easier it is for him to find like-minded people.

Finally, the density and availability of information is now so great that different researchers at the same time have very similar (and really valuable) scientific ideas. If the idea is not published, someone will almost certainly publish it for you. The "winner" is often provided not by the one who invented the innovation a little earlier, but the one who published it a little earlier. Or - the one who managed to open the idea as fully, clearly and convincingly as possible.

Articles and datasets

So, the scientific article is built around the main idea that the researcher suggests. This idea is his contribution to computer science. The article begins with a description of the idea formulated in several sentences. Then comes the introduction, which describes the range of problems solved by the proposed innovation. Description and introduction are usually written in simple language, understandable to a wide audience. After the introduction, it is necessary to formalize the problems presented, to introduce strict notation, in mathematical language. Then, using the introduced notation, it is necessary to make an intelligible and exhaustive statement of the essence of the proposed innovation, to identify differences from previous, similar methods. All theoretical calculations must either be supported by references to previously compiled evidence, or proven independently. This can be done with any assumptions. For example, one can give evidence for the case when the data in the training is infinitely much (obviously unattainable situation) or they are completely independent of each other. Toward the end of the article, the scientist talks about the experimental results that he was able to obtain.

In order for reviewers who are attracted by conference organizers to be more likely to approve an article, it must have one or more attributes. The key factor that increases the chances for approval is the scientific novelty of the proposed idea. Often, novelty is assessed in relation to already existing ideas - moreover, the work on its assessment is performed not by the reviewer, but by the author himself. In the ideal case, the author should tell in detail in the article about the existing methods and, if possible, present them as special cases of his method. Thus, the scientist shows that the adopted approaches do not always work, that he summarized them and proposed a broader, more flexible and therefore more effective theoretical formulation. If the novelty is indisputable, then the rest of the reviewers evaluate the article not so meticulously - for example, they can turn a blind eye to bad English.

To reinforce the novelty, it is useful to add to the article a comparison with existing methods on one or more data sets. Each of them must be open, accepted in the academic environment. For example, there is an ImageNet image repository and a database of institutions such as the Modified National Institute of Standards and Technology (MNIST) and CIFAR (Canadian Institute For Advanced Research). The difficulty is that such “academic” data often differs in content structure from the actual data that the industry deals with. Different data - different results of the proposed method. Scientists, partially working for the industry, try to take this into account and sometimes insert reservations of the form “on our data the result is such and such, and on the public dataset - such and such”.

It happens that the proposed method is completely “sharpened” under an open database and does not work on real data. You can fight this common problem by opening up new, more representative datasets, but often we are talking about private content that companies simply do not have the right to open. In some cases, they carry out (sometimes complex and painstaking) anonymization of data - delete any fragments that point to a particular person. For example, the faces and numbers in photos are erased or made illegible. In addition, in order for dataset not only to be accessible to everyone, but to become a standard among scientists, where it is convenient to compare ideas, it is necessary not only to publish it, but also to write a separate cited article about it and its benefits.

It is worse when there are no open datasets in the studied topic. Then the reviewer needs to take on faith the results given by the author. Theoretically, the author can even overstate them and remain unclassified, but in an academic environment this is unlikely, since it goes against the desire of the overwhelming majority of scientists to develop science.

In a number of areas of ML, including computer vision, it is also customary to attach links to the code to the articles (usually on GitHub). In the articles themselves, the code is either very small or it is pseudocode. And here, again, there are difficulties if the article is written by a researcher from a company, and not from a university. By default, code written in a corporation or startup is NDA. Researchers and their colleagues have to make a lot of effort to separate the code related to the described idea from the internal and certainly closed repositories.

The chance of publication depends on the relevance of the chosen topic. Relevance is largely dictated by products and services: if a corporation or a startup is interested in building a new service or improving an existing one based on the idea from the article, this is a plus.

As already mentioned, articles on computer science are extremely rarely written alone. But as a rule, one of the authors spends much more time and effort than the others. His contribution to the scientific novelty is the greatest. The list of authors of such a person is indicated first - and later, referring to the article, they can only mention him (for example, “Ivanov et al” - “Ivanov and others” in Latin). However, the contribution of the others is also extremely valuable - otherwise it is impossible to be on the list of authors.

Review process

Articles usually stop taking several months before the conference. After submitting an article, reviewers have 3-5 weeks to read, evaluate, and comment on it. This happens through the single blind system, when the authors do not see the names of the reviewers, or double blind, when the reviewers themselves do not see the names of the authors. The second option is considered more impartial: in several scientific papers it was shown that the popularity of the author influences the decision of the reviewer. For example, he may consider that a scientist with a large number of articles already published is a priori worthy of a higher grade.

At the same time, even in the case of double blind, the reviewer will probably guess the author if they work in the same field. In addition, the article at the time of the review can already be published in the database arXiv - the largest repository of scientific works. Conference organizers do not prohibit this, but recommend using a different name and a different annotation in the publication for arXiv. But if the article was placed there, it would not be difficult to find it anyway.

There are always a few reviewers who rate an article. One of them is assigned the role of a meta-reviewer, who should only view the verdicts of his colleagues and make the final decision. If the reviewers differ in the assessment of the article, the meta-reviewer can also read it for completeness.

Sometimes, after reviewing the assessment and comments, the author gets the opportunity to enter into a discussion with the reviewer; there is even a chance to convince him to change the decision (however, such a system works far from not all conferences, and it is possible to seriously affect the verdict even less often). In the discussion you can not refer to other scientific works, except for those that are already referenced in the article. One can only “help” the reviewer to better understand the content of the article.

Conferences and magazines

Articles on computer science are more often sent to conferences than to scientific journals. The reason is that there are requirements for publications in journals that are harder to meet, and the review process can take months or even years. Computer science is a rapidly growing industry, so authors are usually not ready to wait for publication for so long. However, the article already accepted for the conference can then be supplemented (for example, to give more detailed results) and publish it in a journal where the restrictions on the volume are not so strict.

Conference Events

The format of the presence of authors of approved articles at the conference is determined by reviewers. If the article is given a green light, then you most often allocate a stand for a poster. A poster is a static slide with a summary of the article and illustrations. Part of the conference halls filled with long rows of stands for posters. Much of the time the author spends around his poster, talking with scientists who are interested in the article.

A slightly more prestigious option of participation is a quick lightning talk. If reviewers considered an article worthy of a quick report, the author is given about three minutes to speak to a wide audience. On the one hand, lightning talk is a good opportunity to talk about your idea not only to those who, on their own initiative, became interested in the poster. On the other hand, the initiative visitors of the poster are more prepared, more immersed in your specific topic than the average listener in the hall. Therefore, in a quick report, one must still have time to get people in the know.

Usually, at the end of their lightning talk, the authors call the poster number so that listeners can find it and better understand the article.

The last, most prestigious option is a poster plus a full-fledged presentation of the idea, when you no longer need to rush into the story.

But of course, scientists - including the authors of approved articles - come to the next conference not only to show themselves. First, for obvious reasons, they seek to find posters related to their field. And secondly, it is important for them to fill up the list of contacts for the purpose of joint academic work in the future. This is not hunting - or, at least, its very first stage, followed by at least a mutually beneficial exchange of ideas, best practices and joint work on one or several articles.

At the same time, productive networking at the top conference is difficult because of the total lack of free time. If, after a whole day spent on reports and discussions with posters, the scientist has kept his strength and has already overcome the jet-lag, then he goes to one of the many parties. They are satisfied with the corporation - as a result, the parties are often more hunting down. However, many guests use them not to find a new job, but, again, for networking. In the evening, there are no reports and posters - it is easier to “catch” the specialist you are interested in.

From idea to production

Computer science is one of a few industries where the interests of corporations and start-ups are strongly connected with the academic environment. At NIPS, ICML and other similar conferences come a lot of experts from the industry, not just from universities. For computer science, this is typical, but for most other sciences it is the other way around.

On the other hand, far from all the ideas expressed in the articles immediately go to the creation or improvement of services. Even within a company, a researcher can offer a breakthrough idea by scientific standards to service colleagues and be refused implementation for a number of reasons. One of them has already been mentioned here - this is the difference between the “academic” data set for which the article was written and the real dataset. In addition, the introduction of the idea may be delayed, require a large amount of resources, or improve only one indicator at the cost of worsening other metrics.

The situation is saved by the fact that many developers and few researchers themselves. They attend conferences, speak with academicians in the same language, offer ideas, sometimes participate in the creation of articles (for example, in writing code) or even act as authors. If a developer is immersed in an academic process, monitors what is happening in the research department, in a word, if he demonstrates a counter movement to scientists, then the cycle of turning scientific ideas into new services capabilities is reduced.

We wish all young researchers good luck and major achievements in their work. If this post did not tell you anything new, then you may have already been published at the top conference. Register for the award yourself and nominate scientific leaders.

Source: https://habr.com/ru/post/438170/