
Recently, FunCorp joined the beautiful - machine learning. Our backend engineer taught search engines to read memes. On this occasion, we decided to collect the ML-mitap in order to share our work, and at the same time learn from more experienced specialists from other companies, where machine learning is already an important component of the business. We decided to collect - collected. We will hold on February 9th. Program under the cat.
Program
“Discover launch experience for 90 million users: five recommendations for ML developers”, Andrey Zakonov, vk.com
About the report
- Not only the model is important: we formulate the tasks correctly and choose the metrics.
- Different ways to optimize your solutions under load.
- We correctly evaluate experiments: we study graphs and work with feedback.
“Production in ML”, Mark Andreev, Conundrum.ai
About the report
The report will discuss:
- about the types of predictions: realtime, offline, realtime + offline
- on how to get to the container from the prototype in Jupyter Notebook
- about scaling decisions and quality control.
“How to teach search engines to read memes”, Grigory Kuzovnikov, FunCorp
About the report
iFunny is an application with funny pictures and videos. The only textual content that is available is user comments, but in order to attract traffic from search engines, it is not enough, so it was decided to extract text from images and place it on the pages. A service was created specifically for this:
- finds the area containing the "main joke" in the picture
- extracts text from this area
- checks the quality of the recognized text.
The service is written in Python using tensorflow. Nobody in the team had any experience in developing ML services, so we went through all the steps:
- Setting the job.
- The first experiments, when we tried to do something that somehow works, experimenting with the architecture of neural networks.
- Drawing up a training set.
- Training and selection of model coefficients.
- Creating a service using our trained model. Wrapping it in a docker container.
- Depla and binding service to our php-monolith. Idle start.
- The first results of the work and comments from rentals.
- Using recognition results in combat.
- Analysis of the results.
- We are here now. We still have to redo and retrain models to increase the number of correctly recognized memes.
“Machine learning in Yandex.Taxi”, Roman Khalkechev, Yandex.Taxi
About the report
The report will discuss the Yandex.Taxi device.
There will be a detailed story:
- about the tasks that we solve using data analysis and machine learning technologies
- about our pipeline for developing, testing and launching machine learning models in production
- Let's go through all the stages: from experiments in Jupyter Notebook to full-fledged ML-production.
“Getting rid of the curse of Sklearn: writing XGBoost from scratch,” Artyom Khapkin, Mail.ru Group
About the report
The story about boosting. What you need to know to write it yourself. What are the pitfalls, how can you improve its work.
At present, it is difficult to imagine a place where ensemble boosting algorithms over decisive trees are not used. These are search engines, recommendation ranking algorithms, Kaggle competitions and many more.
There are many ready-made implementations of the algorithm: Catboost, Lightgbm, Xgboost, and more. However, there are cases when using ready-made solutions out of the box is not very good - the understanding of the operation of the algorithm is lost, and for certain tasks such implementations are not very suitable, etc.
In this report, we will analyze the principles of the algorithm, and moving from simple to complex, we implement our own Xgboosting algorithm, which can then be adjusted for any machine learning tasks - classification, regression, ranking, etc.
More information in
TelegramYou can register in
Timepad . Limited number of seats.
For those who can not come or do not have time to register, there will be a broadcast on our
channel .