📜 ⬆️ ⬇️

How to create a department of Data Science and not screw it up



Data Science began to come not only in large companies, but also in small ones, and even in startups. However, very often top managers have no understanding of what is required for its successful application. Many people think that one data scientist in a month will solve all the problems of the company, and artificial intelligence will click to work perfectly in all departments. Unfortunately, this is not the case. My name is Ivan Serov and in this post I will tell you where to start creating a DS department and what difficulties it may cause.

Management expectations


One of the most important in creating a department is to immediately lay down expectations and KPIs. With DS, as with any other innovation, you need to go through the whole cycle, which will begin with operational losses. At best, the costs of architecture and specialists can be recaptured in six months, and more often in a year or two or three, depending on the size of the company. You must be prepared for this and not give up after a couple of failures. Often, top managers close the department in a year because he did not have time to make a profit. Because of this, trust in DS is lost. Only by setting the necessary expectations and goals (preferably, on a SMART ) can you make a successful department.



Start small


The best thing to do is to start a so-called proof of concept project — it’s not very complicated and short-lived, but it can bring business to the business. For example, increase revenue by 2% at the expense of a recommender system. You should not try to make an ensemble of 5 custom neural networks and work on it all year. For example, even for projects on the classification of texts, you can start with simple algorithms (such as bag of words) and already get a boost. As a result, this pilot project will be a starting point for further development and will give the management an understanding that the money goes to useful things and DS needs to be developed. This will further give time to work on more complex things. In the absence of competences, it makes sense for a pilot project to hire an external team of DS consultants. They can help bring your desires to life with pretty good quality, or understand what projects you can do in your industry, where to start and how you should build a further AI strategy.



Collect data


Everything here is simple and difficult at the same time: ideally, the company should use all the data that it has. For example, if you are an online retailer, you have, at a minimum, data on sales of specific products, customer behavior on the site and marketing newsletters. Already on this you can build many models, for example, the system of personal mailings.

In fact, it is often a big problem to collect all the company's data into one database due to the different sources, the lack of clear interaction between departments, or even the lack of BI specialists in the company. For organizations that have all the data stored in excel, you should first start collecting them into a database (SQL), and only then think about DS.

All available data should be collected in the form in which it will be convenient for analysts and data scientists to take (most often it is SQL). You need to agree in advance with the BI department on the form in which you want to receive data, process and use it in production.
With a small amount of data, you can buy them from third-party companies. For example, in a telecom: link this data by telephone number with yours and thus enrich them. But in each of these cases, it is necessary to calculate whether there is a benefit from this.



Find analysts


It is important that the company already had an analytics department at the time of creating the DS department. These are the guys who will help Scientists find the data, tell what they mean, how to correctly collect the necessary variables and much more. Analytics is the first step in the company's movement towards the very Data Driven decision-making approach (that is, when all decisions in the company are based on the data obtained, and not on the desire of management). They will help to take advantage of the data without the use of models, and reports will help management to make the right decisions. In addition, in the future, analysts will monitor the status of all DS models and prepare reports based on the results.

Pick a team


Many articles have already been written about this item, I will only try to summarize what has already been said. So, a good DS team most often stands out from:


All roles are very variable and may vary depending on your desires. For example, sometimes the team can still have a business analyst, sometimes there can be several data scientists at once, sometimes a data engineer and a developer can be one person. There are a lot of team options and you need to build on your needs. Or try several options and choose the best.

In addition to the standard team, creating a department from scratch requires not only good specialists from the list above, but also an evangelist who will explain to everyone what DS is and what may be its benefits for other departments - the Chief AI Officer / Chief Data Officer / Chief Digital Officer (choose the name yourself). It is important to mention that if you hire one data scientist and throw tasks on him and analyst, and architect, and developer, then you should not expect a quick result, moreover, it can deprive this person of motivation, and the company successful in the future department.

If a company is big and there are many opportunities for the development of Big Data, then Data Architect is also needed, which will set up the architecture, multi-stream data collection and deploy Hadoop or Spark (systems for processing large data arrays), which the company data scientists will already work with .



Do not forget about internal communications and trainings.


After the pilot project it is necessary to actively develop the team. Companies should organize at least two types of training:
For data scientists, these can be workshops on different topics, weekly meetings, hackathons, master classes. Also, you should pay attention to the purchase of online courses for the team (for example, with coursera) and maybe even put it in the KPI. This will help maintain the team at the proper level in a rapidly growing field and improve internal communication.
For project managers and top managers - it can also be workshops in the form of business case analysis or AI strategies of companies, or, for example, basic understanding of machine learning and deep learning technologies (what is possible and what cannot be done, fundamentals technology). This will help management to form expectations from DS.

Also, most likely, even before the creation of a DS department, there are already interested people in the company - it could be developers who have taken some DS courses, or people from business who want to be DS project managers - they should be involved in the department and help them develop . For example, having trained the developer in machine learning techniques, you can get a good and motivated specialist who knows the internal structure of the company and is cheaper than the average market scientist data scientist, who also needs time to figure it all out.



External communication is important


This item is often forgotten, but it is no less important than the others. The market for specialists in machine learning is in a big shortage of personnel (in recent years, everything has started to improve, but still), every good data scientist understands his value and rather chooses the company he wants to work in - therefore offering a large salary is no longer enough need to captivate projects. To do this, you should competently build your external communications - work with media, opinion leaders, community, talk about completed projects, write articles in various thematic publications, speak at conferences, possibly sponsor industrial events like hackathons and so on - this is only a small part of that what to do to attract talent to the company.

That's all, in conclusion, just to say that I did not specifically mention the difficulties in the process of work of the Data Science department, but only told what is needed to create it. If you have something to add - welcome to the comments.

Source: https://habr.com/ru/post/436052/