
(
c )
GitHab is not just a platform for hosting and joint development of IT-projects, but also a huge knowledge base compiled by hundreds of experts. Fortunately, the service provides not just tools for working with open source, but also high-quality materials for training. We selected some popular repositories and sorted them by the number of stars in descending order.
This compilation will help you figure out which repositories you should pay attention to if you are interested in working with data and in the field of deep learning.
Data science
The Open Source Data Science MastersStars: 11,227, Forks: 4,737
The official repository of the
Data Science Masters curriculum, developed as an open source alternative to formal education in Data Science. The repository is a collection of educational materials collected over several years.
Awesome data scienceStars: 9 240, forki: 2 761
A powerful collection that answers the questions: “What is Data Science?” And “What do you need to know in order to understand this science well?”. Conveniently categorized. For example, there is a
list of books on Data Science, a
selection of infographics, and even thematic
groups on Facebook .
Jupyter Interactive NotebookStars: 5 242, forki: 2 313
The progenitor of this repository is a platform for working with scripts in 40 programming languages
Data Science iPython Notebooks , which gained more than 14,000 stars and 4,000 forks. Data processing and machine learning specialists actively used it for scientific computing.
Today, Jupyter Notebook is a handy collection of notepad files consisting of paragraphs in which requests are written and executed. With the help of the built-in visualizers, a notepad with a set of queries turns into a full-fledged dashboard with data.
Data Science BlogsStars: 4,510, fork: 1 178
A simple but extensive list of educational materials, sorted alphabetically. Here you will find all popular blogs, as well as many small sites with useful information (a total of 251 resources are listed).
Data Science SpecializationStars: 3 114, forki: 27 184
The repository of the Johns Hopkins Data Science course is a very popular course prepared by Roger Pen, Jeff Leek and Brian Kaffo. To be more precise, the Coursera data science course at the Coursera includes several interrelated courses on various topics (for example, R Programming) on various aspects of data analysis, and the repository presented in the compilation combines the information used in all courses.
Spark NotebookStars: 2 677, Forks: 587
Spark Notebook is an open source notebook that provides an interactive web editor that can combine Scala code, SQL queries, Markup, and JavaScript for collaborative analysis and examination of data.
Learn data scienceStars: 2 129, forki: 1 210
The collection of iPython notebooks, focused on the fundamental concepts of machine learning for beginners.
Data Science at the Command LineStars: 2 057, Forks: 503
The repository contains texts, data, scripts, and custom console tools used in the book
Data Science at the Command Line . This practical guide demonstrates how to combine small but powerful command line tools to quickly get, clear, explore, and simulate data.
Data Science Specialization Community SiteStars: 1 395, forki: 2 661
Several students who took the course at Johns Hopkins University created such high-quality content that university staff shared it and also made a catalog for all the interesting content created by the community.
Data Visualization for the Web
D3Stars: 81 837, Forks: 20 282
D3 is a JavaScript data visualization library for HTML and SVG. In D3, the focus is on web standards, so you can use all the capabilities of modern browsers without tying yourself to the proprietary structure, combining powerful visualization components, a guided approach, and interaction with the
Document Object Model (DOM) . This is the most popular data visualization project on GitHub.
Chart.jsStars: 41 393, Forks: 9 294
Chart.js is an HTML5 library that creates a visualization through the <Canvas> element. Chart.js positions itself as a simple and flexible tool, interactive, supporting six different types of charts.
EChartsStars: 32 204, Forks: 9 369
ECharts is a browser library for graphing and visualization. Easy to use, intuitive and easy to configure.
LeafletStars: 23 810, fork: 3 937
A javascript library for creating interactive maps oriented to mobile use. The library code is incredibly small - it is designed for simple, fast and convenient use. Leaflet functions can be extended through a set of plug-ins.
Sigma.jsStars: 8 348, fork: 1 305
JS-library, focused on drawing graphs. Sigma allows you to develop graph views on web pages and integrate them into web applications.
VegaStars: 6,559, forks: 702
Vega is a declarative language for creating, saving and sharing interactive visualization projects. With it, you can describe the appearance and interactive rendering behavior in JSON format, as well as create web views using Canvas or SVG. Vega provides basic building blocks for a wide range of visualization projects: loading and transforming data, scaling, map projections, conventions, graphic labels, etc.
DC.jsStars: 6,458, Forks: 1,734
DC.js is a multidimensional diagram built on D3.js for working with a
cross-filter . DC.js renders in CSS format compatible SVG. Designed for powerful data analysis in the browser and on mobile devices.
EpochStars: 4 949, fork: 290
Universal library of visualization in real time. It focuses on two different aspects: basic charts for creating historical reports and real-time charts for displaying frequently updated time series data.
Deep learning
KerasStars: 37 611, Forks: 14 344
Keras is a deep learning library in Python that is used in both TensorFlow and Theano (yes, you can run it on top of the
TensorFlow ,
Theano and
CNTK libraries ). Keras is designed for rapid experimentation, as the key to conducting good research is the ability to move from idea to result with the least delay. Due to the thorough and accessible documentation, Keras rightfully takes place in our selection.
CaffeStars: 26,892, Forks: 16,276
Caffe (Convolution Architecture For Feature Extraction) is a deep learning library linking Python and MATLAB. In essence, this is a general-purpose library designed for deploying convolutional networks and for image, speech, or multimedia recognition.
There is also a project Caffe2, which includes new features, in particular, recurrent neural networks. In May 2018, the teams Caffe2 and PyTorch merged, the Caffe2 code was transferred to
the PyTorch repository (stars:
24,075 , forks:
5,707 ).
MXNetStars: 16 157, Forks: 5 824
Lightweight, compact, flexible distributed learning environment for Python, R, Julia, Scala, Go, JavaScript, etc. For greater performance, MXNet allows you to mix imperative and symbolic programming methods. The project also contains guidelines for creating other deep learning systems.
Data Science IPython NotebooksStars: 14 747, Forks: 4 410
The collection of iPython notebooks, including big data, Hadoop, scikit-learn, libraries designed for scientific computing, etc. If we talk about deep learning, then TensorFlow, Theano, Caffe, and other tools are covered.
ConvNetJSStars: 9 510, forki: 1 982
ConvNetJS is an implementation of neural networks and their common JavaScript modules. The project is currently not supported, but still worthy of attention. Allows you to train the convolutional (or regular) network directly in the browser.
Deeplearning4jStars: 10,227, forki: 4,570
Library of deep learning for Java and Scala. Integrates with Hadoop and Spark. Deeplearning4j also allows for computations on GPUs with CUDA support. In addition, there are tools for working with the library in Python. The repository contains all the necessary documentation and tutorials.
LISA Lab Deep Learning TutorialsStars: 3 673, forki: 2 045
Textbook of the University of Montreal. The materials presented here introduce some of the most important deep learning algorithms, as well as demonstrate the principle of working with Theano. Theano is a Python library that simplifies the writing of deep learning models and makes it possible to train them on the GPU.
This list is not limited to the number of interesting things on Gitkhab. Next time we’ll talk about machine learning projects and open datasets. If you have your own examples of interesting repositories, share them in the comments.