📜 ⬆️ ⬇️

Other GitHub: Data Science Repositories, Data Visualizations, and Deep Learning Repositories


( c )

GitHab is not just a platform for hosting and joint development of IT-projects, but also a huge knowledge base compiled by hundreds of experts. Fortunately, the service provides not just tools for working with open source, but also high-quality materials for training. We selected some popular repositories and sorted them by the number of stars in descending order.

This compilation will help you figure out which repositories you should pay attention to if you are interested in working with data and in the field of deep learning.

Data science


The Open Source Data Science Masters
Stars: 11,227, Forks: 4,737

The official repository of the Data Science Masters curriculum, developed as an open source alternative to formal education in Data Science. The repository is a collection of educational materials collected over several years.

Awesome data science
Stars: 9 240, forki: 2 761

A powerful collection that answers the questions: “What is Data Science?” And “What do you need to know in order to understand this science well?”. Conveniently categorized. For example, there is a list of books on Data Science, a selection of infographics, and even thematic groups on Facebook .

Jupyter Interactive Notebook
Stars: 5 242, forki: 2 313

The progenitor of this repository is a platform for working with scripts in 40 programming languages Data Science iPython Notebooks , which gained more than 14,000 stars and 4,000 forks. Data processing and machine learning specialists actively used it for scientific computing.

Today, Jupyter Notebook is a handy collection of notepad files consisting of paragraphs in which requests are written and executed. With the help of the built-in visualizers, a notepad with a set of queries turns into a full-fledged dashboard with data.

Data Science Blogs
Stars: 4,510, fork: 1 178

A simple but extensive list of educational materials, sorted alphabetically. Here you will find all popular blogs, as well as many small sites with useful information (a total of 251 resources are listed).

Data Science Specialization
Stars: 3 114, forki: 27 184

The repository of the Johns Hopkins Data Science course is a very popular course prepared by Roger Pen, Jeff Leek and Brian Kaffo. To be more precise, the Coursera data science course at the Coursera includes several interrelated courses on various topics (for example, R Programming) on ​​various aspects of data analysis, and the repository presented in the compilation combines the information used in all courses.

Spark Notebook
Stars: 2 677, Forks: 587

Spark Notebook is an open source notebook that provides an interactive web editor that can combine Scala code, SQL queries, Markup, and JavaScript for collaborative analysis and examination of data.

Learn data science
Stars: 2 129, forki: 1 210

The collection of iPython notebooks, focused on the fundamental concepts of machine learning for beginners.

Data Science at the Command Line
Stars: 2 057, Forks: 503

The repository contains texts, data, scripts, and custom console tools used in the book Data Science at the Command Line . This practical guide demonstrates how to combine small but powerful command line tools to quickly get, clear, explore, and simulate data.

Data Science Specialization Community Site
Stars: 1 395, forki: 2 661

Several students who took the course at Johns Hopkins University created such high-quality content that university staff shared it and also made a catalog for all the interesting content created by the community.

Data Visualization for the Web


D3
Stars: 81 837, Forks: 20 282

D3 is a JavaScript data visualization library for HTML and SVG. In D3, the focus is on web standards, so you can use all the capabilities of modern browsers without tying yourself to the proprietary structure, combining powerful visualization components, a guided approach, and interaction with the Document Object Model (DOM) . This is the most popular data visualization project on GitHub.

Chart.js
Stars: 41 393, Forks: 9 294

Chart.js is an HTML5 library that creates a visualization through the <Canvas> element. Chart.js positions itself as a simple and flexible tool, interactive, supporting six different types of charts.

ECharts
Stars: 32 204, Forks: 9 369

ECharts is a browser library for graphing and visualization. Easy to use, intuitive and easy to configure.

Leaflet
Stars: 23 810, fork: 3 937

A javascript library for creating interactive maps oriented to mobile use. The library code is incredibly small - it is designed for simple, fast and convenient use. Leaflet functions can be extended through a set of plug-ins.

Sigma.js
Stars: 8 348, fork: 1 305

JS-library, focused on drawing graphs. Sigma allows you to develop graph views on web pages and integrate them into web applications.

Vega
Stars: 6,559, forks: 702

Vega is a declarative language for creating, saving and sharing interactive visualization projects. With it, you can describe the appearance and interactive rendering behavior in JSON format, as well as create web views using Canvas or SVG. Vega provides basic building blocks for a wide range of visualization projects: loading and transforming data, scaling, map projections, conventions, graphic labels, etc.

DC.js
Stars: 6,458, Forks: 1,734

DC.js is a multidimensional diagram built on D3.js for working with a cross-filter . DC.js renders in CSS format compatible SVG. Designed for powerful data analysis in the browser and on mobile devices.

Epoch
Stars: 4 949, fork: 290

Universal library of visualization in real time. It focuses on two different aspects: basic charts for creating historical reports and real-time charts for displaying frequently updated time series data.

Deep learning


Keras
Stars: 37 611, Forks: 14 344

Keras is a deep learning library in Python that is used in both TensorFlow and Theano (yes, you can run it on top of the TensorFlow , Theano and CNTK libraries ). Keras is designed for rapid experimentation, as the key to conducting good research is the ability to move from idea to result with the least delay. Due to the thorough and accessible documentation, Keras rightfully takes place in our selection.

Caffe
Stars: 26,892, Forks: 16,276

Caffe (Convolution Architecture For Feature Extraction) is a deep learning library linking Python and MATLAB. In essence, this is a general-purpose library designed for deploying convolutional networks and for image, speech, or multimedia recognition.

There is also a project Caffe2, which includes new features, in particular, recurrent neural networks. In May 2018, the teams Caffe2 and PyTorch merged, the Caffe2 code was transferred to the PyTorch repository (stars: 24,075 , forks: 5,707 ).

MXNet
Stars: 16 157, Forks: 5 824

Lightweight, compact, flexible distributed learning environment for Python, R, Julia, Scala, Go, JavaScript, etc. For greater performance, MXNet allows you to mix imperative and symbolic programming methods. The project also contains guidelines for creating other deep learning systems.

Data Science IPython Notebooks
Stars: 14 747, Forks: 4 410

The collection of iPython notebooks, including big data, Hadoop, scikit-learn, libraries designed for scientific computing, etc. If we talk about deep learning, then TensorFlow, Theano, Caffe, and other tools are covered.

ConvNetJS
Stars: 9 510, forki: 1 982

ConvNetJS is an implementation of neural networks and their common JavaScript modules. The project is currently not supported, but still worthy of attention. Allows you to train the convolutional (or regular) network directly in the browser.

Deeplearning4j
Stars: 10,227, forki: 4,570

Library of deep learning for Java and Scala. Integrates with Hadoop and Spark. Deeplearning4j also allows for computations on GPUs with CUDA support. In addition, there are tools for working with the library in Python. The repository contains all the necessary documentation and tutorials.

LISA Lab Deep Learning Tutorials
Stars: 3 673, forki: 2 045

Textbook of the University of Montreal. The materials presented here introduce some of the most important deep learning algorithms, as well as demonstrate the principle of working with Theano. Theano is a Python library that simplifies the writing of deep learning models and makes it possible to train them on the GPU.

This list is not limited to the number of interesting things on Gitkhab. Next time we’ll talk about machine learning projects and open datasets. If you have your own examples of interesting repositories, share them in the comments.

Source: https://habr.com/ru/post/437940/