Hello.

There is a web project, a flurry of requests is expected soon. There is also a wild desire to keep a log of all calls, and even more, for further analysis. We do not want to use third-party tools.

The data, you see, is not very cumbersome: a pair of string fields, a date / time, an identifier, and a plural field. Therefore, we decided to use MongoDB for storage in conjunction with django. There is absolutely nothing complicated, but the question still worries how well this representative of No-SQL solutions will cope with the task?

I would like to know, maybe, who has already carried out the analysis under a barrage of requests? Maybe where are the strengths and weaknesses of MongoDB? How expedient, in the light of further data processing, is to use this particular solution instead of, for example, PostgreSQL? In relational databases, one significant drawback is the inability to create multiple fields, which is fraught with one-to-many and many-to-many connections, and this can adversely affect performance, which in turn is very critical.

UPD

The essence of the task is to collect all the movements of the user on our pages simultaneously sending these statistics to our server. This is implemented on jQuery. From 3 to 7 requests will be sent from each page, the number of users per day is predicted to be from 1,000,000. This most likely means that the database will be a bottleneck.

  • @AlexWindHope, here merci :) did not notice. - Dex
  • You yourself, in fact, partially answered your question ... From t. you will not get a larger profit (if you do not take into account join, etc. If you take them into account, you will receive a rather tangible one), since so insertions will get a very serious increase if you don’t use the SQL PK AI scheme. Besides, the alter table is a horror flying on the wings of the night: D Accordingly, with any change you don’t need to suffer over the database structure, you just do what you need PS: about the flurry of requests - everything, as before, has sieves caching and competent indices - Zowie
  • @AlexWindHope, caching is not necessary, because the task is to write everything, absolutely any request. Reading from the database will, say, once a month to build a report. So, yet for a quick and trouble-free insertion, MongoDB will be good? - Dex
  • If you are waiting for a cross on my belly - it will not be :) The task is not well described, in general, in my opinion, mongodb is convenient, primarily because it deprives many problems of data aggregation and severity problems 2d data storage model well and, as a result , in their update and insertion +, IMHO, it is much more convenient to work with the concept of the document in the code, since it deprives some hemorrhoids (according to the author of the commentary - more and more terrible) when working with SQL subd - Zowie
  • @AlexWindHope, added. - Dex

3 answers 3

Using MongoDB can give a big boost when you fully use the concept of the document (get rid of joines and data aggregation to the maximum). Answer yourself the question - what exactly do you want to get?

If we talk about inserting data - MongoDB will be faster, given the amount of information, this is also quite important. If, besides, the concept of the document is convenient for your task, then certainly MongoDB would be an excellent choice.

There is another factor, namely Mongo extensibility, against this, nothing sensible, SQL DBMS cannot offer.

Also, in some cases, it is convenient that you can write JavaScript code that will be executed on the Mongo side. (I know that in SQL, kag-ba, too, is possible, but, I think, no one will argue with the fact that the code looks terrible and with the fact that it's scary)

Generally, if you haven’t worked with mongodb before, (or worked, but at the Hello World level) this is a trite, very interesting experience that, in my opinion, if you can, you shouldn’t refuse.

I did not describe the merits of SQL, because I suppose you are already familiar with them.

PS: I tried to assess the situation as adequately as possible, generally mongodb, I personally like it very much, therefore my choice would be obvious

In general, mongodb is a niche; it’s interesting that simply changing the SQL database to noSQL completely changes the approach to development, thinking, etc. etc.

Warning: MongoDB is not only a database, but also a drug ... Take at your own risk: D

  • one
    Thank. I have drug resistance) - Dex
  • mongodb, and true - nyasha, but do not forget that the nyasha can suddenly turn into a treacherous creature and plunge the knife in the back at the most inopportune moment ... it's me about the holes (vulnerabilities), which, I think, not very they'll fix it soon ... - AseN

MongoDB is a fairly popular (and surely gaining popularity) database management system in terms of NOSQL solutions. MongoDB is many times faster than SQL DBMS (MySQL, MSSQL, for example). In highly visited projects, this database management system is often used. There is no need to go far for examples: Forbes, Disney, Yahoo ...

You write:

 ...Поэтому для хранения решили использовать MongoDB в связке с django... 

A clever approach to creating a high-performance site, but do not forget the standard actions, namely:

 Front-END Server: NGINX Memcached Server( можно также попробовать производить кеширование стандартными средствами NGINX ) Не стоит забывать и о том, что кешировать лучше не все подряд, а лишь некоторые( динамичные ) области страницы. 

But! MongoDB has a number of flaws that I would like to talk about.

The first minus (inconvenience) is that each NoSQL DBMS supports a unique (specific) query language, which every time will have to be learned anew. The same applies to MongoDB.

NoSQL DBMS (and in particular, MongoDB) are not very resistant to all sorts of attacks. Многие говорят про NoSQL: "Нет SQL - нет SQL Injection" . This is a true statement. In spite of the fact that one type of vulnerabilities is closed, so many others open:

 Инъекции в регулярных выражениях. JSON инъекции JavaScript инъекции 

If you are interested in finding out how to use these vulnerabilities, then I can demonstrate.


Having gained access through one of the potential "holes", the villain can do extremely undesirable actions:

 Манипулировать REST-интерфейсом и подделывать межсайтовые запросы CSRF Использовать регулярки в запросах Выполнять скрипты на сервере( JS-Скрипты, например ) И это только часть... 

PS

Surely, the same information about protection against NoSQL injections will soon appear on the network, as it is now against simple SQL Injection attacks. So decide ....

  • As for the Front-End and Memcached (which you wrote with an error), I know and see enough, not to mention that 1. This part is not changed 2. I wrote above that nothing will be cached There will be a separate module with its own logic and its database for this task. What about cons, thank you. - Dex
  • one
    @Asen all that heresy that you write is valid only for arch-Krivorukov developers. And all this is connected with the banal problem - the lack of normal processing of incoming data. Moreover, JavaScript injection is possible only when the developer is simply an idiot. I am waiting for an example where I could see any vulnerabilities not related to the developer’s curse. PS: although, as I understand it, your knowledge here is about the same as in the Node.JS question, I think you know what I mean. PPS: I hope this info you took is not from a Hindu article on Habré - Zowie
  • - Any information about the performance must be confirmed by reference. Specifically, in this case, the assertion also turns out to be a lie, because in some scenarios [ MySQL-like solutions are faster.] [1] - The assertion about the query language is also debatable, since in industrial solutions usually nothing is implemented on raw queries, and The logic of working with the database is done on top of the ORM framework. Since the same MongoDB already well supported by most ORM'ов , the argument becomes weak. [1]: labs.laulima.com/mongodb-vs-mysql-performance-benchmarks-cms - Costantino Rupert
  • @AlexWindHope, yes no, not from the "Hindu" article. Just read about it enough, it worked, directly, from the DBMS itself, etc ... - AseN
  • @Asen, scripts other than python will not be on the server. The simplest requests and only write for now, the rest will be hidden from the user. CSRF perfectly filtered by django. All incoming data in GET is filtered again by the framework, and I also add a couple of my checks. Offtop @Asen, treat criticism easier, especially if you recognize the knowledge of these people. - Dex

MongoDb is not as simple as it may seem after the first study. There are many subtleties and surprises there, such as, for example, the possible loss of data ... Perhaps my advice is outdated (I see it was written in 2012), but before using it in production, you need to familiarize yourself with all the settings so that later there were surprises. Speed ​​simply does not appear out of nowhere. You always have to sacrifice something.