Good afternoon, the next dilemma has arisen. There are some data that is stored in the database, based on this data, some calculations are made later for presentation to the user. However, the same data, only in a slightly different form, will be needed on other pages, just for output and for plotting. Sobstvennnoe how to do better, to receive data from the user on their basis to do the calculation and put it all into the database (in the future just to take ready data from the database) or upon request (when you go to a specific page) to take from the database the initial data to do all the calculations and output to web view ?? Thank.

  • one
    Read articles on OLAP. Even if not exactly what is needed, in any case, it will show and justify the approach to data processing and storing the results of this processing. - Sergey
  • Both paradigms are actively used. In the SQL world, a more accepted calculation is when requesting ( pull-on-demand ), the second approach is more complicated (it is required to constantly monitor the prevention of data integrity leakage), but allows you to calculate data only once, and then pull up a ready-made response ( push-on -change ), which, with the right architecture, makes it possible not to think about optimization at all. - etki pm
  • @Etki is not sure what exactly captured the essence of your answer. I am very embarrassed by the first option I described, it turns out that I will keep both original data and those calculated on their basis in this database (i.e., in fact, duplicates can be said). How correct is this approach? - snowhead

1 answer 1

As far as I understand, you doubt the “legality” of each of the two methods: record only the “raw” data and process them as needed, and record the data immediately in the variations that may be needed. Looking ahead, you can immediately say that both are completely legal and are used in many, many applications, and you can choose any of them.

The easiest solution is to “write raw data and get what you need at the time of the request”; the pull-on-demand name is often used for this model. When using SQL, this model is used almost by default - it is customary to store normalized, non-duplicated data using join constructs in order to assemble the resulting picture. This approach is a living classic, and, in this case, I would recommend sticking to it until you are sure that you can implement the second one in the project.

Pros:

  • The data is not duplicated, and, if necessary, additional work with them, it is known that they are in one place
  • The infrastructure is easily maintained: all write-search-modify-delete methods are exactly the same.
  • Data update occurs synchronously in all places.

Minuses:

  • The final data is calculated every time.
  • If you need to build complex data, you can just hang the server

The second model, called push-on-change , is much more difficult to implement, but it has a number of undeniable advantages. This model implies that the data is stored in a form prepared for queries, and the queries themselves are reduced to a simple SELECT ... FROM xxx_prepared_for_yyy_query WHERE ... LIMIT offset, size .

Pros:

  • Single data calculation
  • Significant ease of work for the database and virtually guaranteed sampling within 50 ms even on very large amounts of data (on good repositories - up to 10 ms)
  • Automatic rejection of joins, which allows the use of horizontally scalable storage

All of the above advantages are required by the automaton in highload applications, therefore, as a rule, this approach is used there.

Minuses:

  • Much higher development costs (the programmer must manually create an architecture for each entity that will update the associated tables with the prepared data)
  • Much higher probability of a bug
  • It’s much harder to add new data upload formats
  • Data can be spread over several tables.

I repeat that both approaches are not used the first day, and both are absolutely legal. In most cases, you need to find a compromise (will you not create a table for each search sample, as well as count the number of comments for each news on the fly?), So usually they do not apply in their pure form, but still 90% must be observed or a different approach so that development does not turn into hell. From myself I can recommend for a while to stay with the first approach - calculating data on request - until there is no time to experiment with the second.