How to speed up reading data from the database or reduce the number of requests?

Question

Django / REST-Framework / PostgreSQL project.

There are models connected mainly one-to-many, and an API that makes queries to the database.

Now there are about 700-1500 queries to the database when loading some pages, the database is growing and will be worse. Which way to look to solve problems?

Edit existing code / use raw queries? As I understand, this is a temporary solution and we will again rest on the ceiling.

Key-Value Stores - Redis / memcached and TP? But here, as I also assume, there is a ceiling with the growth of data and we must change the structure of the project.

In search of a solution, I saw another way - block reading from a database. Explain about it. Can we use it when building an API? For example, to give data gradually, up to one record and transfer it to the consumer? Can we combine the use of block reading and key-value storage?

What other comments are there on these methods and what do I disregard?

Added a piece of code to understand. There are two related models. The first one corresponds to 120 entries in the table, the second one is more than 10,000 (approximately 100 Property objects per Element)

class Element(models.Model): name = models.CharField(verbose_name="Name", max_length=255, blank=True, null=True) class Meta: verbose_name = "Element" verbose_name_plural = "Elements" class Property(models.Model): element = ForeignKey(Element, related_name='properties') title = models.CharField(max_length=255) type = models.CharField(max_length=50, choices=TYPE) value = models.CharField(max_length=255, blank=True, null=True) class Meta: verbose_name = "Property" verbose_name_plural = "Properties" ordering = ('title',)

From the API, I get the following structure - I need to fetch from certain rows of the property table

 [ { "name": "Lithium", "properties": { "group": "1", "atomic_number": "3", "symbol": "Li", "period": "2", "atomic_weight": "6.941", "type": "alkali" } }, { "name": "Beryllium", "properties": { "group": "2", "atomic_number": "4", "symbol": "Be", "period": "2", "atomic_weight": "9.012182", "type": "alkaline" } } ...

In one of the methods I use queries of the form (obj is an object of the Element model)

 def get_properties(self, obj): properties = ['Symbol', 'Group', 'Period', 'Type', 'Atomic weight', 'Atomic number'] func = lambda property: obj.properties.get(title=property).value if obj.properties.filter(title=property).exists() else None data = map(func, properties) return dict(zip(properties, data))

Here the check in this function doubles the number of requests to the database, we remove the check - the number of requests drops twice, I don’t understand how to get around

  func = lambda property: obj.properties.get(title=property).value if obj.properties.filter(title=property).exists() else None

You need to rewrite the application, k / v storage here is just a dead poultice.
@etki added a piece of code that confuses me and I don’t understand how to rewrite it
When using PostgreSQL, I never use model abstraction - why is that?
After all, a database in the form of a set of tables, relationships, and integrity rules is already a model.
And SQL is a fairly high level query language for interacting with the model.
The model should be used when data is collected FIG knows where it comes from - from files of some or old oak versions of mysql.
@EugeneBartosh I use the jango framework, where models are used to create and manage tables
I use Zend and Symphony, there are also models there, but I just put a bolt on them in 9 out of 10 cases, PgSQL does not fail

Nikita Konin Nikita Konin 348 2 eleven · Accepted Answer · 2017-03-06T16:08:56

I think the problem here is more likely with incorrect query building and architecture than serious problems with database performance.

700-1500 requests for one page is prohibitive. Sure the number of requests can be significantly reduced.

For example, if you take the method you gave in the example:

 def get_properties(self, obj): properties = ['Symbol', 'Group', 'Period', 'Type', 'Atomic weight', 'Atomic number'] func = lambda property: obj.properties.get(title=property).value if obj.properties.filter(title=property).exists() else None data = map(func, properties) return dict(zip(properties, data))

In 10 minutes I managed to rewrite it so that it uses only one call to the database:

 def get_properties(self, obj): titles = ['Symbol', 'Group', 'Period', 'Type', 'Atomic weight', 'Atomic number'] props = {p.title: p.value for p in obj.properties.filter(title__in=properties).values('title', 'value')} data = {title: None for title in titles} data.update(props) return data

I am sure that such places in your code are not so few and significant performance improvements can be achieved simply by rewriting them so that they do not generate unnecessary requests, do not count once again what has already been calculated, etc.

I have already optimized, but your method is much more elegant, though I don’t understand why to use the intermediate data dictionary.
The method can consist of a single line, not counting the list of titles: props = {p.title.lower (). Replace ('', '_'): p.value for p in obj.properties.filter (title__in = titles)}

Eugene Bartosh Eugene Bartosh 1,867 6 27 · Answer 2 · 2017-02-26T00:12:06

In modern Cloud-systems, and in good old mainframe-systems, there are restrictions on the number of requests from the user program. If the user program exceeds the limits - it is otbrykuyvaetsya without talking. Typical limitations are 150 SQL queries + 100 DML queries. There are also restrictions on the number of requests in triggers, but this is a separate topic.

When loading a page, the number of requests can not be more than a reasonable number, usually it is <10. If you have 1500, it doesn’t fall into any criteria at all and it’s time to put questions to the developers who are so diligently making these requests :-)

Well, judge for yourself - after all, you can select all people aged from 20 to 30 years with one request, but it is possible to run after each of them with a separate request ... why separate if you can all at once?

I have never seen a limit on the number of requests in "modern cloud-systems", only the pay-per-request / pay-for-network model
Do not believe it, just now with compose.io and landing google cloud sql
Added code that spawns requests, how to get around I don’t understand
@etki such restrictions are typical for business systems with payment by subscription - for example, salesforce.com, SAP (in cloud and non-cloud variants)

How to speed up reading data from the database or reduce the number of requests?

2 answers 2

More articles: