Google gives the result for one of the sections of the site - 14 600 000. MySQL database groans, the query executes from 5 to 30 seconds

I made caching with saving cached files to a folder, and one cache for searches, and another one for users.


/cache/ 1.txt - это для юзеров /cache/ 1r.txt - это для поисковиков /cache/ 2.txt /cache/ 2r.txt /cache/ 3.txt /cache/ 3r.txt /cache/ 4.txt /cache/ 4r.txt и т.д. 

Is it possible to do this naturally? Over time, 14,600,000 * 2 = 29,200,000 files will appear in the folder. The whole thing will take 2 - 3 months.

The file takes 17 kb on average. From here we count.

17 * 14 600 000 = 248 200 000 kb / 1024 = 242 383 mb / 1024 = 237 gb * 2 = 474 gb

It turns out that the folder will be 474 GB of files, the total number of which will be 29,200,000 files.

Do not bend the whole thing? What do you advise?

And do not forget - this is just one section of the site, and there are several.

With caching, the pages are loaded instantly.

    1 answer 1

    1. The file cache, as you describe, can be spread across the (binary?) Subfolder tree. 14.6 million is 23 bits. For example, the option to scatter the folders acc. the highest N bits: cache/1/0/0/1/0/1/0/1/ 65535.txt
    2. DB grunting, executing one request, without the other in parallel? Revise the queries and indexes in the tables — it is possible, only with the help of this measure, to make the queries fly. 14.6 million pages is not so much.
    3. Are repeated 1: 1 requests / responses frequent? Query statistics - flat horizontal field or "bell"? Caching makes sense if there are many repetitions.
    4. Maybe it's time to grow: to raise a cluster, or at least one more server with MySQL? Make it a slave, which runs a copy of the database and processes exactly half of all requests. They may have a shared cache, for example. memcached - available for all servers so that repeated requests do not load the database.

    Upd. 5. You can hold robots by setting up robots.txt - for example, let them enter no more than once a week.

    To understand how in this case to optimize the site, you need to see in detail what is happening there. How is the base, what requests, whether the pages are personalized for visitors.

    Inuitably I suspect that long queries are caused by the banal lack of necessary indices in the database. If the database is no longer optimized by indexes and query changes, and it is hard to breathe because of the abundance of data, you can spread the expanded tables by partitions . For example lines with an index from 1 to N are stored on one server, from N + 1 to M - on another.

    More options to think about:

    • Thanks for the detailed answer in the second paragraph - grunting when performing a lot of queries (for example, search bots are worth + site visitors) - gjhgfddjhgjhg