Hello. In general, the problem is this: The project on JAVA is NoSQL DB - MongoDB, you need to work from about 1000000000 records. Inserting, in an empty table, 5000000 records, takes 13-15 minutes, but after inserting 5000000, the further insertion process begins to slow down and almost exponentially and RAM begins to devour Nemer. The priority task of this database is search (the search speed per 10,000,000 is satisfactory)

Question:

  1. Why it happens?
  2. How to fix it?

Possible solutions:

  1. Every 5000000 records - create a new table?

  2. Index optimization? (I have a search by id)

  3. Optimization of the MongoDB config?

  4. System optimization?

  5. Replacing the database?

Thanks in advance for your reply!

  • If I'm not mistaken, all mongodb indexes are kept in RAM. If id is uuid, it takes 16 bytes to write, plus some overhead for the struct. In general, I suspect that this whole business ceases to be stuck in memory and either mongodb generally throws out this index, or the system swap starts working, which is very slow and affects other applications as well. - Qwertiy

1 answer 1

I had a similar case when I worked with SQLite . I had to switch to MySQL .

But for a billion records in my opinion, you will have to switch to BigTable already. For example, on Hadoop , HBase or Cassandra .

About Cassandra, to be honest, not sure, because Facebook itself switched from Cassandra to HBase, although they themselves developed it.

  • Thank. Probably you are right that I will have to change at the moment, I already have sides and cassandra for example: Thus, HBase will probably have to be twisted well, and Hadoop is probably also. It's just that I have only one node at my disposal and it is unlikely that the customer will fork out for a few more. By the way, do you know anything about Aerospike? just in case my coordinates are avkcorporation@gmail.com Thank you. continuation of the next Komenty - Alex
  • MongoDB-3.4 WiredTiger Cassandra - datastax-3.9.0 At the expense of inserts: Monga - single inserts 1 mln = 2min 12s, up to 5 mln speed is about the same, after it begins to decrease steeply 5 mln = 12-15min - "batching" 1 mln = 33 seconds, s each subsequent speed also decreases. in general, today was the result of 1 million = 2 minutes for 5 million, waited 10 minutes and cut down the default config - Alex
  • Kasandra - single inserts 1 mln = 27min insertion speed constant test all night)) - batching - 1 mln = 48s. but after the 2nd-3rd million. A WriteTimeoutException crashes and so far has not really figured out what happens next. Basically, having picked a little cacandra config, I expanded the batching to 4-5 million, but then again I get WriteTimeoutException Also, conducting research using JProfiler, I found out that the insertion speed really drops after every million and the percent load increases, GC starts NOT to work. - Alex
  • Alas, I have not studied other databases other than SQL like in-depth. - Vanguard
  • Although launching GC in manual mode does not help either - Alex