Saving large amounts of data to the database using the entity framework

Question

The program parses through the VK API of users (an average of 20-50 threads) and creates records in the database. Each user has an average of 20 groups, 500 photos, 200 friends. I keep records of friends, groups, users and photos in different tables. Total, in order to save the 1st user, I have about 721 requests to insert into the database. Per minute it is 200-300 users - ~ 216 000 requests for insertion into the database. Because of this, the call to context.SaveChanges() takes about 6-10 minutes to execute.

Tried to use context pool - bulk insert , average time is 4-6 minutes.

AutoDetectChangesEnabled = false; or context.AddRange() give roughly the same results.

The only quick solution to which I came is to binary serialize user data and store it in byte[] so that the user has 4 insertion requests. This reduced the time to call context.SaveChanges() to 1.2 seconds. But at the same time a natural problem appeared - in order to change at least something for the user, it is necessary to deserialize and serialize all its data.

Tell me, what are the approaches for storing large amounts of data that are applicable in this case without using serialization?

Is the program you have used available anywhere? - iskander1220 2:42 pm

Qwertiy ♦ 76.2k 17 73 200 · Accepted Answer · 2016-12-07T18:33:06

You do not need to call context.SaveChanges () for every sneeze.
Insert 1000 entries for 1 context.SaveChanges () faster than for each entry call context.SaveChanges ().
If you have some kind of complicated logic, then it is better to take this logic into the stored procedure.
If the database and the application are on different machines, then it is probably worth checking the network speed.
A huge number of indexes can slow down in the package.
Abandon all EF prii inserts and steer this business through BulkCopy.
For example, you fill everything in bulk into some table on the server, then the JOB queries the table every 5 minutes and adds information about the work tables.
Another profiler would not hurt to see the requests, which forms the EF.
Maybe they are not optimal and worth poshamanit with LINQ.

I would advise you to measure the filling of the database through BulkCopy (the fastest way of multiple inserts) and then you will get the value to which you should stream and above which you will not jump using EF.

A typo in the "strive" (from the stream) is not even according to Freud, but the professional deformation in its purest form: the hand does not rise to correct.

Saving large amounts of data to the database using the entity framework

1 answer 1

More articles: