There is a base SQLite, with tables - Messages and Users . Contains approximately 100,000 users.

It is necessary to add / update messages, and users who have made reposts. For every 100 messages, there are ~ 6000 repost users. Now adding 1 message (in which ~ 300 reposts), if all the repost users are not in the database, takes ~ 14 seconds (is this normal?).

I do this:

 using (AppDbContext db = new AppDbContext()) { // Получаем всех пользователей, чтобы сравнивать с репостнувшими пользователями usersFromDb = db.Users.ToList(); /***** Т.к. пользователей в базе много, получение всех пользователей не оптимально. Но лучше ли будет делать много запросов (для каждого пользователя), вместо этого одного? *****/ } using (AppDbContext db = new AppDbContext()) { foreach (var message in messages) { if (message.RepostUsers != null) { // Проходим по списку репостнувших пользователей /***** Эта часть занимает много времени Можно ли оптимизировать? *****/ foreach (var user in message.RepostUsers) { // Ищем по пользователям из базы var dbUser = usersFromDb.Find(u => u.Id == user.Id); if (dbUser == null) { // Если не нашли, помечаем новым db.Entry(user).State = EntityState.Added; } else if (dbUser.Name != user.Name) { // Если нашли и есть изменения, помечаем измененным db.Entry(user).State = EntityState.Modified; } else { // Если нашли и изменений нет, помечаем не измененным db.Entry(user).State = EntityState.Unchanged; } } } // Получаем Message из базы var dbMessage = db.Messages.AsNoTracking().FirstOrDefault(p => p.Id == message.Id); // Message всегда либо новое, либо измененное. Помечаем db.Entry(message).State = dbMessage != null ? EntityState.Modified : EntityState.Added; } db.Messages.AddRange(messages); db.SaveChanges(); } 

Question 1.
Because There are a lot of users in the database, getting all users is not optimal. But would it be better to make a lot of requests (separately for each repost user) instead of one?

Question 2.
When changing users' states through db.Entry(user).State , are objects placed in the cache to track changes? Those. AsNoTracking not counted? If so, should this be changed and how?

  • Try usersFromDb to make the type HashSet. It will be many times faster. - RusArt
  • Those. HashSet<User> usersFromDb; ? Then usersFromDb = db.Users.ToList(); does not work usersFromDb = db.Users.ToList(); - tretetex
  • usersFromDb = new HashSet(db.Users.ToList()) - RusArt
  • Not stop. You need a dictionary - RusArt

3 answers 3

I answer only the optimization question. Every time you go through usersFromDb in search of Id. For such purposes, you must use the dictionary. Here is the code:

 Dictionary<int,User> usersFromDb; ... usersFromDb = db.Users.ToDictionary(x=>x.Id, x=>x); ... // Ищем по пользователям из базы if (usersFromDb.ContainsKey(userId)) { var dbUser = usersFromDb[userId]; ... 

The dictionary allows you to search for a constant time O (1). While the search for the sheet occurs in linear time O (n).

  • Understood thanks. However, the main slowness is precisely when working with the database - inserting 300 records in 14 seconds, it seems too slow. - tretetex
  • @Skrim You can see how much the db.SaveChanges(); method executes db.SaveChanges(); ? In a modern studio, you can watch while debugging. - RusArt
  • SaveChanges itself does not take long. The change in the db.Entry(user).State state db.Entry(user).State . I performed an insert of 1 message with 881 reposts, of which 848 new users. Results (in milliseconds): all foreach - 16824, db.SaveChanges(); - 583 - tretetex
  • @RuslanArtamonov to pump out 100,000 users every time? - Vladislav Khapin
  • one
    @Skrim How long such a query will be executed: var r = new HashSet(message.RepostUsers); db.Users.Where(x=>r.Contains(x)).ToList(); var r = new HashSet(message.RepostUsers); db.Users.Where(x=>r.Contains(x)).ToList(); - RusArt

Consider the PostgreSQL option, because SQLite is a relational database. I do not go into details. On the one hand, it is simple, but the more records in the database, the heavier the requests. Double the number of records in your database and see if you should work with it further.

  • Thanks for the suggestion, but I still think SQLite is more suitable because in total there will be approximately 200,000 entries and will not continue to grow, only a partial update of existing ones. - tretetex
  • So PostgreSQL is also a relational database. - RusArt
  • This is a DBMS, and Lite does not fall under this definition. When several users start working with the database, then all the flaws come out. - Alexander Puzanov
  • @AlexanderPuzanov Got it. Use (dog) name, but did not see what you told me. - RusArt

So, what happened to do:

  • Instead of getting all users (1-2 seconds), only certain users are sampled (~ 0.01 seconds)
  • The speed of recording / updating 100 messages (approximately 5000-7000 repost users) is 5-7 seconds.

There were several problems:

  1. The AddRange method AddRange not work if there were already added records in the database, since AddRange sets the records state Added
  2. To check the users (whether to add or change) every time all the records from the database were pulled, with which the comparison took place. It is less expensive to get users by criteria using where . But this is not so clear - if you add 10 messages, it is faster to make a selection. If there are 100 messages each, there are many users and the selection is longer than getting all the records through ToList() .
  3. A lot of time was spent on changing the status of db.Entry(user).State = , since this led to an expensive, in terms of execution, call to the internal algorithm DetectChanges

Corrections:

  1. Remove AddRange
  2. We make a sample of users from the database, instead of getting all
  3. Disable the db.Configuration.AutoDetectChangesEnabled = false; mechanism db.Configuration.AutoDetectChangesEnabled = false;

Result:

 using (AppDbContext db = new AppDbContext()) { // Получаем всех пользователей, чтобы сравнивать с репостнувшими пользователями usersFromDb = db.Users.Where(x => allUsers.Keys.Contains(x.Id)).ToList(); } using (AppDbContext db = new AppDbContext()) { db.Configuration.AutoDetectChangesEnabled = false; foreach (var message in messages) { if (message.RepostUsers != null) { // Проходим по списку репостнувших пользователей foreach (var user in message.RepostUsers) { User dbUser = null; // Ищем по пользователям из базы if (usersFromDb.ContainsKey(user.Id)) dbUser = usersFromDb[user.Id]; if (dbUser == null) { // Если не нашли, помечаем новым db.Entry(user).State = EntityState.Added; } else if (dbUser.Name != user.Name) { // Если нашли и есть изменения, помечаем измененным db.Entry(user).State = EntityState.Modified; } else { // Если нашли и изменений нет, помечаем не измененным db.Entry(user).State = EntityState.Unchanged; } } } // Получаем Message из базы var dbMessage = db.Messages.AsNoTracking().FirstOrDefault(p => p.Id == message.Id); // Message всегда либо новое, либо измененное. Помечаем db.Entry(message).State = dbMessage != null ? EntityState.Modified : EntityState.Added; } db.Configuration.AutoDetectChangesEnabled = true; db.SaveChanges(); } 

Special thanks to @Ruslan Artamonov for hints and answers.