Suppose you need to insert a certain number of records in a transaction into several tables.

So, the question arose, it is better to do the insertion in small portions, but often (for example, with 100 records) or to insert with a large number of records, but less often (for example, 10,000).

I read that it is better that the transaction would end faster, but also read that the transaction is an expensive operation.

Ie 2 statements that contradict each other.

For example, from these tables there is a constant reading of data by some application software. Under what approach, inserts for the end user will cause less discomfort (I suspect that large and infrequent)

  • A large transaction has less overhead. And a small transaction has a shorter waiting time for other users. Therefore, for optimal maintenance time, you need to dynamically change the size of the transaction depending on the load on the database. With a small load on the database, you need to increase the size of the transaction, and with a large load on the database, you need to reduce the size of the transaction. - pepsicoca1
  • If you are based on the OLTP system ( ru.wikipedia.org/wiki/OLTP ) then do short transactions, or transfer bulk actions to non-working hours (if any). Yes, short transactions are noticeably more expensive, for a unit of time you insert much less data, but this is the price of a decrease in response time for other users. - Mike

1 answer 1

A short and accurate answer to this question, unfortunately, is impossible. To understand the pros and cons of different transaction sizes, you need to dive deep into the database engine and consider different features — the type of table (with or without a clustered index), the type of insertion (at the end of the table or random pages), the type of transaction isolation used at the moment.

I will try to explain briefly, to the best of my understanding. Please note that my answer will probably be inaccurate in some points and do not take into account certain features.

So, what happens in a transaction that inserts data:

  1. On the table into which the insertion occurs, a lock is imposed on the schema change (so that no one can change the definition of the table at the moment of insertion).
  2. On the lines that will be written to, a lock is placed on the update — such a lock will not allow you to read such lines in another transaction with certain levels of transaction isolation, for example, READ COMMITTED, until your transaction is completed. With a massive insert, block escalation may occur, and, for example, instead of rows, pages will be blocked or (in some cases) the entire table
  3. The records added to the tables are also written to the transaction log, while, until the transaction is completed, such records cannot be deleted from the log.
  4. When a transaction is completed, the log entry is marked complete, and can be deleted immediately with a SIMPLE recovery model or after creating a backup copy of the log with a FULL or BULK-logged model.
  5. The locks imposed by the transaction are removed.

As a result, in the pros of a short transaction - a short time blocking table entries, your users will most likely not notice delays in the work, and in the minuses - high relative overhead costs for creating and completing a transaction.

In the pros of a long transaction - low relative overhead for creating and completing a transaction, in minuses - an increase in waiting for data access, an increase in the transaction log, which can become a problem with very large transactions.

The increase in data access expectations can be "cured" by choosing a different transaction isolation level - for example, READ UNCOMMITED (but it has its negative consequences in the form of dirty read, for example), or READ COMMITTED SNAPSHOT (increases tempdb load).