Using DBREPLICATION when rolling out databases on Microsoft SQL Server

For corporate accounting systems characterized by a gradual increase in the volume of databases due to the accumulation of historical information. Over time, the size of the database can reach such size that it provokes a number of problems with performance, service, available disk space and so on. Today we will consider two approaches to solving this problem: increasing hardware resources and convolving historical data.

Introduction

This article deals with the problem of convolution of very large databases on the MS SQL Server platform. Describes the solution to this problem using the technology of replication DBREPLICATION from Softpoint.

Problematics

Each type of accounting systems may begin to show their own specific features. For example, in systems on the 1C platform, there are problems with such regulatory operations as updating the configuration, updating the 1C platform. As the database grows, the situation gradually worsens, and sooner or later it is necessary to take measures.

Approach # 1: hardware

The most obvious and technically transparent solution is to increase hardware resources. This can be either the purchase of more efficient servers, disk storage, etc., or the rental of more powerful equipment in a third-party data center or cloud.

If you go this way, then a good option is to host the database in the Microsoft Azure cloud. Azure provides several options for the database deployment architecture: MS SQL on the Azure virtual machine, and three variants of the Azure SQL database in the cloud. Therefore, it is possible to choose the most optimal placement option depending on the characteristics of a particular database and the conditions of its operation.

Azure has a number of advantages compared with buying your own equipment. One of the main ones is the enormous hardware power that Azure can provide. As well as a flexible approach to the use of these capacities depending on the actual load. For example, you can buy additional capacity for the period of the “high season” of your business, or at the time of the close of the reporting period, in order to easily pass the peaks. And the rest of the time, use a more budgetary configuration of resources. Thus, on the one hand, you have access to the huge resource potential of Azure at the right moment (which, by the way, is growing all the time), but on the other hand, you can not overpay for excess capacity when you do not need it.

However, despite its relative simplicity, increasing hardware resources is not a universal solution. Firstly, the positive effect often turns out to be far out of proportion with financial investments (there are many investments - there is little effect). Secondly, the effect is temporary, as the base continues to grow and requires more and more resources, more and more financial investments.

In any case, this approach is quite right to life, and is widely used. But we will not dwell on it anymore, since the main goal of the article is not the “hardware” approach, but the “software” approach outlined below.

Approach # 2: base convolution

A more radical solution is the convolution of the base, that is, the removal from it of non-relevant historical data. In a collapsed database, data remain only for a relatively short operational period, usually it is no more than 1-2 years. Obviously, the degree of reduction in each case is different, and it is difficult to name any specific numbers. And yet, let’s take as a guideline the rate of reducing the base by 50–70%, that is, about 2-3 times, about as much and most often turns out in practice, less, or vice versa more - it happens rarely.

We will not dwell on the resulting gain. Obviously, if the base is reduced by a factor of 2–3 or more, the performance effect will be very powerful and long-term, and additional investments in the hardware component can be avoided. A convolution mechanism, once developed and run-in, can be reused in the future. In general, this is a great effective solution that is guaranteed to give a result.

The complexity of the implementation of the convolutions

But with all its effectiveness, the convolution has one very big problem. And the matter is not in the development of the convolution mechanism itself. Yes, this development is also a difficult task, but it is somehow solved. The thing is different. When a database has a size of several hundred gigabytes or has stepped over a terabyte boundary, it turns out that it is physically quite problematic to perform a convolution operation. Inevitably, a whole complex of interrelated difficulties arises, we consider them.

Resource-consuming convolution operations - the equipment is heavily loaded.
- The convolution does not just physically remove a large amount of data, but also performs many related resource-intensive operations: various selections, checks, groupings, indexing, logging, data movement between tables, and so on. This fact is especially important because the collapsible database, as a rule, is already heavily loaded without it, and it has no excess capacity.
Interference to users.
- It is extremely difficult or even impossible to perform the convolution directly in the working database in parallel with the work of users due to the high additional load and locks created by the convolution process.
There is no technological window.
- There is simply no technological window of sufficient duration when users are not working; After all, the process of folding for bases of this size is usually tens of hours or several days.
High risks when rolling directly to the working base.
- The approach, when the convolution algorithm is applied directly in the working base, is in itself highly risky for a number of reasons. One of them is that the possibilities for final verification of the results of convolutions are very limited (there is no time).
Unacceptable duration of the iterative approach.
- You can try to avoid a bottleneck - the technological window, and perform pruning directly in the productive base in iteratively small portions, selecting the size of each portion so that it fits into the existing technological windows. But this path is also most often inapplicable, because, firstly, the pruning process stretches for an unacceptably long period (many weeks or months), secondly, the complexity of the folding mechanism, the total costs of ensuring an uninterrupted pruning process for such a long period , dramatically increase the risks of the project as a whole.
How to compress the voids in the data file !?
- In the course of removing historical information from the database, its data file does not actually decrease (and often even increases a little), just huge voids arise inside it. And we must somehow get rid of them so that the data file is reduced. Otherwise, the gain is lost in terms of disk space. One option is to perform a typical SHRINK. And on such volumes it is a very long procedure (many hours, or even tens of hours).

Thus, the convolution is a very nontrivial task. It is not enough to develop a folding mechanism, it also needs to be applied. Moreover, it is often the application becomes a bottleneck.

Note: Next, we leave behind the brackets the development of the convolution mechanism itself (in other words, the algorithm for deleting historical data), no matter what means it is created. And we focus only on the application in combat of an already implemented mechanism.

DBREPLICATION Solution

It would seem a dead end. But there are still exits. There is an effective technique that allows you to unravel the whole tangle of difficulties, remove risks and guaranteed to achieve the goal. It involves the use of data exchange technology DBREPLICATION. The solution is suitable for both the traditional infrastructure option and the cloud Microsoft Azure. Briefly, the essence is as follows.

A clone of the base is created, one-way data exchange between the clone and the main base is configured via DBREPLICATION. It is allowed for the main base and / or its clone to be placed (both or one of them) in the Microsoft Azure cloud.
In the clone, users do not work, the convolution process starts there. Since the process of folding does not bother anyone, it can last there round the clock without interruption for as long as it takes. Thus, the binding to the duration of the technological window disappears!
Users without interference work in the main database. And DBREPLICATION technology automatically transfers all changes from the main database to the collapsible one at high speed. Thus, the clone is up to date from the point of view of operational data.
After the completion of the convolution, as a rule, detailed verification of the result is carried out with the involvement of technical specialists and the Customer’s business users. And this process can take a long time (several hours or days). In the considered methodology, there is practically no time limit; therefore, it is possible to allocate as much time for verification as is required.
After the completion of the convolution and verification, all users switch to the trimmed clone, and from that moment it becomes the main base. And the original uncircumcised database serves as an archive of historical data.
- An additional advantage is the ability to switch replication in the opposite direction at the time of transition of users to the collapsed database. This gives additional security, because if “something goes wrong”, you can quickly switch users back to the uncircumcised database without losing the entered data.
If there is a need to keep the archived database up to date, you can switch the replication in the opposite direction, and transfer the changes already from the collapsed database to the archived one.

Fig.1. Schematic diagram of database trimming using DBREPLICATION technology.

Fig.2. Option with the deployment of collapsible database in the cloud MS Azure.

Thus, the described convolution technique in the clone base will embroider all bottlenecks. Eliminates dependency on technology window. In the clone of the convolution does not bother anyone, and no one bothers her. You can safely use the maximum clone resources, and also pay enough attention to the final verification.

It remains to figure out which exchange technology can be used in this scheme? Why DBReplication?

Why DBReplication?

In the described method, the key element is the exchange technology, which provides synchronization of the database being trimmed with the main one. In principle, the exchange technology can be any if it satisfies the three mandatory key conditions:

Compatibility with the platform of the business application, the base of which is minimized.

Example: If we turn off the base 1C, then not every exchange technology is compatible with the structure of the base 1C, in particular the classical MS Transaction Replication is not compatible, as it makes changes to the structure of the tables.

Performance. Exchange technology must be guaranteed to cope with the flow of changes that occurs in the working base during the convolution time.

Explanation: in this article, we primarily mean highly loaded databases with a high intensity of data changes. A convolution will last dozens of hours, maybe several days, perhaps there will not even be one iteration of the convolution. During this time, users will make huge changes. Many exchange technologies simply do not handle. And you need not just to cope, it is desirable to cope with the stock.

Principal applicability to the conditions of the problem.

Explanation: it is possible that this item looks self-evident, but we will nevertheless single out it. Namely: do not forget that our data in the source database and in the clone are not equal to each other! This fact automatically sweeps aside a whole class of powerful and productive technologies based on synchronization of transaction logs - always on, log shipping, mirroring, etc.

In addition, the exchange technology should be effective in terms of other indicators:

Reliability and autonomy of operation. Since we are talking about transferring huge amounts of data, and for quite a long time, the project team simply will not have the physical ability to manually deal with issues of exchange, collisions and failures, data quality checking, etc. Therefore, the exchange technology should be as reliable as possible in terms of the quality of the transmitted data, as autonomous and automated as possible.
Quality user interfaces for control and management.
Easy deployment and configuration. Exchange technology should not impose excessively high requirements on the qualifications of specialists for its configuration.

Without these qualities, the exchange technology threatens to turn from a saving key element of the methodology into its “weak link”, brings serious risks and threatens the entire project. And then the original idea loses its meaning.

However, the DBReplication technology certainly satisfies all the requirements and ensures the success of the convolution project.

Note the main factors due to which DBReplication succeeds in this task:

Very high exchange rate. Note some features that provide speed:
- DBReplication is transactional replication, each change begins to be transmitted immediately after the transaction is committed;
- In the internal architecture of the transport subsystem, such solutions are applied as multi-threaded parallel queue processing, a pipeline approach, streaming compression, transaction batch processing, using Bulk Insert and others.
Transport is implemented on the basis of Windows services, equipped with many functions to ensure uninterrupted operation. These include: automatic recovery of exchange after communication breaks, working on weak unstable communication channels, automatic handling of versionality conflicts (for two-way exchange), automatic adaptation in case of changes in the structure of business application tables, and others.
The mechanism of guaranteed data delivery. Strict adherence to transactional integrity and consistency in the transfer of changes.
Does not change the structure of the business application tables. Therefore, in particular, it can be successfully used for the convolution of 1C bases.
An advanced user interface that allows you to centrally manage the “one-window” exchange system, all service information is collected and displayed online.
Easy deployment and configuration.
Adapted for 1C platform. The user does not work with tables, but with familiar 1C metadata objects.
Any versions of MS SQL, starting from 2005, from Standard to Enterprise, both locally deployed and in MS Azure cloud; from the point of view of Azure, it is possible to host both virtual machines and Azure SQL DB deployment options, including the managed Azure SQL database instance.
DBReplication is a ready-made reliable solution that has been tested by many projects in a variety of conditions and tasks.
If DBREPLICATION is used in one-way exchange mode, the exchange direction can be switched.

Additionally, we note another important feature:

DBREPLICATION can work in a two-way exchange mode, when you can enter / change data simultaneously in all databases participating in the exchange. The number of bases that can be included in the exchange circuit is not limited.

Practical application example

A large Russian company has an application accounting system based on 1C 8 + MS SQL Server. The main operating base has long been stepped over 2 terabytes and continues to grow. At the same time, the load is increasing more and more: both transactional and analytical. In the end, a number of bases were formed to complete the convolution of the base. It was decided to cut the historical data for the beginning of 2017. The technique described here was chosen using DBReplication.

By itself, the convolution algorithm was decided to be implemented mainly by means of TSQL with small inclusions of typical 1C tools. The task was complicated by the fact that not all applied objects (tables) could be minimized by the target date, since the business features required that a number of the largest subsystems contained historical data in full to a depth of 5-7 years. Therefore, from the point of view of the database size, the effect of convolution was not as large as we would like. A preliminary analysis was carried out, showing that, taking into account the objective limitations, about 33% of the initial volume will be cut off. But this was also assessed by the Customer as a good result, because the gain not only in the database volume as such, but also in the speed of individual tables, and the tables of the minimized subsystems decreased in volume by more than 33% - from 46% to 77% (in 2- 3 times).

Enlargely we give some indicators and facts of the actual application of the bundle. The duration of the immediate convolution of the data was about 12 hours. Synchronization of accumulated changes via DBREPLICATION took about 1 hour. One of the key points of the project was the final verification of the folded base, performed by the Customer’s specialists. Especially worth noting its duration - this process took about 1 week. Such a duration was due to the fact that the verification was very deep and comprehensive, with the involvement of specialists of different profiles, including the construction of a certain data model in an external system. All this time, the folded base was automatically synchronized with the current combat database through DBREPLICATION. Verification was successful. And users have been switched to a collapsed base. The old database was transferred to archive status, with read-only access. There was no need for its subsequent synchronization, so replication was turned off.

Results of the project:

The applied method has fully justified itself, thanks to it there was enough time to perform the convolution itself, as well as comprehensive verification of the data, which radically minimized the risks of missing any errors and switching to a collapsed database.
Convolution completed successfully:
- The total volume of operational database decreased by 33%. It was not possible to achieve greater effect in terms of database size due to objective reasons, due to limitations on the convolution of some large subsystems.
- The volume of actively used tables of minimized subsystems decreased by 46-77% (2-3 times).

Conclusion

DBReplication is a ready-made reliable solution that has been on the market for many years, tested by many projects in a variety of conditions. In this convolution technique, DBReplication completely takes over one of the key subtasks - database synchronization. In the case of very large databases, very strict requirements are imposed on the exchange system, and DBReplication completely satisfies them. Solves its task reliably and efficiently, thereby ensuring the success of the project as a whole.

For what other tasks can you use DBREPLICATION

The complex of features and competitive advantages allows you to use DBReplication for solving a variety of tasks. In reference order will list them.

Database backup and resiliency.
Load balancing: its redistribution between 2 or more synchronous instances of the database.
Controlled transition to new software versions (MS SQL, 1C), the technique is similar to the convolution technique.
Building a distributed information system with high-speed exchange based on DBReplication. In particular, the replacement of the existing exchange technology company with a more productive and efficient one - DBREPLICATION.