📜 ⬆️ ⬇️

Using DBREPLICATION when rolling out databases on Microsoft SQL Server

For corporate accounting systems characterized by a gradual increase in the volume of databases due to the accumulation of historical information. Over time, the size of the database can reach such size that it provokes a number of problems with performance, service, available disk space and so on. Today we will consider two approaches to solving this problem: increasing hardware resources and convolving historical data.



Introduction


This article deals with the problem of convolution of very large databases on the MS SQL Server platform. Describes the solution to this problem using the technology of replication DBREPLICATION from Softpoint.


Problematics


Each type of accounting systems may begin to show their own specific features. For example, in systems on the 1C platform, there are problems with such regulatory operations as updating the configuration, updating the 1C platform. As the database grows, the situation gradually worsens, and sooner or later it is necessary to take measures.


Approach # 1: hardware


The most obvious and technically transparent solution is to increase hardware resources. This can be either the purchase of more efficient servers, disk storage, etc., or the rental of more powerful equipment in a third-party data center or cloud.


If you go this way, then a good option is to host the database in the Microsoft Azure cloud. Azure provides several options for the database deployment architecture: MS SQL on the Azure virtual machine, and three variants of the Azure SQL database in the cloud. Therefore, it is possible to choose the most optimal placement option depending on the characteristics of a particular database and the conditions of its operation.


Azure has a number of advantages compared with buying your own equipment. One of the main ones is the enormous hardware power that Azure can provide. As well as a flexible approach to the use of these capacities depending on the actual load. For example, you can buy additional capacity for the period of the “high season” of your business, or at the time of the close of the reporting period, in order to easily pass the peaks. And the rest of the time, use a more budgetary configuration of resources. Thus, on the one hand, you have access to the huge resource potential of Azure at the right moment (which, by the way, is growing all the time), but on the other hand, you can not overpay for excess capacity when you do not need it.


However, despite its relative simplicity, increasing hardware resources is not a universal solution. Firstly, the positive effect often turns out to be far out of proportion with financial investments (there are many investments - there is little effect). Secondly, the effect is temporary, as the base continues to grow and requires more and more resources, more and more financial investments.


In any case, this approach is quite right to life, and is widely used. But we will not dwell on it anymore, since the main goal of the article is not the “hardware” approach, but the “software” approach outlined below.


Approach # 2: base convolution


A more radical solution is the convolution of the base, that is, the removal from it of non-relevant historical data. In a collapsed database, data remain only for a relatively short operational period, usually it is no more than 1-2 years. Obviously, the degree of reduction in each case is different, and it is difficult to name any specific numbers. And yet, let’s take as a guideline the rate of reducing the base by 50–70%, that is, about 2-3 times, about as much and most often turns out in practice, less, or vice versa more - it happens rarely.


We will not dwell on the resulting gain. Obviously, if the base is reduced by a factor of 2–3 or more, the performance effect will be very powerful and long-term, and additional investments in the hardware component can be avoided. A convolution mechanism, once developed and run-in, can be reused in the future. In general, this is a great effective solution that is guaranteed to give a result.


The complexity of the implementation of the convolutions


But with all its effectiveness, the convolution has one very big problem. And the matter is not in the development of the convolution mechanism itself. Yes, this development is also a difficult task, but it is somehow solved. The thing is different. When a database has a size of several hundred gigabytes or has stepped over a terabyte boundary, it turns out that it is physically quite problematic to perform a convolution operation. Inevitably, a whole complex of interrelated difficulties arises, we consider them.



Thus, the convolution is a very nontrivial task. It is not enough to develop a folding mechanism, it also needs to be applied. Moreover, it is often the application becomes a bottleneck.


Note: Next, we leave behind the brackets the development of the convolution mechanism itself (in other words, the algorithm for deleting historical data), no matter what means it is created. And we focus only on the application in combat of an already implemented mechanism.


DBREPLICATION Solution


It would seem a dead end. But there are still exits. There is an effective technique that allows you to unravel the whole tangle of difficulties, remove risks and guaranteed to achieve the goal. It involves the use of data exchange technology DBREPLICATION. The solution is suitable for both the traditional infrastructure option and the cloud Microsoft Azure. Briefly, the essence is as follows.



Fig.1. Schematic diagram of database trimming using DBREPLICATION technology.




Fig.2. Option with the deployment of collapsible database in the cloud MS Azure.




Thus, the described convolution technique in the clone base will embroider all bottlenecks. Eliminates dependency on technology window. In the clone of the convolution does not bother anyone, and no one bothers her. You can safely use the maximum clone resources, and also pay enough attention to the final verification.


It remains to figure out which exchange technology can be used in this scheme? Why DBReplication?


Why DBReplication?


In the described method, the key element is the exchange technology, which provides synchronization of the database being trimmed with the main one. In principle, the exchange technology can be any if it satisfies the three mandatory key conditions:



Example: If we turn off the base 1C, then not every exchange technology is compatible with the structure of the base 1C, in particular the classical MS Transaction Replication is not compatible, as it makes changes to the structure of the tables.



Explanation: in this article, we primarily mean highly loaded databases with a high intensity of data changes. A convolution will last dozens of hours, maybe several days, perhaps there will not even be one iteration of the convolution. During this time, users will make huge changes. Many exchange technologies simply do not handle. And you need not just to cope, it is desirable to cope with the stock.



Explanation: it is possible that this item looks self-evident, but we will nevertheless single out it. Namely: do not forget that our data in the source database and in the clone are not equal to each other! This fact automatically sweeps aside a whole class of powerful and productive technologies based on synchronization of transaction logs - always on, log shipping, mirroring, etc.


In addition, the exchange technology should be effective in terms of other indicators:



Without these qualities, the exchange technology threatens to turn from a saving key element of the methodology into its “weak link”, brings serious risks and threatens the entire project. And then the original idea loses its meaning.


However, the DBReplication technology certainly satisfies all the requirements and ensures the success of the convolution project.


Note the main factors due to which DBReplication succeeds in this task:



Additionally, we note another important feature:



Practical application example


A large Russian company has an application accounting system based on 1C 8 + MS SQL Server. The main operating base has long been stepped over 2 terabytes and continues to grow. At the same time, the load is increasing more and more: both transactional and analytical. In the end, a number of bases were formed to complete the convolution of the base. It was decided to cut the historical data for the beginning of 2017. The technique described here was chosen using DBReplication.


By itself, the convolution algorithm was decided to be implemented mainly by means of TSQL with small inclusions of typical 1C tools. The task was complicated by the fact that not all applied objects (tables) could be minimized by the target date, since the business features required that a number of the largest subsystems contained historical data in full to a depth of 5-7 years. Therefore, from the point of view of the database size, the effect of convolution was not as large as we would like. A preliminary analysis was carried out, showing that, taking into account the objective limitations, about 33% of the initial volume will be cut off. But this was also assessed by the Customer as a good result, because the gain not only in the database volume as such, but also in the speed of individual tables, and the tables of the minimized subsystems decreased in volume by more than 33% - from 46% to 77% (in 2- 3 times).


Enlargely we give some indicators and facts of the actual application of the bundle. The duration of the immediate convolution of the data was about 12 hours. Synchronization of accumulated changes via DBREPLICATION took about 1 hour. One of the key points of the project was the final verification of the folded base, performed by the Customer’s specialists. Especially worth noting its duration - this process took about 1 week. Such a duration was due to the fact that the verification was very deep and comprehensive, with the involvement of specialists of different profiles, including the construction of a certain data model in an external system. All this time, the folded base was automatically synchronized with the current combat database through DBREPLICATION. Verification was successful. And users have been switched to a collapsed base. The old database was transferred to archive status, with read-only access. There was no need for its subsequent synchronization, so replication was turned off.


Results of the project:


Conclusion


DBReplication is a ready-made reliable solution that has been on the market for many years, tested by many projects in a variety of conditions. In this convolution technique, DBReplication completely takes over one of the key subtasks - database synchronization. In the case of very large databases, very strict requirements are imposed on the exchange system, and DBReplication completely satisfies them. Solves its task reliably and efficiently, thereby ensuring the success of the project as a whole.


For what other tasks can you use DBREPLICATION


The complex of features and competitive advantages allows you to use DBReplication for solving a variety of tasks. In reference order will list them.


Source: https://habr.com/ru/post/438902/