How in SSIS to output data to the output buffer in several portions?

Question

There is a legacy project made on the SQL Server Integration Services platform. It is engaged in loading data from a third-party source into the database analytic layer. The protocol is organized in such a way that in case of unsuccessful download from the source, all the same data is offered for download the next time. Due to an error with the connection string, the data loading did not work for a year - and now there is not enough memory to hold all the output data in the memory of this memory itself.

Now the question is: can the data prepared inside the Script Component be sent out in a few tricks without accumulating buffer gigabytes in memory? Now the component code, if simplified, looks like this:

public override void CreateNewOutputRows() { var datasource = Enumerable.Repeat(new { a = 5, b = 42 }, 2000000); // Для примера, на самом деле эти данные приходят по WCF foreach (var obj in datasource) { SomeNamedOutputBuffer.AddRow(); // Вот тут на одной из итераций заканчивается память SomeNamedOutputBuffer.A = obj.a; SomeNamedOutputBuffer.B = obj.b; } SomeNamedOutputBuffer.SetEndOfRowset(); }

Accepted Answer · 2016-05-23T05:54:47

The buffer in SSIS is processed automatically , and if there is a shortage of memory, it is written to disk or transferred to the recipient itself, therefore the code in question is working.

The message about the lack of memory for buffers does not mean that the script has flown because of it, but the problem is elsewhere.

For example, my problem was in timeout due to the closed port.

Answer 2 · 2016-05-22T18:59:09

Script Source - first reads all the data, and then for a long time will throw them into the output buffer. Clarification - as Microsoft writes about Source Component, rows are passed to the next Data Flow component when the buffer is full, and this is not related to the SetEndOfRowset command. The main question for you is whether WCF will allow to re-read the data if the processing has fallen for the first time. If yes - the following options are possible:

Suppose that SSIS gigabytes still chew. It has its own mechanism for swapping buffers to disk, due to a significant drop in performance. I would estimate the amount of data, by experience, the SSIS buffer is the amount of data * 1.2-1.5, and the excess of swap over available memory. If the excess is more than 10 times - it is better not to use swapping. Important - there should be a place for swapping on the disk, by default this is the user's temp directory under which the package is running.
Pull the data in the script source parts. Maybe your WCF will allow it, or dump the data and save it in some form to disk, then write a new Script and read it in chunks.

If WCF allows you to read data only once, then modification 2 of the approaches above is to save the data to disk and then read.

If the protocol allowed to request data in parts - I would not ask here.
Is this swap mechanism somehow enabled or enabled by default?
Swaping is always available, and it works when there is not enough RAM for buffers.
The phrase "swapping is better not to apply" should be understood in such a way that in this case, it is quite likely that the package is completed with the error "could not allocate memory for the buffer".
If such a situation is likely, you can play around with the buffer size, reducing it and increasing the chances of memory allocation when performance decreases.

How in SSIS to output data to the output buffer in several portions?

2 answers 2

More articles: