There is raw data of the history of using objects in the form of a list of sesst-triples (id объекта, время начала, время окончания)
[(1, "2012-09-20 00:00:00+04", "2012-09-20 05:00:00+04"), (1, "2012-09-20 07:30:00+04", "2012-09-20 09:25:00+04"), (2, "2012-09-20 07:00:00+04", "2012-09-20 09:15:00+04")]
That is, in this example, object 1 was used twice, from 00:00 to 05:00 and from 07:30 to 09:25, and object 2 - from 07:00 to 09:15. The list is now sorted by object ID and then by increasing time, but you can submit data in any form - in general, this is a table in the SQL RDBMS.
Records as a rule should not, but, theoretically, can intersect - that is, it may happen that it will be [(1, "…", "… 12:00:37"), (1, "… 11:59:42", "…")]
, and in this case we can assume that the use in the area of 12 hours was not interrupted.
I want to receive from this list a time series, conditionally, of the following form:
[("2012-09-20 00:00:00+04", {1}), ("2012-09-20 05:00:00+04", {}), ("2012-09-20 07:00:00+04", {2}), ("2012-09-20 07:30:00+04", {1,2}), ("2012-09-20 09:15:00+04", {1}), ("2012-09-20 09:25:00+04", {})]
Those. from 00:00 (the minimum date-time in the history) object 1 was used, then at 05:00 nothing, then from 07:00 - object 2, then both objects, etc., until 09:25, for which data are over.
Please tell me a good algorithm and data structures for quickly performing such a conversion. Volumes - up to 10,000 objects, for periods of varying duration (day, week, month, 3 months, 6 months, year, no longer interesting), up to 100,000 entries per day.
CPU time pity, memory - any number, within physically reasonable limits.
The obvious “greedy” algorithm with a run over the time interval is no good - the algorithm should not depend directly on the duration of the interval in question, only on the number of records in the history.