Hello, could you tell me how you can implement working with the same Splay tree on different streams?

The tree itself is here ( code 200 lines of the classic tree, because I added a link to pastbin )

The problem is that I do not understand:

1) what flush does here (in the synch method).

comment in code:

/* * This function will be called as a "flush", with an intention * to remove inter-thread inconsistencies. */ 

2) I cannot imagine how to simultaneously insert and search for elements in the same Splay tree in parallel in different streams, it also always pulls the nodes into the root. Moreover, it is important that the parallel Splay tree works faster than the classic ...

UPD:

I clarified the limitations of the task. The project has main.cpp :

 #include "splayset.hpp" #include "stopwatch.hpp" #include "rc4prng.hpp" #include <iostream> #include <cstdint> #include <thread> const size_t concurrency = 4, item_count = 200000, group_size = 100, group_accesses = 2000, group_count = 1000; const char rand_seed[] = "test ukolu"; void test_group (rc4prng<>& rng, splayset<uint64_t>&st) { uint64_t begin = rng.random(), mult = rng.random(); for (size_t i = 0; i < group_accesses; ++i) st.find (begin + mult * rng.random (group_size)); } int main() { splayset<uint64_t> st; rc4prng<> thread_rng[concurrency]; for (size_t tid = 0; tid < concurrency; ++tid) thread_rng[tid].load_key (rand_seed, rand_seed + 10 - tid); // :] std::thread thread[concurrency]; bpp::Stopwatch stopwatch (true); for (size_t tid = 0; tid < concurrency; ++tid) thread[tid] = std::thread ([tid, &thread_rng, &st]() { for (size_t i = 0; i < item_count; ++i) st.insert (thread_rng[tid].random()); }); for (size_t tid = 0; tid < concurrency; ++tid) thread[tid].join(); stopwatch.stop(); std::cout << stopwatch.getMiliseconds() << std::endl; stopwatch.start(); for (size_t tid = 0; tid < concurrency; ++tid) thread[tid] = std::thread ([tid, &thread_rng, &st]() { for (size_t grp = 0; grp < group_count; ++grp) test_group (thread_rng[tid], st); }); for (size_t tid = 0; tid < concurrency; ++tid) thread[tid].join(); stopwatch.stop(); std::cout << stopwatch.getMiliseconds() << std::endl; return 0; } 

I can only change the code of the tree itself (not main.cpp ), and therefore I do not have the opportunity to make a worker who will deal with the dispatch of threads externally.

I liked the @avp advice using scheduler , but I don’t know how to implement it, with the possibility of modifying only my splayset.hpp .

If you take into account my addition, is it possible to somehow solve this task at least partially effectively?

I very much hope for advice, for I am at a dead end.

Thank you in advance.

    1 answer 1

    It seems that this particular code is not suitable for parallel work.

    I would think about the modification with deferred splay() , which will be carried out by a separate stream, launched when creating the tree (or at another time? The question is open).

    For find() it's pretty obvious. We really can, by making a read - only lock, look in parallel from several threads without interfering with each other. Data on inconsistencies found by each stream in balancing, for which splay() is now called, can be placed in a queue to a stream that would carry out balancing deferred, i.e. when there are no search requests.

    If you look at insert() in the same vein, you can see that if the node is not found, the operation is always performed with the current sheet, after which splay() is splay() (i.e. it can also be deferred and executed in the same stream that splay does when searching.)

    Running erase() (or rather, parts of it after find() ) can be done in the same way.

    Thus, an architecture emerges with a separate stream of tree modification, which takes requests from the queue and executes them by setting an exclusive lock on the entire tree. Obviously, insert and erase requests that started with read-only locking, placing their data in the queue, should remove this lock and wait for the operation to complete (probably, along with their request, they should send a condition variable, the signal on which (I say in in terms of pthreads) they will wait).

    You may need a separate algorithm for optimizing the queue of requests to the "splay-stream" (this is a separate issue that will have to be carefully considered).

    Another open question here is the dispatching algorithm, which should balance the load between the read-only streams and the "splay-stream". Those. if there is enough data in the queue and a lot of threads (insert / erase operations) are waiting, then you should probably suspend the flow of new read-only operations to the tree.

    But is it possible to increase productivity in the implementation of such a plan?
    To be honest, not sure. Perhaps, for a large tree with a predominance of requests for a successful search, the gain will be.

    Update

    The morning is wiser than the evening...
    In fact, we can not bother with the creation of a real "splay-stream." This work can be performed by any existing stream before exiting search-update functions.

    Those. by calling splay() you can either add your data to the delayed action queue, or, having found that the time has come to perform all these actions, execute them within this stream. Naturally, the thread should provide all the necessary locks and wait for the completion of work with the tree by other threads.

    Perhaps a limited approach to concurrency is more appropriate for practice. Modification operations are always performed with exclusive blocking, and the search implements deferred splay .

    Additionally, the same queue of deferred actions (in reality, it can be an array of a dozen elements) can also be used as a cache by first conducting a sequential search in it.

    • I added a supplement, for I learned about some of the restrictions on the task. I hope on the basis of the supplement, you can advise me the best way out - Demolver
    • @Demolver, suggested update. Do you have a practical task or research? The fact is that on modern architectures sequential arrays are much faster than structures scattered in memory. For example, both ArrayList in Java and lists in Python are implemented by vectors. - avp
    • Practical, and the goal is to teach the splay tree to work when calling it from different threads (the tree is common to all threads). I was thinking about mutex , but blocking the whole tree for splay ? Plus, if I understand correctly, I need to make a queue of links to nodes for splay ... but if splay for the same node occurs several times but in a different order, then you need to somehow determine the order of the queue and edit it ... - Demolver
    • @Demolver, of course, the order (especially for modification operations) must be considered. However, since we are talking about practice ... look at the last 2 paragraphs of the update. If you make a cache for find, then there will be the same links once. Again, about the practice ... The gain for multi-threaded implementations can be only if the streams share the data minimally. Let's say if this happens once every millisecond (well, maybe a little bit more often), and all the rest of the time they are threshing each of their data. Otherwise, the overhead of synchronization (and volatile with memory) will eat everything. - avp