There is a task, you need to create a weighted graph in the form of an adjacency matrix. We need an int-array, which will simultaneously mean both the connection and its "price". Well, the whole joke is that (in real life) not all elements will be connected with each other, which means some memory will be idle => it will be smarter to create an array of pointers, and those elements that are not used = NULL, t . memory will not occupy ... But again, I occupy a lot of memory for addresses. What is better? In general, do pointers have "weight"?

  • 3
    > In general, do pointers have "weight"? Of course it has: 4 bytes on the x86 architect and 8 bytes on the x64 architect. On the topic: write down for yourself what operations will be carried out with this graph, "approximately" describe the algorithms that you will use for this. After that, you can "approximately" weigh what speed you need for any particular algorithm. If the number of algorithms for which the speed will be more important will be more than for the others, you can turn a blind eye to some costs. No - it means you have to find a compromise, for example: pack / unpack a graph on demand, or in portions - mega
  • those. save memory fail? except packing / unpacking ... - Djonny
  • 2
    > i.e. save memory fail? except for packing / unpacking ... I just note the relationship between "memory saving" and "work speed". Because traversal of an array of indirect references in the general case is a rather slow operation, since requires reading more pages of memory. - And in terms of memory capacity - you can safely calculate how much more efficient it will be to store pointers: there will be no benefit to x86 without additional packaging algorithms. On x64 there will be a loss 2 times . - mega
  • @Djonny, and how many elements (vertices) do you have? And how many (on average) connections at one vertex? - avp
  • 2
    Why micro-optimization? Have you already reached the limit? Make the code as little as possible dependent on the exact memory allocation strategy, implement as it seems more convenient to you, when you come up against the need for improvements, then you will improve. - VladD

4 answers 4

Knut in the first volume has an example of how such matrices can be stored, quite economically and with the ability to quickly index.

A structure is used to store one item.

 struct Node { int row; // это и следующее поле можно и исключить int col; // ниже будет описана замена. Node * next; Node * bottom; Data data; // а это одно (или несколько полей), где собственно хранятся данные. } 

Therefore, it turns out that for each element you need two additional pointers (8-16 bytes) and, possibly, another 4-8-16 bytes to speed up the work (and possibly debugging).

Indices are also not required to be stored in int . It may well be that short may be sufficient.

Data is stored as follows. Each node stores a link to the next node in the row and the next node in the column. If there is no node in the given line / column anymore to the right / below, then the link to the topmost / left one is stored, that is, we loop.

If the data is 4 bytes, and the system is 32 bit, then already when filling less than a third, there will already be a gain.

How to work with such a matrix

We get one vector which will store pointers to lists of lines. To find an element of an index, you just need to take the necessary line in the vector, and then just go through the elements until you find the right one. If you need to calculate the amount per line - it is also not difficult - you just need to get a line and sum up the elements. Zero is not, but they do not change the picture.

But if the dimensions of the matrix are large (for example, 100,000 per 100,000), then it makes sense to simply make a list of all existing rows and columns.

Summing it all up, we get this

 struct Node { int row; int col; Node * next; Node * bottom; Data data; // а это одно (или несколько полей), где собственно хранятся данные. } std::list<Node*> rows; std::list<Node*> cols; // дальше алгоритмы словесные. Node * get(int i, int j) { // Пройтись циклом по rows до нужной строки. // Если такой строки нет, значит элемент нулевой // иначе проходим по строке (используя указатель next), ищем нужный элемент. } // Добавление элемента. void add(Node * n) { // Находим нужную строку. Если строки в списке rows нет, нужно вставить. // добавляем этот элемент в эту строку. // если строка есть, то проходим циклом, что бы найти место вставки, // поправляем два указателя // аналогичное проделываем с столбцами. } // удаление аналогично. 

Maybe I modified the idea of ​​Knut a little, so I strongly recommend that you take his book and search (I think it’s easy to find titles, the first volume).

Also in the book it is recommended to create your own pool of objects, if it is assumed that the data will be modified frequently.

Once again, this algorithm will be very effective if the number of elements is significantly (hundreds of times) less than the total capacity of the matrix.

  • IMHO for the specified bottom algorithms in Node is not needed. If there are few empty lines, you can add a hash to search the line. Or (if most of the lines are filled) generally use a vector of lines If some elements have abnormally many links, then you can also hash the list of links (i.e., string) for them. - Again, the question to the author: what data do you have (how many elements, average number of connections of one element)? - avp
  • I have led the general structure. And already on the task, you can trim-cut. - KoVadim
  • It is right. In general, I correctly understood that bottom is a list of (ring) all nodes with the same col ? - avp
  • one
    Yes, right. If you need to run along the columns, they are also needed. And in some cases, maybe double-linked lists are needed. But these are special tasks. Knut, as I recall, does not have lists of rows and columns; he solves the problem of indexation there in another way. But I do not remember immediately. - KoVadim
  • 2
    next is a pointer to the next (right) element in the row , and bottom is the next (below) element in the column . - KoVadim

Two standard representations of links in a graph are 1) a matrix, where the element (i, j) indicates whether there is a connection between the i-th and j-th vertices and 2) an array of lists, where the i-th element of the array contains a list of connected vertices. If there is no connection (i, j) in the column, then the i-th list will not contain the element j.

Both approaches have pros and cons. Plus a second approach in saving memory.

  • no question about that - Djonny
  • one
    > some memory is idle => it will be smarter to create an array of pointers, and those elements that are not used will be = NULL, i.e. memory will not occupy ... But again, I occupy a lot of memory for addresses. What is better? An answer was given to the question “what’s better” when “some amount of memory is idle”. - fogbit
  • If you read more carefully, you can understand that I meant 1 of 2 options. - Djonny
  • one
    @Djonny, if you definitely want a choice between a matrix of numbers and a matrix of pointers to numbers, then the answer is obvious: The matrix of numbers. Moreover, if all the weights fit into 0..255 (or -128..127), then we can take the matrix of bytes. An even smaller range of weights can be packaged into an even more compact matrix of several bits per element. (I hope you know how to work with bats?) - avp
  • And if you express yourself more precisely, you can get answers to precisely those questions that were meant. - fogbit

This is called highly sparse matrices, considered, for example, by the Whip. There is a long book Pisanetski strictly on this topic.

    Just create a pointer to a pointer;) **