I understand there is a database file for example test.db

In test.db there are pages (pages - as I understand it, this is a set of bits, a partition for quick access) by 8kb (say). If the base file is huge, how the base quickly switches to the desired page (to the desired memory area).

For example, there is a text file with 1 million lines. How do I quickly switch to the desired line? for example now I'm on the 10th line, op already at 200 thousandth, etc. without a cycle? That is, how is this functionality organized to quickly switch to the desired page in the database? (or a line in the file? if this is possible)

I hope you understand what I mean!

  • If the pages are 8kb, for example, you can use the page * 8kb formula to quickly jump to the desired page - Vladimir Gamalyan
  • @VladimirGamalian Well, I do not understand how to move the cursor in the file, if you load the bytes into RAM, then yes, but how are they moving around the file? If the file weighs 10 GB, then the RAM may not be enough .... how to move in the file ... - bsbak
  • @ user3737786 oh, well, it's just some kind of fseek function (depending on your language / platform). And as a result (simplified) the head of the hard disk quickly moves to the desired sector. - Vladimir Gamalyan
  • @VladimirGamalian as far as I know the function of the search gets to the desired line in a loop or am I wrong? - bsbak
  • one
    @ user3737786 most likely, seek access to the operating system, that to the driver of the disk, and he knows how quickly to move the head to the desired piece by displacement (exaggerated, but the essence is approximately like this) - Vladimir Gamalyan

2 answers 2

I can generally describe how the database stores in PostgreSQL files.

Postgres stores tables and indexes in separate files. Maximum file size = 1GB (also called a segment), if a table / index goes beyond these limits, then it is split up into several 1GB files. Segments are placed on the page size of 8kB (this figure, as the limit of 1GB may change at the compilation stage). When working with data, these pages are unloaded into memory and, if necessary, repeated readings are taken from there.

Directly strings ( tuples in tuples terminology) are stored in these pages as follows:

At the beginning of each page there is a title. Of interesting to us at the moment there is stored a pointer to the beginning of free space in the page and a pointer to its end. After the title line identifiers are lines stored in the page. Each identifier is a pointer to the beginning of the line and its size. Postgres gives each line in the database a unique identifier (CTID), it just consists of the page number + the line identifier in it. This allows you, with CTID, to quickly and easily find the line on the disk.

The data itself fills the page from the end. This allows you to reduce the fragmentation of pages and more efficiently pack lines into them.

Here is an approximate graphical representation:

enter image description here

    If it is very compressed and omitting numerous details, then according to MySQL you move with the help of address arithmetic, i.e. you can always very quickly figure out in which memory area this or that entry is located. This is achieved as follows.

    All data of fixed length like INT , FLOAT are placed directly in the record structure. Plus, there is a dynamic section in the record, which is 65536 bytes in length in which data of non-fixed length are placed like VARCHAR , and NULL values ​​are stored here. If you create several VARCHAR columns that completely select this dynamic part of 65536 bytes, MySQL will not allow you to create new columns (Try creating UTF-8 VARCHAR of 65536 length for the sake of interest - either 3 or 4 bytes are allocated for UTF8-symbol in MySQL depending on the encoding). Heavy artillery like BLOB , TEXT fields are generally kept separate. Therefore, you can always calculate the address of your record, just map the data into memory, you know where the beginning of the section of memory, and the address of any line you can calculate if you know the size of a fixed section. It also stores the address of the dynamic section and BLOB/TEXT values.

    There is an excellent book "MySQL. Performance Optimization. Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy D. Zoodnay, Derek J. Balling, Arjen Lenz". Translated into Russian and is an excellent introduction to the architecture of MySQL.

    • one
      You would write for tables of what format it was written. Because the written looks like an uncompressed storage format MyISAM. And absolutely not suitable for compressed and even more so innoDB - Mike
    • one
      @Mike In general, MySQL consists of a kernel and connected table engines - this is not typical for databases, so I described the basic types that are in the kernel - they are implemented there. In InnoDB is a tree - I don’t want to describe it, InnoDB is complicated and I don’t think that the author, if he decided to write a simple DBMS, should be guided by it. I would not. - cheops