📜 ⬆️ ⬇️

pudge embedded database in 500 lines on golang

pudge is an embeddable key / value database written in the standard Go library.

image

I will dwell on the fundamental differences from the existing solutions.

Stateless

pudge.Set("../test/test", "Hello", "World") 

Pooj will automatically create a test database, including subdirectories, or open it. There is no need to store the state of the table and it is safe to store values ​​in multi-threaded applications. Pooj is thread safe.

Typefree

Bytes, strings, numbers, or structures can be written to the pooj. Do not worry about converting data into their binary representation.

  type Point struct { X int Y int } for i := 100; i >= 0; i-- { p := &Point{X: i, Y: i} db.Set(i, p) } var point Point db.Get(8, &point) log.Println(point) 

QuerySystem

Pooj provides the ability to extract keys in a specific order, including a selection with the indication of a limit, an indent, a sort, and a selection by prefix.

  keys, _ := db.Keys(7, 2, 0, true) 

The above code is analogous to the SQL query:

 select keys from db where key>7 order by keys asc limit 2 offset 0 

It should be noted that the sorting of keys - "lazy." On the other hand, the keys are stored in memory and it runs pretty quickly.

Parallelism

Pooj, like most modern databases, uses a non-blocking read model, but writing to a file blocks all operations. But you can create / open files on the fly, minimizing the number of locks. There is no database already opened error in puja. An example of using an http router:

 func write(c *gin.Context) { var err error group := c.Param("group") counter := c.Param("counter") db, err := pudge.Open(group, cfg) if err != nil { renderError(c, err) return } _, err = db.Counter(counter, 1) if err != nil { renderError(c, err) return } c.String(http.StatusOK, "%s", "ok") } 

Engines

Despite its small size, the pooj supports two modes of data storage. In memory and on disk. By default, the pooj stores data (values) only on disk. But if you want, you can turn on data storage in memory. In this case, they will be dropped to disk on request, or when closing the database.

Status

Pooj is used both in home projects and in production, on the chart below - the number of requests to the http server based on the pooj, and the number of requests longer than 20 ms



In this case, the pooj is enabled in full synchronization mode, and, at the time of fsync, significant (more than 20 ms) delays occur. But fortunately, there are not so many of them as a percentage.

On the project page you can find more links with examples of integrating puja into various projects.

Speed

In the benchmark repository , you can compare the pooj with other databases:

Test 1


 Number of keys: 1000000 Minimum key size: 16, maximum key size: 64 Minimum value size: 128, maximum value size: 512 Concurrency: 2 
pogreb
goleveldb
bolt
badgerdb
pudge
slowpoke
pudge (mem)
1M (Put + Get), seconds
187
38
126
34
23
23
2
1M Put, ops / sec
5336
34743
8054
33539
47298
46789
439581
1M Get, ops / sec
1782423
98406
499871
220597
499172
445783
1652069
FileSize, Mb
568
357
552
487
358
358
358

The pooj is very well balanced in terms of the ratio between the writing speed and the reading speed. Those he is not a highly specialized database optimized for reading or writing. With a high reading speed, a rather high writing speed is maintained Which, incidentally, can be further increased by parallelizing the recording in different files (as is done in the LSM Tree engines).

Links to the database used in the test:


They asked to compare with memcache and redis, but since the lion's share of time is spent on network interfaces when interacting with data DB, this is not entirely fair. Although on the other hand, the pooj benefits from multithreading, even though it writes data to disk.

Further development

Source: https://habr.com/ru/post/439216/