Postgres in retrospect

We bring to your attention a translation of the article “Looking Back at Postgres” by Joseph Hellerstein , published in accordance with the international Creative Commons Attribution License version 4.0 (CC-BY 4.0). The authors reserve the right to distribute this work on personal and corporate websites with proper reference to the source.

Translated by Elena Indrupskaya. From myself I would add that “a programmer who desperately wanted to build a system with multi-versioning” - apparently, Vadim Mikheyev, and “volunteers from Russia” who rewritten GiST, we all know well.

annotation

This is a reminder of the Postgres project run at UC Berkeley and led by Mike Stonebraker from the mid-1980s to the mid-1990s. As one of many personal and historical memories, this article was requested for a book [ Bro19 ] dedicated to awarding Turing a prize to Stonebreyker. Therefore, the focus of the article is Stonebreaker’s leadership role and his thoughts on design. But Stonebraker was never a programmer and did not interfere with his development team. The Postgres codebase was the work of a team of brilliant students and, occasionally, full-time university programmers who had a little more experience (and only a little more salary) than students. I was lucky to join this team as a student in the last years of the project. I received useful material for this article from some of the older students involved in the project, but any errors or omissions are mine. If you notice any of them, please contact me and I will try to fix them.

1. Introduction

Postgres was Michael Stounbraker’s most ambitious project - his serious attempt to create a universal database system. Going on for a decade, the project has spawned more articles, candidates of science, professors and companies, than any other Stonebreaker activity. The project also covered more technical areas than any other system he built. Despite the risks inherent in this scale, Postgres has also become the most successful software artifact that came out of Stonbraker’s research teams, and its main contribution to open source. This is an example of the “second system” [ Bro75 ], which was a success. At the time of this writing, i.e. more than thirty years since the project began, the open source PostgreSQL system is the world's most popular independent open source database system and the fourth most popular database system. In the meantime, Postgres companies produced a total of over $ 2.6 billion (in acquisition value). By any measure, Stonebreyker’s Postgres vision had a huge long-lasting resonance.

1.1. Prehistory

Stonebreaker was a huge success early in his career with the Ingres Berkeley research project [ SHWK76 ] and the subsequent startup that he founded along with Larry Rowe and Eugene Wong: Relational Technology, Inc. (RTI).

As RTI developed in the early 1980s, Stonebriker began working on supporting data types in the DBMS that went beyond the traditional rows and columns of the original Codd relational model (Edgar Frank Codd). The motivating example at that time was the need to support databases of computer-aided design tools (CAD) for the microelectronic industry. In a 1983 paper, Stonebreaker and students Brad Rubenstein (Brad Rubenstein) and Antonin Guttman explained how much this industry needs to support “new data types, such as polygons, rectangles, text strings, etc.”, “ effective spatial search "," complex integrity constraints ", as well as" design hierarchies and multiple representations "in the same physical structures [ SRG83 ]. Having such a motivation, the group began work on indexing (including using Guttman’s R-trees for spatial indexing [ Gut84 ]) and on adding abstract data types (ATDs) to the relational database system. At the time, ADTs were a popular new programming language design that was first introduced by Barbara Liskov, later a Turing Award winner, and researched in application programming of databases by a new employee of Stonebreaker - Larry Row. An article in a 1983 SIGMOD Record [ OFS83 ] Stonebreaker and students James Ong and Dennis Fogg describe the study of this concept in an Ingres extension called ADT-Ingres, which included many of the presentation concepts studied more deeply and with better system support in Postgres.

2. Postgres: General Information

As the name implies, Postgres is “Post-Ingres”: a system designed to take what Ingres could do and go beyond. A distinctive feature of Postgres was the introduction of what he ultimately called the object-relational properties of a database: supporting the concept of object-oriented programming in the data model and the declarative query language of the database system. But Stonebreaker also decided to solve a number of other object-independent support for technological problems in Postgres, such as active database rules, versioned data, tertiary storage, and concurrency.

Two articles were written on Postgres design: the description of the early design in SIGMOD 1986 [ SR86 ] and the intermediate description in CACM 1991 [ SK91 ]. The Postgres research project gradually came to naught in 1992 with the founding of Illustra, a start-up by Stonebreaker, which involved Stonebreyker, lead graduate student Wei Hong, and later Jeff Meredith, the chief programmer. In the list below, the possibilities mentioned in the article of 1986 are marked with an asterisk *, and the possibilities from the article of 1991, which were not in the article of 1986, are marked with a cross. Other tasks listed below were taken in the system and research literature, but they are not in any design specification. Many of these topics were considered in Postgres long before they were studied or reinvented by others. In many cases, Postgres was too ahead of its time, and interest in topics started later, from a modern point of view.

ADT support in the database system
- Complex objects (i.e. nested data or non-first normal form data (non-first-normal form - NF2)) *
- Custom abstract data types and functions *
- Extensible access methods for new data types *
- Optimized query processing with expensive custom features
Active databases and rule systems (triggers, warnings) *
- Rules implemented as query rewrite ^†
- Rules implemented as record level triggers ^†
Log-based storage and recovery
- Reduced complexity recovery code, viewing the log as data *, using non-volatile memory for a commit state ^†
- Storage without rewriting and temporal queries ^†
Support for new deep data storage technologies, especially optical discs *
Support for multiprocessors and specialized processors *
Support for various language models
- Minimal changes to the relational model and support for declarative queries *
- Access to the “fast path” from internal APIs bypassing the query language ^†
- Multilingual ^†

We briefly discuss the contribution of Postgres for each of these positions in its relationship with subsequent work in the field of computing.

2.1. ADT support in the database system

Postgres' well-articulated goal was to support new object-relational properties: expanding database technology to provide the benefits of both relational query processing and object-oriented programming. Over time, the object-relational concept, first introduced in Postgres, became standard functionality in most modern database systems.

2.1.1. Complex objects

Quite often, data is presented as nested entities or “objects”. A classic example is a purchase order that has an embedded set of products, their quantities and prices. Religion of relational modeling dictated that such data should be restructured and stored in a format without nesting, using several flat tables of objects (orders, products) with flat tables of relations connecting them (product_ in order). A typical reason for this flattening is that it reduces duplication of data (because the product is redundantly described in many purchase orders), which, in turn, avoids complexity or errors when updating all redundant copies. But in some cases you want to save the nested view, because it is natural for the application (for example, the layout scheme in CAD), and updates are rare. This argument about data modeling is at least as old as the relational model.

Postgres' key approach was to “sit on two chairs” in terms of data modeling: Postgres retained the tables as the “outermost” data type, but allowed the columns to have “complex” types, including nested tuples or tables. One of its rare implementations, first explored in the ADT-Ingres prototype, was to allow table type columns to be declared declaratively as a query definition: “Quel as data type” [ SAHR84 ] (Quel - Ingres query language. - Note .) .

The “post-relational” topic of support for both declarative queries and nested data has reappeared over the years, often spawned by disputes about what is best. During the Postgres era in the 1980s and 1990s, some groups focused on object-oriented databases picked up this idea and developed it into the standard OQL language, which then ceased to be used.

At the turn of the millennium, declarative requests for nested objects became an obsessive research idea for the segment of the database development community in the form of XML databases. The resulting XQuery language (led by Don Chamberlin - person SQL) is obliged to support complex objects Postquel language Postgres. XQuery is widely used and used in industry, but has never been popular with users. Today, these concepts are re-examined in the query language projects for the JSON data model, popular in browser applications. Like OQL, in groups that initially rejected declarative requests in favor of developer-oriented programming (the “NoSQL” movement), these languages often arise as a late addition solely from the desire to add queries back to the systems. At the same time, as Postgres has grown over the years (and the transition from Postquel query language to SQL versions that meet many of the goals under consideration), it has included support for embedded data, such as XML and JSON, in general-purpose DBMS, without requiring any or significant redesign. Disputes proceed with varying success, and Postgres’s approach to expanding the relational structure with extensions for nested data has repeatedly shown itself to be a natural end state for all parties after the arguments have subsided.

2.1.2. Custom abstract data types and functions

In addition to proposing nested types, Postgres put forward the idea of introducing opaque, extensible ADTs that are stored in the database, but not interpreted by the kernel. In principle, this has always been part of Codd's relational model: integers and strings were traditional, but in fact the relational model covers any atomic data types with predicates. The task was to provide such mathematical flexibility in software. In order to use queries that interpret and manipulate these objects, an application programmer must be able to register user-defined functions (UDFs) for these types in the system and call these functions in queries. It is also desirable that user-defined aggregate (user-defined aggregate - UDA) functions summarize collections of these objects in queries. The Postgres database system was innovative, fully supporting these features.

Why put such functionality in a DBMS, and not in high-level applications? A typical answer to this question was a significant performance advantage of the code placed on the data, before pulling the data to the code. Postgres showed that this was quite natural within the framework of the relational environment: only minor changes were required in the relational metadata catalog and mechanisms for calling third-party code were created, but the query syntax, semantics and system architecture worked simply and elegantly.

Postgres was a bit ahead of its time in exploring this functionality. In particular, at that time the community of database researchers was not particularly worried about the security implications of downloading unsafe code to the server. This became perceived as a problem when technology was noticed in the industry. Stonebreaker brought Postgres to the market in its Illustra startup, which was acquired by Informix largely for its ability to support DataBlade expansion packs, including UDF. Informix with Postgres-based technology and a strong offering of parallel databases has become a significant threat to Oracle. Oracle has invested heavily in negative marketing risks related to Informix’s ability to run unprotected custom C code. Some attribute the death of Informix to this campaign, although the financial fraud of Informix (and the subsequent federal accusation of its then CEO) certainly represented more serious problems. Now, decades later, all major database providers support the execution of user-defined functions in one or more languages, using new technologies to protect against server failures or data corruption.

Meanwhile, the 2000s' big-data technology stacks, including the MapReduce phenomenon, which “spoiled a lot of blood” for Stonebreyker and David DeWitt [ DS08 ], are a re-implementation of the Postgres idea, a custom code placed within the request. It appears that MapReduce largely combines Postgres software development ideas with concurrency ideas from systems such as Gamma and Teradata, with some minor innovations around restarting in the process of executing a query for workloads with extreme scalability. Postgres, Greenplum and Aster startups around 2007 showed that Postgres paralleling could lead to something much more functional and practical than MapReduce for most customers, but in 2008 the market was still not ready for this technology. . So far, in 2018, almost every big data stack basically serves the parallel SQL workload with UDF, which is very similar to the design that Stonebraker and the team used for the first time in Postgres.

2.1.3. Extensible access methods for new data types

Relational databases evolved around the same time as B-trees in the early 1970s, and B-trees helped give impetus to Codd’s “independence from physical data storage”: indexing B-trees provides a level of indirection that adaptively reorganizes physical storage without requiring changes to applications. The main limitation of B-trees and related structures was that they only support equality search and one-dimensional range queries. And what if you have 2-dimensional range queries that are typical for mapping and CAD applications? This problem was known during Postgres, and the R-tree [ Gut84 ], developed by Antonin Guttman in the Stonebreaker group, was one of the most successful new indices designed to solve this problem in practice. Nevertheless, the invention of the index structure does not solve for complex systems the task of supporting multidimensional ranges in a DBMS. There are many questions. Can you easily add an access method, such as R-trees, to your DBMS? Can you teach the optimizer to understand that the specified access method will be useful for certain queries? Can you ensure correct recovery and simultaneous access? It was a very bold postgres action point: a software architecture problem affecting most of the database engine, from the optimizer to the storage layer, as well as the logging and recovery system. Postgres R-trees have become a powerful driving force and a prime example of the elegant extensibility of the access method level and its integration into the query optimizer. Postgres showed how to register an abstractly described access method (in this case, an R-tree), and how a query optimizer can recognize an abstract predicate of a sample (in this case, a range selection) and compare it with this abstractly described access method. The issue of simultaneous access control was given less attention in the original work: the lack of one-dimensional ordering of keys made in this case the lock used in B-trees inapplicable.

The promising possibilities of Postgres extensible access methods inspired one of my first research projects at the end of graduate school: Generalized Search Trees — GiST [ HNP95 ] and the subsequent concept of indexability theory [ HKM ⁺ 02 ]. I implemented GiST in Postgres during the semester after receiving my doctoral degree, which made adding new indexing logic to Postgres even easier. In his thesis, Marcel Kornacker from Berkeley (Marcel Kornacker) solved the complex problems of recovery and simultaneous access posed by the extensible “template” type of the GiST index [ KMH97 ].

Today, PostgreSQL advantageously combines the original software architecture of extensible access methods (it has B-tree, GiST, SP-GiST and Gin indices) with extensibility and intensive concurrent access of the Generalized Search Tree Interface (GiST). GiST indexes support PostgIS's popular PostGreSQL geo-information system. Gin indices provide internal support for text indexing in PostgreSQL.

2.1.4. Query Optimizer with Costly UDFs

In the traditional query optimization, the task was to minimize the stream volume of the tuples (and, therefore, I / O operations) generated during the processing of the request. This meant that operators that filter tuples (sampling) are good at the beginning of the query plan, while operators that can generate new tuples (connection) must be executed later. As a result, query optimizers will “push” the sample operators below the connections and arrange them arbitrarily, focusing instead on smart optimization of connections and disk accesses. UDFs have changed the approach: if you have expensive UDFs in your sample operators, the order in which UDFs are executed can be crucial for optimizing performance. Moreover, if the UDF in the sample operator really takes a lot of time, it is possible that the sample should be performed after the connections (i.e., “pullup” the sample pull up). Accounting for these factors has complicated the search space for the optimizer. I took this problem as the first difficult task in graduate school, and ended up being the subject of my master's work with Stoneblock at Berkeley and my Ph.D. in Wisconsin under the guidance of Jeff Naughton, but with constant help from Stonebreyker. DBMS Postgres was the first to save the cost and selectivity of user-defined functions in a database directory. We approached the optimization problem by inventing the optimal order of the sampling operations, and then the optimal alternation of sampling operations along the branches of each connection tree considered when searching for a plan. This allowed the optimizer to support the classic System R dynamic programming architecture with a small additional sorting cost to properly streamline expensive sample operators.

When I entered graduate school, it was one of the three topics that Stonebraker wrote on the blackboard in his office as options for choosing a topic for my dissertation. It seems that the second topic was the indexing of functions, and I do not remember the third one.

Cost-effective optimization was disabled in PostgreSQL's source tree trees at an early stage, largely because there were no convincing options for using expensive user-defined functions at that time. The examples we used revolved around image processing, and finally, in 2018, they became popular data processing tasks. Of course, today, in the era of big data and machine learning workloads, costly functions have become quite common, and I expect this problem to return to the fore. Once again, Postgres is well ahead of its time.

Ironically, the code I wrote in graduate school was completely removed from the PostgreSQL source tree by a young programmer named Neil Conway, who several years later began doing his PhD thesis under my supervision at UC Berkeley and is now one from Stonebreyker's “Candidate Grandchildren”.

2.2. Active databases and rule systems

The Postgres project began at the end of the artificial intelligence community’s interest in rule-based programming as a way to represent knowledge in expert systems. Such a course of thought did not lead to success: many believe that this caused the widely discussed "winter of artificial intelligence," which continued throughout the 1990s.

However, rule-based programming was maintained in the database developer community in two forms. The first is theoretical work around declarative logic programming using Datalog. She was a “bone-in-the-throat” for Stonebreaker: he seemed to really hated the subject and criticized it bitterly in several community reports over the years.

Datalog has survived as a mathematical basis for declarative languages and, over time, has found application in various areas of computer science, including software-configurable networks and compilers. Datalog is a declarative construction of requests "on steroids" as a completely expressive programming model. In the end, I was involved in this, as in the natural design option, and followed this in various application settings outside of traditional database systems.

The second set of questions related to database rules was a practical work on what was ultimately called active databases and database triggers, which evolved into the standard functionality of relational DBMSs. Stonebreaker, in his peculiar manner, kicked in to work on a more practical option.

Stonebreaker’s work on database rules began with Eric Hanson’s Ph.D. work, which was originally done for Ingres, but quickly moved into a new Postgres project. It was expanded in the Ph.D. work of Spyros Potamianos on PRS2: Postgres Rules System 2. The theme in both implementations was the ability to implement the rules in two different ways. One of them is to interpret the rules as rewriting requests. This is reminiscent of the work on re-recording the submissions that Stonebreaker did at Ingres for the first time. In this scenario, the logic of the rule “subject to perform an action” is converted to “when prompted to overwrite it and execute instead of the original one”. For example, a request like "add a new line to the list of Mike awards" can be rewritten as "raise Mike's salary by 10%." Another way was to implement a more natural “subject to perform an action” by checking the conditions at the row level using locks inside the database. When such locks were detected, the result was not a wait (as in the traditional simultaneous access control), but the execution of the corresponding action.

The code for the row level rules in PRS2 was, sadly, difficult. A small search in the Postgres archives in Berkeley found the following comment (probably Spiros Potamianos) to the Postgres source code version 3.1 from about 1991 (given in translation):

 
  * ABOUT AND WITH A N AND E: 
  * Take a deep breath and read.  If you can not get into the following 
  * code (i.e. if the boss did not force you to voluntarily do it 
  * dirty business) avoid it at all costs.  Try to do something less dangerous. 
  * for your (mental) health.  Go home and watch horror movies on TV. 
  * Read a little Lovecraft.  Go to serve in the army.  Go and spend some nights 
  * in the national park.  Commit suicide ... 
  * What, you continue to read, really?  Well, then you deserve what you got. 
  * Welcome to the dark labyrinth of tuple-level rules system, my 
  * poor colleague ...

As a result, to implement the rules in Postgres, neither the query rewrite method nor the row-level blocking method were declared “winners” — both were stored in the released system. In the end, the code for all the rules was rejected and rewritten in PostgreSQL, but the current source code still retains both the notion of triggers at the operator level and at the row level.

Postgres rules systems at one time had a very big impact and went "nostrils to nostrils" with research on the IBM Starburst and MCC HiPAC projects. Today, triggers are part of the SQL standard and are implemented in the engines of many major databases. However, they are used with some caution. One of the problems is that the developments mentioned above have not overcome the negative aspects that led to the “winter of artificial intelligence”: interactions in the pile of rules can become unacceptably confusing, even when the set of rules grows only slightly. In addition, in practice, the execution of triggers usually takes a relatively long time, so the databases being implemented that should work quickly tend to avoid the use of triggers. However, there was some artisanal in related areas, such as supporting materialized views, handling complex events, and streaming requests, each of which is in some way extensions of ideas explored in Postgres rule systems.

2.3. Log-based storage and recovery

Stonebraker described his Postgres storage design in this way:

When considering Postgres storage, we were guided by missionary zeal to do something unusual. All modern commercial systems use a write-ahead log (WAL) storage manager, and we felt that this technology is well understood. Moreover, the original Ingres prototype of the 1970s was used by a similar storage manager, and we had no desire to make another implementation. [ SK91 ]

Although it looks like pure intellectual restlessness, there were technological grounds for this work. Over the years, Stonebriker has repeatedly expressed dislike for complex proactive logging schemes, first developed by IBM and Tandem to restore databases. One of his main objections is based on the intuition of a software developer: no one should rely on something so complex, especially for functionality that will be used only in rare, critical scenarios after a failure.

Postgres repository combines the concepts of core storage and journaling historical information into a single, simple disk view. Basically, the idea was to store each record in the database in a linked list of versions marked with transaction identifiers — in a sense, this is “log as data” or “data as log” depending on your point of view. The only additional metadata that is needed is a list of identifiers of completed transactions and the time they were committed. This approach greatly simplifies recovery, since there is no “translation” from the journal view back to the main view. Он также делает возможными темпоральные запросы: вы можете выполнять запросы по состоянию на некоторый момент времени и получать доступ к версиям данных, которые были зафиксированы в это время. Первоначальный дизайн системы хранения Postgres, который выглядит так, как будто Стоунбрейкер описал его в одном творческом сеансе мозгового штурма, рассматривал ряд проблем эффективности и оптимизации базовой схемы наряду с грубым анализом того, как может повести себя производительность [ Sto87 ]. Итоговая реализация в Postgres была несколько проще.

Мнение Стоунбрейкера о «радикальной простоте» транзакционного хранилища было глубоко противоположно принятым в сообществе в то время, когда поставщики баз данных выделяли себя, инвестируя значительные средства в алгоритмы высокопроизводительной обработки транзакций. В то время победители сравнительных тестов достигали высокой производительности и возможности восстановления благодаря хорошо оптимизированным системам ведения журнала упреждающей записи. После того, как они получили хорошо работающие журналы упреждающей записи, поставщики также начали нововведения следующего этапа, такие как транзакционная репликация на основе доставки журнала, что было трудно осуществить в Postgres. В итоге система хранения Postgres так и не отличилась в производительности. Версионность и темпоральные запросы были позже удалены из PostgreSQL и заменены использованием журнала упреждающей записи.

К сожалению, PostgreSQL все еще не особенно быстро обрабатывает транзакции: использование в нем журнала упреждающей записи несколько половинчато. Как ни странно, команда PostgreSQL сохранила много служебной информации, хранимой вместе с кортежами в Postgres, для обеспечения многоверсионности, что никогда не было целью проекта Postgres в Беркли. Результатом является система хранения, которая может эмулировать изоляцию снимков (snapshot isolation) Oracle с изрядным объемом дополнительных накладных расходов ввода-вывода, но которая не следует первоначальной мысли Стоунбрейкера о темпоральных запросах или простом восстановлении.

Майк Олсон (Mike Olson) отмечает, что его первоначальное намерение состояло в том, чтобы заменить реализацию B-дерева Postgres своей собственной реализацией B-дерева из проекта Berkeley DB, который разрабатывался в Беркли в эпоху Postgres. Но Олсон так и не нашел на это времени. Когда годы спустя Berkeley DB получила поддержку транзакций в Sleepycat Corp., Олсон попытался убедить (тогдашнее) сообщество PostgreSQL использовать его для восстановления вместо хранилища «без перезаписи». Они отказались: в проекте был программист, который отчаянно хотел построить систему с многоверсионностью (MVCC), и поскольку он был готов выполнить работу, он выиграл спор.

Медленно работающее хранилище PostgreSQL не является внутренне присущим системе. В Greenplum ветка PostgreSQL в качестве интересной альтернативы интегрировала высокопроизводительное сжатое хранилище. Оно было разработано Мэттом МакКлином (Matt McCline)—ветераном команды Джима Грея (Jim Gray) в компании Tandem. Оно также не поддерживало темпоральных запросов.

Но возможность темпоральных запросов была интересной и оставалась уникальной. Более того, кредо Стоунбрейкера в отношении разработки простого программного обеспечения для восстановления данных сегодня имеет отголоски как в контексте систем NoSQL (которые выбирают репликацию, а не WAL), так и в базах данных основной памяти (MMDB — main memory databases, которые часто используют многоверсионность и сжатые журналы фиксации). Идея версионных реляционных баз данных и темпоральных запросов сегодня все еще отнесена к эзотерике, появляясь в случайных исследовательских прототипах и небольших проектах с открытым исходным кодом. Это идея, которая созрела для возвращения в нашу эпоху дешевого хранения и непрерывных потоков данных.

2.4. Запросы к данным на носителях с новыми технологиями глубокого хранения

В середине проекта Postgres Стоунбрейкер подписался в качестве одного из руководителей на большой грант по научному направлению «цифровая земля» под названием Project Sequoia. Часть грантового предложения заключалась в обработке беспрецедентных объемов цифровых спутниковых изображений, требующих до 100 терабайт памяти, т. е. намного большего объема данных, чем в то время было бы разумно хранить на магнитных дисках. В основе предлагаемого решения было исследовать идею создания СУБД (а именно Postgres), облегчающей доступ к полуавтономному «третичному» хранилищу, предоставляемому роботизированными накопителями с автоматической сменой дисков для управления библиотеками оптических дисков или лент.

Из этого вытекало несколько разных исследований. Одним из них была файловая система Inversion — попытка предоставить абстракцию файловой системы UNIX над реляционной СУБД. В обзорной статье для Sequoia Стоунбрейкер описал это в своем обычном стиле свысока как «простое упражнение» [ Sto95 ]. На самом деле Майк Олсон, студент Стоунбрейкера (и последующий основатель компании Cloudera), был занят этим в течение нескольких лет, да и конечный результат не был вполне однозначным [ Ols93 ] и не выжил на практике.

На несколько лет позже Inversion Билл Гейтс «бился с теми же ветряными мельницами» в WinFS — попытке воссоздать наиболее широко используемую в мире файловую систему над серверной частью реляционной базы данных. WinFS поставлялась в разработческих версиях Windows, но так и не вышла на рынок. Гейтс позже назвал это своим самым большим в Microsoft разочарованием.

Другим основным направлением исследований на этом фронте было включение третичного хранилища в стек более типичных реляционных баз данных, что было предметом кандидатской диссертации Суниты Сараваги (Sunita Sarawagi). Основной темой было изменение масштаба, в котором вы мыслите управление пространством (т. е. данными в хранилище и иерархии памяти) и временем (координация планирования запросов и кеша для минимизации нежелательных операций ввода-вывода). Одной из ключевых проблем в этой работе было хранение больших многомерных массивов в третичном хранилище и извлечение их, что перекликается с работами в области многомерного индексирования. Основные идеи включали разбиение массива на порции и хранение вместе порций, которые выбираются вместе, а также репликацию порций, чтобы указанная порция данных могла иметь нескольких физических «соседей». Вторая проблема — подумать о том, как диск становится кешем для третичного хранилища. Наконец, оптимизация запросов и планирование должны были учитывать длительное время загрузки данных из третичного хранилища и важность попаданий (hits) кеша диска. Это влияет как на план, выбираемый оптимизатором запросов, так и на время, на которое этот план намечен к выполнению.

Роботы на лентах и оптических дисках в настоящее время широко не используются. Но проблемы третичного хранилища очень распространены в облаке, которое в 2018 году имеет глубокую иерархию хранения: от присоединенных твердотельных дисков к услугам надежного дископодобного хранилища (например, AWS EBS), к архивному хранилищу (например, в AWS S3), к глубокому хранилищу (например, AWS Glacier). Сегодня эти уровни хранения по-прежнему относительно обособлены, и рассуждения о сквозном хранилище, охватывающем эти уровни, практически не поддерживаются базой данных. Я не удивлюсь, если вопросы, исследованные на этом фронте в Postgres, будут пересмотрены в ближайшее время.

2.5. Поддержка мультипроцессоров: XPRS

Стоунбрейкер никогда не создавал большую параллельную систему баз данных, но он руководил многими стимулирующими дискуссиями в этой области. Его статья «Case for Shared Nothing» (Случай для систем без разделяемых ресурсов) [ Sto86 ] задокументировал крупномодульные архитектурные решения в этой области. Он популяризировал терминологию, используемую в отрасли, и озадачил поддержкой архитектур без совместно используемых ресурсов, таких как Gamma и Teradata, которые были заново открыты в 2000-х годах сообществом работающих над большими данными.

По иронии судьбы, самым существенным вкладом Стоунбрейкера в область параллельных баз данных была архитектура «общей памяти» под названием XPRS, что означало «eXtended Postgres on RAID and Sprite». В начале 1990-х годов XPRS была «лигой справедливости» систем Беркли: она объединяет в сокращенном виде систему Postgres Стоунбрейкера, распределенную ОС Sprite Джона Остерхаута (John Ousterhout) и архитектуру RAID-хранилищ Дейва Паттерсона (Dave Patterson) и Рэнди Каца (Randy Katz). Как и для многих межфакультетских работ, выполнение проекта XPRS фактически определялось аспирантами, которые работали над ним. Оказалось, что основной вклад внес Вэй Хонг, который написал свою кандидатскую диссертацию по оптимизации параллельных запросов в XPRS. Таким образом, основным вкладом XPRS в литературу и индустрию была оптимизация параллельных запросов без существенного рассмотрения проблем, связанных с RAID или Sprite.

Из этих трех проектов огромное влияние на дальнейшее оказали Postgres и RAID. Sprite лучше всего помнят по кандидатской диссертации Менделя Розенблюма (Mendel Rosenblum) о файловых системах с журнальной структурой (Log Structured File Systems — LFS), которая не имела ничего примечательно общего с распределенными операционными системами. Все три проекта содержали новые идеи для дискового хранения, помимо видоизменения отдельных копий по месту. LFS и менеджер хранилища Postgres довольно похожи новым отношением к журналу как к основному хранилищу и необходимостью дорогостоящей фоновой реорганизации. Однажды я осторожно прощупывал Стоунбрейкера на предмет соперничества LFS и Postgres или академических «жареных фактов» об их взаимоотношениях, но я так и не узнал от него ничего интересного. Возможно, в то время в Беркли кто-то «мутил воду».

В принципе, параллелизм «взрывает» пространство планов оптимизатора запросов, умножая традиционные варианты выбора, сделанные во время оптимизации запросов (доступ к данным, алгоритмы соединений, порядок соединений), на все возможные способы распараллеливания каждого варианта выбора. Основная идея названного Стоунбрейкером «оптимизатора Вэя Хонга» заключалась в том, чтобы разбить проблему на две: запустить традиционный оптимизатор запросов в духе System R для одного узла, а затем «распараллелить» получившийся план, запланировав степень параллелизма и размещение каждого оператора, исходя из представления данных и конфигурации системы. Этот подход эвристичен, но в нем параллелизм наращивает стоимость традиционной оптимизации запросов аддитивно, а не мультипликативно.

Хотя оптимизатор Вэя Хонга был разработан в контексте Postgres, он стал стандартным подходом для многих оптимизаторов параллельных запросов в отрасли.

2.6. Поддержка различных языковых моделей

Среди интересов Стоунбрейкера, неоднократно возобновляющихся со времен Ingres, был интерфейс прикладного программирования (API) системы баз данных. В свои лекции из серии Database Systems (Системы баз данных) он часто включал язык GEM Карло Заниоло (Carlo Zaniolo) как тему, которую важно понять поборникам систем баз данных. Этот интерес к языку, несомненно, привел его к партнерству с Ларри Роу в Postgres, что, в свою очередь, глубоко повлияло на дизайн модели данных Postgres и ее объектно-реляционный подход. Их работа была сосредоточена в основном на приложениях для работы с большим объемом данных из коммерческой сферы, включающих как обработку деловой информации, так и новые приложения, такие как САПР/АСУП и ГИС.

Одной из проблем, которая была навязана в то время Стоунбрейкеру, была идея «спрятать» границы между конструкциями языка программирования и хранилищем базы данных. Различные конкурирующие исследовательские проекты и компании, исследующие объектно-ориентированные базы данных (Object-Oriented Databases — OODB), были нацелены на так называемую «потерю соответствия» между императивными объектно-ориентированными языками программирования, такими как Smalltalk, C++ и Java, и декларативной реляционной моделью. Идея OODB состояла в том, чтобы сделать объекты языка программирования при желании помечаемыми как «постоянные» и автоматически обрабатываемыми встроенной СУБД. Postgres поддерживал хранение вложенных объектов и абстрактных типов данных, но его интерфейс, основанный на декларативных запросах в реляционном стиле, предполагал неестественные для программиста обращения к базе данных (требовал от него использования декларативных запросов), которые к тому же были дорогостоящими (требовали синтаксического разбора и оптимизации). Чтобы конкурировать с поставщиками OODB, Postgres предоставил так называемый интерфейс «быстрого пути» (Fast Path): по сути API C/C++ к внутреннему устройству хранения базы данных. Это позволило Postgres иметь среднюю производительность по академическим тестам OODB, но так никогда и не решило задачи позволить программистам на разных языках избежать проблемы потери соответствия. Вместо этого Стоунбрейкер навесил на модель Postgres ярлык «объектно-реляционной» и просто обошел стороной применение объектно-ориентированных баз как невыгодный рынок (zero-billion dollar market). Сегодня практически все коммерческие системы реляционных баз данных являются «объектно-реляционными» системами баз данных.

Это оказалось разумным решением. Сегодня ни один из продуктов OODB не существует в своей задуманной форме, и идея «постоянных объектов» в языках программирования была большей частью отброшена. В отличие от этого, широко распространено использование слоев объектно-реляционного отображения (object-relational mapping — ORM, подпитываемое ранними работами, такими как Java Hibernate и Ruby on Rails), что позволяет относительно гладко «подгонять» декларативные базы данных почти под любой императивный объектно-ориентированный язык программирования в качестве библиотек. Этот подход на уровне приложения отличается как от OODB, так и от объектно-реляционных баз данных по Стоунбрейкеру. Кроме того, легковесные хранилища «ключ-значение» также успешно используются как в бестранзакционной, так и в транзакционной форме. Их первооткрывателем была аспирантка Стоунбрейкера Марго Зельцер (Margo Seltzer), которая работала над базой данных Berkeley DB в рамках своей кандидатской диссертации в то же время, что и группа Postgres, что предвосхитило рост таких распределенных NoSQL-хранилищ «ключ-значение», как Dynamo, MongoDB и Cassandra.

3. Влияние на программное обеспечение

3.1. Открытый исходный код

Postgres has always been an open source project with even releases, but for a long time at first it was intended to be used in research, not in production.

As the Postgres research project collapsed, two students of Stonebreaker, Andrew Yu and Jolly Chen, modified the system's parser to replace the original Postquel language with an extensible SQL. The first release of Postgres that supports SQL was Postgres95, and the next one was named PostgreSQL.

The open source development team became interested in PostgreSQL and “adopted” it even when the interests of the rest of the Berkeley team changed. The core PostgreSQL developers group has remained relatively stable over time, and the open source project has become highly developed. Initially, efforts were focused on the stability of the code and the functionality visible to the user, but over time, the open source community has significantly changed and improved the core of the system, from the optimizer to access methods and the underlying transaction and storage system. Since the mid-1990s, a very small part of the internal components of PostgreSQL came from the academic group at Berkeley. Her last contribution was probably my GiST implementation in the second half of the 1990s, but even it was substantially rewritten and cleaned up by volunteers from the open source community (in this case, from Russia). The part of the open source community that works on PostgreSQL deserves the greatest praise for implementing an orderly process that has served for decades to create a highly efficient and long-term project.

Although things have changed in 25 years, the basic architecture of PostgreSQL remains very similar to the Postgres university releases of the early 1990s, and developers familiar with the current PostgreSQL source code will easily read the Postgres 3.1 source code (1991). Everything, from the source code directory structure to the process structure and data structure, remains remarkably similar. The code from the Postgres team at Berkeley had excellent backbone.

Today, PostgreSQL is, without a doubt, the most high-performance open source database, and it supports the functionality that is often missing in commercial products. It is also (according to one influential rating site) the most widely used independent open source database in the world, and its influence continues to grow: in 2017 and 2018, it was the database with the fastest growing popularity in the world [ DE19c ]. PostgreSQL is used in a wide variety of industries and applications, which is not surprising, given its focus on wide opportunities.

According to DB-Engines, PostgreSQL is today the fourth most popular DBMS in the world, after Oracle, MySQL and MS SQL Server, all three being offered by specific companies (MySQL was acquired by Oracle many years ago) [ DE19a ]. Ranking rules are discussed in the description of the DB-Engines [ DE19b ] ranking methodology.

Heroku is a cloud-based SaaS provider that is now part of Salesforce. Postgres was introduced in Heroku in 2010 as the default database for its platform. Heroku chose Postgres for reliability. With support from Heroku, larger application development platforms, such as Ruby on Rails and Python for Django, have begun to recommend Postgres as the default database.

Today, PostgreSQL supports an extension infrastructure that makes it easy to add additional features to the system through user-defined functions and associated modifications. Now there is an ecosystem of PostgreSQL extensions, akin to the concept of llustra DataBlade extension packs, but with open source. The most interesting extensions include, for example, the Apache MADlib library for machine learning in the SQL interface and the Citus library for parallel query execution.

One of the most interesting open source applications built on Postgres is the PostGIS geographic information system, which uses many of the features of Postgres that initially inspired Stonebraker to launch the project.

3.2. Commercial introduction

PostgreSQL has long been an attractive starting point for creating commercial database systems, given its use under an “all-pervading” open source software license, reliable code, flexibility, and extensive functionality. Summarizing the acquisition costs listed below, we see that Postgres has generated over $ 2.6 billion in acquisition costs.

Please note that this is a measure in dollars of real financial transactions and is much more significant than the values that are often used in high technology. Figures in the billions are often used to describe the assessed value of shareholdings, but often overestimated by a factor of 10 or more compared to the present value in the hope of its future value. The purchase transaction dollars of a company measure its actual market value at the time of the acquisition. It is fair to say that Postgres has created over $ 2.6 billion in real commercial value.

Many commercial efforts related to PostgreSQL have focused on what is probably its main limitation: the ability to scale to a parallel architecture without sharing resources.

PostgreSQL parallelization requires a fair amount of work, but a highly doable little experienced team. Today, PostgreSQL open source industry branches, such as Greenplum and CitusDB, provide this capability. It is a pity that PostgreSQL was not properly parallelized in open source much earlier. If in the early 2000s, PostgreSQL had been expanded in open source to support an architecture without resource sharing, it is possible that the direction of big data with open source would have evolved quite differently and more efficiently.

Illustra was Stonebreyker’s second major startup, founded in 1992 to commercialize Postgres, since RTI brought Ingres to the market.

Illustra was actually the third name suggested for the company. Continuing the theme of painting, given the name Ingres, Illustra was originally called Miro. Due to trademark issues, the name was changed to Montage, but it also ran into problems with trademarks.

The founding team included some of the core of the Postgres team, including recent graduate student Wei Hong and then chief programmer Jeff Meredith, as well as Ingres graduates Paula Hawthorn and Michael Ubell. Postgres undergraduate Mike Olson joined shortly after founding, and I worked at Illustra in optimizing costly functions as part of my PhD work. There were three major works in Illustra: expanding SQL92 to support custom types and functions, such as Postquel, make the Postgres code base robust enough for commercial use, and stimulate the market for expandable database servers with examples of DataBlade extensions — specialized plug-in components of data types and functions. Illustra was acquired by Informix in 1997 at an estimated cost of $ 400 million [ Mon96 ], and its DataBlade architecture was integrated into the more well-established Informix request processing code as Informix Universal Server.
Netezza was a startup founded in 1999 that branched PostgreSQL code to create a high-performance parallel processing mechanism for requests for custom-made equipment based on FPGA. Netezza was a fairly successful independent company that conducted the first public sale of shares in 2007. It was eventually acquired by IBM for $ 1.7 billion [ IBM10 ].
Greenplum made the first attempt to offer a parallel, horizontally scalable version of PostgreSQL without resource sharing. Founded in 2003, Greenplum was branched from the PostgreSQL public distribution, but largely preserved the PostgreSQL API, including the API for user-defined functions. In addition to parallelization, Greenplum has expanded PostgreSQL with an alternative, high-performance, compressed column repository and parallel query-based optimizer of rules called Orca. Greenplum was acquired by EMC in 2010 for $ 300 million [ Mal10 ], and in 2012 EMC incorporated Greenplum into its subsidiary, Pivotal. In 2015, Pivotal decided to release open source Greenplum and Orca again. One of Greenplum's Postgres API optimization achievements was the MADlib library for machine learning in SQL [ HRS + 12 ]. MADlib lives today as an Apache project. Another interesting open source project based on Greenplum is Apache HAWQ, developed by Pivotal, which runs the “upper half” of Greenplum (i.e., the parallel query handler and PostgreSQL extensibility application programming interfaces) over big data storages such as Hadoop file system.
EnterpriseDB was created in 2004 as an open source software business that sells PostgreSQL in both basic and advanced versions and provides related services to corporate clients. A key feature of the improved EnterpriseDB Advanced Server is database compatibility with Oracle, which ensures the migration of applications from Oracle.
Aster Data was founded in 2005 by two Stanford students to create a parallel analytics engine. Its main single-node engine was based on PostgreSQL. Aster focused on graph queries and analytics packages based on user-defined functions that could be programmed using the SQL or MapReduce interface. Aster Data was acquired by Teradata in 2011 for $ 263 million [ Sho11 ]. Although Teradata has never integrated Aster into its core parallel database engine, it still supports Aster as a standalone product for use cases outside the main Teradata data warehouse market.
ParAccel was founded in 2006, selling a parallel version of PostgreSQL with column storage without resource sharing. ParAccel has extended the Postgres optimizer with new heuristics for queries with many connections. In 2011, Amazon invested in ParAccel, and in 2012 announced AWS Redshift, a data warehouse as a service with deployment in a public cloud based on ParAccel technology. In 2013, ParAccel was acquired by Actian (which also acquired Ingres) for an undisclosed amount of the transaction, which means that this was not a material expense for Actian. Meanwhile, the AWS Redshift offer was a huge success for Amazon — for many years it was Amazon’s fastest growing data warehousing service, and many believe that it is ready to take long-existing data warehousing products, such as Teradata and Oracle Exadata, out of business. In this sense, Postgres can achieve its ultimate dominance in the cloud.
CitusDB (CitusDB is the name of the DBMS; the company is called Citus Data. - Note.) Was founded in 2010 to offer a parallel implementation of PostgreSQL without resource sharing. Although it started as a PostgreSQL branch, since 2016 CitusDB has been implemented through the PostgreSQL open extensions API and can be installed into the basic PostgreSQL installation. Since 2016, CitusDB extensions are available in open source.

4. Lessons

You can learn a lot from Postgres success, some of which are challenging conventional wisdom.

The lesson I’m learning of a higher order is that Postgres challenged the “Second System Effect” syndrome by Fred Brooks [ Bro75 ]. Brooks argued that after the successful first system, designers often create a second one that fails because of overloading with opportunities and ideas. Postgres was Stonebreyker’s second system, and it certainly was full of opportunities and ideas. The system also successfully prototyped many ideas, while delivering a software infrastructure that brought many ideas to a successful conclusion. This was not an accident - at its core, Postgres was designed with the possibility of extensibility, and this design was well thought out. With expandability as the core of the architecture, the opportunity to be creative and to worry less about frameworks appeared: you can try different extensions and let the strongest win. Done well, the “second system” is not doomed. She benefits from trust, favorite projects and aspirations that have emerged during the use of the first system. This is an early architectural lesson from a more "server-oriented" school of database development that challenges the well-established view of the "component-oriented" school of operating system development.

Another lesson is that the emphasis on universality, “one size fits all”, can be a winning approach for both research and practice. However, the Stonebreaker since MIT (In 2001, Stonebreyker took the position of a computer science professor at the Massachusetts Institute of Technology (MIT). - Approx. Trans.) Made a stir in the world of databases in the early 2000s with the thesis “one size is not suitable for everyone”. Under this banner, he launched a flotilla of important projects and start-ups, but none of them could scale with Postgres. It seems that the Stonebreaker of the time of Berkeley is challenging the later experience of the Stonebreaker of the times of MIT, and I see no problems in that.

As Emerson (Ralph Waldo Emerson) said, "a stupid sequence is a scarecrow of small minds."

Of course, there is wisdom in the motto “one size doesn’t suit everybody” (you can always find modest markets for non-standard solutions), but the success of the system, which is Berkeley’s Stonebreaker card, far beyond its original goals, shows that the overwhelming majority of problems with databases data is completely solved using a good general purpose architecture. In addition, the design of this architecture in itself is a technological challenge and achievement. In the end, as in most scientific and technical debates, there is not only one good way to do something. Both Stonebreakers have something to teach us. But by nature I am still a fan of the wider program that the Stonebreaker of the time of Berkeley adopted.

The last lesson I learn from Postgres is the unpredictable potential that can be found in the open source you are exploring. In his Turing lecture, Stonebriker talks about the “intuitive insight” of the PostgreSQL system, which is successfully developed in open source code, mainly due to people not from Stonebriker’s environment. Here is a quote that sounds remarkably modest:

A team of volunteers who have picked up the project, none of which has anything to do with me or Berkeley, has been watching this open source system since 1995. The Postgres system, which you get from the Internet, is the result of this command. This is open source at its best, and I just want to mention that I have nothing to do with this and with this group of people to whom we are all in a huge debt. [ Sto14 ]

I’m sure that all of us who wrote open source would like to see such "intuitive insight" come to us. But the point is not only in “intuitive foresight”. The source of luck is undoubtedly rooted in the aspirations, breadth and insight of Stonebreyker in the project and in the team, which he oversaw the creation of the prototype Postgres. If there is any lesson in it, it can be like this: “do something important and let it go”. It seems to me (learning from Stonebreaker) that you cannot miss a single part of this lesson.

5. Thanks

I’m grateful to my old Postgres buddies Wei Hong, Jeff Meredith and Mike Olson for their memories and information, as well as Craig Kerstiens for his contribution to modern PostgreSQL.

Literature

[Bro75] Frederick P Brooks. The mythical man-month, 1975.
[Bro19] Michael L. Brodie, editor. Making Databases Work. Morgan & Claypool, 2019.
[DE19a] DB-Engines. DB-Engines ranking, 2019. db-engines.com/en/ranking . (Last accessed January 4, 2019).
[DE19b] DB-Engines. Method of calculating the scores for the DB-Engines ranking, 2019. db-engines.com/en/ranking_definition (Last accessed January 4, 2019).
[DE19c] DB-Engines. PostgreSQL is the DBMS of the year 2018, January 2019. db-engines.com/en/blog_post/79 (Last accessed January 4, 2019).
[DS08] David DeWitt and Michael Stonebraker. Mapreduce: A major step backwards. The Database Column, 1:23, 2008.
[Gut84] Antonin Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Data Management, SIGMOD '84, pages 47–57, New York, NY, USA, 1984. ACM.
[HKM ⁺ 02] Joseph M. Hellerstein, Elias Koutsoupias, Daniel P. Miranker, Christos H. Papadimitriou, and Vasilis Samoladas. Bounds for range queries. J. ACM, 49 (1): 35–55, January 2002.
[HNP95] Joseph M. Hellerstein, Jeffrey F. Naughton, and Avi Pfeffer. Generalized search trees for database systems. International Conference on Very Large Data Bases, VLDB '95, pages 562–573, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.
[HRS ⁺ 12] Joseph M Hellerstein, Christoper Re, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, et al. The MADlib analytics library: or MAD skills, the SQL. Proceedings of the VLDB Endowment, 5 (12): 1700–1711, 2012.
[IBM10] IBM to acquire Netezza, September 2010. www-03.ibm.com/press/us/en/pressrelease/32514.wss#release (Last accessed January 22, 2018).
[KMH97] Marcel Kornacker, C. Mohan, and Joseph M. Hellerstein. Concurrency and generalized search trees. ACM SIGMOD International Conference on Data Management, SIGMOD '97, pages 62–72, New York, NY, USA, 1997. ACM.
[Mal10] Om Malik. Big Data = Big Money: EMC Buys Greenplum. In GigaOm, July 2010. gigaom.com/2010/07/06/emc-buys-greenplum .
[Mon96] John Monroe. Informix acquires illustra for complex data management. In Federal Computer Week, January 1996.
[OFS83] James Ong, Dennis Fogg, and Michael Stonebraker. Implementation of data abstraction in the relational database system ingres. ACM Sigmod Record, 14 (1): 1–14, 1983.
[Ols93] Michael A. Olson. Inversion of file system. 1993.
[SAHR84] Michael Stonebraker, Erika Anderson, Eric Hanson, and Brad Rubenstein. Quel as a data type. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD '84, pages 208–214, New York, NY, USA, 1984. ACM.
[Sho11] Erick Shonfeld. Big pay day for big data. teradata buys aster data for $263 million. In TechCrunch, May 2011. techcrunch.com/2011/03/03/teradata-buys-aster-data-263-million (Last accessed January 22, 2018).
[SHWK76] Michael Stonebraker, Gerald Held, Eugene Wong, and Peter Kreps. The design and implementation of ingres. ACM Transactions on Database Systems (TODS), 1(3):189–222, 1976.
[SK91] Michael Stonebraker and Greg Kemnitz. The postgres next generation database management system. Commun. ACM, 34(10):78–92, October 1991.
[SR86] Michael Stonebraker and Lawrence A. Rowe. The design of postgres. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, SIGMOD '86, pages 340–355, New York, NY, USA, 1986. ACM.
[SRG83] M Stonebraker, B Rubenstein, and A Guttman. Application of abstract data types and abstract indices to cad bases. IEEE Trans, on Software Engineering, 1983.
[Sto86] Michael Stonebraker. The case for shared nothing. IEEE Database Eng. Bull., 9(1):4–9, 1986.
[Sto87] Michael Stonebraker. The design of the postgres storage system. In Proceedings of the 13th International Conference on Very Large Data Bases, VLDB '87, pages 289–300, San Francisco, CA, USA, 1987. Morgan Kaufmann Publishers Inc.
[Sto95] Michael Stonebraker. An overview of the sequoia 2000 project. Digital Technical Journal, 7(3):39–49, 1995.
[Sto14] Michael Stonebraker. The land sharks are on the squawk box, 2014. www.acm.org/turing-lecture-stonebraker (Last accessed January 4, 2019).

Source: https://habr.com/ru/post/438890/