Document storage

Question

The question is purely theoretical.

How to store in the database (MySQL, PostgreSQL) information about documents of different structure, while after commissioning do not change the structure of fields when new documents appear?

For example, when developing there is the following sample document

Организация: Рога и Копыта ООО Дата: 24.10.2011 Валюта: RUB Сумма: 10000,00

When the code was put into operation, a new type of document appeared that needs to be stored in the database, while logically practically not connected with the first

 Организация: Ромашка ООО Дата: 25.10.2011 Наименование Ед изм Цена Товар 1 шт. 5000,00 Товар 2 г. 1000,00

And there in a month you still need to keep another document.

 Дата 31.12.2011 Сумма: 1000000,00 Назначение: Премия под Новый Год

That is, we need a kind of pattern for universal storage of documents. Simply XML in the field is not suitable. The data will need to be analyzed later. That is, insert reports into any reports using SQL queries.

Nofate ♦ Nofate 32.5k 13 55 86 · Answer 1 · 2011-10-24T08:46:05

Perhaps not the best solution, but still:

Table under the description of the types of documents.

 ТипыДокументов (Код, Наименование, ....)

Table for specific copies of documents and their common attributes (number, date, comment, held / not held, etc.):

 ЭкземплярыДокументов (Код, ТипДокумента, ... общие поля...)

The list of possible fields in the documents:

 ПоляДокументов (Код, Наименование, ТипДокумента, ТипДанных)

Binding document fields:

 ПоляПоДокументам (Код, ТипДокумента, ПолеДокумента)

Field values in documents:

 Значения (Код, ЭкземплярДокумента, ПолеДокумента, Значение)

As a result, something like Key-Value storage will be released.

You can get all the fields of a specific document like this:

 select пд.Наименование, з.Значение from Значения з left join ПоляДокументов пд on пд.Код=з.ПолеДокумента where з.ЭкземплярДокумента = <код_нужного_документа>

sketched hastily, but I think you caught the idea.

> It works slowly. It works for us, I don’t complain) Moreover, in the same table there is a history of the field values and a sample request more cumbersome.
Scope: more than a hundred types of documents, about 900 possible fields, 1-50 fields in the document, thousands of documents in the database.

Answer 2 · 2011-10-24T09:12:05

There is such an approach.

A table is created with all columns that may be needed, such as int1, int2, string1, string2, date1 ...
A document type table and its associated column table are created, containing the document type identifier, the external column name, and the column name in the general column table.
A generic table of documents is created, containing the document identifier and the document type identifier.

After that, you can refer to the sample from the table of documents, which is formed by the table of document types and their columns, as a subquery.

In principle, you can still study the structure of representations that display the structure of database objects so that you can control the physical structure of the database.

This method has a weak point: the number of columns in the table is limited.
- For MyISAM - 65 KB per line or 4096 columns (which is the end, if you keep a lot of long lines, then we rest on the limit on the volume).

Anton Gerasimov Anton Gerasimov eleven 2 · Answer 3 · 2015-07-05T14:48:28

The task is reduced to the storage of poorly structured data.

In PostgreSQL, pay attention to the JSONB data type ( http://www.postgresql.org/docs/9.4/static/datatype-json.html )

Here is a good presentation on this topic: http://www.sai.msu.su/~megera/postgres/talks/RIT-Bartunov-Korotkov-2014.pdf

Add to the answer the minimum required solution example (information on the link can be deleted and the answer will lose value).

Nicolas Chabanovsky ♦ 38.2k 54 220 437 · Answer 4 · 2011-10-24T08:35:21

If pure sql, then the table:

record number
value type
value

Example:

 (1, Организация, Ромашка ООО) (1, Дата, 25.10.2011 (1, Товар 1 5000,00) (1, Товар 2 1000,00) (1, Товар 1 размерность, шт.) (1, Товар 2 размерность, г.) (2, Дата, 31.12.2011) (2, Сумма, 1000000,00) (2, Назначение, Премия под Новый Год)

Dare)

In general, I agree with the conditional types of documents 1 and 3, but is there a question in the field VALUE will be text?
And with documents of type 2 it is necessary to keep the table and not just Type = Value
You need something like EXCEL with an arbitrary number of rows and columns and not necessarily in the same table
the value field may be completely absent in this table, it can be stored depending on the type in another table with a link to it, or some data types can be stored with a link to a separate table.
but honestly I advise you to abandon such an idea, and to modify the structure of the table.
if necessary, the crap synchronization is greater, but the increase in processing speed is more significant

renegator renegator 3,848 7 7 · Answer 5 · 2011-10-24T09:35:23

If we remove the condition that the tables in the database do not change (and who set the condition?), Then a possible solution would be to generate queries using metadata.

renegator

3,848 7 7

This is due to the fact that the program has several geographically separated local databases (the transmission channels given are unstable) that exchange with each other while changing the structure of the tables in one of the databases, synchronization between the others is necessary. This stops the exchange. It turns out something like a distributed database in 1C when you need to update the configuration when changing metadata and there is simply no professional in remote branch offices - dsh
Such questions: what is the need to synchronize the structure? If the databases are fully synchronized, will the bad channel synchronize data even more? questions can be considered rhetorical - renegator
The method of storing data in the database is in no way connected with the need to exchange synchronization data over unreliable channels. If it is not assumed that the user can change the database structure, and its change occurs only at the design stage, then it is definitely necessary to study the metadata views and work with the changing physical structure of the database. - Modus
DSN, I do not know what you have developed a client application, but try to look in the direction of serialization. General parameters for all documents (No., date, author ...) you write in the database, and you implement specific parameters as a class object that you serialize and write to the database (blob - field). Those. table structure: id, date_s, author, doc_object. I met a similar implementation in one of the old journals “Computers + Programs” - Vyacheslav Kirichenko
This is good if you do not need to make requests for non-standard attributes of documents. - Modus

|

Document storage

5 answers 5

More articles: