The question is purely theoretical.

How to store in the database (MySQL, PostgreSQL) information about documents of different structure, while after commissioning do not change the structure of fields when new documents appear?

For example, when developing there is the following sample document

ΠžΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΡ: Π ΠΎΠ³Π° ΠΈ ΠšΠΎΠΏΡ‹Ρ‚Π° ООО Π”Π°Ρ‚Π°: 24.10.2011 Π’Π°Π»ΡŽΡ‚Π°: RUB Π‘ΡƒΠΌΠΌΠ°: 10000,00 

When the code was put into operation, a new type of document appeared that needs to be stored in the database, while logically practically not connected with the first

 ΠžΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΡ: Ромашка ООО Π”Π°Ρ‚Π°: 25.10.2011 НаимСнованиС Π•Π΄ ΠΈΠ·ΠΌ Π¦Π΅Π½Π° Π’ΠΎΠ²Π°Ρ€ 1 ΡˆΡ‚. 5000,00 Π’ΠΎΠ²Π°Ρ€ 2 Π³. 1000,00 

And there in a month you still need to keep another document.

 Π”Π°Ρ‚Π° 31.12.2011 Π‘ΡƒΠΌΠΌΠ°: 1000000,00 НазначСниС: ΠŸΡ€Π΅ΠΌΠΈΡ ΠΏΠΎΠ΄ Новый Π“ΠΎΠ΄ 

That is, we need a kind of pattern for universal storage of documents. Simply XML in the field is not suitable. The data will need to be analyzed later. That is, insert reports into any reports using SQL queries.

    5 answers 5

    Perhaps not the best solution, but still:

    Table under the description of the types of documents.

     Π’ΠΈΠΏΡ‹Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² (Код, НаимСнованиС, ....) 

    Table for specific copies of documents and their common attributes (number, date, comment, held / not held, etc.):

     ЭкзСмплярыДокумСнтов (Код, Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°, ... ΠΎΠ±Ρ‰ΠΈΠ΅ поля...) 

    The list of possible fields in the documents:

     ΠŸΠΎΠ»ΡΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² (Код, НаимСнованиС, Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°, Π’ΠΈΠΏΠ”Π°Π½Π½Ρ‹Ρ…) 

    Binding document fields:

     ΠŸΠΎΠ»ΡΠŸΠΎΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°ΠΌ (Код, Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°, ΠŸΠΎΠ»Π΅Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°) 

    Field values ​​in documents:

     ЗначСния (Код, ЭкзСмплярДокумСнта, ΠŸΠΎΠ»Π΅Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°, Π—Π½Π°Ρ‡Π΅Π½ΠΈΠ΅) 

    As a result, something like Key-Value storage will be released.

    You can get all the fields of a specific document like this:

     select ΠΏΠ΄.НаимСнованиС, Π·.Π—Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ from ЗначСния Π· left join ΠŸΠΎΠ»ΡΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² ΠΏΠ΄ on ΠΏΠ΄.Код=Π·.ΠŸΠΎΠ»Π΅Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π° where Π·.ЭкзСмплярДокумСнта = <ΠΊΠΎΠ΄_Π½ΡƒΠΆΠ½ΠΎΠ³ΠΎ_Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°> 

    sketched hastily, but I think you caught the idea.

    • It works slowly. - Modus
    • Do you have other options? - Nofate ♦
    • > It works slowly. It works for us, I don’t complain) Moreover, in the same table there is a history of the field values ​​and a sample request more cumbersome. Scope: more than a hundred types of documents, about 900 possible fields, 1-50 fields in the document, thousands of documents in the database. - Nofate ♦
    • Thanks for the example of their real practice - dsh

    There is such an approach.

    1. A table is created with all columns that may be needed, such as int1, int2, string1, string2, date1 ...
    2. A document type table and its associated column table are created, containing the document type identifier, the external column name, and the column name in the general column table.
    3. A generic table of documents is created, containing the document identifier and the document type identifier.

    After that, you can refer to the sample from the table of documents, which is formed by the table of document types and their columns, as a subquery.

    In principle, you can still study the structure of representations that display the structure of database objects so that you can control the physical structure of the database.

    • This method has a weak point: the number of columns in the table is limited. - For PgSQL it is 250-1600, depending on the type. - For MyISAM - 65 KB per line or 4096 columns (which is the end, if you keep a lot of long lines, then we rest on the limit on the volume). - For InnoDB - 1000 columns. - Nofate ♦

    The task is reduced to the storage of poorly structured data.

    In PostgreSQL, pay attention to the JSONB data type ( http://www.postgresql.org/docs/9.4/static/datatype-json.html )

    Here is a good presentation on this topic: http://www.sai.msu.su/~megera/postgres/talks/RIT-Bartunov-Korotkov-2014.pdf

    • Try to write more detailed answers. Explain what is the basis of your statement? Add to the answer the minimum required solution example (information on the link can be deleted and the answer will lose value). - Nicolas Chabanovsky ♦

    If pure sql, then the table:

    1. record number
    2. value type
    3. value

    Example:

     (1, ΠžΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΡ, Ромашка ООО) (1, Π”Π°Ρ‚Π°, 25.10.2011 (1, Π’ΠΎΠ²Π°Ρ€ 1 5000,00) (1, Π’ΠΎΠ²Π°Ρ€ 2 1000,00) (1, Π’ΠΎΠ²Π°Ρ€ 1 Ρ€Π°Π·ΠΌΠ΅Ρ€Π½ΠΎΡΡ‚ΡŒ, ΡˆΡ‚.) (1, Π’ΠΎΠ²Π°Ρ€ 2 Ρ€Π°Π·ΠΌΠ΅Ρ€Π½ΠΎΡΡ‚ΡŒ, Π³.) (2, Π”Π°Ρ‚Π°, 31.12.2011) (2, Π‘ΡƒΠΌΠΌΠ°, 1000000,00) (2, НазначСниС, ΠŸΡ€Π΅ΠΌΠΈΡ ΠΏΠΎΠ΄ Новый Π“ΠΎΠ΄) 

    Dare)

    • In general, I agree with the conditional types of documents 1 and 3, but is there a question in the field VALUE will be text? And with documents of type 2 it is necessary to keep the table and not just Type = Value - dsh
    • You need something like EXCEL with an arbitrary number of rows and columns and not necessarily in the same table - dsh
    • the value field may be completely absent in this table, it can be stored depending on the type in another table with a link to it, or some data types can be stored with a link to a separate table. - Vladimir Klykov
    • but honestly I advise you to abandon such an idea, and to modify the structure of the table. if necessary, the crap synchronization is greater, but the increase in processing speed is more significant - Vladimir Klykov

    If we remove the condition that the tables in the database do not change (and who set the condition?), Then a possible solution would be to generate queries using metadata.

    • This is due to the fact that the program has several geographically separated local databases (the transmission channels given are unstable) that exchange with each other while changing the structure of the tables in one of the databases, synchronization between the others is necessary. This stops the exchange. It turns out something like a distributed database in 1C when you need to update the configuration when changing metadata and there is simply no professional in remote branch offices - dsh
    • Such questions: what is the need to synchronize the structure? If the databases are fully synchronized, will the bad channel synchronize data even more? questions can be considered rhetorical - renegator
    • The method of storing data in the database is in no way connected with the need to exchange synchronization data over unreliable channels. If it is not assumed that the user can change the database structure, and its change occurs only at the design stage, then it is definitely necessary to study the metadata views and work with the changing physical structure of the database. - Modus
    • DSN, I do not know what you have developed a client application, but try to look in the direction of serialization. General parameters for all documents (No., date, author ...) you write in the database, and you implement specific parameters as a class object that you serialize and write to the database (blob - field). Those. table structure: id, date_s, author, doc_object. I met a similar implementation in one of the old journals β€œComputers + Programs” - Vyacheslav Kirichenko
    • This is good if you do not need to make requests for non-standard attributes of documents. - Modus