All with the upcoming May holidays! Probably, the topic is hackneyed, although I did not find all the same clear indications in the basic manuals for the DBMS and books on the database. I wanted to ask what would be more correct if the primary key is a numeric value and, accordingly, have the index PK or the primary key of the text field, which is also a unique value? This is about the feasibility and productivity of working with such a table. For some reason, I still tend to the first, that the numeric increment is more correct for a table with reference information, for example, car numbers. I use Postgres DBMS. I would like to know both for this subd and for other DBMS the advantages and disadvantages of using a primary key of a numerical value or a text field.

  • There is no serious difference. However, I vote for a simple numerical idisk simply because if a table appears in the system with arbitrary references to records (that is, from any tables), it will be much easier to refer to a regular id, and because the history of adding will be saved even in what something, though not necessarily very accurate. As far as performance is concerned, the incremental identifier most likely is slightly higher than the string, but the application requirements here should be valued above this minimum gain. - etki
  • Etki, that's what I wanted to hear. And if we talk about indexing such a field, then indexing a numerical field will also work much faster than the index of the test field. Is not it? I will underline the text field, not the string one. And this is a big difference. Currently citext is used - this is a special data type for storing text, ignoring case and encoding features. - IntegralAL
  • The index should be kept in a good way in the RAM, so that the difference, although it will be, is likely to be imperceptible even on big data. In practice, I did not check it, but with enough desire, you can fill the database with millions of other random records and evaluate it. - etki

2 answers 2

The primary key in the database is, as a rule, the value that is used to refer to entries in this table. Therefore, if such links are provided in principle, it is desirable to keep the primary key as small as possible.

In principle, the above is enough to make the primary keys numeric in all reference tables. But if, nevertheless, for some reason you have to make the primary key string, then in no case should you make it a clustered index, unless it is the only index in the table.

Of course, I'm talking about long lines. If a string takes up less space than a “regular” number, there should be no problems with it.

  • Can you give a reason why string PK should not be clustered? - Petr Abdulin
  • Because a clustered index, speeding up one of the queries (sampling by its key), slows down all the others. The presence of other indexes other than cluster implies that the queries to this table will be different. Therefore, the speed of the clustered index is critical. And the indices for string fields have a rather small branching ratio, compared to indices for numeric fields, which slows down their work. - Pavel Mayorov
  • What kind of ramification is meant if the cluster index is, by definition, a ordered sequence (allowing you to do a binary search)? Also, I did not understand what you meant by "slows down everyone else." - Petr Abdulin
  • A clustered index is the same B + -tree as other indices. And the slowdown is as follows. To find a record by some index in a table without a clustered index, you must first find its row number in the index, and then find the record by its number in the table. If there is a clustered index, you must first find its cluster key in the index, and then perform a search on the clustered index. The length of the cluster key increases the cost of both search operations by increasing the height of the tree. - Pavel Mayorov
  • one
    So you answered yourself, why the string index should not be clustered - after all, it is impossible to make an auto-increment on it! :) - Pavel Mayorov

In general, the faster will be the one that is smaller in size, i. numeric almost always.

The advantage of using text can be if “someone” does a lot of WHERE on this field, but at the same time this “someone” does not know the internal Id . Such situations are quite rare, because usually, to select a value, a reference book of the form {Id: Name} preliminarily provided, i.e. Id available to the client.