First of all, a question for experienced colleagues: are tables allowed without a PC (primary key) allowed in a professional database? The situation is this - there is a client table with different fields including ID (which is the pc for the table), but there is also a table of contact data (without pc) where the columns consist of (ID, phone, email, priority method of communication, etc.) other ways ) Should I establish a connection between these tables and how should it cost? After all, there may be many numbers and they can be added.

  • If there is no key, create an id for the record. Suppose you accidentally added the same phone 2 times for ID = 1. And now you want to delete one of the entries. Write delete to do this. delete where id=1 and phone='123-45-67' will delete both entries, because they are completely the same ... - Mike

3 answers 3

tl; dr;

Yes, add FK. If you have one entry in the contact table for each customer, make Id (CustomerID) in the contact table a primary key + clustered index. If there is more than one for one customer, add a separate column CustomerDetailsID, and make PK + CI it + add a nonclustered index on CustomerID.

If the email and the preferred method of contact for the client is one, and there are many telephones - take the telephones into a separate table.

Long version

Ok, in order to understand whether tables without a Primary Key are valid, you must first understand what PK is and how it relates to indexes.

Primary Key and Foreign Key are, above all, logical concepts.

PK is a column (or several columns) that uniquely identifies a record. Those. one PK value corresponds to one record in the current table. For example, the ID value in the customer table uniquely identifies the customer record in this table.

FK is a column, each value of which is uniquely associated with some record in another table. For example, for each СustomerID in the contact data table there is exactly one entry in the Customers table.

PK and FK are properties of the data structure itself. Purely theoretically, it doesn’t matter if you put the PK mark on Customer.ID, and if you created FK ContactDetails.CustomerID -> Customer.ID — columns from this will not cease to identify the records. For example, there are no FKs in the Team Foundation Server database, which does not prevent it from working quite normally :)


Why then put the PK and FK marks when creating the database in SQL Server?

  • This allows SQL Server to maintain strong uniqueness, protecting you from data errors. Those. he simply won't let you insert another Customer record with the same ID. And it will not allow an entry for a non-existent customer to be entered into the ContactDetails table.
  • This allows SQL Server to build queries more efficiently. For example, when searching for a customer by ID, he will know for sure that he will find no more than one line. And not, for example, 100,500 customers with ID = 1. And he will select the appropriate query plan, allocate the appropriate amount of memory for the query, etc.

What does this all have to do with indices? The fact is that to maintain the integrity of PK and FK SQL Server, you need certain physical structures in the database.

There are two table storage formats in SQL Server.

  • A bunch of. Actually, the name speaks for itself - it's just all the rows in the table that are on the disk as (um) tables. To find something in a pile - you have to go through the whole pile. This operation is called Table Scan, and it is terribly inefficient with a large amount of data (it actually goes through all the data, puts locks on them, in general, in a real system it usually doesn’t do anything good)
  • Cluster index. This is a tree, built on some column with a unique value (or several columns), in the leaves of which lie the very lines of the table. The cluster index allows you to quickly search for data by the value of the column itself.

In addition to cluster indexes, there are also nonclustered ones - these are exactly the same search trees, but in the leaves they have the value of a cluster index (or rowid from a heap). Those. they allow you to find a value (for example, the date of registration) for a column from a clustered index (by which you can then select the rows themselves). A nonclustered index may impose additional restrictions — for example, data uniqueness. But nevertheless - he does not store the data itself (by default).


Ok, how do these physical structures match PK and FK?

PK needs the ability to quickly verify the existence and uniqueness of a record. Therefore, PK is created either on the basis of a clustered index, or on the basis of a unique non-clustered index. He just can't hang in the air like that.

A typical candidate for a clustered index is a primary key. both the cluster index value and the PK value must be unique, must uniquely identify the string, etc. - and in real-world circuits a situation rarely arises when two different columns fall under these requirements at once.

The same Management Studio creates a single button for both PK and Clustered Index. Therefore, the cluster index and Primary Key are considered almost synonymous. Although in fact there is a technical possibility to create a clustered index on one column, and PK - on another.

FK does not need a supporting structure in the table on which it is specified. But he needs a supporting structure in the table to which he refers. Because it should check existence and uniqueness, but only in another table - then the requirements for this structure are the same as for the PK structure in the table referenced by FK.

For example, when pasting to ContactDetails, SQL Server must verify that the CustomerID value for the inserted CustomerID value is the appropriate (and exactly one!) Record in Customers. Therefore, for FK from the side of Customer, you need either a clustered index for the same column, or at least a unique key. In the ContactDetails table, no data structures are required for this FK.

  • "But nevertheless - it does not store the data itself (by default)." Just the same stores. A nonclustered index stores columns explicitly included in the index (in the main list and Include) and the cluster index key of this table, or the RID if the table is stored in a heap. Cluster, respectively, stores all the columns of the table, and, in fact, is the main storage location for the table data. - minamoto
  • In all other respects - chic and as complete as possible. - minamoto
  • @minamoto I tried to express it in "(default)". I will try to add in response in some form :) - PashaPash ♦

Are tables without a PC (primary key) allowed in a professional database?

The primary key in the table is needed so that you can uniquely identify the record - for example, if there are complete duplicates in the table for the remaining fields, or when using foreign keys (links) to this table.

If this particular table does not require anything like this, then the presence of a primary key in it is optional. Not kosherno, of course, but there’s nothing particularly bad about it. Although systems often evolve, such a need may arise in the future - so sometimes it makes sense to introduce a synthetic primary key into the structure simply “just for every fireman”.

By the way, if you do not even create a primary key, it is not a fact that it does not exist. The DBMS may well enter (and most likely enter) into the structure a field that is hidden and inaccessible to you, which will play the role of a synthetic primary key that identifies the record. For example, a DBMS needs to somehow identify which table entry corresponds to an entry in the index ...

whether to establish a relationship between these tables

When creating such a connection by means of a DBMS (creating a foreign key), you will shift the concern of controlling the integrity and consistency of the information onto the shoulders of a specially designed service for SQL Server. This will save you from potential violation of the logical integrity of data, which is generally useful - if you do not create such a connection and assign this task to client software, which is much worse adapted for such functions, then you can have unnecessary problems, especially in case of emergency situations. So if logic requires such a connection, it must be created.

    If you understand correctly, then there is a table without a clustered index. As a rule, such tables are called heaps. They should be used, for example, in the following cases:

    1) There will not be many records in the table and we need to do a scan when accessing the table.

    2) If nonclustered indexes are created and data is accessed through them.

    3) When we know that a massive data recording will be made to the table.

    There are other options for using heaps. It all depends on the architecture of the project. In large projects, this is quite common.

    More details can be read on msdn .

    • Comments are not intended for extended discussion; conversation moved to chat . - Nick Volynkin ♦