Hello to all!

There was a task - to optimize the structure of the MySQL database, highlighting some fields in separate tables.

The database contains data on products, their categories and several additional fields. properties.

Everything is contained in one huge table of VARCHAR fields (mostly).

It is necessary to find unique VARCHAR lines for different fields and select them into separate tables, leaving only the key in the main one.

Is it possible to do this using only MySQL?

If not, then tell me at least the general algorithm - I'm very afraid to spoil the data in this database myself.

1. - The structure of the existing database - only necessary in this case the field.

MAIN TABLE

`id` int(10) unsigned NOT NULL AUTO_INCREMENT - (ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ Π˜Π” ΠΏΠΎΠ·ΠΈΡ†ΠΈΠΈ Ρ‚ΠΎΠ²Π°Ρ€Π°) `prod_name` varchar(255) NOT NULL - Π½Π°Π·Π²Π°Π½ΠΈΠ΅ ΠΏΠΎΠ·ΠΈΡ†ΠΈΠΈ(Π½Π΅ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ΅) `prod_cat` varchar(64) NOT NULL - Π½Π°Π·Π²Π°Π½ΠΈΠ΅ ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΠΈ(Π½Π΅ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ΅) `prod_prop01` varchar(64) NOT NULL - Π½Π°Π·Π²Π°Π½ΠΈΠ΅ свойства1(Π½Π΅ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ΅) `prod_prop02` varchar(64) NOT NULL - Π½Π°Π·Π²Π°Π½ΠΈΠ΅ свойства2(Π½Π΅ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ΅) 

2. - I would like to get a trail. structure

MAIN TABLE

 `id` int(10) unsigned NOT NULL AUTO_INCREMENT - (ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ Π˜Π” ΠΏΠΎΠ·ΠΈΡ†ΠΈΠΈ Ρ‚ΠΎΠ²Π°Ρ€Π°) `prod_name` int(10) - Π˜Π” названия ΠΏΠΎΠ·ΠΈΡ†ΠΈΠΈ(ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ) `prod_cat` int(10) - Π˜Π” ΠΊΠ°Ρ‚Π΅Π³ΠΎΡ€ΠΈΠΈ(ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ) `prod_prop01` int(10) - Π˜Π” свойства1(ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ) `prod_prop02` int(10) - Π˜Π” свойства2(ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ) 

TABLE OF NAMES

 `id` int(10) unsigned NOT NULL AUTO_INCREMENT - (ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ Π˜Π” Названия) `name` varchar(64) NOT NULL - Π½Π°Π·Π²Π°Π½ΠΈΠ΅ ΠΏΠΎΠ·ΠΈΡ†ΠΈΠΈ(ΡƒΠ½ΠΈΠΊΠ°Π»ΡŒΠ½ΠΎΠ΅!) 

And the same tables for - categories, properties1 and properties2.

I am sure that the task is pretty banal - DB decomposition.

But I'm not special. I have not yet managed to finish reading the chapters on this, and I have a rather superficial understanding of the methods of decomposition.

I will be glad to any practical advice. Thank.

  • In relational database theory, this is called normalization, and what you want to get is the second normal form - renegator

2 answers 2

The algorithm is quite simple: each table is for storing individual entities. In your case, these entities are:

  • prod_cat - categories
  • prod_name - products
  • prod_prop - properties

In one category there may be several products, one product may have several properties. The data is linked by keys; for this, a primary key is created in each table, and a foreign key is also created in the child ones (product and property tables):

Category table:

 id int, prod_cat text 

Product table:

 id int, id_cat int, prod_name text 

Property table:

 id int, id_prod int, prod_prop text 

Property1 and property2, if you start from your question, one entity. However, if we are talking about, for example, the expiration date and the bar code, then it makes sense to leave these data in the table of goods:

 id int, id_cat int, prod_name text, best_before date, bar_code text 

Rotate it using pure MySQL, you can. Create the tables you need, copy the source table (just in case) and create several fields for the primary keys from the optimized tables in it, and then copy the data you need into the optimized tables. ( The principle is this: you transferred data from the source table, in your new table, this generated data in the id (auto-increment) field, you copy these id into the original table. Repeat until all the data is spread over the desired tables. )

Pay attention to such requests:

  • SELECT DISTINCT ... - removes duplicates from the selection,
  • GROUP BY ... is also a powerful tool for combining identical data.
  • INSERT INTO ... SELECT ... is a copy of the data from a table to a table.

Do not worry about the data: with such a transfer you will not lose them, because the whole process is copying from tables to tables. However, backup never hurts.

    I disagree with the previous answer a little, there was a similar task to also split the base into structurally nested, but as it turned out, all text values ​​in the original table were driven in manually and who wrote as spelling errors, spaces and similar trifles. They came out of the situation in the following way - they formed a directory through the GROUP BY on the column, manually twisted the twins (it was necessary to break the brain in some cases) and then according to the proposed algorithm. Unfortunately, the system for which this was done lived only a year after our processing (which took 1 day of preparation and 2 weeks of manual editing of directories) due to the crisis of 2009, the client went bankrupt, and therefore it is a pity a bit :)

    • In my case, everything is not so sad, the data were driven in a semi-automatic way from other systems. Therefore, all the names and property lines without errors. - coderus