Started doing a project on Django. The following data is available in JSON (I cite part of the structure, the rest is repeated):

[{"name": "Hydrogen", "atomic_number": 1, "symbol": "H", "thermal": {"absolute_boiling_point": { "value": 234, "unit": "K"}, "heat_of_vaporization": { "value": {"p": 4, "M": 0.452, "n": 10}, "unit": "kJ/mol"}}}] 

For clarity:

enter image description here

It can be seen that the structure is quite complex and repeatedly nested.

The element has properties without a group (type, Name, symbol, atomic_number, which is about ten and they are common to all elements) and property groups (there are several dozen groups, each of which also includes several dozen properties).

As you can see, the value of the properties can be represented by a string, decimal number and engineering record (for which three numbers are used - mantissa, base, degree)

There was a problem - I can not describe this structure through the relational database of the project (postgresql), I can not establish links.

Will a relational database suit me? Or should you look for a noSQL solution? Do I choose the right tools for the project?

Created models for item

 class Element(models.Model): class Meta: verbose_name = 'Element' verbose_name_plural = 'Elements' name = models.CharField(verbose_name="Name", max_length=255, blank=True, null=True) symbol = models.CharField(verbose_name="Symbol", max_length=255, blank=True, null=True) 

Various recording forms

 class ScientificNotation(models.Model): class Meta: verbose_name = 'Scientific Notation' significand = models.FloatField(verbose_name="Significand", blank=True, null=True) base = models.IntegerField(verbose_name="Base", blank=True, null=True) exponent = models.IntegerField(verbose_name="Exponent", blank=True, null=True) unit = models.CharField(verbose_name="Unit", max_length=255, blank=True, null=True) 

 class DecimalNotation(models.Model): class Meta: verbose_name = 'Decimal Notation' value = models.FloatField(verbose_name="Value", blank=True, null=True) unit = models.CharField(verbose_name="Unit", max_length=255, blank=True, null=True) 

Property groups With this model, and there is a problem.

 class ThermalProperties(models.Model): class Meta: verbose_name = 'Thermal properties' verbose_name_plural = 'Thermal properties' element = OneToOneField(Element) melting_point = ForeignKey(ScientificNotation) boiling_point = ForeignKey(ScientificNotation) heat_of_vaporization = ForeignKey(DecimalNotation) 

I can not refer to the same model (in a specific example, ScientificNotation)

  ERRORS: table.ThermalProperties.boiling_point: (fields.E304) Reverse accessor for 'ThermalProperties.boiling_point' clashes with reverse accessor for 'ThermalProperties.melting_point'. HINT: Add or change a related_name argument to the definition for 'ThermalProperties.boiling_point' or 'ThermalProperties.melting_point'. 

And I will duplicate the question here if someone has read it. Will a relational database suit me? Do I choose the right tools for the project?

  • added clarification that the source data is in JSON format. The element has properties without a group (type, Name, symbol, atomic_number, which is about ten and they are common to all elements) and property groups (there are several dozen groups, each of which also includes several dozen properties). That is, in the end, the total number of properties will be hundreds - while1pass

2 answers 2

Good question, I'm glad to donate an hour to write an answer :)

It would not sound loud headline, but this structure is more simple than complex.

Relational structure

Almost any structure can be described in a relational database, although it will be quite tough

Please note that there are design patterns in relational databases too, and let's try first to implement one of the correct patterns called Class Table Inheritance. It guarantees the placement of different elements in different tables in which they should be located:

  1. element - structure for elements

    • name
    • atomic_number
    • symbol
  2. element_thermal - structure for parameters

    • atomic_number
    • type
    • unit
  3. element_thermal_mol - structure for mol / kg parameters

    • atomic_number
    • p
    • M
    • n
  4. element_thermal_value - structure for ordinary parameters

    • atomic_number
    • value

Here, depending on the unit, you need to select the table that you want to make the JOIN. Here we use the data separation approach into tables, which fully complies with the relational structure and its rules, but with each new type and data you will have to add a new table and complicate saving and business logic.

OK. Let's try the Entity-Attributes-Values ​​(EAV) approach, it is often called the anti-design pattern because of the lack of type control, data control, data placement in one table, etc., and in general this is an inversion of the relational structure inside out, but the approach itself is small data is quite well manifested.

  1. element - structure for elements

    • name
    • atomic_number
    • symbol
  2. element_thermal - structure for parameters

    • atomic_number
    • type
    • unit
  3. element_thermal_values - structure for all parameters

    • atomic_number
    • parameter_type
    • value

In fact, you keep all your parameter values ​​in element_thermal_values ​​by writing the parameter name (INT, P, M, N) in parameter_type, then JOIN this table and get all the parameters and process them, the request for data is always one, but of course control should be provided at the code level.

On a small project there will be no difference in what you choose, the second option is simply simpler, the first is more correct.

Yes, in fact, you can not create a separate table at all, but simply stuff the value into value in the form of json, you simply cannot work with them and somehow execute queries, so this approach is not always correct, although databases like PostgreSQL allow you to produce some operations at the JSON level.

NoSQL (Unstructured databases)

Want to use noSQL, use. Nobody limits you in this and your structure, which is to some extent dynamic, will be suitable for MongoDB when you use all the charms of a document-oriented approach. Your item itself looks like a document.

If the data has no links or there are very few of them with other data, then in general, NoSQL (MongoDB) is an excellent choice for your task and will be able to simplify to some extent the work with data.

In general, the code size will be about the same , since handlers at value during output or processing will still have to be written.

Conclusion

I recommend that you try several approaches , start with a new MongoDB, then try competent inheritance in relational tables, compare the pros and cons of various implementations and see what works for you, experience will not be less.

MongoDB will interest you by the fact that it is a database in essence in which data is stored as accessible by the application (JSON) and a variable data structure, which also has pros and cons, of which pros are still scalable.

The old approach to SQL perfectly solves the problems of building systems, where data integrity, compliance with ACID, heavy queries to join with many tables are needed.

There are pros and cons in the two approaches, in my case, if I wrote a system that contained a periodic table and I would like to touch the new technologies, I would take MongoDB and know Map-Reduce, etc.


We often use Mongo for data, whose structure is difficult to foresee, for example, logging as a history of user actions with different types of parameters, of course it would be good to stuff it into MySQL, but the desire to create a separate table for certain data simply does not exist, but using the EAV approach on large the amount of data simply destroys all performance , in MongoDB there is a good scalability and variable data structure, so the pros outweigh the disadvantages.

  • one
    The relational database allows you to store non-relational data (for the postgres there is a JSON, JSONB type). You can build indexes on them, search, retrieve individual values, I won’t say about foreign keys, did not try. Those. all relational data is stored relationally, and if the correlation model falls poorly, make a field with json. - NumminorihSF
  • Thank you for the detailed answer, you made me look at the project differently. - while1pass
  • I added a clarification to the question and corrected it a little without changing the basic idea. Each element will have hundreds of properties and dozens of groups in which these properties are nested. Therefore, the CTI approach will be too complicated - you will have to create a new table for each property, as I understood from your answer. The second approach will not work, as the structure is too complicated - while1pass
  • But with the MongoDB option you intrigued me and gave me a direction, I will try - while1pass

Will a relational database suit me?

Yes, it is fine.

Each element will have hundreds of properties and dozens of groups in which these properties are nested. (part belongs to the element itself, part to groups of properties, which in turn belong to the element)

Your task looks like a relationship task. That is, I believe that a relational database is better suited than a non-relational one.

Do I choose the right tools for the project?

There are always several ways to solve the same problem.

As you can see, the value of the properties can be represented by a string, decimal number and engineering record (for which three numbers are used - mantissa, base, degree)

You can still solve the problem using the framework for generalized links in django


I can not refer to the same model (in a specific example, ScientificNotation)

Well, naturally you can not. ForeignKey defines a one-to-many relationship between tables (objects), but for some reason you tried to define it twice to the same table.

It was possible, for example, to make an additional table.


Solutions to your problem is not one.

You have an item and ways to write it. Different types of records can be implemented through classes.

 class Element(models.Model): class Meta: verbose_name = 'Element' verbose_name_plural = 'Elements' name = models.CharField(verbose_name="Name", max_length=255, blank=True, null=True) symbol = models.CharField(verbose_name="Symbol", max_length=255, blank=True, null=True) class DecimalNotationElement(Element): # Наследумеся от элемента! """ Наследуясь от Element, в Джанге, автоматически создаётся отношение один_к_одному. """ class Meta: verbose_name = 'Decimal Notation' value = models.FloatField(verbose_name="Value", blank=True, null=True) unit = models.CharField(verbose_name="Unit", max_length=255, blank=True, null=True) # Работать будет так: element = DecimalNotationElement.objects.create( name="lalala", symbol="S", value=num, unit="some unit" ) # То есть просто обращаемся к нужному полю, и уже не играет # роли что поля из разных классов. >>>element = DecimalNotationElement.objects.get(name="lalala") >>>element.name >>>'lalala' >>>element.unit >>>'some unit' 
  • I added a clarification to the question and corrected it a little without changing the basic idea. Each element will have hundreds of properties and dozens of groups in which these properties are nested. The structure is not limited to the image on the diagram. Consequently, your decision, if I correctly understood it, will not be able to describe. For example, an element may have several values ​​in the decimal notation (part belongs to the element itself, part to property groups, which in turn belong to the element) - while1pass
  • one
    Well, then everything is simplified. Take RDBMS, and do not suffer. - Mr. Fix
  • one
    Added this in response. - Mr. Fix
  • In another answer, @Firepro and my comments to it came to the conclusion that each new property will force you to create a new table. And the number of properties in the hundreds, the structure will be impossible to complicate - while1pass
  • one