I am writing a parser of html sites, and since there is no unique data search algorithm, I set parameters in the form of an array that tell where to look for content. So I specify where to look for links, headings, descriptions, content, which items need to be deleted, etc. Now I store these parameters in the form of an associative array and when I create a class, I save them as a class property in the constructor, for example.

public function __construct(EntityManager $em) { $this->dom = [ 'lifehacker.ru' => [ 'posts_links' => 'div[class=content]', 'title_dom' => 'body h1', 'image_dom' => '.entry-content a img', 'excerpt_dom' => '.the-excerpt p', 'content_dom' => '.entry-content', 'remove_elements' => [ 'script', 'div[class=entry-details]', 'p[class=wp-caption-text], div[class=lh-post-source-view]', 'div[class=jp-relatedposts]', 'div[class=social-likes]', 'div[class=the-excerpt]', 'p[class=wp-thumbnail-caption'] ] } 

If there are even 5 sites, a huge constructor will come out, which is not good (I also tried to store it in the database and call it from the fields, but this method works slower, and in my opinion this is also not a good solution.

Tell me where it is better to set and store these parameters?

    1 answer 1

    Can be stored in config files.

    For example, in app/config/sites/lifehacker.ru.php or app/config/sites/lifehacker.ru.yml . So it will be easier to track changes in the version control system.

    But, in general, the database should be fine, especially if you use caching - http://blog.alterphp.com/2014/05/doctrine2-optimization-with-apc-cache.html

    Downloading and parsing pages will still take more than 90% of the time.

    • If you use the doctrine cache, memory overflow occurs, I do not know why. The variant with a config would approach. Just how then to pull this data into the controller? - Valentine Murnik
    • one
      $this->get('app.my_config_loader')->getConfig('lifehacker.ru'); , and in this service, Yaml::parse(file_get_contents($path . '/' . $siteName . '.yml')) , and the path to $path pass through DI - "%kernel.root_dir%/config" - luchaninov
    • Thank you for what you need) - Valentine Murnik