It is necessary to collect a database (company name, type of activity, telephone, e-mail) from sites of catalogs of enterprises. For this you need to write a parser for each site? Or can I write a universal parser that can parse any sites-directories of enterprises?
1 answer
It depends on the extent of the problem. If there are one two directories, then individual parsers are simpler.
If a lot of sites and added, then look at Tomita Parser
Well, by itself, don't even think about html parsing hands, google, a bunch of tools
|