Distinguish article links from others.

Question

I have a list of news sites

[ 'ria.ru', 'www.rbc.ru', 'lenta.ru', 'news.rambler.ru', 'kp.ru', 'iz.ru', 'www.gazeta.ru', 'vesti.ru', 'www.mk.ru', 'news.ngs.ru', 'russian.rt.com', 'life.ru', 'ren.tv', 'smi2.ru', 'kommersant.ru', 'svpressa.ru', 'tass.ru', 'cosmo.ru', 'lentainform.com', 'ura.ru', 'echo.msk.ru', 'vz.ru', 'www.aif.ru', 'dni.ru', 'www.ridus.ru', 'E1.RU', 'ridus.ru', 'rg.ru', 'tsargrad.tv', 'eg.ru', ]

Of these, I need to take only links to articles. I use find_all ("a"), and skip through the loop. But the problem is that I have to filter the real links to the article from others. Any ideas how to implement this?

What are "links to articles" and how do they differ from other links?
I think that for each site separately it is necessary to form a template for "distinguishing" links.
It is unlikely that all these news sites agreed on a single link template to facilitate the work of parsers-robots ...
The rest is a link to registration, a link to "Similar", or even to another site.

Distinguish article links from others.

0

More articles: