Hello. There are two scripts in python 3. The first script works without problems (the goal is to collect the necessary links from the page):
import requests from bs4 import BeautifulSoup as bs import re from itertools import groupby r = requests.get('http://www.mmtitalia.it/directory_edile/rivenditori_macchine/ammann/index.htm') soup = bs(r.text, 'lxml') print(soup) link = soup.find('div', class_='elenco').find_all('a', href = re.compile('azienda')) links = [i.get('href') for i in link] new_links = [el for el, _ in groupby(links)] for i in new_links: print('http://www.mmtitalia.it/directory_edile/rivenditori_macchine/ammann/' + i) The second script is the supplemented first script, its goal is to read the data from the input file (file_1), then go to the specified addresses and collect the necessary information (that is, more links). BUT, an error occurs (question title). Question: what's the problem? For some reason, the code for the page that the soup variable receives is different in different scripts, but the variable r gets the same address. The second (problem) script:
import requests from bs4 import BeautifulSoup as bs import re from itertools import groupby file_1 = 'links.txt' file_2 = 'links2.txt' myfile_1 = open(file_1, mode = 'r', encoding = 'ascii') myfile_2 = open(file_2, mode = 'w', encoding = 'ascii') for link in myfile_1: r = requests.get(link) soup = bs(r.text, 'lxml') url = soup.find('div', class_='elenco').find_all('a', href = re.compile('azienda')) print(url) urls = [i.get('href') for i in url] new_urls = [el for el, _ in groupby(urls)] for i in new_urls: myfile_2.write('http://www.mmtitalia.it/directory_edile/rivenditori_macchine/ammann/' + i) The first three lines in links.txt: http://www.mmtitalia.it/directory_edile/rivenditori_macchine/ammann/index.htm http://www.mmtitalia.it/directory_edile/rivenditori_macchine/astra/index.htm http: www.mmtitalia.it/directory_edile/rivenditori_macchine/atlas/index.htm