Parsing this html page: http://cogcc.state.co.us/COGIS/DrillingPermitsList.cfm First I download using requests
import requests from xmlWorker import HtmlParser url = "http://cogcc.state.co.us/COGIS/DrillingPermitsList.cfm" postDataForPending = {"listtype":"Pending", "country": "All", "B1": "Go!"} postDataForApproved = {"listtype":"Approved", "country": "All", "B1": "Go!"} response = requests.post(url, data = postDataForPending) htmlText = response.text print(htmlText) if __name__ == '__main__': htmlParser = HtmlParser(htmlText) print(htmlParser.get_received()) And then parse this business lxml
class HtmlParser: xpathRoot = '/tr[position()>1 and position()<{0}+2]/' xpathToReceivedfirst = xpathRoot + 'td[1]/font/text()' def __init__(self, htmlText): logf = open("download.log", "w") try: self.document = lxmlHtmlParser.fromstring(htmlText) except Exception as e: # most generic exception you can catch logf.write(str(e)) finally: # optional clean up code pass def get_received(self): xp = self.xpathToReceivedfirst.format(maxRecordsToParse) receivedElements = self.document.xpath(xp) return receivedElements No errors are displayed. The problem is that during debug all self.document attributes are either not specified or equal, say '\ n', respectively, all xpaths return empty sheets. At the same time, BeautifulSoup parses the elements normally. Html file is valid. What is the problem still do not understand
UPDATED
I parsed one table bs-op, deleted all the carriage translations, spaces between tags. Still nothing works, the browser normally detects and renders this table saved to a file