Parsing BeautifulSoup

Question

Hello! Please help with the solution of parsing text from a piece of code:

<dd itemprop="value"><span class="allItemsOfLanding"><a href="/analogovye-kamery/ctv/">все аналоговые камеры CTV</a></span>CTV</dd>

I need to parse only the value of "CTV" (the contents of the dd tag), and with me using dd.getText () it gives:

 <span class="allItemsOfLanding"><a href="/analogovye-kamery/ctv/">все аналоговые камеры CTV</a></span>CTV

Those. I have enough extra, I don’t understand how to tell him to take the value of the dd tag only, or to delete the span tags in the dd found somehow. Thank!

------------------ ADDED ----------------

Here is an example of the code that is being used:

 try: page_tovar = requests.get(v,headers = headers,data = data,timeout = timeout) soup_p = BeautifulSoup(page_tovar.text.encode('utf-8', 'ignore'),'lxml') rows_dt = soup_p.find("section",{"class":"catalog-detail-blocks"}).find("dl").findAll("dt") rows_dd = soup_p.find("section",{"class":"catalog-detail-blocks"}).find("dl").findAll("dd") result_parse_item = {} for i in range(0,len(rows_dt)): name = rows_dt[i].contents[0] # getText() znach = rows_dd[i].contents[0] # getText() result_parse_item[name] = znach result_parse[k] = result_parse_item print name,znach except Exception as e: print " Error: {0}".format(e)

It is possible to use a chain by continuing parsing or using a regular expression, or simply correcting your code.
Cited 3 options for parsing, I like the contents version more, but more universal with the search for brothers.
Thanks, it turned out, the first option came up: if len (rows_dd [i] .contents)> 2: znach = rows_dd [i] .contents [2] .strip () else: znach = rows_dd [i] .contents [0]. strip ()

Answer 1 · 2016-08-08T18:18:36

In this case, it is possible to use the contents property.

 soup.dd.contents[1] # второй элемент, первый будет span

The second option. We find the first span, from it we are looking for the next element

 soup.dd.find('span').next_sibling

the third option via get_text ()

 soup.dd.get_text("|").split("|")[1]

Parsing BeautifulSoup

1 answer 1

More articles: