Greetings. I use Simple HTML DOM for parsing online store products. The parameter "weight of goods" lies in this piece:

<div id="AddnInfo"> <p> <p><label>&ldquo;R&rdquo;Web#:</label> <span class="value">399014</span></p> <p><label>SKU:</label> <span class="value">0279D033</span></p> <p><label>Manufacturer #:</label> B5529</p> <p><label>Product Weight:</label>1.5&nbsp;pounds</p> <p><label>Product Dimensions (in inches):</label>8.5 x 7.9 x 2.3</p> </p> </div> 

Moreover, as to the desired number, and after it there can be absolutely any amount of <p> ... </ p>, i.e., its place in the above list is not constant.
The question is - on what basis can you pull out Product Weight from this code?

  • direct, go through all p and see if there is a label there - teran
  • @teran see if there is a label - and how? This is where I have a hitch)) the figure is not inside <label> .. </ label>, but outside it is Alexandros
  • You can try regular content on AddInfo /(\d|\.)+(&nbsp;)+pounds/ - br3t
  • rather, it's easier to just find labels, learn from which you need the text and taken the parent p - teran
  • @ br3t with the help of this regular program in my example, instead of 1.5, just 1 is output, unfortunately I am not strong in regulars, what could be the matter? - Alexandros

1 answer 1

You should find the source parent #AddnInfo block and a list of all child label elements. Next, compare the contents of the text of these tags for compliance with Product weight: For the desired label found, take the parent element and get its text content:

 define("PRODUCT_WEIGHT", 'Product Weight:'); $html = str_get_html($txt); $labels = $html->find("#AddnInfo label"); foreach($labels as $l){ if($l->innertext === PRODUCT_WEIGHT){ $weight = (string)$l->parent()->find('text', 1) break; } } print_r([$weight, html_entity_decode($weight)]); 

$weight will contain 1.5&nbsp;pounds . With the help of html_entity_decode you can convert   , but it is converted not to the usual ASCII space ( 32 ), but to the 160 ( 0xA0 ), and therefore not trimmed using trim() .


If you further need to pull out only a numerical value, then you can either just delete the substring &nbsp; pounds &nbsp; pounds , or if we assume that there are not always pounds , etc. use the appropriate regular expression:

 preg_match("/\d+(?:\.\d+)?/", $weight, $matches); 

then in $matches[0] turns out to be 1.5