Hello. I have a string with html table tags. The task is to split the line into an array of tags with their contents. Example:

<table> <tr> <td> <table> <tr> <td>ВСкст ячСйки β„–1</td> <td>ВСкст ячСйки β„–2</td> </tr> </table> </td> <td> <table> <tr> <td></td> </tr> </table> </td> </tr> </table> 

Found a solution by reference a link :

 function walk($output, \DOMNode $node, $depth = 0) { if ($node->hasChildNodes()) { $children = $node->childNodes; foreach ($children as $child) { if ($child->nodeType === XML_TEXT_NODE) { continue; } $output[] = $child->nodeName; $item = walk(array(), $child, $depth + 1); if (!empty($item)) { $output[] = $item; } } } return $output; } $dom = new DOMDocument; $dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8')); $root = $dom->getElementsByTagName('body')->item(0); $output = walk(array(), $root, 0); 

Everything works as it should and displays only tags in the following format:

 ["table",["tr",["td",["table",["tr",["td","td"] ... 

The question is to display the content (not attributes) of these tags. Type:

 ["table":"",["tr":"",["td":"",["table":"",["tr":"",["td":"ВСкст ячСйки β„–1","td":"ВСкст ячСйки β„–2"] ... 

I tried:

 array_push($output, array( $child->nodeName => $child->textContent)); 

at the exit:

 ["table":"ВСкст ячСйки β„–1ВСкст ячСйки β„–2 ... 
  • if you need the text of cells, then enter it when the tag name = td and not all in a row - teran
  • And what is this generally for a mixture of json and pkhp arrays in a pitiful result? - teran
  • This is necessary for the subsequent replacement of the contents of other cells, i.e. from the cells of the 2nd table I will transfer the contents to the first one, therefore a structure is needed. - ed danilov

1 answer 1

decided so:

 function walk($output, \DOMNode $node, $depth = 0) { if ($node->hasChildNodes()) { $children = $node->childNodes; foreach ($children as $child) { if ($child->nodeType === XML_TEXT_NODE) { $output[] = $child->textContent; // Π·Π°ΠΌΠ΅Π½ΠΈΠ» Ρ‚Π΅Π»ΠΎ Π² условии } $output[] = $child->nodeName; $item = walk(array(), $child, $depth + 1); if (!empty($item)) { $output[] = $item; } } } return $output; } $dom = new DOMDocument; $dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8')); $root = $dom->getElementsByTagName('body')->item(0); $output = walk(array(), $root, 0); $name = html_entity_decode(str_replace('\u','&#x',$output[1][1][1][1][1][1][0]), ENT_NOQUOTES,'UTF-8'); // для дСкодирования json Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Π° ΠΊΠΈΡ€ΠΈΠ»Π»ΠΈΡ†Ρ‹ Ρ‚ΠΈΠΏΠ° - \u041d\u043e\u043c\u0435\u0440