I do not think that regular expressions are fully suitable for solving this problem.
It is better to use a DOMDocument . It will correctly process even invalid layout.
View an example of work
$string = "<td class=x11111111111111111 width=140 style='>1111111111111111111111111111<td>11111111111111border-top:none;border-left:none; width:107pt'>"; $doc = new DOMDocument(); $doc->loadHTML($string, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); // Измените селектор на тот, что вам нужен $elements = $doc->getElementsByTagName('td'); // Перебираем все элементы из выборки foreach ($elements as $element) { // Список атрибутов элемента $attributes = $element->attributes; // Перебираем атрибуты // После удаления элемента выполняется переиндексация списка атрибутов // Когда будет удален последний, условие станет ложным и произойдет выход из цикла while ($attributes->length) { // Удаляем атрибуты по одному, пока не будут удалены все из них $element->removeAttributeNode($attributes->item(0)); } } echo $doc->saveHTML(); // <td></td>
Notice the LIBXML_HTML_NOIMPLIED
and LIBXML_HTML_NODEFDTD
.
Without them, the conclusion would be
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><td></td></body></html>
Starting with PHP 5.4 and Libxml 2.6, the second parameter $option
appeared in the loadHTML
method, which explains Libxml how to parse HTML
LIBXML_HTML_NOIMPLIED (integer)
Sets the HTML_PARSE_NOIMPLIED
flag, which disables the automatic addition of missing html / body ... elements.
LIBXML_HTML_NODEFDTD (integer)
Sets the HTML_PARSE_NODEFDTD
flag, which prevents the addition of a standard doctype if it was not found.
All predefined constants can be viewed in the documentation .
Attention
Although the documentation states that Libxml version 2.6 is required, however LIBXML_HTML_NODEFDTD
is available only from version 2.7.8, and LIBXML_HTML_NOIMPLIED
from version 2.7.7
Based on the answers to the questions: