Help with the analysis of HTML pages! The program has the ability to set comments in the form of an HTML page. Inside HTML, what the user threw there is - mostly text, tables, lists. For example:
<BODY> <P>Комментарий</P> <P>1.</P> <P> <TABLE cellSpacing=0 cellPadding=0 width=249 border=1> <COLGROUP> <COL span=3 width=83> <TBODY> <TR height=20> <TD class=xl66 height=20 width=83> <FONT face=Calibri> 1:1</FONT> </TD> <TD class=xl64 width=83> <FONT face=Calibri> 1:2</FONT> </TD> <TD class=xl65 width=83> <FONT face=Calibri>1:3 </FONT> </TD> </TR> <TR height=20> <TD class=xl67 height=20> <FONT face=Calibri> 2:1</FONT> </TD> <TD class=xl63> <FONT face=Calibri> 2:2</FONT> </TD> <TD class=xl63> <FONT face=Calibri> 2:3</FONT> </TD> </TR> <TR height=20> <TD class=xl63 height=20> <FONT face=Calibri> </FONT> </TD> <TD class=xl63> <FONT face=Calibri> </FONT> </TD> <TD class=xl63> <FONT face=Calibri> </FONT> </TD> </TR> <TR height=20> <TD class=xl63 height=20> <FONT face=Calibri> </FONT> </TD> <TD class=xl63> <FONT face=Calibri> </FONT></TD> <TD class=xl63> <FONT face=Calibri> </FONT> </TD> </TR> </TBODY> </TABLE> </P> <P>2.</P> <P> <TABLE cellSpacing=0 cellPadding=0 width=192 border=1> <COLGROUP> <COL span=3 width=64> <TBODY> <TR height=20> <TD class=xl66 height=40 rowSpan=2 width=64> <FONT face=Calibri> 1:1</FONT> </TD> <TD class=xl64 width=128 colSpan=2> <FONT face=Calibri>1:2</FONT> </TD> </TR> <TR height=20> <TD class=xl63 height=20> <FONT face=Calibri>2:2</FONT> </TD> <TD class=xl63> <FONT face=Calibri> 2:3</FONT> </TD> </TR> <TR style="HEIGHT: 15pt" height=20> <TD class=xl63 height=20> <FONT face=Calibri> 3:1</FONT> </TD> <TD class=xl63> <FONT face=Calibri>3:2</FONT> </TD> <TD class=xl63> <FONT face=Calibri>3:3</FONT> </TD> </TR> <TR height=20> <TD class=xl63 height=20> <FONT face=Calibri> </FONT> </TD> <TD class=xl63> <FONT face=Calibri> </FONT> </TD> <TD class=xl63> <FONT face=Calibri> </FONT> </TD> </TR> </TBODY> </TABLE> </P> <P>3.</P> <P> <TABLE cellSpacing=0 cellPadding=0 width=256 border=1> <COLGROUP> <COL span=4 width=64> <TBODY> <TR height=20> <TD class=xl63 height=20 width=64 align=right><FONT face=Calibri>1</FONT></TD> <TD class=xl63 width=64 align=right><FONT face=Calibri>2</FONT></TD> <TD class=xl63 width=64 align=right><FONT face=Calibri>3</FONT></TD> <TD class=xl63 width=64 align=right><FONT face=Calibri>4</FONT></TD></TR> <TR height=20> <TD class=xl63 height=20 align=right><FONT face=Calibri>1</FONT></TD> <TD class=xl63 align=right><FONT color=#ff8000 face=Calibri>2</FONT></TD> <TD class=xl63 align=right> <P align=center><FONT color=#ff8000 face=Calibri>3</FONT></P></TD> <TD class=xl63 align=right><FONT color=#ff8000 face=Calibri>4</FONT></TD></TR> <TR height=20> <TD class=xl63 height=20 align=right><FONT color=#ff8000 face=Calibri>1</FONT></TD> <TD class=xl63 align=right><FONT color=#ff8000 face=Calibri>2</FONT></TD> <TD class=xl63 align=right><FONT face=Calibri>3</FONT></TD> <TD class=xl63 align=right><FONT face=Calibri>4</FONT></TD></TR> <TR style="HEIGHT: 15pt" height=20> <TD class=xl63 height=20 align=right><FONT face=Calibri>1</FONT></TD> <TD class=xl63 align=right><FONT face=Calibri>2</FONT></TD> <TD class=xl63 align=right> <P align=center><FONT face=Calibri>3</FONT></P></TD> <TD class=xl63 align=right><FONT face=Calibri>4</FONT></TD></TR> <TR height=20> <TD class=xl63 height=20 align=right><FONT face=Calibri>1</FONT></TD> <TD class=xl63 align=right><FONT face=Calibri>2</FONT></TD> <TD class=xl63 align=right><FONT face=Calibri>3</FONT></TD> <TD class=xl63 align=right><FONT face=Calibri>4</FONT></TD></TR> <TR height=20> <TD class=xl63 height=20 align=right><FONT face=Calibri>1</FONT></TD> <TD class=xl63 align=right><FONT face=Calibri>2</FONT></TD> <TD class=xl63 align=right> <P align=center><FONT face=Calibri>3</FONT></P></TD> <TD class=xl63 align=right><FONT face=Calibri>4</FONT></TD></TR></TBODY></TABLE></P> <P> 4.</P> <P class=MsoListParagraphCxSpFirst style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -18pt; mso-list: l0 level1 lfo1"><SPAN lang=EN-US style="FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol; mso-ansi-language: EN-US"><SPAN style="mso-list: Ignore">·<SPAN style='FONT: 7pt "Times New Roman"'> </SPAN></SPAN></SPAN><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT face=Calibri>Ntcn 1<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></FONT></SPAN></P> <P class=MsoListParagraphCxSpMiddle style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -18pt; mso-list: l0 level1 lfo1"><SPAN lang=EN-US style="FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol; mso-ansi-language: EN-US"><SPAN style="mso-list: Ignore">·<SPAN style='FONT: 7pt "Times New Roman"'> </SPAN></SPAN></SPAN><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT face=Calibri>Ntcn 2<o:p></o:p></FONT></SPAN></P> <P class=MsoListParagraphCxSpLast style="MARGIN: 0cm 0cm 8pt 36pt; TEXT-INDENT: -18pt; mso-list: l0 level1 lfo1"><SPAN lang=EN-US style="FONT-FAMILY: Symbol; mso-fareast-font-family: Symbol; mso-bidi-font-family: Symbol; mso-ansi-language: EN-US"><SPAN style="mso-list: Ignore">·<SPAN style='FONT: 7pt "Times New Roman"'> </SPAN></SPAN></SPAN><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT face=Calibri>Ntcn 3<o:p></o:p></FONT></SPAN></P> <P class=MsoNormal style="MARGIN: 0cm 0cm 8pt"><SPAN lang=EN-US style="mso-ansi-language: EN-US"><o:p><FONT face=Calibri> </FONT></o:p></SPAN></P> <P class=MsoListParagraphCxSpFirst style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -18pt; mso-list: l1 level1 lfo2"><SPAN lang=EN-US style="mso-bidi-font-family: Calibri; mso-ansi-language: EN-US; mso-bidi-theme-font: minor-latin"><SPAN style="mso-list: Ignore"><FONT face=Calibri>1.</FONT><SPAN style='FONT: 7pt "Times New Roman"'> </SPAN></SPAN></SPAN><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT face=Calibri>Ntcn 1<o:p></o:p></FONT></SPAN></P> <P class=MsoListParagraphCxSpMiddle style="MARGIN: 0cm 0cm 0pt 36pt; TEXT-INDENT: -18pt; mso-list: l1 level1 lfo2"><SPAN lang=EN-US style="mso-bidi-font-family: Calibri; mso-ansi-language: EN-US; mso-bidi-theme-font: minor-latin"><SPAN style="mso-list: Ignore"><FONT face=Calibri>2.</FONT><SPAN style='FONT: 7pt "Times New Roman"'> </SPAN></SPAN></SPAN><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT face=Calibri>Ntcn 2<o:p></o:p></FONT></SPAN></P> <P class=MsoListParagraphCxSpLast style="MARGIN: 0cm 0cm 8pt 36pt; TEXT-INDENT: -18pt; mso-list: l1 level1 lfo2"><SPAN lang=EN-US style="mso-bidi-font-family: Calibri; mso-ansi-language: EN-US; mso-bidi-theme-font: minor-latin"><SPAN style="mso-list: Ignore"><FONT face=Calibri>3.</FONT><SPAN style='FONT: 7pt "Times New Roman"'> </SPAN></SPAN></SPAN><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT face=Calibri>Ntcn 3</FONT></SPAN></P> <P class=MsoListParagraphCxSpLast style="MARGIN: 0cm 0cm 8pt 36pt; TEXT-INDENT: -18pt; mso-list: l1 level1 lfo2"> </P> <P class=MsoListParagraphCxSpLast style="MARGIN: 0cm 0cm 8pt 36pt; TEXT-INDENT: -18pt; mso-list: l1 level1 lfo2">5.</P> <P class=MsoListParagraphCxSpLast style="MARGIN: 0cm 0cm 8pt 36pt; TEXT-INDENT: -18pt; mso-list: l1 level1 lfo2"> </P> </BODY> Each such HTML comment is arbitrary and does not contain any predefined structure. I need to go through the entire page in order and determine which blocks follow which, i.e. determine the order in which the elements follow, and then, based on this data, form your own presentation for printing. Due to the fact that the comment itself is created by the user, then there are places in the markup not according to the standard - in the example the first table is inside the <p> , in the third table the <p> tag is in the <td> cells. Tables are created in Word / Excel and copied from there to comments and are obtained with such curves, the lists are also made by separate tags <p> . I would like to get something like this:
- Text (Comment)
- Text (1.)
- Table
- Text (2.)
- Table
- Text (3.)
- Table
- Text (4.)
- List (1.)
- List (2.)
- etc.