I use the standard System.Windows.Forms.WebBrowser to load and display one web page. WebBrowser emulates version 11 of IE. The page has an iframe and is displayed normally.

I need to get the full HTML code of the page with the contents of the IFrame, but these elements are empty. In the full IE html-code can be seen. I try to receive the text of the code in the following two ways:

public String HtmlDocumentText { get { return webBrowser.DocumentText; } } public String DomDocumentText { get { // REM. http://stackoverflow.com/questions/3640236/converting-htmldocument-domdocument-to-string var document = webBrowser.Document; var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)document.DomDocument; var content = documentAsIHtmlDocument3.documentElement.outerHTML; return content; } } 

It seems to me that the problem lies in the security restrictions for this element, because the following code causes an access error:

 var frames = webBrowser.DocumentTestToDelete.Window.Frames; var frame1 = webBrowser.DocumentTestToDelete.Window.Frames[0]; var document1 = frame1.Document; 

Error: Access Denied (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)).

I note that I managed to get the IFrames code using JavaScript:

 string jCode = "var iframe = document.getElementById('frame-id'); var innerDoc = iframe.contentDocument || iframe.contentWindow.document; innerDoc.documentElement.innerHTML"; Object html = webBrowser.WebBrowser.Document.InvokeScript("eval", new object[] { jCode }); 

But this is not what is necessary, although I could use this approach, but with large labor costs. Maybe there is a way to force WebControl to render full html along with the contents of an IFrame?

    2 answers 2

    The DocumentText property returns what you are asking for - the document code. The code of the content of the iframes is not included in it, since the iframe is more than just a mechanism for substituting one HTML into another (*). To get a document along with the contents of iframes, the only way is to go through all the iframes, take their contents (for example, through the MSHTML.IHTMLIFrameElement3 interface) and substitute it into the original document. Clearly, you can not get the content if the iframe looks to a different domain.

    Something like this:

     using System; using System.Collections.Generic; using System.ComponentModel; using System.Runtime.InteropServices; using System.Text; using System.Windows.Forms; namespace WebBrowserTest { public partial class Form1 : Form { public Form1() { InitializeComponent(); } public String DomDocumentText { get { var document = webBrowser1.Document; string returnstr = ""; MSHTML.IHTMLDocument3 doc3 = null; MSHTML.HTMLDocument new_doc = null; MSHTML.IHTMLDocument2 doc2 = null; MSHTML.IHTMLElementCollection elems = null; MSHTML.IHTMLDocument3 new_doc3 = null; MSHTML.IHTMLElementCollection elems_new = null; MSHTML.IHTMLDocument3 child_doc = null; MSHTML.IHTMLElement content = null; MSHTML.IHTMLElement content_new = null; MSHTML.IHTMLElementCollection elem_col = null; try { doc3 = (MSHTML.IHTMLDocument3)document.DomDocument;//исходный Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ new_doc = new MSHTML.HTMLDocument();//копия Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π° doc2 = new_doc as MSHTML.IHTMLDocument2; doc2.write(webBrowser1.DocumentText);//создаСм копию Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π° //ΠΏΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ всС iframe Π² Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π΅ ΠΈ Π΅Π³ΠΎ ΠΊΠΎΠΏΠΈΠΈ... elems = doc3.getElementsByTagName("iframe"); new_doc3 = new_doc as MSHTML.IHTMLDocument3; elems_new = (new_doc3).getElementsByTagName("iframe"); int i = 0; foreach (MSHTML.IHTMLIFrameElement3 elem in elems) //для ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ iframe... { try { child_doc = elem.contentDocument as MSHTML.IHTMLDocument3; elem_col = child_doc.getElementsByTagName("body"); if (elem_col == null || elem_col.length == 0) { i++; continue; } content = (MSHTML.IHTMLElement)elem_col.item(0); string str = (content).innerHTML;//ΠΏΠΎΠ»ΡƒΡ‡Π°Π΅ΠΌ содСрТимоС iframe content_new = elems_new.item(i) as MSHTML.IHTMLElement; (content_new).outerHTML = str;//замСняСм iframe Π½Π° Π΅Π³ΠΎ содСрТимоС i++; } catch (Exception ex) { //для iframe, URL ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Ρ… Π² Π΄Ρ€ΡƒΠ³ΠΎΠΌ Π΄ΠΎΠΌΠ΅Π½Π΅, Π±ΡƒΠ΄Π΅Ρ‚ ΠΈΡΠΊΠ»ΡŽΡ‡Π΅Π½ΠΈΠ΅ //HRESULT: 0x80070005 (E_ACCESSDENIED) System.Diagnostics.Debug.WriteLine("Can't process iframe " + i.ToString()); System.Diagnostics.Debug.WriteLine(ex.Message); } finally { //ΠžΡ‡ΠΈΡΡ‚ΠΊΠ° рСсурсов, задСйствованных Π² Ρ†ΠΈΠΊΠ»Π΅... if (child_doc != null) { Marshal.ReleaseComObject(child_doc); child_doc = null; } if (elem_col != null) { Marshal.ReleaseComObject(elem_col); elem_col = null; } if (content != null) { Marshal.ReleaseComObject(content); content = null; } if (content_new != null) { Marshal.ReleaseComObject(content_new); content_new = null; } } }//end foreach returnstr = new_doc.documentElement.innerHTML; return returnstr; } finally { //ΠžΠΊΠΎΠ½Ρ‡Π°Ρ‚Π΅Π»ΡŒΠ½Π°Ρ очистка рСсурсов... if (doc3 != null) Marshal.ReleaseComObject(doc3); if (new_doc != null) Marshal.ReleaseComObject(new_doc); if (doc2 != null) Marshal.ReleaseComObject(doc2); if (elems != null) Marshal.ReleaseComObject(elems); if (new_doc3 != null) Marshal.ReleaseComObject(new_doc3); if (elems_new != null) Marshal.ReleaseComObject(elems_new); if (child_doc != null) Marshal.ReleaseComObject(child_doc); if (content != null) Marshal.ReleaseComObject(content); if (content_new != null) Marshal.ReleaseComObject(content_new); if (elem_col != null) Marshal.ReleaseComObject(elem_col); } } } private void button1_Click(object sender, EventArgs e) { MessageBox.Show(DomDocumentText); } } } 

    To use, you need to connect the library of IE COM objects (in Visual Studio Add a link -> COM -> Microsoft HTML Object Library , or manually specify the file mshtml.tlb).

    Notes

    (*) - For the iframe, the browser creates a separate window inside the main window, which can, in general, display not HTML, but a document of another type. Therefore, in the DOM model, the InnerHTML property of the iframe does not correspond to its content, but to the replacement text for browsers without iframe support, which can be placed inside the iframe tag (which is rarely used in our time).

    • Thanks for the detailed answer. but I don’t understand where IHTMLIFrameElement3 comes from? What library do you use for this? I am using C: \ Program Files (x86) \ Microsoft Visual Studio 14.0 \ Visual Studio Tools for Office \ PIA \ CommonMicrosoft.mshtml.dll version 7.0.3300.0 - Sergej Loos
    • @SergejLoos "Add Link" - "COM" - "Microsoft HTML Object Library" (if not in the list, look for the file mshtml.tlb on the disk) This is a library of IE IE objects, the DLL from it will be generated automatically and placed in the project folder . - MSDN.WhiteKnight

    To solve my problem, I used JavaScript indicated in the question:

     documentHtml = webBrowser.DomDocumentText; // extract frames content (by frame id) var framesAsHtmlElements = webBrowser.Document.GetElementsByTagName("iframe"); foreach(HtmlElement frame in framesAsHtmlElements) { String id = frame.Id; if (String.IsNullOrEmpty(id) == false) { string jCode = "var iframe = document.getElementById('" + id + "'); var innerDoc = iframe.contentDocument || iframe.contentWindow.document; innerDoc.documentElement.innerHTML"; String frameHtml = webBrowser.WebBrowser.Document.InvokeScript("eval", new object[] { jCode }) as String; if (String.IsNullOrEmpty(frameHtml) == false) { // ((?im)<iframe(([.]|[^/]|\r\n|\r|\n)*?id=["']frame-id["'](.|\r\n|\r|\n)*?)(/>|</iframe>)+?) Regex regEx = new Regex(String.Format(@"(<iframe(([.]|[^/]|\r\n|\r|\n)*?id=[""']{0}[""'](.|\r\n|\r|\n)*?)(/>|</iframe>)+?)", id), RegexOptions.Multiline | RegexOptions.IgnoreCase ); Match match = regEx.Match(documentHtml); if (match.Success) { int pos = match.Index; int length = match.Length; documentHtml = documentHtml.Substring(0, pos) // before <iframe> + "<iframe" + match.Groups[2].Value + "/>" + frameHtml + "</iframe>" + documentHtml.Substring(pos + length); ; } } } } 
    • one
      HTML solutions using Regex may be unreliable / difficult to maintain , see stackoverflow.com/q/420354/240512 . It is better to use a parser, in the DOM to find all the iframe elements and replace the value of OuterHTML, it is quite simple to do without complicated regulars. - MSDN.WhiteKnight pm
    • I understand this, that's why I tried to use DOM, but it didn't work out the first time. And it was urgent to close the gap. And with manual addition of COM and automatic generation of interop classes I will try a little later. Thanks for the tip - Sergej Loos