Simplify string content

Question

Suppose we have:

string a = <p style="text-align:center;"><strong>Primary Link: <a href="http://www.mediafire.com/download/90wqj6d0n86h7z1/YandereSimMay7th.rar">http://www.mediafire.com/download/90wqj6d0n86h7z1/YandereSimMay7th.rar</a></strong></p>

and you need to cut it to the left and to the right, so that it happens:

 string a = http://www.mediafire.com/download/90wqj6d0n86h7z1/YandereSimMay7th.rar

It is necessary to delete the characters before the "http" and after the "rar", while also removing the "duplicate". The question is how to do this? I will ask with an example, since I am new to C ++.

If you are new to C ++, use the language in which you are not new.
And yes, for parsing HTML, they usually use a ready-made parser, and not reinvent the wheel.
If you are given an exhaustive answer, mark it as correct (a daw opposite the selected answer).

Harry Harry 106k 9 54 132 · Accepted Answer · 2016-05-16T04:04:30

It depends on what you need. If the first URL is the same approach, if the first URL in the <a> tag is slightly different, etc. You can, for example, use regular expressions.

Since you are a beginner, let's simplify the task - find href="URL" , and select a URL from it.

Find the position of href=" in the line

 size_t pos = a.find("href=\"");

and, if found, truncate the string to the left, skipping 6 characters href="

 if (pos != string::npos) a = a.substr(pos + 6);

Then look for quotes and cut to the right.

 pos = a.find('"'); if (pos != string::npos) a = a.substr(0,pos);

Everything.

For this particular case :) You understand that you need to search from <img> little differently, well, I’m already given advice to see the parser. And yet - from experience - HTML is not XML, it even allows for freedom under the standard, well, and even if XML meets a curve, then what can we say about HTML ... I’m exaggerating, but 80% of the code will handle HTML errors and 20% - analysis itself :)

Simplify string content

1 answer 1

More articles: