There is a line:

xmlhttp.open("GET","show_city1.php?state_code="+str,true); 

You need to pull everything out between: ("GET", "and" +, that is, pull out a piece: show_city1.php?state_code= without extra characters at the beginning and at the end, what approximate patern to use for this? Tell me, please, need a patern under C # ...

  • So you are trying to parse the js code? o_O you are doing something beyond. What is your real challenge? - VladD
  • the real task is to parse: show_city1.php? state_code = in a regular way, everything seems to have clearly stated the essence above, the code is not js, it is present as text in the content of the web page in the open form, that is, the correct pattern is needed ... - GeneratorSveta
  • Yeah, and tomorrow developed will turn it into a new Url("show_city" + city_num + ".aspx").addQuery("state_code", str) . Your approach to the problem (“pull the text out of the html page”) is basically wrong. And yes, this cannot be a real task. This is part of your approach to solving a bigger problem. - VladD
  • @VladD I see nothing unusual in the task. Typical for data collectors who do not have normal open export. It occurs quite often (in some areas). And yes, this decision is doomed to constant updating, which does not remove its necessity (if the update frequency has reasonable time and / or a game is worth the candle). - Petr Abdulin
  • @PetrAbdulin: let's call things by their proper names: a data collector from a site whose creators do not want this data to be collected at all. (Otherwise they would open the API.) - VladD

3 answers 3

Here's your regulars (the "/" characters at the beginning and end are not part of the regular season)

 /xmlhttp\.open\("GET","(.+?)"/ 

C # code:

  var content = "xmlhttp.open(\"GET\",\"show_city1.php?state_code=\"+str,true);"; var matches = Regex.Matches(content , "xmlhttp\\.open\\(\"GET\",\"(.+?)\""); var result = matches[0].Groups[1]; Console.WriteLine(result); 

Conclusion:

show_city1.php? state_code =

  • And how to screen? var matches = Regex.Matches (content, @ "xmlhttp \ .open (" GET "," (. +?) ""); if this is the case, then errors go ... - GeneratorSveta
  • there is no way to go around an error through an additional variable ... - GeneratorSveta
  • @GeneratorSveta I added the code to the answer. - zenden2k
  • Yes, everything is almost parsit, and the code with Regex.Replace is lower, which is also, but still such a moment, in the content there are several similar lines, for example xmlhttp.open ("GET", "show.php? Country_code =" xmlhttp.open ("GET "," show_city1.php? state_code = "and the current is first parsed: show.php? country_code = how to parse all the similar pieces from the content? - GeneratorSveta
  • Here is the full code: var matches = Regex.Matches (content, "xmlhttp \\. Open \ (\" GET \ ", \" (. +?) \ ""); Var result = matches [0] .Groups [1 ]; lock (this.lockValid) this.valid ++; StreamWriter writer = new StreamWriter ("Out.txt", true); writer.WriteLine ("http: //" + str3 + "/" + result); writer.Close (); but the current is kept as it was written above, the first one that came out only, it is necessary that all suitable ones be saved from the content - GeneratorSveta

Isn't it easier to use Regex.Replace ? Grab what you need and what you don’t need - just delete with a replacement:

 var text = "xmlhttp.open(\"GET\",\"show_city1.php?state_code=\"+str,true);"; var result2 = Regex.Replace(text, @"(?s).*""GET"",""([^""]+)"".*", "$1"); 

enter image description here

The internal modifier (?s) changes the behavior of a point, which now also catches line breaks.

  • I honestly confess with Regex.Replace as I didn’t use it much earlier, I tried your version at all, but it retains all the content of the page, generally an interesting but complicated version of my page ... - GeneratorSveta
  • I added (?s) , now it should work even with multiline text. BUT! If you have several such substrings, this option will not work. - Wiktor StribiĹĽew

Well, for example:

 \("GET","(.+)"\+ 

In the first group you will have the line you need.

  • The same is not how to escape does not come out: var matches = Regex.Matches (content, @ "(\" GET \ ", \" (. +) \ "\ +"); Errors ... - GeneratorSveta
  • Chet I too smart, corrected regex. Shield as usual with a slash. - Petr Abdulin
  • yes, but almost no other characters are registered either: var matches = Regex.Matches (content, @ "(" "GET" "," "(. +)" "\ +"); that is, this kind of parity: ("GET", " show_city1.php? state_code = "+ Do you need to parse the current this part: show_city1.php? state_code = - GeneratorSveta
  • Correctly this is a full match, and you need the first group. - Petr Abdulin
  • Yes, I understood the groups thanks ... - GeneratorSveta