Good day!

There was a question that I can’t solve in any way, namely: there is a certain .csv file, with lines:

http://fallertoo.com/ http://xn----7sbjsadhgf0dd3k7b.we--p1qw/vse_seeei_podryad/mqwetiki http://www.nice.com/ru/ http://exway.to/%D0%qwewqeD0%BA%D1%83%D0%BD%D0%B8%D0%BD http://name@while.com http://news.t-stilex.into/v-vashem-aktive-net-inostrannyx-yazykov-ne-kruchintes http://fib.mexeemat.com:80/books/90359 

It is necessary to return only domain names using a regular expression:

 fallertoo.com xn----7sbjsadhgf0dd3k7b.we--p1qw nice.com exway.to while.com t-stilex.into fib.mexeemat.com 

I may need several iterations with different expressions, until I reached only the variant (//\w+.\w+.\w+) , which returns the following result:

 //fallertoo.com //www.nice.com //exway.to //name@while.com //news.t-stilex //fib.mexeemat.com 

Please help, thank you in advance!

  • re.findall(r'//([^/]*\.[^/:]+)', s) ? - Wiktor Stribiżew
  • Thank! There were flaws, but in the end it turned out! - savinmxl

1 answer 1

I suggest to use

 re.findall(r'//([^/]*\.[^/:]+)', s) 

See the regular expression demo

Details

  • // - substring //
  • ([^/]*\.[^/:]+) - Exciting group number 1:
    • [^/]* - 0 or more characters other than /
    • \. - point
    • [^/:]+ - 1 or more characters other than / and :

Python :

 import re rx = r"//([^/]*\.[^/:]+)" strs = ["http://fallertoo.com/", "http://xn----7sbjsadhgf0dd3k7b.we--p1qw/vse_seeei_podryad/mqwetiki", "http://www.nice.com/ru/", "http://exway.to/%D0%qwewqeD0%BA%D1%83%D0%BD%D0%B8%D0%BD", "http://name@while.com", "http://news.t-stilex.into/v-vashem-aktive-net-inostrannyx-yazykov-ne-kruchintes", "http://fib.mexeemat.com:80/books/90359"] for s in strs: m = re.search(rx, s) if m: print(m.group(1)) 

Result:

 fallertoo.com xn----7sbjsadhgf0dd3k7b.we--p1qw www.nice.com exway.to name@while.com news.t-stilex.into fib.mexeemat.com 
  • one
    Thanks for the detailed answer! - savinmxl