Crop text with regular Python expression

Question

Good day!

There was a question that I can’t solve in any way, namely: there is a certain .csv file, with lines:

http://fallertoo.com/ http://xn----7sbjsadhgf0dd3k7b.we--p1qw/vse_seeei_podryad/mqwetiki http://www.nice.com/ru/ http://exway.to/%D0%qwewqeD0%BA%D1%83%D0%BD%D0%B8%D0%BD http://name@while.com http://news.t-stilex.into/v-vashem-aktive-net-inostrannyx-yazykov-ne-kruchintes http://fib.mexeemat.com:80/books/90359

It is necessary to return only domain names using a regular expression:

 fallertoo.com xn----7sbjsadhgf0dd3k7b.we--p1qw nice.com exway.to while.com t-stilex.into fib.mexeemat.com

I may need several iterations with different expressions, until I reached only the variant (//\w+.\w+.\w+) , which returns the following result:

 //fallertoo.com //www.nice.com //exway.to //name@while.com //news.t-stilex //fib.mexeemat.com

Please help, thank you in advance!

Thank! There were flaws, but in the end it turned out! - savinmxl

Wiktor Stribiżew Wiktor Stribiżew 11.8k 2 13 32 · Accepted Answer · 2018-09-16T17:45:10

I suggest to use

 re.findall(r'//([^/]*\.[^/:]+)', s)

See the regular expression demo

Details

// - substring //
([^/]*\.[^/:]+) - Exciting group number 1:
- [^/]* - 0 or more characters other than /
- \. - point
- [^/:]+ - 1 or more characters other than / and :

Python :

 import re rx = r"//([^/]*\.[^/:]+)" strs = ["http://fallertoo.com/", "http://xn----7sbjsadhgf0dd3k7b.we--p1qw/vse_seeei_podryad/mqwetiki", "http://www.nice.com/ru/", "http://exway.to/%D0%qwewqeD0%BA%D1%83%D0%BD%D0%B8%D0%BD", "http://name@while.com", "http://news.t-stilex.into/v-vashem-aktive-net-inostrannyx-yazykov-ne-kruchintes", "http://fib.mexeemat.com:80/books/90359"] for s in strs: m = re.search(rx, s) if m: print(m.group(1))

Result:

 fallertoo.com xn----7sbjsadhgf0dd3k7b.we--p1qw www.nice.com exway.to name@while.com news.t-stilex.into fib.mexeemat.com

Crop text with regular Python expression

1 answer 1

More articles: