How to choose url of images from a file?

Question

There is a mysql-dump from which there are links to images, for example <img style=\"float: left;\" src=\"http://www.vademec.ru/upload/iblock/1d7/1d77389882672a4d7952dc80ae229d3f.jpg\" i.e. with screened " from which you need to select the url of pictures

 grep -e "/<img(?:\\s[^<>]*?)?\\bsrc\\s*=\\s*(?|"([^"]*)"|\'([^\']*)\'|([^<>\'"\\s]*))[^<>]*>/" file.sql

But bash: ошибка синтаксиса около неожиданной лексемы ('

In addition, in some cases, after the extension, there are additional characters name.png&677dnfnwf , etc.

This crocodile is not quite green, what is the name of the dog?
An error in grep is associated with double quotes that you have inside the string and you do not escape them.
But how does this relate to certain "MySQL commands" (there is no command in MySQL at all, only queries) it
grep -e "/ <img (?: \\ s [^ <>] *?)? \\ bsrc \\ s * = \\ s * (? | /" ([^ / "] *) /" | \ '([^ \'] *) \ '| ([^ <> \' / "\\ s] *)) [^ <>] *> /" This design does not help.
grep -e "/ <img (?: \\ s [^ <>] *?)? \\ bsrc \\ s * = \\ s * (? | \ 042 ([^ \ 042] *) \ 042 | \ '([^ \'] *) \ '| ([^ <> \' \ 042 \\ s] *)) [^ <>] *> / "This does not produce anything.
Are you sure that after the name of the picture there can be an ampersand and some other text?
By standards, an ampersand is not a special character separator, and if it is, then it is included in the file name.
Now, if there was a question mark after the name - then another thing - it would be the parameters ... And you stated the fact that these characters may be after the name but did not say what to do if they are

Accepted Answer · 2017-03-03T08:13:58

It’s hard to understand what exactly you want to achieve from the question, but if you meant that the output of grep should be only the URL of the images, it should look like this:

 grep -ioP "<img[^>]+src=(\\\\?['\"])?\K.*?(?=([?].*)?(?(1)\1|[ >]))"

The grep : i keys are case insensitive, o leave only the match itself, not the string containing it, P use the perl dialect of regular expressions. The expression itself reads like this:

 <img # <img [^>]+ # любые символы кроме закрывающего тег (1 и более шт.) src= # src= (\\\\?['\"]) # группа1: возможно обратная косая (приходится учетверять, т.к. выражение в двойных кавычках) # и одинарная или двойная кавычка ? # группа1 может отсутствовать \K # с этой точки начинается выдаваемая часть выражения .*? # любые символы (не жадный захват) (?= # после которых идет ... (не захватывающая проверка) ([?].*)? # возможно вопросительный знак и часть которую надо отбросить (?(1) # Условие: если группа1 присутствовала \1 # то такие же символы как были в группе1, т.е. такая же кавычка как была |[ >]) # иначе (кавычек на было) пробел или конец тега )

Cuts the entire part to the url, but leaves after the extension, although it needs to be deleted.
@Magi I don’t see in your question a word about the fact that you need to delete something after expansion and I don’t see an adequate example of input and output data covering all possible situations (just don’t say that name.png&677dnfnwf is an adequate example, because all that after the ampersand in this line by standards is part of the file name!).
@Magi And how to determine where the extension is, if the url string you don’t meet the standards and any web server will never find the url string, because the same one doesn’t know where the extension is.
according to standards determines the end of the file name only by the presence of characters there ?
or # , but in your example there are no such characters, so I continue to assert that the characters you specify are for the file name and cannot be separated
@Magi and if you need to separate from the url something standard, then give all the same correct url examples that are found.
If something is non-standard, then give the same and specify exactly how to interpret the line.
Although you can make such improvements yourself to the regular season to your liking

How to choose url of images from a file?

1 answer 1

More articles: