Is it possible in python 2.7 in regex.findall () to find valid from one line to another? I now have a date that I need to sort out with regex.

UA.E.3858-17,23.05.2017, "Subsidiary Company", "Ukrmetrteststandard" "", RENAULT CLIO, M1,, 2013, VF17R040H49534950, b / c, euro-5,, https: //drive.google.com/ open? id = 0B48-MrZcy1-zT0V0YlZINGRsSFU, UA.E.3859-17,23.05.2017, "DP", "Ukrmetrteststandart" "" ", NISSAN MICRA, M1,, 2012, MDHFBUK13U0522453, b / c, euro-5 https://drive.google.com/open?id=0B48-MrZcy1-zV3RwLXlmdzk0dkE, UA.E.3860-17,23.05.2017, "Ukrmetrteststandart" "UR", CITROEN BERLINGO, N1, 2011, VF77N9HPP , b / c, Euro-5,, https: //drive.google.com/open? id = 0B48-MrZcy1-zODZpSGI5REVza3c, UA.E.3861-17,23.05.2017, "Ukrmetrteststandart" "" , NISSAN LEAF, M1,, 2015,1N4AZ0CP9FC323172, b / c, EL, https: //drive.google.com/open? Id = 0B48-MrZcy1-zejkxNFN0WXhhSUk, UA.E.3862-17,23.05.2017, "ДП" "Укрметртестстандарт" "", FIAT DOBLO, M1,, 2010, ZFA26300009062099, b / c, euro-5,, https: //drive.google.com/open? Id = 0B48-MrZcy1-zVkNQMERFR21XOGc, UA. E.3863-17.23.05.2017, "DP "" Ukrmetrteststandard "" ", AUDI S5 SPORTBACK, M1, 2012, WAUZZZ8T2CA027101, b / c, Euro-5,, https: //drive.google.com/open? Id = 0B48-MrZcy1-zSzVRa3VMcHVVVVVVCVVVV3VMcHVVVVVVVVVVVVVVVVVVVVVV3VV3VV3VVVVVVVV3VMcVVVV? .3864-17,23.05.2017, "ДП", "Ukrmetrteststandart" "", OPEL VIVARO, N1,, 2013, W0LF7B1BEDV610630, b / c, euro-5,, https: //drive.google.com/open? Id = 0B48-MrZcy1-zRmR2amNBSDJkRWs, UA.E.3865-17,23.05.2017, "ДП", "Ukrmetrteststandart" "", SKODA FABIA, M1,, 2011, TMBEM25J6B3178665, b / c, Євро-5,, https: /drive.google.com/open?id=0B48-MrZcy1-zaW9VdHJ1MTZUTGc, UA.E.3870-17,24.05.2017, "Ukrmetrteststandart" "", "NISSAN QUASHQAI, M1,, 2013, SJNFEAJ10U27044, NISSAN QUASHQAI, M1,, 2013, SJNFEAJ10U27044, NISSAN QUASHQAI, M1, 2015 to, Euro-5,, https: //drive.google.com/open? id = 0B48-MrZcy1-zM1BGcFhJLTZ0T00,

I need to go through with the "UA" to the special character "\ n". How can I do that?

  • What do you want to get out? - MaxU
  • I want the output to give me every single piece of text from UA to \ n in different variables. The first piece in the first variable, the second in the second ... - DanTheMan
  • re.findall(r'UA([^\n]+)\n', s, re.S & re.M) ? - MaxU
  • TypeError: expected string or buffer - DanTheMan
  • I can't download this link docs.google.com/spreadsheet/… - DanTheMan

2 answers 2

Use the Pandas module:

 import pandas as pd url = r'https://docs.google.com/spreadsheet/ccc?key=11WR6rwQhL4wUDN8I77ju_5rzZl8IglsUjtUDI6pZsAQ&output=csv' df = pd.read_csv(url, skiprows=28, header=None) 

Result:

 In [55]: df Out[55]: 0 1 2 3 4 5 6 \ 0 UA.A(b).0023-16 29.01.2016 ДП "ДержавтотрансНДІпроект" MINI M1 NaN NaN 1 UA.A(b).0024-16 29.01.2016 ДП "ДержавтотрансНДІпроект" MINI M1 NaN NaN 2 UA.A(b).0025-16 29.01.2016 ДП "ДержавтотрансНДІпроект" MINI M1 NaN NaN 3 UA.A(b).0028-16 29.01.2016 ДП "ДержавтотрансНДІпроект" Mercedes-Benz AMG G 63 M1G NaN NaN 4 UA.A(b).0032-16 29.01.2016 ДП "ДержавтотрансНДІпроект" NaN - CHANGFENG, HIFLY NaN 5 UA.A(b).0034-16 02.02.2016 ДП "ДержавтотрансНДІпроект" FFB Feldbinder TSA 30.3 O4 NaN NaN 6 UA.A(b).0035-16 02.02.2016 ДП "ДержавтотрансНДІпроект" DAF FT XF 105.460 N3 NaN NaN 7 UA.A(b).0036-16 02.02.2016 ДП "ДержавтотрансНДІпроект" Krone SDP 27 O4 NaN NaN 8 UA.A(b).0038-16 02.02.2016 ДП "ДержавтотрансНДІпроект" SKODA RAPID M1 NaN NaN 9 UA.A(b).0027-16 03.02.2016 ДП "ДержавтотрансНДІпроект" Mercedes-AMG AMG GT S M1 NaN NaN ... ... ... ... ... ... ... ... 83771 UA.E.3857-17 23.05.2017 ДП "Укрметртестстандарт" DACIA LODGY M1 NaN 2014 83772 UA.E.3858-17 23.05.2017 ДП "Укрметртестстандарт" RENAULT CLIO M1 NaN 2013 83773 UA.E.3859-17 23.05.2017 ДП "Укрметртестстандарт" NISSAN MICRA M1 NaN 2012 83774 UA.E.3860-17 23.05.2017 ДП "Укрметртестстандарт" CITROEN BERLINGO N1 NaN 2011 83775 UA.E.3861-17 23.05.2017 ДП "Укрметртестстандарт" NISSAN LEAF M1 NaN 2015 83776 UA.E.3862-17 23.05.2017 ДП "Укрметртестстандарт" FIAT DOBLO M1 NaN 2010 83777 UA.E.3863-17 23.05.2017 ДП "Укрметртестстандарт" AUDI S5 SPORTBACK M1 NaN 2012 83778 UA.E.3864-17 23.05.2017 ДП "Укрметртестстандарт" OPEL VIVARO N1 NaN 2013 83779 UA.E.3865-17 23.05.2017 ДП "Укрметртестстандарт" SKODA FABIA M1 NaN 2011 83780 UA.E.3870-17 24.05.2017 ДП "Укрметртестстандарт" NISSAN QUASHQAI M1 NaN 2013 7 8 9 10 11 12 0 WMWXS71020T843838 новий Євро-6 NaN https://drive.google.com/o... NaN 1 WMWXS510402D48540 новий Євро-6 NaN https://drive.google.com/o... NaN 2 WMWXS510X02D48509 новий Євро-6 NaN https://drive.google.com/o... NaN 3 WDB4632721X249338 новий Євро-6 NaN https://drive.google.com/o... NaN 4 270 од., інвойс №15HFTD010... NaN - NaN https://drive.google.com/o... NaN 5 WFB334S9BG0052729 новий - NaN https://drive.google.com/o... NaN 6 XLRTE47MS0E808234 б/к Євро-5 NaN https://drive.google.com/o... NaN 7 WKESDP27061270299 б/к - NaN https://drive.google.com/o... NaN 8 TMBAF6NH4G4020744 новий Євро-6 NaN https://drive.google.com/o... NaN 9 WMX1903781A008268 новий Євро-6 NaN https://drive.google.com/o... NaN ... ... ... ... ... ... ... 83771 UU1JSDB3551390677 б/к Євро-5 NaN https://drive.google.com/o... NaN 83772 VF17R040H49534950 б/к Євро-5 NaN https://drive.google.com/o... NaN 83773 MDHFBUK13U0522453 б/к Євро-5 NaN https://drive.google.com/o... NaN 83774 VF77N9HP0BJ654851 б/к Євро-5 NaN https://drive.google.com/o... NaN 83775 1N4AZ0CP9FC323172 б/к ЕЛ NaN https://drive.google.com/o... NaN 83776 ZFA26300009062099 б/к Євро-5 NaN https://drive.google.com/o... NaN 83777 WAUZZZ8T2CA027101 б/к Євро-5 NaN https://drive.google.com/o... NaN 83778 W0LF7B1BEDV610630 б/к Євро-5 NaN https://drive.google.com/o... NaN 83779 TMBEM25J6B3178665 б/к Євро-5 NaN https://drive.google.com/o... NaN 83780 SJNFEAJ10U2704476 б/к Євро-5 NaN https://drive.google.com/o... NaN [83781 rows x 13 columns] 
  • I save the data from the Google table in Python and do not download the date, I need to assign each variable with regex and for each UA before \ n to be assigned to each variable - DanTheMan
  • Are there any other modules? - DanTheMan
  • @DanTheMan, for sure there are other modules that allow you to do the same. Just in Pandas it is done very simply and as efficiently as possible - MaxU

I do not think that here it will be possible to pick up a good pattern for regulars. It is better to try other modules.

If you have a csv format, you can use the csv module

If you do not want to save the data, I think it will be more preferable for you, because with it you can parse just the string variable.

But you can also split the file by separator without the csv module. It is enough just to use the split() function on a variable of type string .

 for line in lines: splitted = line.split(',') # по умолчанию разделение идёт по пробелу # код далее 

split can also be used when reading a file:

 with open ('filename.csv', 'r') as f: for line in f.readlines(): line = line.strip() # или rstrip(), чтобы убрать спец символы типа \n, \t и т.п. splitted = line.split(',') 
  • Thank you so much - DanTheMan