Parsing a CSV file with a schedule

Question

Good afternoon, dear friends!

Please, tell me .... There is a file (csv) in which the schedule (per line) of work of some "enterprises" is indicated.

In the format:

Kushi Tsuru,"Mon-Sun 11:30 am - 9 pm" Osakaya Restaurant,"Mon-Thu, Sun 11:30 am - 9 pm / Fri-Sat 11:30 am - 9:30 pm" The Stinking Rose,"Mon-Thu, Sun 11:30 am - 10 pm / Fri-Sat 11:30 am - 11 pm" McCormick & Kuleto's,"Mon-Thu, Sun 11:30 am - 10 pm / Fri-Sat 11:30 am - 11 pm" Mifune Restaurant,"Mon-Sun 11 am - 10 pm" The Cheesecake Factory,"Mon-Thu 11 am - 11 pm / Fri-Sat 11 am - 12:30 am / Sun 10 am - 11 pm" New Delhi Indian Restaurant,"Mon-Sat 11:30 am - 10 pm / Sun 5:30 pm - 10 pm" Iroha Restaurant,"Mon-Thu, Sun 11:30 am - 9:30 pm / Fri-Sat 11:30 am - 10 pm"

It is necessary to make a program that will be on request dates, for example:

Jan 01 2018 12:00AM or Feb 02 2019 11:50PM - to deduce which "enterprises" worked on this day and this time?

For example, User enters a date ( Feb 02 2019 11:50 PM ). The program displays only those "enterprises" that according to the file (schedule) work that day.

I can not imagine how this can be implemented.

Share ideas please!

@ gil9red User enters the date and time, and only time intervals are given in the schedule.
It is not clear how you can substitute the input date, get the day of the week out of it and check it with the schedule.
@AlexandrS, can you give some small examples of input and output data?
mda, here are the ranges: 11 am - 12:30 am ( 11:00 - 00:30 ) it will be very difficult to process

Answer 1 · 2019-03-07T16:54:29

This task seemed to me interesting enough to spend some time on it.

decision:

 import re import time import pandas as pd from datetime import datetime as DT #библиотека функций для парсинга: def tm_to_min(t, fmt='%H:%M %p'): try: t = time.strptime(t, '%I %p') except ValueError: t = time.strptime(t, '%I:%M %p') # return # of minutes from the midnight return t.tm_hour*60 + t.tm_min def parse_time_range(s): tm_from, tm_to = re.findall(r'(\d{1,2}\:?\d*?\s+[ap]m)', s) return tm_to_min(tm_from), tm_to_min(tm_to) def range_to_csv(s): if re.match('\d$', s): return s m = re.search(r'(\d)\s*-\s*(\d)', s) if m: a,b = map(int, m.groups()) return ''.join(map(str, (range(a, b+1)))) else: return '' def range_to_list(s): if re.match('\d$', s): return [int(s)] m = re.search(r'(\d)\s*-\s*(\d)', s) if m: a,b = map(int, m.groups()) return list(range(a, b+1)) else: return [] def get_weekdays(s): # cut off a time range s = re.sub(r'\s+\d.*$', '', s) s = (s.replace('Mon', '1') .replace('Tue', '2') .replace('Wed', '3') .replace('Thu', '4') .replace('Fri', '5') .replace('Sat', '6') .replace('Sun', '7') ) ret = '' for x in re.split('\s*,\s*', s): #ret += range_to_list(x) ret += range_to_csv(x) return ret def parse_sched(s): weekdays = get_weekdays(s) m_from, m_to = parse_time_range(s) return pd.Series([weekdays, m_from, m_to]) def dt_to_sched(s): d = pd.to_datetime(s) w = str(d.week+1) minutes = d.hour * 60 + d.minute return w,minutes def create_schedule(df): # функция "explode()" отсюда: https://stackoverflow.com/a/40449726/5741205 t = explode(df.assign(sched=df.sched.str.split('\s*/\s*')), 'sched') t[['weekdays','min_from','min_to']] = t.sched.apply(parse_sched) # replace time ranges like "11 am - 12:30 am" --> "11:00 - 23:59:59" t.loc[t.min_to < t.min_from, 'min_to'] = 24*60 return t ############################################# # парсим CSV df = pd.read_csv(r'C:\download\schedule.csv', header=None, names=['name', 'sched']) # создаем расписание в виде нормализованного DF t = create_schedule(df) # имитируем ввод даты пользователем user_date = 'Feb 02 2019 9PM' w, mins = dt_to_sched(user_date) # проверка расписания res = t.loc[t.weekdays.str.contains(w) & (mins >= t.min_from) & (mins <= t.min_to), 'name'].drop_duplicates()

result:

 In [394]: res Out[394]: 0 Kushi Tsuru 2 Osakaya Restaurant 4 The Stinking Rose 6 McCormick & Kuleto's 7 Mifune Restaurant 9 The Cheesecake Factory 11 New Delhi Indian Restaurant 14 Iroha Restaurant Name: name, dtype: object

what the DataFrame t looks like with the schedule:

 In [395]: t Out[395]: name sched weekdays min_from min_to 0 Kushi Tsuru Mon-Sun 11:30 am - 9 pm 1234567 690 1260 1 Osakaya Restaurant Mon-Thu, Sun 11:30 am - 9 pm 12347 690 1260 2 Osakaya Restaurant Fri-Sat 11:30 am - 9:30 pm 56 690 1290 3 The Stinking Rose Mon-Thu, Sun 11:30 am - 10 pm 12347 690 1320 4 The Stinking Rose Fri-Sat 11:30 am - 11 pm 56 690 1380 5 McCormick & Kuleto's Mon-Thu, Sun 11:30 am - 10 pm 12347 690 1320 6 McCormick & Kuleto's Fri-Sat 11:30 am - 11 pm 56 690 1380 7 Mifune Restaurant Mon-Sun 11 am - 10 pm 1234567 660 1320 8 The Cheesecake Factory Mon-Thu 11 am - 11 pm 1234 660 1380 9 The Cheesecake Factory Fri-Sat 11 am - 12:30 am 56 660 1440 # <-- время после полуночи игнорируется 10 The Cheesecake Factory Sun 10 am - 11 pm 7 600 1380 11 New Delhi Indian Restaurant Mon-Sat 11:30 am - 10 pm 123456 690 1320 12 New Delhi Indian Restaurant Sun 5:30 pm - 10 pm 7 1050 1320 13 Iroha Restaurant Mon-Thu, Sun 11:30 am - 9:30 pm 12347 690 1290 14 Iroha Restaurant Fri-Sat 11:30 am - 10 pm 56 690 1320

Xander xander 10.6k 2 eleven 33 · Answer 2 · 2019-03-07T12:25:48

one). Parse your lines to pull out four values from each: the day of the week the beginning of the working week, the day of the week the end of the working week, the time to start work, the time to finish the work. This part is trivial. The days of the week are best given to their numbers in the online type.

2). When the user enters the date / time:

2.1). Get the day of the week. For example, using the .isoweekday () method

2.2). Check that it is not less than the day of the beginning of the working week

2.3). Check that it is not more than the end of the working week

2.4). You allocate only time and check that it is not less than the time to start work and not more than the time to finish work.

If all checks pass, the enterprise is working at this point in time.

I apologize .... Point number 1. Never before engaged in parsing.
For example: Osakaya Restaurant, "Mon-Thu, Sun 11:30 am - 9 pm / Fri-Sat 11:30 am - 9:30 pm"
Here the structure is much more complicated than in the line that you gave in the question.

Parsing a CSV file with a schedule

2 answers 2

More articles: