Here is a variant using the Pandas module:
In [37]: import pandas as pd In [38]: url = 'http://kinozal.tv/browse.php?s=%F7%E5%EB%EE%E2%E5%EA&g=0&c=0&v=0&d=0&w=0&t=0&f=0'
In the next line we:
- we read (parsim) the third table (by default
read_html() parses all the tables and returns a list of DataFrames, we are interested in the third table [with index 2 ]) at this URL - skipping the first column (
Unnamed: 0 ) - rename column
Unnamed: 1 -> Name - save the resulting DataFrame as
df
Code:
In [39]: df = pd.read_html(url, header=0)[2].iloc[:, 1:].rename(columns={'Unnamed: 1':'Name'})
Show the first 10 lines of our frame:
In [40]: df.head(10) Out[40]: Name ΠΠΎΠΌΠΌ. Π Π°Π·ΠΌΠ΅Ρ Π‘ΠΈΠ΄ΠΎΠ² ΠΠΈΡΠΎΠ² ΠΠ°Π»ΠΈΡ Π Π°Π·Π΄Π°Π΅Ρ 0 ΠΠΆΠΎΡΠ΄ΠΆ Π‘. ΠΠ»Π΅ΠΉΡΠΎΠ½ - Π‘Π°ΠΌΡΠΉ ... 2 342 ΠΠ 2 3 ΡΠ΅Π³ΠΎΠ΄Π½Ρ Π² 19:17 fx365 1 ΠΠΎΡΠ»Π΅Π΄Π½ΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ Π½Π° ΠΠ΅ΠΌΠ»Π΅... 1 1.9 ΠΠ 15 61 ΡΠ΅Π³ΠΎΠ΄Π½Ρ Π² 18:19 BLACKTIR 2 ΠΠΎΡΠ»Π΅Π΄Π½ΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ Π½Π° ΠΠ΅ΠΌΠ»Π΅... 2 707 ΠΠ 8 35 ΡΠ΅Π³ΠΎΠ΄Π½Ρ Π² 18:18 BLACKTIR 3 ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅... 2 3.25 ΠΠ 25 8 ΡΠ΅Π³ΠΎΠ΄Π½Ρ Π² 17:15 Olyanchik 4 Π€ΡΡΠ½Π·ΠΈΠΊ ΠΠΊΡΡΡΡΠ½. Π§Π΅Π»ΠΎΠ²Π΅ΠΊ Ρ... 2 500 ΠΠ 23 0 Π²ΡΠ΅ΡΠ° Π² 22:33 Π§Π΅Π»ΠΎΠ²Π΅ΠΊ91 5 Π§Π΅Π»ΠΎΠ²Π΅ΠΊ-Π½Π΅Π²ΠΈΠ΄ΠΈΠΌΠΊΠ° (9 ΡΠ΅Π·ΠΎΠ½... 4 4.85 ΠΠ 6 14 Π²ΡΠ΅ΡΠ° Π² 22:29 Gorgona007 6 ΠΠΎΡΠΈΡ ΠΠΈΡΠ²Π°ΠΊ - Π’ΡΠ΅Π½ΠΈΠ½Π³ Π»ΠΈΡ... 3 142 ΠΠ 7 1 Π²ΡΠ΅ΡΠ° Π² 16:57 sekes 7 Π§Π΅Π»ΠΎΠ²Π΅ΠΊ ΠΈΠ· ΠΠ°ΡΠ°ΠΌΠΈ / The Ma... 1 744 ΠΠ 20 0 Π²ΡΠ΅ΡΠ° Π² 14:39 dushevnaya 8 Π§Π΅Π»ΠΎΠ²Π΅ΠΊ - ΡΠ²Π΅ΠΉΡΠ°ΡΡΠΊΠΈΠΉ Π½ΠΎΠΆ ... 0 1.46 ΠΠ 19 1 15.10.2016 Π² 21:34 Amancio 9 ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅... 1 243 ΠΠ 53 1 15.10.2016 Π² 21:15 Amancio
print all the lines in the name (column: Name ) of which the substring 'Π΅ΠΌΡΠΈΡΠ°' is present:
In [41]: df.ix[df.Name.str.contains('Π΅ΠΌΡΠΈΡΠ°')] Out[41]: Name ΠΠΎΠΌΠΌ. Π Π°Π·ΠΌΠ΅Ρ Π‘ΠΈΠ΄ΠΎΠ² ΠΠΈΡΠΎΠ² ΠΠ°Π»ΠΈΡ Π Π°Π·Π΄Π°Π΅Ρ 3 ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅... 2 3.25 ΠΠ 25 8 ΡΠ΅Π³ΠΎΠ΄Π½Ρ Π² 17:15 Olyanchik 9 ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅... 1 243 ΠΠ 53 1 15.10.2016 Π² 21:15 Amancio 14 ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅... 11 7.25 ΠΠ 228 13 15.10.2016 Π² 02:28 daboen 15 ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅... 4 2.38 ΠΠ 53 2 14.10.2016 Π² 20:29 DaDalida 16 ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅... 7 1.58 ΠΠ 172 4 14.10.2016 Π² 19:50 jaaadina123
list of items that satisfy the condition, in the form of a regular list:
In [43]: df.ix[df.Name.str.contains('Π΅ΠΌΡΠΈΡΠ°'), 'Name'].tolist() Out[43]: ['ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ / 2016 / Π Π£ / HDTVRip (720p)', 'ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ. ΠΠΎΠ½ΡΠ΅ΡΡ Π² ΠΠ»ΠΈΠΌΠΏΠΈΠΉΡΠΊΠΎΠΌ / Π ΠΎΠΊ / 2016 / MP3', 'ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ / 2016 / Π Π£ / HDTV (1080i)', 'ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ / 2016 / Π Π£ / DVB', 'ΠΠ΅ΠΌΡΠΈΡΠ° - ΠΠ°Π»Π΅Π½ΡΠΊΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ / 2016 / Π Π£ / SATRip']
PS in general, with the help of Pandas, you can do a lot of interesting things (especially data processing) with minimal costs (minimum code) with almost maximum (for Python) performance.