In general, I read the book 'Python for complex problems of data science and machine learning' and there is an example of code:

import pandas as pd data = pd.read_csv('data/president_heights.csv') heights = np.array(data['height(cm)']) print(heights) 

This code does not work for me, I tried to run this code in different ways (I put this file into the anaconda3 folder), but it did not work out.

This is how I did (in the editor):

 import pandas as pd data = pd.read_csv('E:\\my_folder\\Python\\president_heights.csv') heights = np.array(data['height(cm)']) print(heights) 

or in the terminal:

 data = pd.read_csv('C:\\Anaconda3\\president_heights.csv') 

Here is the error code, it is the same both from the editor and from the terminal:

 Traceback (most recent call last): File "<ipython-input-21-14c66386591b>", line 1, in <module> data = pd.read_csv('C:\\Anaconda3\\president_heights.csv') File "C:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 446, in _read data = parser.read(nrows) File "C:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1036, in read ret = self._engine.read(nrows) File "C:\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1848, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas\_libs\parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error ParserError: Error tokenizing data. C error: Expected 1 fields in line 72, saw 8 

Here are lines 70 through 74 (71, 73 is empty) from the president_heights.csv file:

  <meta name="js-proxy-site-detection-payload" content="YTdhMjllODE5Njc5NjBkYzAxMGUwOTJlYWFhYmQ5YjgzZTlmZTA4OTRlNGJjZGQ4NjMzNTBlM2M0Y2FkZTA0ZHx7InJlbW90ZV9hZGRyZXNzIjoiMTA5LjI1Mi43My4xMCIsInJlcXVlc3RfaWQiOiIwRTY4OjYxNjU6NTVDQkY5OkEyNTA0OTo1Qjc4M0E4RiIsInRpbWVzdGFtcCI6MTUzNDYwNTk3NSwiaG9zdCI6ImdpdGh1Yi5jb20ifQ=="> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="html-safe-nonce" content="24cc27afd7691f29ad302e95fae059d2020b557d"> -site-detection-payload" content = "YTdhMjllODE5Njc5NjBkYzAxMGUwOTJlYWFhYmQ5YjgzZTlmZTA4OTRlNGJjZGQ4NjMzNTBlM2M0Y2FkZTA0ZHx7InJlbW90ZV9hZGRyZXNzIjoiMTA5LjI1Mi43My4xMCIsInJlcXVlc3RfaWQiOiIwRTY4OjYxNjU6NTVDQkY5OkEyNTA0OTo1Qjc4M0E4RiIsInRpbWVzdGFtcCI6MTUzNDYwNTk3NSwiaG9zdCI6ImdpdGh1Yi5jb20ifQ ==">  <meta name="js-proxy-site-detection-payload" content="YTdhMjllODE5Njc5NjBkYzAxMGUwOTJlYWFhYmQ5YjgzZTlmZTA4OTRlNGJjZGQ4NjMzNTBlM2M0Y2FkZTA0ZHx7InJlbW90ZV9hZGRyZXNzIjoiMTA5LjI1Mi43My4xMCIsInJlcXVlc3RfaWQiOiIwRTY4OjYxNjU6NTVDQkY5OkEyNTA0OTo1Qjc4M0E4RiIsInRpbWVzdGFtcCI6MTUzNDYwNTk3NSwiaG9zdCI6ImdpdGh1Yi5jb20ifQ=="> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="html-safe-nonce" content="24cc27afd7691f29ad302e95fae059d2020b557d"> " content = "DASHBOARD_V2_LAYOUT_OPT_IN, EXPLORE_DISCOVER_REPOSITORIES, UNIVERSE_BANNER, FREE_TRIALS, MARKETPLACE_INSIGHTS, MARKETPLACE_PLAN_RESTRICTION_EDITOR, MARKETPLACE_SEARCH, MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES">  <meta name="js-proxy-site-detection-payload" content="YTdhMjllODE5Njc5NjBkYzAxMGUwOTJlYWFhYmQ5YjgzZTlmZTA4OTRlNGJjZGQ4NjMzNTBlM2M0Y2FkZTA0ZHx7InJlbW90ZV9hZGRyZXNzIjoiMTA5LjI1Mi43My4xMCIsInJlcXVlc3RfaWQiOiIwRTY4OjYxNjU6NTVDQkY5OkEyNTA0OTo1Qjc4M0E4RiIsInRpbWVzdGFtcCI6MTUzNDYwNTk3NSwiaG9zdCI6ImdpdGh1Yi5jb20ifQ=="> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="html-safe-nonce" content="24cc27afd7691f29ad302e95fae059d2020b557d"> " content = "DASHBOARD_V2_LAYOUT_OPT_IN, EXPLORE_DISCOVER_REPOSITORIES, UNIVERSE_BANNER, FREE_TRIALS, MARKETPLACE_INSIGHTS, MARKETPLACE_PLAN_RESTRICTION_EDITOR, MARKETPLACE_SEARCH, MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES">  <meta name="js-proxy-site-detection-payload" content="YTdhMjllODE5Njc5NjBkYzAxMGUwOTJlYWFhYmQ5YjgzZTlmZTA4OTRlNGJjZGQ4NjMzNTBlM2M0Y2FkZTA0ZHx7InJlbW90ZV9hZGRyZXNzIjoiMTA5LjI1Mi43My4xMCIsInJlcXVlc3RfaWQiOiIwRTY4OjYxNjU6NTVDQkY5OkEyNTA0OTo1Qjc4M0E4RiIsInRpbWVzdGFtcCI6MTUzNDYwNTk3NSwiaG9zdCI6ImdpdGh1Yi5jb20ifQ=="> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="enabled-features" content="DASHBOARD_V2_LAYOUT_OPT_IN,EXPLORE_DISCOVER_REPOSITORIES,UNIVERSE_BANNER,FREE_TRIALS,MARKETPLACE_INSIGHTS,MARKETPLACE_PLAN_RESTRICTION_EDITOR,MARKETPLACE_SEARCH,MARKETPLACE_INSIGHTS_CONVERSION_PERCENTAGES"> <meta name="html-safe-nonce" content="24cc27afd7691f29ad302e95fae059d2020b557d"> 

enter image description here

enter image description here

What am I doing wrong ??

  • Replace screenshots with error text. - Sergey Gornostaev
  • @SergeyGornostaev made. - timob256 4:38 pm
  • one
    The first few lines of the csv-file can add to the question? - Sergey Gornostaev
  • @MaxU I think this is possible due to the fact that my anaconda is installed without writing in PATH - timob256
  • @ timob256, can you quote lines from your CSV from 70 to 73 in the question? - MaxU

1 answer 1

Mistake:

 ParserError: Error tokenizing data. C error: Expected 1 fields in line 72, saw 8 

says that the parser expects 1 column, and in line 72 he saw 8 columns.

In your case, the error is caused by the fact that you incorrectly downloaded CSV from GitHub. Here is the correct download address: https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/master/notebooks/data/president_heights.csv

In the first lines of the HTML file there was no comma symbol (CSV delimiter by default), therefore pd.read_csv() considered that this CSV consists of a single column.

pd.read_csv() allows you to read data directly from the link:

 url = 'https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/master/notebooks/data/president_heights.csv' df = pd.read_csv(url) 

result:

 In [2]: df Out[2]: order name height(cm) 0 1 George Washington 189 1 2 John Adams 170 2 3 Thomas Jefferson 189 3 4 James Madison 163 4 5 James Monroe 183 5 6 John Quincy Adams 171 6 7 Andrew Jackson 185 7 8 Martin Van Buren 168 8 9 William Henry Harrison 173 9 10 John Tyler 183 .. ... ... ... 32 35 John F. Kennedy 183 33 36 Lyndon B. Johnson 193 34 37 Richard Nixon 182 35 38 Gerald Ford 183 36 39 Jimmy Carter 177 37 40 Ronald Reagan 185 38 41 George HW Bush 188 39 42 Bill Clinton 188 40 43 George W. Bush 182 41 44 Barack Obama 185 [42 rows x 3 columns] 

Also, this error often occurs due to "broken" CSV files.

Example:

 ID,Text 1,Нормальная строка. 2,ОШИБКА: незаэкранированные запятые: 1,2,3 

 pd.read_csv(filename) ... skipped ... ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4 

If you fix the CSV, i.e. screen commas that are not delimiters, then everything works correctly:

 ID,Text 1,"Нормальная строка." 2,"ОШИБКА: незаэкранированные запятые: 1,2,3" 

result:

  ID Text 0 1 Нормальная строка. 1 2 ОШИБКА: незаэкранированные запятые: 1,2,3