The program analyzes a sample of data on sales of houses and saves graphs of price dependencies on various parameters. But for some reason it displays an error in the line:

dataset.pivot_table('price', row[i]).plot(kind='bar', stacked=True)

Here is the code itself:

 #python data analysis library import csv from pandas import read_csv import matplotlib.pyplot as plt #read dataset.csv file dataset = read_csv('dataset.csv') reader = csv.reader(open('dataset.csv'), delimiter=',', quotechar='"') #Show Characteristics-Price Addiction corr = dataset.corr()._get_item_cache(item='price').plot(kind='bar',stacked=True) plt.savefig('Characteristics-Price Addiction.png', format = 'png') plt.title('House Price Addiction With All Characteristics') plt.grid() plt.show() #save all addiction for row in reader: i = 1 while (i != 20): dataset.pivot_table('price', row[i]).plot(kind='bar', stacked=True) plt.savefig('price-' + row[i] + ' addiction', format='png') i += 1 break 

Mistake:

Grouper for 'price' not 1-dimensional

  • you use Pandas in a very strange way ... Why do you need csv.reader if you have already read the same data in DataFrame dataset ? You can put your CSV file on any file exchanger (for example: dropmefiles.com ) and briefly explain what you want to "draw"? - MaxU
  • Do you want to display 20 price dependencies from the remaining columns from a CSV file? Then you have an error in the logic of using pivot_table() - as an index, you must specify the column name, not the value: row[i] - MaxU
  • I read Reader-ohm the first line, where there are 20 parameters of the apartment, relative to which the dependence is built. Next, I loop over all 20 parameters in a loop, substituting them into a variable, alternately plotting dependency graphs. Only gives an error. - Andrey
  • Do you want to build 20 graphs in total or 20 graphs for each apartment? - MaxU
  • Only 20 graphs. The overall dependence of the parameters throughout the sample. - Andrew

1 answer 1

UPDATE:

D: \ temp \ price-bedrooms-addiction.png:

enter image description here

Here is a working example:

 import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib matplotlib.style.use('ggplot') # т.к. вы не предоставили примера CSV воспользуемся случайными данными... df = pd.DataFrame(np.random.randint(0,100,(5, 20)),columns=['p{}'.format(i) for i in range(1,21)]) df['price'] = np.random.randint(10**5, 10**7, 5) for c in df.columns.difference(['price']).tolist(): df.set_index(c)['price'].plot.bar(stacked=True) plt.savefig('d:/temp/price-{}-addiction.png'.format(c)) 

results:

D: \ temp \ price-p1-addiction.png:

enter image description here

D: \ temp \ price-p2-addiction.png:

enter image description here

D: \ temp \ price-p3-addiction.png:

enter image description here

etc.

If you want to see the dependence of the price on all parameters on one chart:

 In [98]: df.set_index('price').plot.bar(rot=0, stacked=True, figsize=(16, 12)) Out[98]: <matplotlib.axes._subplots.AxesSubplot at 0x932eb70> 

enter image description here

  • And how to understand now where is the schedule? - Andrew
  • Well, as if by the name and the name of the X-axis - MaxU
  • Could you make it on the example of my sample? For nothing is clear. Night is already: from - Andrew
  • @ Andrei, what should be in the quality of the X axis in the charts - the price or the parameter (one of 20)? - MaxU