The problem is this: in the 1.csv file there is a column with unique values ​​of the type "OR-SDR-28HD-OLEC". Each cell of this column corresponds to the cell value of the adjacent column. There is a 2.csv file in which there are values ​​from the first column of the 1.csv file. It is necessary for the values ​​in the first column in the 2.csv file to pull in the corresponding values ​​from the second column.

1.csv looks like this:

Overwrite=No,,"* Changing the setting to ?Overwrite=Yes? will have the added effect of deactivating all of your items from the website except for those listed on this datafeed. If this is not intended, keep ?Overwrite=No?.",,,,,,,, Part #,Item #,Currency,MSRP,MAP,Checkout MAP,Selling Price,Inventory,Fulfillment Option,Shipping,Activation Mark PS2-BEC-5780,9SIAFDG6V86915,USD,,0.00,False,26.98,37,Seller,free,True 

2.csv must in the second column contain Item # from 1.csv

 Overwrite = Yes,,* Changing the setting to ?Overwrite=Yes? will have the added effect of deactivating all of your items from the website except for those listed on this datafeed. If this is not intended, keep ?Overwrite=No?. Part #,Item #,Currency,MSRP,MAP,Checkout MAP,Selling Price,Inventory,Fulfillment Option,Shipping,Activation Mark PS2-BEC-5780,,USD,,0.00,False,26.98,37,Seller,free,True 
  • What are your difficulties? Doku and examples have already read? - MaxU
  • @MaxU first difficulty: not strong enough in English. second: the number of cells in the above columns is approximately 11k and will continue to grow - Dmitry Vladimirovich
  • 11K lines for Pandas is "about nothing" ... ;-) With English, they will hardly help you here. Have you tried to do something yourself? Give the code (attempts) and small examples of input data and what you want to get as a result in question ... - MaxU
  • @MaxU added examples. The content of the first column will change and each Part # equates its Item # - Dmitry Vladimirovich

2 answers 2

If I correctly understood the condition of the problem:

 import pandas as pd d1 = pd.read_csv(filename1, skiprows=1, usecols=['Part #','Item #']) d2 = pd.read_csv(filename2, skiprows=1).drop('Item #',axis=1) res = d2.merge(d1, on='Part #', how='left') 

if you want to add only one column from another CSV file (DataFrame), it will be more efficient to use the .map() method:

 d1 = pd.read_csv(filename1, skiprows=1, usecols=['Part #','Item #']) d2 = pd.read_csv(filename2, skiprows=1) d2['Item #'] = d2['Part #'].map(d1.set_index('Part #')['Item #']) 

Example:

 In [106]: d2['Item #'] = d2['Part #'].map(d1.set_index('Part #')['Item #']) In [107]: d2 Out[107]: Part # Item # Currency MSRP MAP Checkout MAP Selling Price Inventory Fulfillment Option Shipping \ 0 PS2-BEC-5780 9SIAFDG6V86915 USD NaN 0.0 False 26.98 37 Seller free Activation Mark 0 True 

UPDATE: pandas.DataFrame is a tabular structure in memory. To save a DataFrame in CSV, you can use the DataFrame.to_csv () method

 d2.to_csv(r'/path/to/file.csv', index=False) 
  • Unfortunately, when executing the script, the "Item #" column remains empty. - Dmitry Vladimirovich
  • @ Dmitriy Vladimirovich, give the question a few lines (4-8) from each CSV file as text so that we can copy it ... - MaxU
  • added lines. We are talking about python2.7, right? I apologize for the untidy look of the text, while mastering the resource - Dmitry Vladimirovich
  • @ Dmitry Vladimirovich, in your examples there is not a single coincident 'Part #' :( The Python version in this case does not play a big role - the Pandas methods will work for Python 2 and for Python 3 in the same way ... - MaxU
  • Imitated the coincidence of columns, corrected the data in the question. the script still does not make changes to the second file - Dmitry Vladimirovich

Unique values ​​can be used as an index pandas.DataFrame , then you can get the necessary values, you can simply use df.loc[] :

 #!/usr/bin/env python3 import pandas as pd # use unique keys as index df = pd.read_csv('1.csv', index_col=0, names=['key', 'value'], header=None) print(df) # read as series keys = pd.read_csv('2.csv', squeeze=True, names=['key']) print('\nKeys:', *keys) print('\n', df.loc[keys]) 

Example output:

  value key a 1 b 2 c 3 d 4 Keys: bdb value key b 2 d 4 b 2 
  • I understand that the author wants to combine the data from the two CSV, and not to filter ... Although it is probably worth waiting for what the author of the question says ... - MaxU
  • @MaxU: judging by the example, the author wants to add a column with the found values. I will leave an answer for other people who, like me, understood the verbal description of the task (the author is one, Google is big). - jfs
  • Yes, maybe this answer will help someone in the future ... - MaxU