Given two files in csv format with approximately the following content:

123; 456; 789; 

All values ​​are numeric, size is the same. It is necessary to compare the lines of the first file with the second and, if the values ​​match, write the line to the third file.

  • one
    comm -12 <(sort -u a.csv) <(sort -u b.csv) - jfs

4 answers 4

Example:

file: 1.csv :

 a;b;c 1;2;3 4;5;6 7;8;9 

file: 2.csv :

 a;b;c 7;8;9 1;1;1 4;5;6 2;2;2 

 import pandas as pd d1 = pd.read_csv('1.csv', sep=';') d2 = pd.read_csv('2.csv', sep=';') d1.merge(d2).to_csv(r'res.csv', sep=';', index=False) 

result:

 a;b;c 4;5;6 7;8;9 

    I cite the author's modified answer , taking into account the comments:

     import argparse parser = argparse.ArgumentParser() parser.add_argument('csvfile') parser.add_argument('csvfile2') args = parser.parse_args() filename1 = args.csvfile filename2 = args.csvfile2 with open('result.csv', 'w') as response_file: with open(filename1) as f: msisdn1_lines = f.readlines() with open(filename2) as f: msisdn2_lines = f.readlines() # На каждую строчку msisdn1_lines for msisdn1 in msisdn1_lines: # Делаем перебор строк другого файла for msisdn2 in msisdn2_lines: if msisdn1 == msisdn2: response_file.write(msisdn1) 

    The verification algorithm can be simplified by using the intersection method of the set (set), which returns common elements:

     with open('result.csv', 'w') as response_file: with open(filename1) as f: msisdn1_lines = set(f.readlines()) with open(filename2) as f: msisdn2_lines = set(f.readlines()) # Получаем список общих элементов common_lines = msisdn1_lines.intersection(msisdn2_lines) for line in common_lines: response_file.write(line) 

    Ps. The intersection method is replaced by the & operator, so you can simply:

     common_lines = msisdn1_lines & msisdn2_lines 

    Pps. instead of a for line in common_lines you can write to a file in one fell swoop if the lines are combined into one line.

    It was:

     for line in common_lines: response_file.write(line) 

    will be:

     response_file.write(''.join(common_lines)) 
    • from the cycle for line in common_lines: you can also get rid of: response_file.write('\n'.join(common_lines)) - MaxU
    • one
      @MaxU there even the separator is not needed, because readlines will return lines with '\ n', therefore: response_file.write(''.join(common_lines)) - gil9red

    She wrote herself answered))

     import requests import argparse def main(): parser = argparse.ArgumentParser() parser.add_argument('csvfile') parser.add_argument('csvfile2') args = parser.parse_args() filename1 = args.csvfile filename2 = args.csvfile2 with open('result.csv', 'w') as response_file: with open(filename1) as msisdn1_file: for line in msisdn1_file.readlines(): MSISDN1 = line.strip() b=False with open(filename2) as msisdn2_file: for line in msisdn2_file.readlines(): MSISDN2 = line.strip() if MSISDN1==MSISDN2: b=True if b==True: response_file.write(MSISDN1+'\n') if __name__ == '__main__': main() 
    • How does your code correlate with your question? - Serge Nazarenko
    • not copied, sorry - PyLam
    • 2
      Your code is very inefficient, because for every line from the first file you re-read the entire second file and compare it line by line. - Nikita Konin
    • import requests not needed, delete it, moreover, in python, variables in uppercase are considered constants, and MSISDN1 and MSISDN2 are not constants, this is not critical, but it is better not to do so - gil9red
    • @PyLam, I thought that it would be useful to show the example of a modified code with the comments: ru.stackoverflow.com/a/826655/201445 - gil9red

    For small files, to find common numbers for two files, you can load the numbers from each file into set () and output their intersection (not tested):

     #!/usr/bin/env python3 """Usage: common-numbers <file>...""" import re import sys from pathlib import Path def read_numbers(filename): return set(map(int, re.findall(br'\d+', Path(filename).read_bytes()))) print(*set.intersection(*map(read_numbers, sys.argv[1:]))) 

    Startup example:

     $ common-numbers a.csv b.csv