There is a text file:
автозапчасти лексус новосибирск автозапчасти лексус в туле запчасти для lexus ls 460 разборка lexus rx запчасти на лексус rx 330 бу разборка lexus rx Can I remove duplicate strings with Python 3?
There is a text file:
автозапчасти лексус новосибирск автозапчасти лексус в туле запчасти для lexus ls 460 разборка lexus rx запчасти на лексус rx 330 бу разборка lexus rx Can I remove duplicate strings with Python 3?
Found a working solution:
file ='C:\\words.txt' uniqlines = set(open(file,'r', encoding='utf-8').readlines()) gotovo = open(file,'w', encoding='utf-8').writelines(set(uniqlines)) It removes duplicates. But unfortunately it also changes the layout of the lines. So the question remains relevant.
You can use the fileinput to change the file in place:
#!/usr/bin/env python3 import sys import fileinput with fileinput.FileInput(inplace=True, backup='.bak', mode='rb') as file: seen = set() for line in file: if line not in seen: # first time seen.add(line) sys.stdout.buffer.write(line) # redirected to the file Example:
T:\> python remove-duplicates-inplace.py C:\words.txt Strings are compared literally, that is, even if the difference is only in spaces, the lines are considered different. You can normalize spaces if necessary:
for line in file: words = tuple(line.split()) if words not in seen: seen.add(words) sys.stdout.buffer.write(line) You can open the files manually:
#!/usr/bin/env python3 from collections import OrderedDict filename = r'C:\words.txt' with open(filename, encoding='utf-8') as file: uniq = OrderedDict.fromkeys(file) with open(filename, 'w', encoding='utf-8') as file: file.writelines(uniq) Both solutions require that unique strings be loaded into memory. If this is not the case, then you can use an external sort so that duplicate lines go in a row in a file, and then delete them using an algorithm that does not load unique lines into memory. .
Maybe. For example:
def delete_string(): File = open('test.txt', 'r') str_list = [] for i in File.readlines(): if i not in str_list: str_list.append(i) File.close() File = open(a, 'w') for j in str_list: File.write(j) The code is not quality. But come down :)
from tempfile import mkstemp from os import close from shutil import move def write_lines(file='words.txt'): ft, temp = mkstemp() # создать temp-файл lines = [] # "уникальные" строки из file with open(temp, 'w') as t, open(file) as f: for line in f: # читать file построчно if line not in lines: # для line, отсутствующих в lines lines.append(line) # сохранить line в lines t.write(line) # записать line в temp-файл close(ft) # закрыть temp-файл move(temp, file) # переместить/переименовать temp-файл в file Source: https://ru.stackoverflow.com/questions/631054/
All Articles