How to make my set_checkpoints () function handle a text file correctly

Question

I have a text file data.txt :

 яблоко банан абрикос яблоко апельсин апельсин мандарин абрикос абрикос абрикос абрикос абрикос

I try to make sure that after processing the data.txt file with the set_checkpoints() function, the output is the save.txt file:

 //А//яблоко//А// //А//банан//А// //А//абрикос//А// //А//яблоко//А// //А//апельсин//А// //А//апельсин//А// //А//мандарин//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А//

But as a result, I get this:

 //А//яблоко//А// //А//банан//А// //А//абрикос//А// яблоко //А//апельсин//А// апельсин //А//мандарин//А// абрикос абрикос абрикос абрикос абрикос

Solution: It is necessary to make the function, when finding the desired word from the array, continue to search for this word until the end of the file, rather than interrupt on the first match and move to the next word. As a result, it turns out that he finds from the array of fruits match with a word from a text file, marks it with checkpoints and proceeds to search for the next word from the array. How to make the function work correctly?

Here is my code:

 import re path = 'data.txt' # input data save = 'save.txt' # output data def open_read(path): # функция открывает и считывает файл file = open(path, 'r') content = file.read() file.close() #print(content) return content Fruits = ['яблоко','банан','абрикос','апельсин','мандарин'] # массив ключевых слов n=len(Fruits) # определение размера массива Fruits def set_checkpoints(content): # функция устанавливает checkpoints for i in range(n): find = re.compile(Fruits[i]) res = find.search(content) #lenght = len(find.findall(content)) # определяет общее количество конкретного найденного слова #print (lenght) #for i in range(lenght): if res == None: continue # Если не обнаружено слово else: k1 = res.start() k2 = res.end() content = content[:k1]+"//А//"+content[k1:k2]+"//А//"+content[k2:] print(content) return content ############################################### content = open_read(path) # окрываем файл content = set_checkpoints(content) # обрабатываем файл функцией set_checkpoints() file = open(save, 'w') # сохраняем файл file.write(content) file.close()

PS
I understand that the point here is in the search() method, since The search() method searches the entire string, but returns only the first match found. Here it seems you need to use the findall() method, since this method returns a list of all matches found. But I do not know how to findall() implement the installation of checkpoints. The start() and end() properties work only in the search() method.

Help me please.

Where did the space appear in //А// абрикос//А// ?

Accepted Answer · 2018-07-23T07:09:39

No regular expressions are needed here:

 fruits = ['яблоко', 'банан', 'абрикос', 'апельсин', 'мандарин'] with open('input.txt', encoding='utf-8') as f_in: with open('output.txt', 'w', encoding='utf-8') as f_out: for line in f_in: # Для удаления справа пустых символов: ' ', '\n', '\r', и т.п. line = line.rstrip() # Если фрукт есть в списке if line in fruits: f_out.write('//А//{}//А//\n'.format(line))

If you work with the text and through the function:

 FRUITS = ['яблоко', 'банан', 'абрикос', 'апельсин', 'мандарин'] def set_checkpoints(text: str) -> str: # В одну строку return '\n'.join( '//А//{}//А//'.format(line) for line in text.splitlines() if line in FRUITS ) # NOTE: тот же код, что выше # new_lines = [] # # for line in text.splitlines(): # if line in FRUITS: # new_lines.append('//А//{}//А//'.format(line)) # # return '\n'.join(new_lines) with open('input.txt', encoding='utf-8') as f: content = f.read() # Обрабатываем файл функцией set_checkpoints() content = set_checkpoints(content) with open('output.txt', 'w', encoding='utf-8') as f: f.write(content)

Result (output.txt):

 //А//яблоко//А// //А//банан//А// //А//абрикос//А// //А//яблоко//А// //А//апельсин//А// //А//апельсин//А// //А//мандарин//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А//

MaxU MaxU 52.7k 6 gold signs 18 silver marks 51 bronze marks · Answer 2 · 2018-07-23T10:54:02

If you plan to process small files (obviously smaller half of free RAM), then you can replace the text without cycles:

 import re from pathlib import Path fruits = ['яблоко','банан','абрикос','апельсин','мандарин'] # массив ключевых слов pref, suff = '//А//', '//А//' text = Path(r'D:\temp\1.txt').read_text(encoding='utf-8') pat = r'(\b)({})(\b)'.format('|'.join(fruits)) text = re.sub(pat, r'\1{}\2{}\3'.format(pref, suff), text) Path(r'D:\temp\result.txt').write_text(text, encoding='utf-8') # check print(Path(r'D:\temp\result.txt').read_text(encoding='utf-8'))

result:

 //А//яблоко//А// //А//банан//А// //А//абрикос//А// //А//яблоко//А// //А//апельсин//А// //А//апельсин//А// //А//мандарин//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А// //А//абрикос//А//

How to make my set_checkpoints () function handle a text file correctly

2 answers 2

More articles: