There is a certain file consisting of lines. You need to selectively remove lines starting with a specific character set using python. How to do it?

Here is an example.

ATOM 14 ATOM 15 ANISOU 15 ATOM 16 ANISOU 16 

you need to delete everything that starts with ANISOU

  • one
    Do you have ideas and suggestions on how to do this better or better? Any specific problems? - m9_psy
  • I apologize for the fact that unfairly closed this question 04.03.2017 . I hope no more such mistakes. - Yuri

2 answers 2

How to remove every other line from a python file?

Using fileinput , to transparently create a temporary file, to execute at the place of change:

 #!/usr/bin/env python3 import fileinput import os with fileinput.FileInput(filename, inplace=True, backup='.bak') as file: for i, line in enumerate(file, start=1): if i & 1: # odd print(line, end='') # keep line (stdout is redirected to the file) os.unlink(filename + '.bak') # remove the backup on success 

This for-cycle can also be written using itertools.islice :

 import sys from itertools import islice sys.stdout.writelines(islice(file, 0, None, 2)) # keep lines[::2] 

If the implementation of .writelines() does not write lines as they arrive, but loads them all into memory, then you can use an explicit for-cycle to write one line at a time without loading the entire file into memory.

For a small file, the full code can use .readlines() to get a list of lines (load the file into memory) and completely overwrite this file, risking losing the data if an error occurs:

 with open(filename) as file: lines = file.readlines()[::2] # lines to keep with open(filename, 'w') as file: file.writelines(lines) 

For a small file specified from the command line or standard input (stdin), ignoring possible errors, you can jot down :

 #!/usr/bin/env python3 import fileinput from itertools import islice print("".join(islice(fileinput.input(), 0, None, 2)), end='') 

This is a complete script. Using:

 $ every-other-line file1 file2 >output_file 

In a more general case, to remove lines in place from a file without creating a temporary file and not loading all the contents into memory, seek()/tell() works , but probably a less effective solution is to create:

 from itertools import islice with open(filename, 'r+') as file: write_offset = file.tell() # where to write next for line in islice(iter(file.readline, ''), 0, None, 2): # keep lines[::2] read_offset = file.tell() # where to read next file.seek(write_offset) file.write(line) write_offset = file.tell() file.seek(read_offset) file.truncate(write_offset) 

This more complex option also works for files that do not fit in the RAM or for which there is no space for creating a copy on the disk.

you need to delete everything that starts with ANISOU

You can adapt the above code examples:

 import fileinput import os with fileinput.FileInput(filename, inplace=True, backup='.bak') as file: for line in file: if not line.startswith('ANISOU'): print(line, end='') # keep line (stdout is redirected to the file) os.unlink(filename + '.bak') # remove the backup on success 

You can create a temporary file yourself (for example, if there is not enough space in the current directory for a copy of the file, you can explicitly specify another directory (on another disk) and use shutil.move() , if necessary):

 #!/usr/bin/env python3 from pathlib import Path from tempfile import NamedTemporaryFile path = Path(filename) with path.open() as file, \ NamedTemporaryFile('w', dir=str(path.parent), delete=False) as output_file: for line in file: if not line.startswith('ANISOU'): print(line, end='', file=output_file) Path(output_file.name).replace(path) 

Loading strings into memory:

 with open(filename) as file: lines = [line for line in file if not line.startswith('ANISOU')] with open(filename, 'w') as file: file.writelines(lines) 

It is easy to adapt to other conditions by defining a keep_line() predicate, for example:

 with open(filename) as file: lines = list(filter(keep_line, file)) with open(filename, 'w') as file: file.writelines(lines) 

where in this case:

 def keep_line(line): return not line.startswith('ANISOU') 
     #!/usr/bin/env python3 #-*- coding: utf-8 -*- f=open("./файл.txt","r") # открытие фала на чтение f2=open("./файл2.txt","w") # открытие файла на запись результата stroka=" " # Создаем не пустую строку, будет использоваться для построчного чтения файла while stroka!="" : # Запускам цикл в котором будет построчно считываться файл, в конце файла параметр stroka станет равен пустой строке и цикл завершится stroka=f.readline() # Построчное чтение файла if stroka[:6]!="ANISOU": f2.write(stroka) # Если строка не начинается с ANISOU то строка запишется в файл результата f.close # закрытие файла f2.close # закрытие файла 
    • ts / ts :) - Nick Volynkin
    • The file should not be empty lines? - vp_arth
    • @vp_arth: even an empty string has a "\n" in it. stroka == "" means EOF. Need to say the code in the answer is not idiomatic. This algorithm can be written as: for line in file: if not line.startswith('ANISOU'): output_file.write(line) - jfs