How to remove every other line from a python file?
Using fileinput , to transparently create a temporary file, to execute at the place of change:
#!/usr/bin/env python3 import fileinput import os with fileinput.FileInput(filename, inplace=True, backup='.bak') as file: for i, line in enumerate(file, start=1): if i & 1: # odd print(line, end='') # keep line (stdout is redirected to the file) os.unlink(filename + '.bak') # remove the backup on success
This for-cycle can also be written using itertools.islice :
import sys from itertools import islice sys.stdout.writelines(islice(file, 0, None, 2)) # keep lines[::2]
If the implementation of .writelines() does not write lines as they arrive, but loads them all into memory, then you can use an explicit for-cycle to write one line at a time without loading the entire file into memory.
For a small file, the full code can use .readlines() to get a list of lines (load the file into memory) and completely overwrite this file, risking losing the data if an error occurs:
with open(filename) as file: lines = file.readlines()[::2] # lines to keep with open(filename, 'w') as file: file.writelines(lines)
For a small file specified from the command line or standard input (stdin), ignoring possible errors, you can jot down :
#!/usr/bin/env python3 import fileinput from itertools import islice print("".join(islice(fileinput.input(), 0, None, 2)), end='')
This is a complete script. Using:
$ every-other-line file1 file2 >output_file
In a more general case, to remove lines in place from a file without creating a temporary file and not loading all the contents into memory, seek()/tell() works , but probably a less effective solution is to create:
from itertools import islice with open(filename, 'r+') as file: write_offset = file.tell() # where to write next for line in islice(iter(file.readline, ''), 0, None, 2): # keep lines[::2] read_offset = file.tell() # where to read next file.seek(write_offset) file.write(line) write_offset = file.tell() file.seek(read_offset) file.truncate(write_offset)
This more complex option also works for files that do not fit in the RAM or for which there is no space for creating a copy on the disk.
you need to delete everything that starts with ANISOU
You can adapt the above code examples:
import fileinput import os with fileinput.FileInput(filename, inplace=True, backup='.bak') as file: for line in file: if not line.startswith('ANISOU'): print(line, end='') # keep line (stdout is redirected to the file) os.unlink(filename + '.bak') # remove the backup on success
You can create a temporary file yourself (for example, if there is not enough space in the current directory for a copy of the file, you can explicitly specify another directory (on another disk) and use shutil.move() , if necessary):
#!/usr/bin/env python3 from pathlib import Path from tempfile import NamedTemporaryFile path = Path(filename) with path.open() as file, \ NamedTemporaryFile('w', dir=str(path.parent), delete=False) as output_file: for line in file: if not line.startswith('ANISOU'): print(line, end='', file=output_file) Path(output_file.name).replace(path)
Loading strings into memory:
with open(filename) as file: lines = [line for line in file if not line.startswith('ANISOU')] with open(filename, 'w') as file: file.writelines(lines)
It is easy to adapt to other conditions by defining a keep_line() predicate, for example:
with open(filename) as file: lines = list(filter(keep_line, file)) with open(filename, 'w') as file: file.writelines(lines)
where in this case:
def keep_line(line): return not line.startswith('ANISOU')