Hello! It is necessary to search for a pattern in a file of any format. Any pattern of bytes can act as a pattern. How can this be done using Python?
1 answer
To find out whether a file contains a possible multi-line pattern of sizes from one byte to all bytes in the file:
#!/urs/bin/env python2 """Usage: find-pattern <pattern> <filename>""" import mmap import sys from contextlib import closing pattern, filename = sys.argv[1:] with open(filename, 'rb', 0) as file, \ closing(mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)) as s: sys.exit(s.find(pattern) == -1) The script returns zero if pattern found and 1 otherwise. mmap supports files and more available memory.
If the file size is larger than the address space (on a 32-bit system it may occur), then mmap in parts can be made and checked len(pattern) parts on the border of the pieces by hand. In this case, perhaps the option with file.read(chunksize) more convenient to use directly without mmap.
|
pattern in s, where s is mmap, will tell if there is a pattern in the file.s.find(pattern)returns the position of the pattern in the file (offset in bytes). Here is an example code with mmap (at the end of the answer) —selection of random words from a file . - jfs