Search in binary file

Question

Colleagues, you need to find all entries in the file with the format "data \ model \ folder \ folder \ file.extension". There may be several. There is no time to dig up the file structure (the file is not text), so I decided to go by searching for the first match and finding the first occurrence of the file format (tga, dds in both registers), then extracting the slice (file [start: end]). This script returns too many matches from nonexistent places. Where could I make a mistake?

# -*- coding: utf-8 -*- import os import glob folder = r"C:\files\data\model" final_folder = r"C:\files\data\model_patched" files = glob.glob(folder + r'\**\*.fskin', recursive=True) filecount = 0 oc_count = 0 for file in files: with open(file, 'r+b') as f: fcontent = f.read() fcontent = fcontent.decode("utf-16", errors='ignore') cur_pos = 0 extensions = ['.TGA', '.DDS', '.tga', '.dds'] occurences = [] print("file {}:".format(file)) while cur_pos != -1: string_begin = fcontent.find(r"data\model", cur_pos) cur_pos = string_begin # начинаем с 0 или конца предыдущего вхождения string_end = 0 cur_pos_iter = int(cur_pos) # Чтобы не перезаписать cur_pos while string_begin != -1: ch = fcontent[cur_pos_iter:cur_pos_iter+4] # берём 4 символа с текущей позиции if ch not in extensions: # если не находим, увеличиваем счётчик на 1 и идём дальше cur_pos_iter += 1 else: # Если находим, ищем вхождение целиком, ставим курсор на конец вхождения и глушим цикл string_end = cur_pos_iter+4 occurence = fcontent[string_begin:string_end] cur_pos = string_end print("Found {} [{}:{}]".format(occurence, string_begin, string_end)) oc_count += 1 else: filecount += 1 filename_relative = file[file.find('model')+5:] new_filename = final_folder+filename_relative # os.makedirs(os.path.dirname(new_filename), exist_ok=True) # with open(new_filename, 'wb') as fw: # fw.write(bytearray(fcontent, 'utf-16')) # fw.close() f.close() print("Done. Replaced {} occurences in {} files.".format(filecount, oc_count))

It is not clear what is at the entrance what is at the exit.
(why not use a regular expression?) What do you want to get at the output?
At the entrance to the file with the contents of some kind of Chinese, the path to the file is still Chinese .
At the output, I want to replace these paths with the generated ones and save them in a separate file according to the folder structure (the commented code part at the end).
First, find out if the binary format allows you to change the length of the paths you want to replace (for example, are there any zeros after the line with the path or where is the length you can tweak) ¶ If it’s possible in principle to replace the string, then temporarily forget about the binary file and manually create a simple test file where you put examples of possible paths and learn how to find all the necessary paths in this text file.
example search in a binary file: Search for a pattern (sequence of bytes) in a file (several GB) in Python 2
Related Questions: Search for lines in a file , How to replace a line in a .txt file through python 3?

Search in binary file

0

More articles: