I cannot read the entire file, 500-600 MB in size, since this data is loaded into RAM, for me it is too expensive.

I read the file file_get_content 'with a limit of lines (conditionally on 1000 lines). How do I then say delete specific lines. Without using $f = file .

Details: I read a very large file of 1000 lines (the first 1000 lines), I process them in my own way and depending on the conditions, some lines need to be deleted, and some left.

I can write the result to a temporary file, and what if the script stops or something else, but there is no possibility of rollback.

  • 3
    Will sed not save the father of Russian democracy? - alexlz
  • one
    @ifrops, I don’t understand your concerns about writing to a temporary file. How did - and do it. You read the old lines in lines, write the necessary lines in temporary. Then rename (mv) it to the old one. Just make files on the same file system. All OK. The name of the temporary file is stored in some file. If the script failed, then nothing terrible happened. Of course, someone (mb. The script itself, before starting the main work) must delete the "irrelevant" temporary files. - avp
  • one
    @avp What does "by place" mean? Without creating the resulting file? So deleting a line at the beginning of a 600M file and then shoveling all the lines is an apocalyptic spectacle. If we are talking about the fact that sed'u need to set the name of the resulting file - so it is not. ( -I ) - alexlz
  • one
    @ifrops: what's wrong if the script crashes? Well, in the temporary file there will be incorrect data, delete it and start the script again. - VladD
  • one
    @ifrops, if some actions are not suitable for solving your task, it is because you (the meaning of the task) have not really described it. By the way, here you opened the file (entirely). Send the data. The script is falling. When you call again, it will still resend them (the file has not changed). So in this regard, nothing has improved. - avp

2 answers 2

The idea is this.

We read 100 (200, 1000 lines) from a file, we filter and write to the resulting file. Then we note in a special file how many lines were read and from which position (or just the block number). And so on in the loop.

If the script crashes and it is restarted, then it deducts the label for the start from the special file and starts processing further.

There are two downsides:

  • some blocks will be filtered two or more times (since the script will be restarted).
  • you need to somehow mark in the resulting file that the entire block was recorded. For example, add a tag to the end of the file, and when writing the next block, delete and add to the end again.
  • Comrades minus, you at least write what you do not like. - KoVadim
 cat myfile.txt | grep -v текст_строки_которого_удалить > newfile.txt 

But in general, describe in more detail, a couple of lines of the source file, and what you delete by what principle. The above example is poor, and obviously not for your case, but there is little information))

  • five
    better avoid the extra command: grep -v filter myfile.txt> newfile.txt - VladD