There are several scripts that listen to different ports. All these scripts are written in the same directory, say / 1 /. At time N (no more than 20 minutes), another script is launched that moves all these files to a directory, say / 2 /, and then their processing starts in directory 2. At the time the files are moved from directory 1 to directory 2, to directory 1 may receive new files. Quite recently, a huge number of files became a problem (several tens of thousands in 20 minutes). The processing does not take much time, the files are stored later in their parameters for 2 weeks, after they are deleted. The start interval of taking files cannot be shortened. Historically everything works that way.

Question:

Is there any way to change directories atomically, that is, empty 2 becomes 1, and directory 1 with files becomes 2. Atomic, of course, not within the processor command, but without prejudice to the system and the operation of the scripts. Maybe there are options to somehow get stuck with inode? Well, I don’t want to rewrite it all. So any advice in this situation is welcome. There is also an idea not to use directory 2. And to process everything from directory 1 at once, but then you probably have to set an exclusive lock on each file in directory 1 for scripts writing these files to it.

  • you did not specify a significant point - the processes continue to keep the files open between the atomic write operations? - aleksandr barakin
  • one
    I think it should be possible to make a symbolic link / 1 / and change the path where it leads, then to / 2 / then to / 3 /. In this case, the scripts must finish the open files, and the new files will fall into another directory, i.e. nothing breaks. - Alexey Reytsman
  • @alexanderbarakin is not, the data came, we opened the file, wrote them to disk in directory 1, the file was closed. then all files accumulated in 1 were transferred to directory 2 and started processing them in directory 2 (there may be a situation when transferring files to 2 works when writing a file to directory 1, in practice, all the rules). During processing, the file was opened, read, processed, closed, depending on the data in it - the file was moved from directory 2 to directory / 3 / dir1 / ... - loginaz1
  • @AlexeyReytsman is an interesting idea, I will try it in practice. - loginaz1
  • @AlexeyReytsman not everything is all right if folder 2 didn’t manage to process it to the end, but we have already changed the symbolic link. but in my case everything fits. - loginaz1

2 answers 2

  1. if the writing processes do not leave directory 1 current (the “current directory” is a property of the process changed by a call to the chdir () / fchdir () function) between the facts of creating new files, the solution proposed by Alexey Reytsman in the comment will do . to "switch directories":

    1. create a new directory with an arbitrary name (for example, the current timestamp)
    2. create a symbolic link 1 pointing to this directory:

       $ ln -snf имя-нового-каталога 1 

      each time this command is executed (thanks to the -n and -f options), the link will simply change the location to which it points.

      the first time, when 1 you still will be a directory, not a link, you will need to rename it ( 1 to 2 ) in the “old” way.

    3. The same can be done for catalog 2 , turning it into a link (indicating the “real” catalog created at the previous iteration twenty minutes ago).
    4. Do not forget to delete the "real" directories at the end of their processing.
  2. if writing processes leave directory 1 current between the facts of creating new files, then more “atomicity” can be achieved by manipulating not the directory, but files in the directory, moving them from directory 1 to directory 2 :

     $ find 1 -type f -exec mv {} 2 \; 

    clarifications:

    1. directories 1 and 2 for the "smoothness" of the process should be within the same file system.
    2. for greater “atomicity”, it makes sense to keep this file system in memory ( tmpfs ).
    3. if between the opening and closing of the file by the writing process it may take some substantial time, after transferring the files to directory 2 , but before they are processed, make a delay that exceeds this maximum time of the “thinking” of the writing process.
  • Isn't it easier to logrotate on them? And the call rename as far as I remember is atomically executed. - 0andriy
  • @ 0andriy, Logrotate is not simpler - it seems to me that using logrotate would be more a complication than a simplification . but if you state your solution using logrotate , I will only be in favor. - aleksandr barakin

I will speak not on the issue, but on the ideology of the task :-)

Write the received data to a bunch of files (probably still - with dates and time in the name?), Then move them to the processing directory, process there ... It's not even Windows ... It just brings back memories of DOS / 3.1 :-)

Not only does this approach add a bunch of hemorrhoids to system administrators (Scripts !!!), so it’s also very unreliable.

Maybe still try to use a pipe or system message queue? No problems with the manufacturer / consumer synchronization, no garbage in the file system, and most importantly - it is NECESSARY!

  • Well, I really do not want to rewrite everything. And so you are right - everything has been working since very ancient times. It works and no one touched, except that the system was only updated. But they updated the sources of information (the writers) and now everything has come to a halt. The approach we tested yesterday gave speed, sort of as if everything suits us so far. But instinct suggests that all this will have to redo sooner or later. - loginaz1