I am generally a Pedant in and of myself and I don’t like it when there is rubbish left in a particular directory or database by means of program crashes or errors. I have never written serious projects and in order to get a lesson by reading your comments, I will ask a series of questions in detail writing everything down.

For example, there is a script that allows two users to communicate with each other, share photos, videos, music, etc., and the principle of the script is as follows:

The user prints a message, uploads a photo (files are loaded into a temporary folder by ajax) and a music file, then clicks the "Send" button and the data flies to the server, and the following happens on the server:

A text message is written to the database, then the next step is to write to another photo info table (path to the photo, dimensions, etc.), and then move the photo from the temporary directory to a permanent one, the same with the music file.
Everything seemed to be fine, we got the expected result, but what if after the text message is recorded in the database and the file information is also recorded, then after that there will be a failure? After all, failures can be? This is a technique. With server, internet, etc. anything can happen during execution.

At the exit, we receive a message that, without attachments and moreover, half of the information about attachments was recorded, and half was not and it is still not clear where the files themselves remained.

The idea came to my mind at the beginning of the execution of a code to be implemented something like a map, for example, from an array and write all the information into it, namely:
- Full path to the files from the temporary folder
- The full path to the files where we will have to move the files from the temporarily folder
- And other possible information information

And this card immediately first thing to write to the table (I will not describe where, the main thing is to understand that this infa recorded)

And at the end of the script, if everything went well, then we delete the map, and if the map, for example, after updating the page, remains in the table, then the script worked incorrectly and just run on this map and delete everything from everywhere.

I’m not trying to invent or discover something, I’d just like to know if this should be done? Or is there a simpler way I've never heard of? Or maybe all this wild nonsense?
I will be happy to hear from you and get good advice.
Thank!!!

  • You can save all the information at once in a post. just write it all at once to all tables in a single transaction. And in the post status of operations. Something tells me that simply “everything is completed / in work” is enough. Although of course you can fix the stage. It is highly desirable that the temporary and working folders are on the same file system. What files to transfer move, instead of to read-write. move only directories entries. - Mike
  • But no one will give a 100% guarantee. This DB ensures that the data physically gets on the disk. And the file system caches the record and, after an unexpected reboot, it may turn out that the files were not transferred or became zero length (xfs often makes fun of it). Of course, you can give sync, but frequent sync will slow down. So I would do the garbage collection procedure. And the post until it is complete is not marked not to show everyone, but only to the creator, that after the failure it could load the files, fix something - Mike
  • Fixing stages implies frequent queries to the database. In the evening my head is not thinking. I'll drop by here tomorrow))) Today was a busy day. - Hit-or-miss

1 answer 1

если данные не провалидировались - эксепшен begin в бд, делаем записи в таблицы, произошла ошибка, rollback, эксепшен // баг в валидации //или сломалась бд, кончилось место, неправильный запрос //если все окей с созданием записи в бд: записываем файл, если файл не записался/был испорчен: удаляем файл, rollback, эксепшен иначе - commit 

you can, for example, if the load is not very high, store files in the database, then the rollback will also roll back files.

or, say, you can make a script that will run through the database and the list of files every day and check the validity of the files, the bad ones - to delete.

The topic is very big, in fact, the question is very wide, and for small projects without super-high workloads, the usual solution in such cases is simply to restore from frequent backups in case of any errors.

in large projects - everything is more complicated and you cannot write in one post.

the rest of the functionality (such as after a file has been corrupted, etc.) is solved by essentially cluster methods - three servers, if one file becomes corrupted, a disk has gone, etc., then the version that matches on the other two servers is considered correct. there is about the same duplication at the level of computers / processors in space applets and mainframes

Read about netflix and google architectures - they often solve such problems. (netflix with errors on machines is struggling in an amusing way - they randomly kill hosts on the network, every day, so that the recovery process is perfectly debugged)

ps the question "whether it is necessary to implement such a thing" is very dependent on your service and the expected level of its quality. one thing is that users are messaging on the social network and once out of a million a photo display error occurs, the user will simply restart it again, and if we assume you have an air defense network management service and one of the launchers received a broken file with a terrain map - oche ploho, a rocket can fly to your office at the exercises.

  • I have a MyISAM engine and it does not support transactions at the database level, unfortunately. Anyway, I have something to read, thanks! - Hit-or-miss
  • without the use of transactional database, there is not much point in working to do transactional programs (it is possible, but it will be slower, more excess code and more errors) - strangeqargo