如何强制正在运行的程序使用外部方法将其I / O缓冲区的内容刷新到磁盘?(How to force a running program to flush the contents of its I/O buffers to disk with external means?)

我有一个长期运行的C程序,它在开始时打开一个文件,在执行期间写出“有趣”的东西,并在它完成之前关闭文件。 使用gcc -o test test.c (gcc版本5.3.1。)编译的代码如下所示:

//contents of test.c #include<stdio.h> FILE * filept; int main() { filept = fopen("test.txt","w"); unsigned long i; for (i = 0; i < 1152921504606846976; ++i) { if (i == 0) {//This case is interesting! fprintf(filept, "Hello world\n"); } } fclose(filept); return 0; }

问题在于, 由于这是一个科学计算 (想想搜索素数,或者你最喜欢的难以破解的东西)它可能真的运行了长时间。 由于我确定我没有足够的耐心,我想中止当前的计算,但我想通过某种方式以一种智能的方式通过外部手段强制程序清除当前在OS中的所有数据缓冲区/磁盘缓存,无论在哪里。

这是我尝试过的(对于上面的虚假程序,当然不适用于目前仍在运行的真实交易):

按ctrl + C; 要么 发送kill -6 <PID> (并且还kill -3 <PID> ) - 正如@BartekBanachewicz所建议的那样,

但是在这些方法中的任何一个之后,在程序的最开始创建的文件test.txt 仍然是空的 。 这意味着, fprintf()的内容在计算过程中被留在某个中间缓冲区中,等待一些OS /硬件/软件刷新信号,但由于没有获得这样的信号,内容消失了 。 这也意味着@EJP的评论

你的问题是基于一个谬论。 “OS缓冲区/磁盘缓存中的东西”不会丢失。

似乎不适用于此。 经验表明,这些东西确实迷失了。

我正在使用Ubuntu 16.04,如果可能的话,我愿意将调试器附加到此过程,并且如果以这种方式检索数据是安全的。 由于我之前从未做过这样的事情,如果有人能给我详细解答如何安全可靠地将内容刷入磁盘,我将不胜感激。 或者我也对其他方法持开放态度。 这里没有错误的余地,因为我不打算再次重新运行程序。

注意:当然我可以在if分支中打开和关闭一个文件,但是一旦你要编写很多东西,那就非常低效了。 重新编译程序是不可能的,因为它仍处于某种计算的中间。

注2:原始问题以与C ++相关的稍微抽象的方式被问到相同的问题,并被标记为这样(这就是为什么评论中的人建议std::flush() ,即使这是一个C ++问题)。 好吧,我想我做了一个主要的编辑。


有点相关: 如果进程被杀死,通过write()写入的数据是否会被刷新到磁盘?

I have a long-running C program which opens a file in the beginning, writes out "interesting" stuff during execution, and closes the file just before it finishes. The code, compiled with gcc -o test test.c (gcc version 5.3.1.) looks like as follows:

//contents of test.c #include<stdio.h> FILE * filept; int main() { filept = fopen("test.txt","w"); unsigned long i; for (i = 0; i < 1152921504606846976; ++i) { if (i == 0) {//This case is interesting! fprintf(filept, "Hello world\n"); } } fclose(filept); return 0; }

The problem is that since this is a scientific computation (think of searching for primes, or whatever is your favourite hard-to-crack stuff) it could really run for a very long time. Since I determined that I am not patient enough, I would like to abort the current computation, but I would like to do this in an intelligent way by somehow forcing the program by external means to flush out all the data that is currently in the OS buffer/disk cache, wherever.

Here is what I have tried (for this bogus program above, and of course not for the real deal which is currently still running):

pressing ctrl+C; or sending kill -6 <PID> (and also kill -3 <PID>) -- as suggested by @BartekBanachewicz,

but after either of these approaches the file test.txt created in the very beginning of the program remains empty. This means, that the contents of fprintf() were left in some intermediate buffer during the computation, waiting for some OS/hardware/software flush signal, but since no such a signal was obtained, the contents disappeared. This also means, that the comment made by @EJP

Your question is based on a fallacy. 'Stuff that is in the OS buffer/disk cache' won't be lost.

does not seem to apply here. Experience shows, that stuff indeed get lost.

I am using Ubuntu 16.04 and I am willing to attach a debugger to this process if it is possible, and if it is safe to retrieve the data this way. Since I never done such a thing before, I would appreciate if someone would provide me a detailed answer how to get the contents flushed into the disk safely and surely. Or I am open to other methods as well. There is no room for error here, as I am not going to rerun the program again.

Note: Sure I could have opened and closed a file inside the if branch, but that is extremely inefficient once you have many things to be written. Recompiling the program is not possible, as it is still in the middle of some computation.

Note2: the original question was asked the same question in a slightly more abstract way related to C++, and was tagged as such (that is why people in the comments suggesting std::flush(), which wouldn't help even if this was a C++ question). Well, I guess I made a major edit then.


Somewhat related: Will data written via write() be flushed to disk if a process is killed?

最满意答案

我可以添加一些清晰度吗? 显然几个月过去了,我想你的程序不再运行了......但是这里有一些关于缓冲的困惑,但仍然不清楚。

一旦你使用stdio库和FILE * ,默认情况下你的程序中会有一个相当小的(依赖于实现,但通常是一些KB)缓冲区,它会累积你写的内容,并在它满了时将它刷新到操作系统, (或关闭文件)。 当你杀死你的进程时,这个缓冲区就会丢失。

如果数据已刷新到操作系统,则将其保存在unix文件缓冲区中,直到操作系统决定将其持久保存到磁盘(通常很快),或者有人运行sync命令。 如果你扼杀了计算机上的电源,那么这个缓冲区也会丢失。 你可能不关心这种情况,因为你可能不打算扯下力量! 但这就是@EJP所讨论的内容(重新设置在OS缓冲区/磁盘缓存中的东西'不会丢失 ):你的问题是stdio缓存,而不是操作系统。

在一个理想的世界中,你会编写你的应用程序,使其在关键点处被std::flush()或std::flush() )。 在你的例子中,你会说:

if (i == 0) {//This case is interesting! fprintf(filept, "Hello world\n"); fflush(filept); }

这将导致stdio缓冲区刷新到操作系统。 我想你的真正的作家更复杂,在那种情况下,我会尝试让fflush“经常但不经常”发生。 太罕见了,当你终止这个过程时会丢失数据,而且如果你写的很多,你就会失去缓冲的性能优势。

在您描述的情况下,程序已经运行且无法停止和重写,那么您唯一的希望就是在调试器中停止它。 你需要做什么的细节取决于std lib的实现,但是你通常可以查看FILE *filept对象并开始关注指针,但是它很乱。 @ ivan_pozdeev关于在调试器中执行std::flush()或fflush()的注释很有帮助。

Can I just add some clarity? Obviously months have passed, and I imagine your program isn't running any more ... but there's some confusion here about buffering which still isn't clear.

As soon as you use the stdio library and FILE *, you will by default have a fairly small (implementation dependent, but typically some KB) buffer inside your program which is accumulating what you write, and flushing it to the OS when it's full, (or on file close). When you kill your process, it is this buffer that gets lost.

If the data has been flushed to the OS, then it is kept in a unix file buffer until the OS decides to persist it to disk (usually fairly soon), or someone runs the sync command. If you kill the power on your computer, then this buffer gets lost as well. You probably don't care about this scenario, because you probably aren't planning to yank the power! But this is what @EJP was talking about (re Stuff that is in the OS buffer/disk cache' won't be lost): your problem is the stdio cache, not the OS.

In an ideal world, you'd write your app so it fflushed (or std::flush()) at key points. In your example, you'd say:

if (i == 0) {//This case is interesting! fprintf(filept, "Hello world\n"); fflush(filept); }

which would cause the stdio buffer to flush to the OS. I imagine your real writer is more complex, and in that situation I would try to make the fflush happen "often but not too often". Too rare, and you lose data when you kill the process, too often and you lose the performance benefits of buffering if you are writing a lot.

In your described situation, where the program is already running and can't be stopped and rewritten, then your only hope, as you say, is to stop it in a debugger. The details of what you need to do depend on the implementation of the std lib, but you can usually look inside the FILE *filept object and start following pointers, messy though. @ivan_pozdeev's comment about executing std::flush() or fflush() within the debugger is helpful.

更多推荐