如何测量Haskell程序的顺序和并行运行时(How to measure sequential and parallel runtimes of Haskell program)

我正在从这个问题中测量haskell程序以产生下表,其中包含运行时和加速总结,因此我可以在图中绘图。

#Cores Runtimes Speedups Absolute Relative Seq ? .. .. 1 3.712 .. .. 2 1.646 .. ..

第一个问题

通过编写带有-threaded标志的程序(下面的[3]和[4])来获取1和2核心上的运行时间,我不确定顺序的那个时间([1]或[2] ]下面):

应该是没有-threaded标志的编译所获得的时间,或者 通过标志获得但未指定任何数量的核心,即没有-Nx

编译没有-threaded标志

$ ghc --make -O2 test.hs [1] $ time ./test ## number of core = 1 102334155 real 0m4.194s user 0m0.015s sys 0m0.046s

使用-threaded标志进行编译

$ ghc --make -O2 test.hs -threaded -rtsopts [2] $ time ./test ## number of core = not sure? 102334155 real 0m3.547s user 0m0.000s sys 0m0.078s [3] $ time ./test +RTS -N1 ## number of core = 1 102334155 real 0m3.712s user 0m0.016s sys 0m0.046s [4] $ time ./test +RTS -N2 ## number of core = 2 102334155 real 0m1.646s user 0m0.016s sys 0m0.046s

第二个问题

从上面可以看出,我使用time命令来测量运行时。 我正在接受“真正的”时间。 但是如果我运行带有-sstderr标志的程序,我会得到更详细的信息:

$ ghc --make -O2 test.hs -rtsopts $ ./test +RTS -sstderr 102334155 862,804 bytes allocated in the heap 2,432 bytes copied during GC 26,204 bytes maximum residency (1 sample(s)) 19,716 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 1 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 3.57s ( 3.62s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.57s ( 3.62s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 241,517 bytes per MUT second Productivity 100.0% of total user, 98.6% of total elapsed

我相信-sstderr提供了一个更准确的时间,我应该使用而不是time命令。 我对么? 另外,我应该使用哪个“总时间”(3.57s或3.62s)?

最后,在进行这样的测量时,任何一般建议/良好做法? 我知道有一些软件包允许我们对我们的程序进行基准测试,但我主要对手动测量(或使用脚本为我这样做)感兴趣。

另外:运行时是运行程序3次的中位数。

I am taking measurement of the haskell program from this question to produce the following table with runtimes and speedups summary so I can plot in a graph.

#Cores Runtimes Speedups Absolute Relative Seq ? .. .. 1 3.712 .. .. 2 1.646 .. ..

First question

While the runtimes on 1 and 2 cores are taken by compiling the program with the -threaded flag on ([3] and [4] below), I am not sure which time to take for the sequential one ([1] or [2] below):

should it be the time obtained by compiling without the -threaded flag, or that obtained with the flag on but then NOT specifying any number of cores i.e. with no -Nx

Compiling without -threaded flag

$ ghc --make -O2 test.hs [1] $ time ./test ## number of core = 1 102334155 real 0m4.194s user 0m0.015s sys 0m0.046s

Compiling with -threaded flag

$ ghc --make -O2 test.hs -threaded -rtsopts [2] $ time ./test ## number of core = not sure? 102334155 real 0m3.547s user 0m0.000s sys 0m0.078s [3] $ time ./test +RTS -N1 ## number of core = 1 102334155 real 0m3.712s user 0m0.016s sys 0m0.046s [4] $ time ./test +RTS -N2 ## number of core = 2 102334155 real 0m1.646s user 0m0.016s sys 0m0.046s

Second question

As can be seen from above, I am using the time command to measure the runtimes. I am taking the 'real' time. But if I run the program with the -sstderr flag on, I get more detailed information:

$ ghc --make -O2 test.hs -rtsopts $ ./test +RTS -sstderr 102334155 862,804 bytes allocated in the heap 2,432 bytes copied during GC 26,204 bytes maximum residency (1 sample(s)) 19,716 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 1 collections, 0 parallel, 0.00s, 0.00s elapsed Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 3.57s ( 3.62s elapsed) GC time 0.00s ( 0.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.57s ( 3.62s elapsed) %GC time 0.0% (0.0% elapsed) Alloc rate 241,517 bytes per MUT second Productivity 100.0% of total user, 98.6% of total elapsed

I believe that the -sstderr provides a more accurate time which I should use instead of the time command. Am I correct? Also, which of the 'Total time' (3.57s or 3.62s) should I use?

And finally, any general advice/good practice while taking measurements like this? I am aware that there are some packages that allow us to benchmark our program, but I am mainly interested in taking the measurements manually (or using a script to do that for me).

Also: the runtimes are the median of running the program 3 times.

最满意答案

我会在单核时间使用-N1 。 我相信这也限制了GC使用一个核心(我觉得这似乎适合基准测试?),但其他人可能知道更多。

至于你的第二个问题,Haskell中基准测试的答案几乎总是使用标准 。 Criterion将允许你计算一次程序的运行时间,然后你可以将它包装在运行程序的脚本中,使用-N1 , -N2等。取3次运行的中位数可以作为一个非常快速和粗略的指标,但如果你想依靠结果那么你需要更多的运行。 Criterion足够运行您的代码并执行适当的统计数据,为您提供合理的平均时间,以及置信区间和标准偏差(并尝试纠正您的机器的繁忙程度)。 我知道你问过自己做的最佳实践,但Criterion已经体现了很多:使用时钟时间,基准测试很多,并且如你所知,不要仅仅采用简单的结果。

如果你想对整个事情进行基准测试,那么Criterion对你的程序只需要很少的改动。 添加这个:

import Criterion.Main main :: IO () main = defaultMain [bench "My program" oldMain]

其中oldMain是你过去的主要功能。

I would use -N1 for the single-core time. I believe that also constrains the GC to use one core (which seems fitting for the benchmark, I think?), but others may know more.

As for your second question, the answer to benchmarking in Haskell is nearly always to use criterion. Criterion will allow you to time one run of the program, and you can then wrap it in a script which runs the program with -N1, -N2, etc. Taking the median of 3 runs is okay as a very quick and rough indicator, but if you want to rely on the results then you'll need a lot more runs than that. Criterion runs your code enough and performs the appropriate statistics to give you a sensible average time, as well as confidence intervals and standard deviation (and it tries to correct for how busy your machine is). I know you asked about best practice for doing it yourself, but Criterion already embodies a lot of it: use clock time, benchmark a lot, and as you realised, don't just take a simple mean of the results.

Criterion requires very little change to your program if you want to benchmark the whole thing. Add this:

import Criterion.Main main :: IO () main = defaultMain [bench "My program" oldMain]

where oldMain is whatever your main function used to be.

更多推荐