## Other

drummers-lowrise

Message boards : Sierpinski/Riesel Base 5 Problem : 400K FFT gets less credit than 384K FFT

 Subscribe SortOldest firstNewest firstHighest rated posts first
Author Message
axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74603 - Posted: 17 Mar 2014 | 5:10:19 UTC

See this work unit: http://www.primegrid.com/workunit.php?wuid=385715431

It uses a 400K FFT, has a slightly higher runtime compared to 384K FFT (as expected), but gets about 7% less credit.

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 74605 - Posted: 17 Mar 2014 | 6:14:29 UTC - in response to Message 74603.

See this work unit: http://www.primegrid.com/workunit.php?wuid=385715431

It uses a 400K FFT, has a slightly higher runtime compared to 384K FFT (as expected), but gets about 7% less credit.

What number are you comparing it against? Make sure you're not comparing Riesel to Sierpinski numbers as they have different speeds with the same FFT size.

That being said, the benchmark speeds don't always follows a linear progression. In other words, when you adjust for the size of the number being tested, larger FFT sizes aren't always slower than smaller FFT sizes. There's certainly a general trend that the calculations slow down as the FFT gets slower as it gets larger, but that's not true for every FFT size. Furthermore, different computers process different numbers at different speeds (and, in fact, sometimes use different FFT sizes) so the benchmarks that completely flatten the credit curve on one machine will certainly have some fluctuations on other machines.

That's what you're seeing. On that computer, those particular numbers are processed comparatively faster than they were on the machine that ran the benchmarks, so you're seeing it as being "credit-poor" on the 400K FFTs (or at least some of them), and credit rich on the 384K FFTs. That won't be true on all computers.

Unfortunately, there's not much we can do about it. It's not possible to make a system that has perfectly linear credit to time ratios for all numbers on all computers. This is the best we can do, and it's certainly much better than anything we've ever had before.

(That last part might not be entirely true. I've been wondering whether it's necessary to treat zero-padded and all-complex FFTs differently. Right now, they're both lumped together and if there's a significant difference in speed that will cause some irregularities.)

EDIT: At least for the Riesel SR5's, the variation in FFT speed on the benchmark machine can't be explained by a difference between zero-padded and all-complex FFTs since all of the benchmarks were zero-padded. On the Sierpinski side, we did use both types, but the pattern (400 being slower than both 384 and 448) is very similar to the speeds on the Riesel side.
____________
My lucky number is 75898524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74617 - Posted: 17 Mar 2014 | 11:27:48 UTC

You can compare it against this workunit, for example: http://www.primegrid.com/workunit.php?wuid=385995908

That is another Riesel. Same-ish number of bits, with the difference being the FFT size (host id 422300 is mine, so I'm comparing against that). I cannot believe that _any_ machine would've benchmarked 400K as 7% faster than 384K. You need to recheck you benchmarks.

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 74619 - Posted: 17 Mar 2014 | 12:31:11 UTC - in response to Message 74617.

You can compare it against this workunit, for example: http://www.primegrid.com/workunit.php?wuid=385995908

That is another Riesel. Same-ish number of bits, with the difference being the FFT size (host id 422300 is mine, so I'm comparing against that). I cannot believe that _any_ machine would've benchmarked 400K as 7% faster than 384K. You need to recheck you benchmarks.

Reality doesn't require belief. :)

All the newer benchmarks (which includes the SR5s) are done under reproducible conditions and all are at least doublechecked.

EDIT:

The test conditions for the benchmark, if you're interested, are:

Intel Core i5 4670K CPU
1600 MHz dual channel DDR3 memory
Windows 7 professional 64 bit running in safe mode
All fans running at higher than normal speed to minimize thermal effects on speed
1 core running LLR 3.8.9 and the other cores idle

Test runs against known benchmarks are first run to verify the system is performing as expected. LLR iteration times on the second set of numbers (at 20000 iterations) are used for all timings. The current speed must match the known benchmark by no more than 0.001 ms (the minimum that LLR reports.)

All benchmarks are run at least twice, and the reference benchmark is also run after the other tests are completed to insure nothing changed while the tests are running.

If you think I can improve upon that test regimen I am open to suggestions.
____________
My lucky number is 75898524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74632 - Posted: 17 Mar 2014 | 16:49:55 UTC - in response to Message 74619.

Can you post the results of the benchmark? I would like to doublecheck against other hardware.

1 core running LLR 3.8.9 and the other cores idle

This is probably not real word applicable. You should test it with all cores firing (to accurately model cache/bandwidth/turbo boost effects). Nonetheless, I'm surprised that 400K is faster than 384K, even under this condition!

PS:- You should check your own real life work units. They back me up ;-)

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 74636 - Posted: 17 Mar 2014 | 18:20:30 UTC - in response to Message 74632.

Can you post the results of the benchmark? I would like to doublecheck against other hardware.

1 core running LLR 3.8.9 and the other cores idle

This is probably not real word applicable. You should test it with all cores firing (to accurately model cache/bandwidth/turbo boost effects). Nonetheless, I'm surprised that 400K is faster than 384K, even under this condition!

PS:- You should check your own real life work units. They back me up ;-)

I'm not looking for "real life". Actual crunching conditions involve lots of variables that are far more unpredictable as you move from host to host. If I do benchmarks on dual core CPUs it won't hold up for quad cores. If I do benchmarks on quad core dual channel CPUs it won't hold up for quad core quad channel CPUs. If I do benchmarks for quad core quad channel CPUs it won't hold up for hex core CPUs.

Lowest common denominator is single core execution on an idle system.

What I'm looking for is a stable baseline. Nothing we do will work perfectly under all circumstances.

Here's an example with these FFTs on SR5:
llr64 -d -q"330268*5^1379919-1"
FFT 384K, 964,528 digits, 3,204,092 iterations, 1.929 ms/iteration, 6,181 seconds run time, 0.075260 uncorrected credit/sec, 1.364998 ccf (credit correction factor)

llr64 -d -q"325922*5^1572460-1"
FFT 400K, 1,099,108 digits, 3,651,158 iterations, 2.045 ms/iteration, 7,467 seconds run time, 0.080896 c/s, 1.269895 ccf

llr64 -d -q"325922*5^1637856-1"
FFT 448K, 1,144,818 digits, 3,803,003 iterations, 2.301 ms/iteration, 8,751 seconds run time, 0.074886 c/s, 1.371813 ccf

____________
My lucky number is 75898524288+1

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 74637 - Posted: 17 Mar 2014 | 18:26:17 UTC

If you are trying to convince me that your computer should be the baseline computer, feel free to put it in a box and mail it to me. I'll return it when I leave PrimeGrid. :)

If you can find a problem with my test methodology, I'm listening.

Otherwise, you'll just have to accept what I've known all along: there are going to be variations between how fast different computers run different tests, and therefore only one machine -- mine -- is going to have a perfectly flat credit graph. Every other computer is going to have variations. If you have a better way of making this work, again, I'm listening.

EDIT: Let's not forget how horrible the credit used to be when we were using the BOINC credit function. It's much, much better now. 2 years ago people would have been jumping for joy if the credit function had only 7% error.
____________
My lucky number is 75898524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74671 - Posted: 18 Mar 2014 | 11:35:49 UTC

Mike, Can you post the actual benchmark numbers on which the credits are based (I assume SR5 has a separate set of numbers)?

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 74675 - Posted: 18 Mar 2014 | 12:46:27 UTC - in response to Message 74671.

Mike, Can you post the actual benchmark numbers on which the credits are based (I assume SR5 has a separate set of numbers)?

That's what I posted. Those are the actual tests used for the benchmarks.
____________
My lucky number is 75898524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74676 - Posted: 18 Mar 2014 | 13:51:28 UTC

But those show 400K as slower than 384K (2.045 ms/iteration vs 1.929 ms/iteration), hence should be getting more credits, not less.

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 916
ID: 107307
Credit: 974,532,191
RAC: 2

Message 74681 - Posted: 18 Mar 2014 | 14:44:06 UTC

It's not just the timing of a single iteration. It's that time multiplied by the number of iterations. That's why Mike gives the total time and the credit per second. The CCF is to bring all the credit/second into line with each other.

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74683 - Posted: 18 Mar 2014 | 15:41:19 UTC

How are c/s and ccf calculated?

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 916
ID: 107307
Credit: 974,532,191
RAC: 2

Message 74697 - Posted: 18 Mar 2014 | 20:28:30 UTC

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 74698 - Posted: 18 Mar 2014 | 20:31:21 UTC - in response to Message 74683.

How are c/s and ccf calculated?

Credit (in this case uncorrected credit) is defined as (digits/10000)^2 / 20. Seconds is shown in the data I posted, and is simply the number of iterations multiplied by the iteration time. You may recognize this as also being the point formula used for PRPNet tasks, which is divided by 20 when converted into BOINC credits.

You will notice that in all the cases shown, c/s * ccf equals the same number -- in other words, after applying the correction factor ccf all of the SR5 tasks produce the exact same credit per second (~0.1027). CCF (call it Cx for FFT of size x) is defined as Csgs/Cx, where Csgs is the credits/second of an SGS task and Cx is the uncorrected credits/second of FFT x. By definition, the CCF for SGS is 1.0000.

(That isn't the full credit formula, and there's a further scaling factor applied later. But this is the part that equalizes the credit between FFT sizes and subprojects.)
____________
My lucky number is 75898524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74735 - Posted: 19 Mar 2014 | 14:08:18 UTC - in response to Message 74698.

Credit (in this case uncorrected credit) is defined as (digits/10000)^2 / 20.

This is the crux of the problem. Using the c*d^2 formula for calibrating is sensitive to the size of the number used for benchmark. Had you used a higher (or lower) number for benchmarking a particular FFT, it would have yielded different correction factors.

The correct (and simpler) method is to use a c(FFT)*d formula for calibrating.

Since we know the iteration time for each FFT, we can calculate how much credit each iteration of the FFT should get.
Eg:- 400K FFT has 2.045 ms/iter ==> 489 iter/sec. With a target c/s of 0.1027, each iteration should get 2.1*10^-4 credits. So c(400K) = 2.1*10^-4.
Credit awarded will be c(FFT) * #iterations (times bonus factors).

Similarly for other FFTs. Note that the c(FFT) themselves can be easily calculated as time(FFT) * scaling factor, where the scaling factor is constant (not dependent on FFTs).

PS:- I don't know how easy (or difficult) it will be to implement such a change.
PPS:- Since the 400K benchmark used a n that is closest in size to what is currently being crunched, I suspect that my proposal will keep the current 400K credit, but drop the credit for other FFTs :-( Oh well, at least things will be consistent.

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 74738 - Posted: 19 Mar 2014 | 15:46:00 UTC

Noted and understood.

I'll need to think about that and probably do some tests. I'm not convinced you're correct, but I'm also not convinced you're wrong.

It will be a while -- months -- until I can look at this. Right now there's three apps that are broken, major issues with the way the BOINC server is interacting with Mac BOINC clients, and we really, really, really want to get database replication running before the next challenge. Credit formula improvements will have to wait.

Thanks for the help.
____________
My lucky number is 75898524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 74743 - Posted: 19 Mar 2014 | 17:34:47 UTC - in response to Message 74738.

It will be a while -- months -- until I can look at this. Right now there's three apps that are broken, major issues with the way the BOINC server is interacting with Mac BOINC clients, and we really, really, really want to get database replication running before the next challenge. Credit formula improvements will have to wait..

Understood.

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 75014 - Posted: 29 Mar 2014 | 16:02:24 UTC

Axn, I think you're right.

In the following, digits is the number of digits in the candidate, or, more precisely, log10(k)+log10(b)*n. FFT is the iteration time of an LLR test at the FFT being used, and X is a constant to make the number come out to the correct credit value for the reference SGS task.

Currently the formula is essentially digits^2 * FFT * X. You're saying it should be digits * FFT * X2. (X2 would be a different constant than X).

Since run time is exactly the number of iterations multiplied by the iteration time, and the number of iterations is proportional to the number of digits, the run time, and hence the credit, needs to be digits * FFT (i.e., iterations * iteration_time).

This will take some time to implement because I need to rerun a lot of the benchmarks on my new benchmark machine, as well as changing the formula on the server.

Thanks for pointing out the error.
____________
My lucky number is 75898524288+1

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 75017 - Posted: 29 Mar 2014 | 18:34:54 UTC - in response to Message 74603.

See this work unit: http://www.primegrid.com/workunit.php?wuid=385715431

It uses a 400K FFT, has a slightly higher runtime compared to 384K FFT (as expected), but gets about 7% less credit.

Going back to your original question, under the current formula, this WU got 449.99 credit. With the new formula, it gets 448.29 credit.

A change, yes, but not a significant change.

The fun starts when you look at your example of the 384K FFT test. It got 484.54 credit, but with the new formula it gets 423.227, which is more reasonable.

The error in the formula caused some of the FFT adjustment constants to be out of line, which was causing some the credit variations you observed.

We'll fix it, but I can't promise when.

____________
My lucky number is 75898524288+1

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 75297 - Posted: 8 Apr 2014 | 13:46:34 UTC

Jim and I just put this change into production.
____________
My lucky number is 75898524288+1

axn
Volunteer developer

Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0

Message 75333 - Posted: 9 Apr 2014 | 3:14:32 UTC

With this change, I'm getting a consistent 25.5-26.5 sec/credit across the different FFT sizes!

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 75337 - Posted: 9 Apr 2014 | 3:51:22 UTC - in response to Message 75333.

With this change, I'm getting a consistent 25.5-26.5 sec/credit across the different FFT sizes!

A few minutes ago I loaded in new FFT adjustments based on benchmarks run with 4 cores running rather than 1. That will increase credit to longer tasks, which were being penalized more for running multiple instance than shorter tasks.
____________
My lucky number is 75898524288+1

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13620
ID: 53948
Credit: 267,817,725
RAC: 323,446

Message 75341 - Posted: 9 Apr 2014 | 4:26:10 UTC - in response to Message 75333.

With this change, I'm getting a consistent 25.5-26.5 sec/credit across the different FFT sizes!

That's good validation that the formula is actually working since the test cases used for the benchmarks are the same as before.
____________
My lucky number is 75898524288+1

Message boards : Sierpinski/Riesel Base 5 Problem : 400K FFT gets less credit than 384K FFT