Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
GeneferCUDA Block Size Setting
Author |
Message |
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
There is a setting on the PrimeGrid preferences page called "GeneferCUDA Block Size". This can be used to vary the kernel size within GeneferCUDA.
For any specific computer, there is a particular value of the block size that produces the fastest calculations at each value of n. This optimal value is likely to be different for the Short work units and the Long (World Record) work units.
Besides changing the speed of the calculation, making this number larger generally reduces the amount of CPU usage, but may increase screen lag.
Decreasing this number may help with screen lag, but may increase CPU usage.
The default value is zero, which lets Genefer choose the best value. Under Windows, this value is 7 for the short (N=524288) work units and 8 for the long (World Record, N=4194304) work units. The default values for both Linux and Mac are 7 for the short work units and 9 for the long work units.
Usually, there will be no need to change this value. However, as we start searching the larger N values we will be working in an environment that is new, and it may be necessary to adjust these values. Therefore, it may be beneficial for this setting to be easily adjustable.
EDIT: Updated with more up to date information about the effects of changing the block size.
____________
My lucky number is 75898524288+1 | |
|
|
http://www.primegrid.com/result.php?resultid=339565550
Here's the first wu I've run since I changed the block size to 10. From what I can see there's absolutely no difference but then I'm not sure I changed the block size enough or what I need to be looking at on the unit. It's also possible the older wu's didnt take advantage of the block size. If 10 is the default size for the unit then I see why it made no difference.
So, suggestions on block sizes to try? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
http://www.primegrid.com/result.php?resultid=339565550
Here's the first wu I've run since I changed the block size to 10. From what I can see there's absolutely no difference but then I'm not sure I changed the block size enough or what I need to be looking at on the unit. It's also possible the older wu's didnt take advantage of the block size. If 10 is the default size for the unit then I see why it made no difference.
Read the last sentence of the first post. ;-)
So, suggestions on block sizes to try?
Depends on what you want to accomplish. If you're not experiencing lag that makes you want to throw your computer out the window, leave it set to 0.
If you do have a bad lag problem, and you're running Windows, I'd suggest looking for an external cause for the lag (See Neometal*'s and Pyrus's posts for examples).
That being said, here's what I think you'll see if you start changing this setting:
For the current WU's, the actual setting used is 8. That's what is used if you leave the setting at 0. (Once 1.05 goes live you'll see a message in stderr telling you what value it's using.) I tried setting it to 7, and then 6, with no noticeable effect. GPU utilization stayed at 99%. I don't have a lag problem, so I can't say if 7 or 6 would have helped, but it didn't appear to slow anything down. At 5, however, GPU utilization dropped to about 66%, and CPU usage shot way up. WUs will therefore take 50% longer on my computer -- and use a full CPU core too. I have not tested anything below 5.
Raising the setting above 8 theoretically makes genefer a little more efficient, and uses less CPU, but may also incur more lag. Since the GPU runs at 99% at the default setting of 8, the only reason to try higher than 8 would be to lower CPU usuage.
The software, so far, allows you to set a value between 2 and 15. I have not tested all of those values, so it's entirely possible that extreme settings may cause the program or computer to crash.
I'm not quite ready to put this out there for beta testing just yet, but it probably won't be more than another day or two unless something horrible is discovered. If anyone has a pressing reason that they need this ability, let me know.
Once you have the right version of Genefer, for a change in this setting to take affect three things need to happen, in this order:
1) You must change the setting on the PrimeGrid preferences webpage,
2) Your Boinc client must do an update because of either an automatic scheduler request, or manually hitting the Update button on the project tab.
3) The GeneferCUDA program must start or restart. If a WU is already running, you can suspend and resume the WU to cause the new setting to take effect.
____________
My lucky number is 75898524288+1 | |
|
|
First, i'm not concerned with lag as the system running my gpu is not used except for running pg (lucky i know) and second, made the change to preferences and then waited until I'd gotten and finished new units after making the block change before posting so I was thinking that once a new job started with the changed block size it would affect the unit. My goal is to run the workunits as fast as possible at the current level of my gpu. So based on your comments I was just hoping that the block size would make a difference.
So apparently I'm not testing things like I need to. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Raising the setting above 8 might show a reduction in CPU usage, but you're not going to get the app to run much faster than it already is since the utilization is pegged at 99% right now.
In any event, the setting doesn't do anything until 1.05 gets released.
The primary purpose for this is mitigating lag, if there turns out to be a problem at higher Ns.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
After some testing, it seems that, at least for N=262144, valid values for the block size are 3 through 10. Anything outside of that causes an error. I'll have to wait for the WU to complete to be sure those values are OK.
If you enter a value outside of that range, the setting will be ignored.
It appears as if a value of 10 causes a slowdown of the GPU.
I'd recommend not changing the default setting, unless you're willing to do your own timing tests to see what works best on your hardware.
____________
My lucky number is 75898524288+1 | |
|
|
After some testing, it seems that, at least for N=262144, valid values for the block size are 3 through 10. Anything outside of that causes an error. I'll have to wait for the WU to complete to be sure those values are OK.
If you enter a value outside of that range, the setting will be ignored.
It appears as if a value of 10 causes a slowdown of the GPU.
I'd recommend not changing the default setting, unless you're willing to do your own timing tests to see what works best on your hardware.
I have dropped my block size to 9 to test since 10 may not be the best size for now. It will be tomorrow before I know. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
You might want to wait until 1.05 is released. The setting doesn't do anything with 1.04.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Rick, you inspired me to add a new command line option: "-b2 n", where n is the small n in a GFN, namely b^2^n+1.
GeneferCUDA-boinc 1.05 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -b2 18
Generalized Fermat Number Bench 2
SHIFT=3 710492^262144+1 Time: 27.7 ms/mul. Err: 5.00e-001 1533952 digits
SHIFT=4 710492^262144+1 Time: 7.06 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=5 710492^262144+1 Time: 1.89 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 1.05 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 1.04 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 1.11 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 1.76 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 3.83 ms/mul. Err: 4.21e-001 1533952 digits
A few things to note.
The test hits a rounding error when the block size parameter ("SHIFT") is 3. This may be an artifact of the testing, or it may be happening because the b value being used here is higher than our current search WUs.
If real WUs react to variations in SHIFT the same as this benchmark, then at N=262144 SHIFT=7 is actually a little faster that the default setting (8) on my computer.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
So the next question is, do the timing results reflect real WUs? Let's do some real tests, but at N=8192 (because it's much faster). I got rid of SHIFT=3 because it's way too slow and takes too long to run.
GeneferCUDA-boinc 1.05 beta 1 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -b2 13
Generalized Fermat Number Bench 2
SHIFT=4 2009574^8192+1 Time: 349 us/mul. Err: 3.82e-001 51636 digits
SHIFT=5 2009574^8192+1 Time: 188 us/mul. Err: 3.82e-001 51636 digits
SHIFT=6 2009574^8192+1 Time: 161 us/mul. Err: 3.82e-001 51636 digits
SHIFT=7 2009574^8192+1 Time: 207 us/mul. Err: 3.82e-001 51636 digits
SHIFT=8 2009574^8192+1 Time: 403 us/mul. Err: 3.82e-001 51636 digits
GeneferCUDA-boinc.cu(193) : cufftSafeCall() CUFFT error: 6.
Two things of note there: SHIFT=6 is the fastest; it's better than twice as fast as the default value of 8. Also, the test blows up above 8.
Now, how does this compare to real world crunching? With SHIFT=8:
GeneferCUDA-boinc 1.05 beta 1 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -shift 8 -q 2009574^8192+1
Testing 2009574^8192+1...
maxErr during b^N initialization = 0.0000 (0.023 seconds).
2009574^8192+1 is a probable composite. (RES=ab752d28c1e60445) (51636 digits) (err = 0.2070) (time = 0:01:09) 23:31:48
And with SHIFT=6:
GeneferCUDA-boinc 1.05 beta 1 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -shift 6 -q 2009574^8192+1
Testing 2009574^8192+1...
maxErr during b^N initialization = 0.0000 (0.030 seconds).
2009574^8192+1 is a probable composite. (RES=ab752d28c1e60445) (51636 digits) (err = 0.2070) (time = 0:01:10) 23:34:12
Looks like we can't really rely on the benchmarks. Back to the drawing board...
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Mike, is this setting only active when you change the default value?
I see no switch in my client_state.xml-file.
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Mike, is this setting only active when you change the default value?
I see no switch in my client_state.xml-file.
Maybe. I think I noticed the same thing at the beginning.
Don't bother writing code for this as I'll give you all the code verbatim. It will be plug and play for the other builds.
Actually, that code's already in the repository, so you don't have to wait.
Now that I know what was wrong with last night's benchmarks tests, that shouldn't be too long now. I just need to delete one line of code. Less is more...
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
It seems the benchmarks were fine. The -shift parameter wasn't working right, so all of the real runs were with the default shift value.
You might want to sit down for this one...
At the default value of 8:
Command line: GeneferCUDA-boinc-windows.exe -shift 8 -q 2009574^8192+1
Testing 2009574^8192+1...
maxErr during b^N initialization = 0.0000 (0.023 seconds).
2009574^8192+1 is a probable composite. (RES=ab752d28c1e60445) (51636 digits) (err = 0.2070) (time = 0:01:09) 23:31:48
and with shift=6, which the new benchmark says is a lot faster:
Command line: GeneferCUDA-boinc-windows.exe -shift 6 -q 2009574^8192+1
Testing 2009574^8192+1...
maxErr during b^N initialization = 0.0000 (0.022 seconds).
2009574^8192+1 is a probable composite. (RES=ab752d28c1e60445) (51636 digits) (err = 0.2070) (time = 0:00:28) 07:01:18
What's astounding is that with shift=6, the GPU is only running at 72% yet Genefer is running a lot faster. It is using a whole CPU core, however.
The benchmarks are saying that at N=262144, I can get about a 5% speed increase by going to a shift value of 7. I'd like to know if the same is true for other people, so I'm putting 1.05 Beta 1 on my website.. You can download it, and use the following command line to run the benchmarks:
GeneferCUDA-boinc-windows.exe -b2 18
You can download beta 1 here. For the command line benchmark, you only need the .exe file in that archive. To use with boinc (and the new setting), you need the .exe file and the app_info.xml (or use your own app_info).
Let me know what kind of results you see. I'll be doing a full set of benchmarks, from n=18 to n=22 and posting the results.
Mike
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Here are my benchmarks on a GTX 460 and a C2Q Q6600. YMMV.
Command line: GeneferCUDA-boinc-windows.exe -b2 18
Generalized Fermat Number Bench 2
SHIFT=4 710492^262144+1 Time: 7.01 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=5 710492^262144+1 Time: 1.9 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 1.05 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 1.04 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 1.1 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 1.74 ms/mul. Err: 4.21e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 3.81 ms/mul. Err: 4.21e-001 1533952 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 19
Generalized Fermat Number Bench 2
SHIFT=4 577098^524288+1 Time: 13.9 ms/mul. Err: 2.01e-001 3020555 digits
SHIFT=5 577098^524288+1 Time: 3.62 ms/mul. Err: 2.01e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 2.08 ms/mul. Err: 2.01e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 1.96 ms/mul. Err: 2.03e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 2.04 ms/mul. Err: 2.01e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 2.48 ms/mul. Err: 2.01e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 4.54 ms/mul. Err: 2.01e-001 3020555 digits
(Yes, I see the anomaly in maxErr. That's not important.)
Command line: GeneferCUDA-boinc-windows.exe -b2 20
Generalized Fermat Number Bench 2
SHIFT=4 468750^1048576+1 Time: 27.7 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=5 468750^1048576+1 Time: 7.05 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 4.25 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 3.94 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 3.88 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 4.13 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 5.93 ms/mul. Err: 1.64e-001 5946413 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 21
Generalized Fermat Number Bench 2
SHIFT=4 380742^2097152+1 Time: 55.3 ms/mul. Err: 3.63e-001 11703432 digits
SHIFT=5 380742^2097152+1 Time: 14.1 ms/mul. Err: 3.63e-001 11703432 digits
SHIFT=6 380742^2097152+1 Time: 8.88 ms/mul. Err: 3.63e-001 11703432 digits
SHIFT=7 380742^2097152+1 Time: 8.31 ms/mul. Err: 3.63e-001 11703432 digits
SHIFT=8 380742^2097152+1 Time: 8.09 ms/mul. Err: 3.63e-001 11703432 digits
SHIFT=9 380742^2097152+1 Time: 8.2 ms/mul. Err: 3.63e-001 11703432 digits
SHIFT=10 380742^2097152+1 Time: 9.66 ms/mul. Err: 3.63e-001 11703432 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 22
Generalized Fermat Number Bench 2
SHIFT=4 309258^4194304+1 Time: 110 ms/mul. Err: 1.56e-001 23028076 digits
SHIFT=5 309258^4194304+1 Time: 27.7 ms/mul. Err: 1.56e-001 23028076 digits
SHIFT=6 309258^4194304+1 Time: 17.8 ms/mul. Err: 1.56e-001 23028076 digits
SHIFT=7 309258^4194304+1 Time: 16.6 ms/mul. Err: 1.56e-001 23028076 digits
SHIFT=8 309258^4194304+1 Time: 16.2 ms/mul. Err: 1.56e-001 23028076 digits
SHIFT=9 309258^4194304+1 Time: 15.9 ms/mul. Err: 1.56e-001 23028076 digits
SHIFT=10 309258^4194304+1 Time: 16 ms/mul. Err: 1.56e-001 23028076 digits
It should be noted that although the benchmark code is very, very similar to the real code, it's not identical. There may be small differences between what the benchmark shows and the real processing.
____________
My lucky number is 75898524288+1 | |
|
|
Holy crap :) those are some MAJOR differences in run times. Wish I hadn't gone to bed and read this first. I will have to test this out this evening if I can.
How soon with 1.05 be published?
It looks to me that part of the difference will be based on each gpu card and how it handles the blocks because in your case 6 seems to be the magic number not 8 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Holy crap :) those are some MAJOR differences in run times. Wish I hadn't gone to bed and read this first. I will have to test this out this evening if I can.
How soon with 1.05 be published?
It looks to me that part of the difference will be based on each gpu card and how it handles the blocks because in your case 6 seems to be the magic number not 8
You can get the beta 1 from my website. Link is above.
You might want to wait for beta 2 (working on it right now). I changed the benchmark routine to use the real code, so the timings will match actual WUs exactly.
Also, 6 was the best choice at N=8192. We're not running WU's at N=8192. For our current WUs, setting the block size to 7 is better. On my computer, anyway.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
You may now download 1.05 Beta 2. It uses the actual computation software for the benchmarks, so they should be a very accurate representation of how quickly the software will run real WUs.
Here's the benchmark results:
Command line: GeneferCUDA-boinc-windows.exe -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 1.89 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 1.04 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 1.01 ms/mul. Err: 2.03e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 1.05 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 1.66 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 3.63 ms/mul. Err: 1.88e-001 1533952 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 3.62 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 2.06 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 1.92 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 1.96 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 2.36 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 4.33 ms/mul. Err: 1.72e-001 3020555 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 7.03 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 4.23 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 3.91 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 3.81 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 4.01 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 5.77 ms/mul. Err: 1.72e-001 5946413 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 14 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=6 380742^2097152+1 Time: 8.85 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=7 380742^2097152+1 Time: 8.26 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=8 380742^2097152+1 Time: 7.99 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=9 380742^2097152+1 Time: 8.03 ms/mul. Err: 1.72e-001 11703432 digits
SHIFT=10 380742^2097152+1 Time: 9.42 ms/mul. Err: 1.76e-001 11703432 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 60000^4194304+1 Time: 28 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=6 60000^4194304+1 Time: 17.8 ms/mul. Err: 6.10e-003 20041019 digits
SHIFT=7 60000^4194304+1 Time: 16.6 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=8 60000^4194304+1 Time: 16.2 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=9 60000^4194304+1 Time: 16 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=10 60000^4194304+1 Time: 16.1 ms/mul. Err: 5.86e-003 20041019 digits
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Please note that there is a bug in beta 2: it does not update Boinc's progress counter.
I'll release a beta 3 build in an hour or so once it's completed a WU and there's no other obvious errors. In the meantime, you can use beta 2 if you wish; it just won't update the progress meter in the Boinc manager.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Beta 3 can now be downloaded from here: http://www.asgoodasitgoetz.com/distribution/geneferCUDA-boinc.1.05beta3.7z
Same as beta 2 except that the boinc progress meter works.
____________
My lucky number is 75898524288+1 | |
|
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=339575564
I don't know what did you wanted hide, but the last row of log shows the whole number view:
568204^262144+1 is a probable composite. (RES=e62a18ec02671c3e) (1508509 digits) (err = 0.1270) (time = 1:19:48) 23:38:46
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=339575564
I don't know what did you wanted hide, but the last row of log shows the whole number view:
568204^262144+1 is a probable composite. (RES=e62a18ec02671c3e) (1508509 digits) (err = 0.1270) (time = 1:19:48) 23:38:46
Yes, it's only necessary to conceal the number if it's prime. Or if it's not yet finished crunching, and thus might be prime.
There didn't seem to be any point in concealing the number if it's composite.
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Command line: ./GeneferCUDA-boinc -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 1.64 ms/mul. Err: 1.88e-01 1533952 digits CPU=75%
SHIFT=6 710492^262144+1 Time: 529 us/mul. Err: 1.88e-01 1533952 digits CPU=27%
SHIFT=7 710492^262144+1 Time: 255 us/mul. Err: 1.72e-01 1533952 digits CPU=14%
SHIFT=8 710492^262144+1 Time: 188 us/mul. Err: 1.88e-01 1533952 digits CPU=9%
SHIFT=9 710492^262144+1 Time: 185 us/mul. Err: 1.88e-01 1533952 digits CPU=6%
SHIFT=10 710492^262144+1 Time: 183 us/mul. Err: 1.72e-01 1533952 digits CPU=3%
Command line: ./GeneferCUDA-boinc -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 3.46 ms/mul. Err: 1.80e-01 3020555 digits CPU=78%
SHIFT=6 577098^524288+1 Time: 886 us/mul. Err: 1.72e-01 3020555 digits CPU=23%
SHIFT=7 577098^524288+1 Time: 347 us/mul. Err: 1.72e-01 3020555 digits CPU=9%
SHIFT=8 577098^524288+1 Time: 200 us/mul. Err: 1.72e-01 3020555 digits CPU=6%
SHIFT=9 577098^524288+1 Time: 187 us/mul. Err: 1.88e-01 3020555 digits CPU=4%
SHIFT=10 577098^524288+1 Time: 184 us/mul. Err: 1.88e-01 3020555 digits CPU=2%
Command line: ./GeneferCUDA-boinc -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 8.11 ms/mul. Err: 1.72e-01 5946413 digits CPU=91%
SHIFT=6 468750^1048576+1 Time: 4.3 ms/mul. Err: 1.72e-01 5946413 digits CPU=53%
SHIFT=7 468750^1048576+1 Time: 531 us/mul. Err: 1.64e-01 5946413 digits CPU=7%
SHIFT=8 468750^1048576+1 Time: 259 us/mul. Err: 1.64e-01 5946413 digits CPU=3%
SHIFT=9 468750^1048576+1 Time: 184 us/mul. Err: 1.76e-01 5946413 digits CPU=2%
SHIFT=10 468750^1048576+1 Time: 187 us/mul. Err: 1.88e-01 5946413 digits CPU=2%
Command line: ./GeneferCUDA-boinc -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 17.8 ms/mul. Err: 1.64e-01 11703432 digits CPU=93%
SHIFT=6 380742^2097152+1 Time: 13 ms/mul. Err: 1.68e-01 11703432 digits CPU=78%
SHIFT=7 380742^2097152+1 Time: 1.46 ms/mul. Err: 1.64e-01 11703432 digits CPU=9%
SHIFT=8 380742^2097152+1 Time: 349 us/mul. Err: 1.66e-01 11703432 digits CPU=2%
SHIFT=9 380742^2097152+1 Time: 208 us/mul. Err: 1.56e-01 11703432 digits CPU=1%
SHIFT=10 380742^2097152+1 Time: 186 us/mul. Err: 1.58e-01 11703432 digits CPU=1%
Command line: ./GeneferCUDA-boinc -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 60000^4194304+1 Time: 37.8 ms/mul. Err: 5.86e-03 20041019 digits CPU=100%
SHIFT=6 60000^4194304+1 Time: 31.8 ms/mul. Err: 5.00e-01 20041019 digits CPU=92%
SHIFT=7 60000^4194304+1 Time: 18 ms/mul. Err: 6.35e-03 20041019 digits CPU=52-54%
SHIFT=8 60000^4194304+1 Time: 574 us/mul. Err: 5.86e-03 20041019 digits CPU=2%
SHIFT=9 60000^4194304+1 Time: 292 us/mul. Err: 6.35e-03 20041019 digits CPU=1%
SHIFT=10 60000^4194304+1 Time: 218 us/mul. Err: 6.35e-03 20041019 digits CPU=1%
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Ronald,
Something that both Shoichiro and myself have noticed is that when the test runs are too short (and 'too short' can be pretty long) the results can show really short execution times -- times that just don't make sense. I think your benchmarks are exhibiting that behavior.
In particular, there should absolutely be a significant increase in time per iteration as you increase N, and I don't see that in your results.
You might want to try increasing BENCHCOUNT and see if you get more rational results.
I have BENCHCOUNT set to 10000, and the benchmarks are quite annoyingly long -- but the results started getting whacky when I made it smaller. To be honest, I'm not really sure why the timing gets thrown off with shorter tests because it should be more than long enough, but it's not.
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Perhaps i broke this in the source-code or the linux-patch for lowering the cpu-usage caused that.
A diff should give you an idea what i had to change.
I hope my PM with app and source came through...
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
First error in a unit #339876464 after ~12ksec.
Shift 9 was to much...
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Ronald,
Don't use your -b2 benchmark results to select a propper shift value. While it's possible the 9 is the best value for you, it's a certainty that the benchmark results you got are meaningless.
Until we figure out what's wrong with the benchmarks they should not be used to decide what shift value is best.
There's two easy ways to determine that the benchmarks are not working:
1) The iteration times are simply too fast. Unless your hardware is really fast, the times should all be in milliseconds (ms), not microseconds (us). For the n=18 benchmark (which corresponds to our WUs), I was getting times around 1 millisecond. Since the test is executed 10,000 times, that part of the test should take about 10 seconds. If the test takes 10 seconds, but the computer is saying it did 10,000 iterations in 186 microseconds, the test should have finished in 2 seconds.
2) As you increase n, the length of each iteration increases almost by a factor of 2. So your test results at n=19 should be significantly higher than at 18, at 20 they should be significantly higher than at 19, and so on. You were seeing identical numbers at all values of n.
I have no clue what's wrong there. Shoichiro has observed the same behavior --- look at his posts on the Mersenne forum. I don't understand what caused it then, and I don't understand what caused it now. Maybe another set of eyeballs will help.
I guess the first place to start is to grab a stopwatch and see how long those tests actually take.
Your tests are saying it's taking a bit under 200 microseconds, so that should take 2 seconds to run. Is it taking 2 seconds? Or is it taking 10 to 20 seconds for each line? (My 460 takes 10 seconds per line, your 450 should be slower.) If it's taking 10 or more seconds, the clock readings are wrong. If it's taking 2 seconds, something is very wrong. Or at least very weird.
____________
My lucky number is 75898524288+1 | |
|
|
Hi Michael,
her some numbers for a Gainward nVidia GTX 580 Phantom3
1536MB
core 783 MHz
mem 1005 MHz
shader 1566 MHz
Command line: GeneferCUDA-boinc-windows.exe -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 1.34 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 536 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 513 us/mul. Err: 2.03e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 584 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 920 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 1.89 ms/mul. Err: 1.88e-001 1533952 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 2.55 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 1 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 913 us/mul. Err: 1.88e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 953 us/mul. Err: 1.88e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 1.22 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 2.2 ms/mul. Err: 1.72e-001 3020555 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 4.98 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 1.95 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 1.69 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 1.67 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 1.81 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 2.76 ms/mul. Err: 1.72e-001 5946413 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 9.83 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=6 380742^2097152+1 Time: 3.85 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=7 380742^2097152+1 Time: 3.35 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=8 380742^2097152+1 Time: 3.21 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=9 380742^2097152+1 Time: 3.26 ms/mul. Err: 1.72e-001 11703432 digits
SHIFT=10 380742^2097152+1 Time: 3.97 ms/mul. Err: 1.76e-001 11703432 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 60000^4194304+1 Time: 19.5 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=6 60000^4194304+1 Time: 7.87 ms/mul. Err: 6.10e-003 20041019 digits
SHIFT=7 60000^4194304+1 Time: 6.99 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=8 60000^4194304+1 Time: 6.65 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=9 60000^4194304+1 Time: 6.56 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=10 60000^4194304+1 Time: 6.73 ms/mul. Err: 5.86e-003 20041019 digits
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Your tests show that, on your computer, block sizes of 7, 7, 8, 8, and 9 are the fastest at n values of 18 through 22, respectively. Those are the same as I got on my card.
Two data points are better than 1!
If enough people show similar results I may change the program to default to those values.
The default values for those n are currently: 8, 8, 8, 9, and 9. (I have previously said that the defaults for those values were, 8, 8, 9, 9, and 10, but that was a mistake.) If the 7, 7, 8, 8, 9 numbers hold up for a wide range of GPUs then the current default values aren't optimal and should be changed. I'll need a lot more data, however.
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
In the moment i calculate an unit with shift=6 and see 27% cpu-usage via 'top'. For shift=7 i have only the benchmark value of 14% cpu load via 'top' but the computation time was only ~10min shorter.
A 262144-WU with the default value 8 causes 9% cpu load on one core and needed ~10ksec (~2hrs50) on my slower GTS450 green/eco-version.
From my point of view should the defaults values stay where they are. Max 10% cpu usage while calculating an unit on the gpu should be tolerable for all.
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
|
Zotac GTX 460 AMP! Edition 1024MB
Core 810 MHz
Memory 1000 MHz
Shader 1620 MHz
GeneferCUDA-boinc 1.05 beta 3 (CUDA3.2)
GeneferCUDA-boinc-windows.exe -b2 18
SHIFT=5 710492^262144+1 Time: 1.75 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 863 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 839 us/mul. Err: 2.03e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 876 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 1.41 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 3.13 ms/mul. Err: 1.88e-001 1533952 digits
GeneferCUDA-boinc-windows.exe -b2 19
SHIFT=5 577098^524288+1 Time: 3.36 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 1.72 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 1.61 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 1.65 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 2 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 3.71 ms/mul. Err: 1.72e-001 3020555 digits
GeneferCUDA-boinc-windows.exe -b2 20
SHIFT=5 468750^1048576+1 Time: 6.68 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 3.53 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 3.25 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 3.19 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 3.37 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 4.91 ms/mul. Err: 1.72e-001 5946413 digits
GeneferCUDA-boinc-windows.exe -b2 21
SHIFT=5 380742^2097152+1 Time: 13.6 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=6 380742^2097152+1 Time: 7.48 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=7 380742^2097152+1 Time: 6.89 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=8 380742^2097152+1 Time: 6.69 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=9 380742^2097152+1 Time: 6.72 ms/mul. Err: 1.72e-001 11703432 digits
SHIFT=10 380742^2097152+1 Time: 7.94 ms/mul. Err: 1.76e-001 11703432 digits
GeneferCUDA-boinc-windows.exe -b2 22
SHIFT=5 60000^4194304+1 Time: 25.5 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=6 60000^4194304+1 Time: 15.2 ms/mul. Err: 6.10e-003 20041019 digits
SHIFT=7 60000^4194304+1 Time: 13.8 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=8 60000^4194304+1 Time: 13.5 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=9 60000^4194304+1 Time: 13.4 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=10 60000^4194304+1 Time: 13.5 ms/mul. Err: 5.86e-003 20041019 digits
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
In the moment i calculate an unit with shift=6 and see 27% cpu-usage via 'top'. For shift=7 i have only the benchmark value of 14% cpu load via 'top' but the computation time was only ~10min shorter.
A 262144-WU with the default value 8 causes 9% cpu load on one core and needed ~10ksec (~2hrs50) on my slower GTS450 green/eco-version.
From my point of view should the defaults values stay where they are. Max 10% cpu usage while calculating an unit on the gpu should be tolerable for all.
It seems to use more CPU on Linux. On Windows I see much lower usage. At shift=8, I see 0 CPU (that's total across the CPU, so 1 would be 4% of a core). At shift =7, it fluctuates between 0 and 1 (0% and 4%). At 6, I see a value of about 6, which is about 24%. That's in the same ballpark as you're seeing on Linux.
That's comparing the output of the measurement tools: top on Linux, task manager on Windows. One or both might be inaccurate due to the high frequency at which a GPU app's CPU thread blocks and unblocks.
I'm actually very concerned about our ability to measure CPU utilization on this kind of app. I may need to write my own measurement tool for this.
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
That's comparing the output of the measurement tools: top on Linux, task manager on Windows. One or both might be inaccurate due to the high frequency at which a GPU app's CPU thread blocks and unblocks.
I'm actually very concerned about our ability to measure CPU utilization on this kind of app. I may need to write my own measurement tool for this.
Both tools are inaccurate because each cpu changed their clock rates more than 10.000 times per second.
Intel released a tool to read out all SB performance data directly from hidden registers.
AMDs Bulldozer makes this much easier via LWP (Lightweight Profiling). In the past you need kernel-driver and complicated configurations, with LWP you can read the latency-time, cache-miss, fpu-operations etc directly in user-mode via cpu-commands like 'lwpins', 'lwpval', 'bextr' etc.
But do you really need this accuracy?
Sysinternals Process-Explorer has a min-setting of 0.5sec and neither Windows nor Linux are known to be realtime-OS (special libraries on the Win-platform or a kernel with realtime-extensions are another story). This limits the ability for measuring and in the most cases you need only an approximate or average value.
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
That's comparing the output of the measurement tools: top on Linux, task manager on Windows. One or both might be inaccurate due to the high frequency at which a GPU app's CPU thread blocks and unblocks.
I'm actually very concerned about our ability to measure CPU utilization on this kind of app. I may need to write my own measurement tool for this.
Both tools are inaccurate because each cpu changed their clock rates more than 10.000 times per second.
Intel released a tool to read out all SB performance data directly from hidden registers.
AMDs Bulldozer makes this much easier via LWP (Lightweight Profiling). In the past you need kernel-driver and complicated configurations, with LWP you can read the latency-time, cache-miss, fpu-operations etc directly in user-mode via cpu-commands like 'lwpins', 'lwpval', 'bextr' etc.
But do you really need this accuracy?
Sysinternals Process-Explorer has a min-setting of 0.5sec and neither Windows nor Linux are known to be realtime-OS (special libraries on the Win-platform or a kernel with realtime-extensions are another story). This limits the ability for measuring and in the most cases you need only an approximate or average value.
The problem is that the typical method for measuring CPU usage -- statistical sampling -- fails miserably if the frequency at which your application is turning on and off is near a harmonic of the frequency at thich the samples are being taken.
It's very much like taking a telephone poll prior to a real election. There's systemic errors that can cause the poll to be wildly inaccurate. Top and task manager aren't actually measuring cpu usage -- they're measuring cpu usage during a very small slice of time. That slice of time is expected to be indicative of the entire time domain being tested, but there are reasons why that might not be true.
A far more accurate way of measuring cpu usage is to write a program that's essentially just an infinite loop, and have this program count the number of iterations it performs per second. For our CPUs, this would need to be multithreaded, with one thread per core, so the whole CPU is being used.
Then you start up Genefer, and record the reduction in speed of the the test program. That gives you the real CPU usage, free from any of the systemic errors that might affect the statistical sampling done by programs such as top or task manager.
The fact that we see different CPU utilization at shift=8 between Linux and Windows means one of two things -- either Nvidia's drivers or libraries are better under Windows, or the tools aren't measuring CPU consumption equally. If we're going to tune Genefer for best performance, it's important that we have confidence in the tools being used to measure the performance.
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
either Nvidia's drivers or libraries are better under Windows, or the tools aren't measuring CPU consumption equally
Possibly. I am using version 285.05.23 from Cuda-sdk 41rc2 in the moment but i have plans for a driver-update on the version 285.05.32 from the officially released Cuda41-sdk.
The differences in computation times between the Cuda-sdk's 32, 40 and 41rc2 seems to be around 5% on my GTS450. The biggest difference (between 10-20%) makes the decision for compiling a 32bit- or 64bit-app.
On the other side could it be that newer sdk's have a solution for the problems on a GTX550 and i want to make some experiments with the new LLVM-based CUDA compiler...
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
In the moment i calculate an unit with shift=6 and see 27% cpu-usage via 'top'. For shift=7 i have only the benchmark value of 14% cpu load via 'top' but the computation time was only ~10min shorter.
A 262144-WU with the default value 8 causes 9% cpu load on one core and needed ~10ksec (~2hrs50) on my slower GTS450 green/eco-version.
From my point of view should the defaults values stay where they are. Max 10% cpu usage while calculating an unit on the gpu should be tolerable for all.
Update:
Using project preferences to override SHIFT; using 6 instead of 8
maxErr during b^N initialization = 0.0000 (0.200 seconds).
Testing b^262144+1...
maxErr exceeded for 571426^262144+1, 0.5000 > 0.4500
18:21:04 (623): called boinc_finish
I will go back to 8.
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
On the other side could it be that newer sdk's have a solution for the problems on a GTX550 and i want to make some experiments with the new LLVM-based CUDA compiler...
IIRC trying the different toolkits and SDKs was the very first thing I tried with regards to the 550 problem. The behavior was unchanged with 3.2, 4.0, or 4.1.
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
IIRC trying the different toolkits and SDKs was the very first thing I tried with regards to the 550 problem. The behavior was unchanged with 3.2, 4.0, or 4.1.
My hope was that the release-version 4.1 solves the problem...
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
Switched to Windows to run the benchmarks before fully testing on Linux.
GTX 480, 1536 MB
Core 701 MHz
Memory 924 MHz
Shader 1401 MHz
Driver 280.26
System: i7 875K, 8 GB, Win 7 Ultimate 64-bit SP1
Edit: app v1.05beta3 :)
Command line: GeneferCUDA-boinc-windows.exe -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 1 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 616 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 568 us/mul. Err: 2.03e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 630 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 981 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 1.99 ms/mul. Err: 1.88e-001 1533952 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 1.88 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 1.17 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 1.05 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 1.05 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 1.33 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 2.34 ms/mul. Err: 1.72e-001 3020555 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 3.7 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 2.27 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 1.98 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 1.92 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 1.98 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 2.97 ms/mul. Err: 1.72e-001 5946413 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 7.19 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=6 380742^2097152+1 Time: 4.56 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=7 380742^2097152+1 Time: 3.87 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=8 380742^2097152+1 Time: 3.7 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=9 380742^2097152+1 Time: 3.69 ms/mul. Err: 1.72e-001 11703432 digits
SHIFT=10 380742^2097152+1 Time: 4.31 ms/mul. Err: 1.76e-001 11703432 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 60000^4194304+1 Time: 14.6 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=6 60000^4194304+1 Time: 9.36 ms/mul. Err: 6.10e-003 20041019 digits
SHIFT=7 60000^4194304+1 Time: 8.22 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=8 60000^4194304+1 Time: 7.68 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=9 60000^4194304+1 Time: 7.53 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=10 60000^4194304+1 Time: 7.63 ms/mul. Err: 5.86e-003 20041019 digits
So for my system 7, 7, 8, 8, 9 would be ok and near optimal.
The app used a full CPU core with shift=5 across all n, and roughly 30 % of a core at shift=6, n=18. Otherwise, the task manager didn't show CPU usage other than at the shift changes.
No screen lag noted.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2351 ID: 1178 Credit: 17,540,347,481 RAC: 4,278,142
                                           
|
GT 440, 1536 MB, OEM version with 144 shaders
Core 719 MHz
Memory 800 MHz
Shader 1439 MHz
Driver 285.62
System: i7 920, 6 GB, Win Vista Home 64-bit SP2
Command line: GeneferCUDA-boinc-windows.exe -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 2.31 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 2.11 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 2.04 ms/mul. Err: 2.03e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 2.18 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 3.13 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 6.71 ms/mul. Err: 1.88e-001 1533952 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 6.65 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 4.42 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 4.47 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 4.51 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 5.15 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 8.49 ms/mul. Err: 1.72e-001 3020555
digits
Command line: GeneferCUDA-boinc-windows.exe -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 13.2 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 8.91 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 8.67 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 8.78 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 8.72 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 12.3 ms/mul. Err: 1.72e-001 5946413 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 24.2 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=6 380742^2097152+1 Time: 18.9 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=7 380742^2097152+1 Time: 18.4 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=8 380742^2097152+1 Time: 18.5 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=9 380742^2097152+1 Time: 18.4 ms/mul. Err: 1.72e-001 11703432 digits
SHIFT=10 380742^2097152+1 Time: 20.4 ms/mul. Err: 1.76e-001 11703432 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 60000^4194304+1 Time: 49.6 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=6 60000^4194304+1 Time: 37.7 ms/mul. Err: 6.10e-003 20041019 digits
SHIFT=7 60000^4194304+1 Time: 36.6 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=8 60000^4194304+1 Time: 36.7 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=9 60000^4194304+1 Time: 36.3 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=10 60000^4194304+1 Time: 36.6 ms/mul. Err: 5.86e-003 20041019
digits
A little bit different pattern overall for this slower OEM card.
____________
141941*2^4299438-1 is prime!
| |
|
|
Ubuntu 10.10 64bit
AMD Phenom II X4 965 4GB
All 4 CPU cores were running BOINC based TRP_LLR's at the time.
EVGA GTX 560Ti (14 Processors, 448 CUDA cores, 1280MB, 320 bit memory interface)
Factory clocked
Core 797MHz
Memory 1950MHz
Shader 1594MHz
Nvidia Driver 290.10 (64bit)
Command line: ./primegrid_genefer_1.05_i686-pc-linux-gnu__cuda32_13 -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 805 us/mul. Err: 1.88e-01 1533952 digits
SHIFT=6 710492^262144+1 Time: 269 us/mul. Err: 1.88e-01 1533952 digits
SHIFT=7 710492^262144+1 Time: 133 us/mul. Err: 1.88e-01 1533952 digits
SHIFT=8 710492^262144+1 Time: 100 us/mul. Err: 1.88e-01 1533952 digits
SHIFT=9 710492^262144+1 Time: 100 us/mul. Err: 1.88e-01 1533952 digits
SHIFT=10 710492^262144+1 Time: 100 us/mul. Err: 1.80e-01 1533952 digits
Command line: ./primegrid_genefer_1.05_i686-pc-linux-gnu__cuda32_13 -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 1.51 ms/mul. Err: 1.88e-01 3020555 digits
SHIFT=6 577098^524288+1 Time: 444 us/mul. Err: 1.72e-01 3020555 digits
SHIFT=7 577098^524288+1 Time: 177 us/mul. Err: 1.72e-01 3020555 digits
SHIFT=8 577098^524288+1 Time: 112 us/mul. Err: 1.72e-01 3020555 digits
SHIFT=9 577098^524288+1 Time: 100 us/mul. Err: 1.72e-01 3020555 digits
SHIFT=10 577098^524288+1 Time: 100 us/mul. Err: 1.72e-01 3020555 digits
Command line: ./primegrid_genefer_1.05_i686-pc-linux-gnu__cuda32_13 -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 2.96 ms/mul. Err: 1.72e-01 5946413 digits
SHIFT=6 468750^1048576+1 Time: 922 us/mul. Err: 1.60e-01 5946413 digits
SHIFT=7 468750^1048576+1 Time: 270 us/mul. Err: 1.64e-01 5946413 digits
SHIFT=8 468750^1048576+1 Time: 136 us/mul. Err: 1.64e-01 5946413 digits
SHIFT=9 468750^1048576+1 Time: 101 us/mul. Err: 1.72e-01 5946413 digits
SHIFT=10 468750^1048576+1 Time: 101 us/mul. Err: 1.72e-01 5946413 digits
Command line: ./primegrid_genefer_1.05_i686-pc-linux-gnu__cuda32_13 -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 5.83 ms/mul. Err: 1.64e-01 11703432 digits
SHIFT=6 380742^2097152+1 Time: 3.04 ms/mul. Err: 1.68e-01 11703432 digits
SHIFT=7 380742^2097152+1 Time: 449 us/mul. Err: 1.68e-01 11703432 digits
SHIFT=8 380742^2097152+1 Time: 178 us/mul. Err: 1.66e-01 11703432 digits
SHIFT=9 380742^2097152+1 Time: 113 us/mul. Err: 1.72e-01 11703432 digits
SHIFT=10 380742^2097152+1 Time: 102 us/mul. Err: 1.57e-01 11703432 digits
Command line: ./primegrid_genefer_1.05_i686-pc-linux-gnu__cuda32_13 -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 60000^4194304+1 Time: 11.9 ms/mul. Err: 6.35e-03 20041019 digits
SHIFT=6 60000^4194304+1 Time: 7.73 ms/mul. Err: 6.35e-03 20041019 digits
SHIFT=7 60000^4194304+1 Time: 4.04 ms/mul. Err: 6.10e-03 20041019 digits
SHIFT=8 60000^4194304+1 Time: 295 us/mul. Err: 5.86e-03 20041019 digits
SHIFT=9 60000^4194304+1 Time: 156 us/mul. Err: 6.10e-03 20041019 digits
SHIFT=10 60000^4194304+1 Time: 124 us/mul. Err: 6.47e-03 20041019 digits
We're not sure that TheDawgz understand any of this - but figured that someone might want to see it. If we can provide any additional info - just ask.
We did learn that our Thermaltake TR2 RX750W PSU loves Genefer so much that it sings when running the benchmarks. Depending on the shift value and the number of digits the PSU audibly squeals, squeaks and screams at very repeatable pitches and volumes ranging from a whisper to nearly a yell.
____________
There's someone in our head but it's not us. | |
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
As Michael noted earlier, there's something wrong with benchmarking with the Linux app (1.05). I'm getting similar results to yours and ronald's and they're clearly faulty. The time for each respective shift should significantly increase with the n, and it doesn't for the higher shift values. The times are also way too fast to be realistic.
If the app is running fine in BOINC, there shouldn't be any problem with actual wu's.
BOINC wu's are processing fine here, and I'll compile a test report and post later today.
____________
| |
|
|
TheDawgz will also start running work units via BOINC later on today and will post some info in a day or so. We need to finish up some PPR Sieving first, and more importantly we need to silence our musical power supply.
With the current BOINC work units; the sound is somewhere between squeal and howl - so bad that the one BOINC w/u were testing had to be aborted due to complaints about the noise!
____________
There's someone in our head but it's not us. | |
|
|
I've tried 6,7,8 and 9 and it looks like 7 may be the best block size for my card. Question, will 32bit vs 64 bit windows make a difference in runtime? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
I've tried 6,7,8 and 9 and it looks like 7 may be the best block size for my card. Question, will 32bit vs 64 bit windows make a difference in runtime?
Yes, it will. But probably not the difference you were expecting. ;-)
I tested 32 vs. 64 when I first started. The tests showed that 32 bit was slightly faster than 64 bit. That's not so surprising because all the real work is done on the GPU. The speed of the CPU is almost irrelevant. You would expect the two to run at about the same speed, considering you see 0% CPU usage. So, why would the 32 bit version be even slightly faster? Hard to say definitively, but my guess would be that the you're getting more memory bus saturation with the 64 bit version than you are with the 32 bit version. That would vary from CPU to CPU, so YMMV.
That's on Windows, anyway. I don't know if it's the drivers or something else, but the difference in behavior between Windows and Linux is more than I would expect, so the answer for Linux might be different than the answer for Windows.
____________
My lucky number is 75898524288+1 | |
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2653 ID: 1 Credit: 89,203,733 RAC: 32,354
                     
|
Does this still hold true for different shift values (e.g. ones that utilize more CPU)?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Does this still hold true for different shift values (e.g. ones that utilize more CPU)?
Assuming you're asking about 32 bit vs 64 bit, I don't know. But I doubt it's relevant to Windows, since under Windows, the shift values that use a lot of CPU tend to also run a lot slower, so you would never want to use them anyway. I can't speak to Linux.
My guess, however, would be that no, it wouldn't make it faster, and might make it slower. The CPU doesn't do any of the math, so using 64 bit math won't help.
The CPU work is not mathematical in nature, it's all control logic to feed the GPU. There's very little opportunity to use 64 bit math.
Except for the GPU kernels and the CUDA library internals, this is the complete computational loop:
cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_z,(cufftDoubleComplex *)g_z, CUFFT_FORWARD));
mul_kernel<<< n/128,128 >>> ((double2*)g_z);
cufftSafeCall(cufftExecZ2Z(plan,(cufftDoubleComplex *)g_z,(cufftDoubleComplex *)g_z, CUFFT_INVERSE));
bt = Na[i / (8 * sizeof(UInt32))] >> (i % (8 * sizeof(UInt32))) & 1;
for(j=0;j<n1;j+=(STRIDE*STRIDE*2))
....transpose<<< grid, threads >>>((double *)&g_zz[j],(double *)&g_z[j],(int) STRIDE*2,(int) STRIDE*2);
FFTnextStepGFN_kernel<<< n1/STRIDE/ithreads,ithreads >>>((double *)g_zz,(double *)g_e1,g_f1,g_f2,base,invBase,t,n1,g_maxErr,tt,SHIFT);
FFTnextStepGFN_kernel2<<< n1/STRIDE/ithreads,ithreads >>>((double *)g_zz,(double *)g_e1,g_f1,g_f2,base,invBase,t,n1,g_maxErr,SHIFT);
for(j=0;j<n1;j+=(STRIDE*STRIDE*2))
....transpose<<< grid, threads >>>((double *)&g_z[j],(double *)&g_zz[j],(int) STRIDE*2,(int) STRIDE*2);
GPU code is in green, CPU code is in red. As you can see, the CPU does almost nothing. Everything is done on the GPU. One trivial calculation and two "for" loops. The ratio of GPU instructions to CPU instructions is probably around 1 million to 1.
At least as far as the code that is visible. What's happening inside the drivers and libraries is anyone's guess.
____________
My lucky number is 75898524288+1 | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1943 ID: 352 Credit: 5,926,042,460 RAC: 1,458,176
                                   
|
GTX 580/i5-2500/Win 7 x64 default setting.
Merged app_info with AVX, running 3 PPS LLR AVX task along GPU GFN, watching progress via VNC. GPU time 3085 secs, CPU time 270 secs.
Second WU time estimation after 20% is almost 4 hours, DCF 4.5.
Will see how it settles down along PPS LLR.
(I know I can set DCF to 1 but it will eventually adjust anyway).
Next WU it have SHIFT=7 manually, I might try to run 4 LLRs as well.
So far - good job!
____________
My stats | |
|
|
GTX 550 TI
Intel i-5 2500 K
win 7
EVGA GTX 550Ti
Core 999MHz
Memory 2100MHz
Shader 1998MHz
Nvidia Driver 290.53 (64bit)
Command line: GeneferCUDA-boinc-windows.exe -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 1.28 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 1.13 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 1.07 ms/mul. Err: 2.03e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 1.09 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 1.59 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 3.27 ms/mul. Err: 1.88e-001 1533952
digits
Command line: GeneferCUDA-boinc-windows.exe -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 2.7 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 2.32 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 2.18 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 2.14 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 2.44 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 4.1 ms/mul. Err: 1.72e-001 3020555
digits
Command line: GeneferCUDA-boinc-windows.exe -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 5.63 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 4.82 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 4.57 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 4.39 ms/mul. Err: 1.72e-001 5946413
digits
SHIFT=9 468750^1048576+1 Time: 4.34 ms/mul. Err: 1.64e-001 5946413
digits
SHIFT=10 468750^1048576+1 Time: 5.86 ms/mul. Err: 1.72e-001
5946413 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 11.6 ms/mul. Err: 1.64e-001 11703432
digits
SHIFT=6 380742^2097152+1 Time: 10.1 ms/mul. Err: 1.64e-001 11703432
digits
SHIFT=7 380742^2097152+1 Time: 9.69 ms/mul. Err: 1.64e-001 11703432
digits
SHIFT=8 380742^2097152+1 Time: 9.36 ms/mul. Err: 1.64e-001 11703432
digits
SHIFT=9 380742^2097152+1 Time: 9.02 ms/mul. Err: 1.72e-001 11703432
digits
SHIFT=10 380742^2097152+1 Time: 9.92 ms/mul. Err: 1.76e-001
11703432 digits
Command line: GeneferCUDA-boinc-windows.exe -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 60000^4194304+1 Time: 23.1 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=6 60000^4194304+1 Time: 20.2 ms/mul. Err: 6.10e-003 20041019 digits
SHIFT=7 60000^4194304+1 Time: 19.5 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=8 60000^4194304+1 Time: 18.9 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=9 60000^4194304+1 Time: 18.3 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=10 60000^4194304+1 Time: 18.1 ms/mul. Err: 5.86e-003 20041019
digits
By the way what will be maxErr allowed at b22? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
By the way what will be maxErr allowed at b22?
I expect it to remain at 0.4500.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
IMPORTANT UPDATE
As of GeneferCUDA for Windows V1.06, the default block size has been changed to the following:
For N <= 65536 the block size is 6
(This will use a lot of CPU, even a full core, but the speed increase is very significant. Neither Boinc nor PRPNet is currently using GeneferCUDA with N values this low, however.)
For 131072 <= N <= 1048576 the block size is 7
(This makes this version especially useful for PRPNet as well as Boinc.)
For N >= 2097152 the block size is 8
These changes should generally produce speed improvements on most systems. You are encouraged, however, to run the appropriate -b2 test to see which is best for your hardware.
This does not apply to the Linux distribution of GeneferCUDA. The Linux build may need different tuning than the Windows build, so Ronald is in a better position to address the best values for Linux.
____________
My lucky number is 75898524288+1 | |
|
|
These are the benchmarks from my old screwed up OS. The lag is much better at SHIFT=7 and is very acceptable. It virtually disappears except minor lags here and there on SHIFT=6. At SHIFT=5 the lag begins to return though. And CPU usage is strangely low. At SHIFT=7 it's 2-3%, at SHIFT=6 it's 5-8% and at SHIFT=5 it's at 20-24%. Oh, and at the default SHIFT=8 there is a slight improvement there now even but still beyond acceptable.
Generalized Fermat Number Bench 2 -b2 18
SHIFT=5 710492^262144+1 Time: 1.04 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=6 710492^262144+1 Time: 804 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=7 710492^262144+1 Time: 771 us/mul. Err: 2.03e-001 1533952 digits
SHIFT=8 710492^262144+1 Time: 798 us/mul. Err: 1.88e-001 1533952 digits
SHIFT=9 710492^262144+1 Time: 1.29 ms/mul. Err: 1.88e-001 1533952 digits
SHIFT=10 710492^262144+1 Time: 2.84 ms/mul. Err: 1.88e-001 1533952 digits
Generalized Fermat Number Bench 2 -b2 19
SHIFT=5 577098^524288+1 Time: 2.07 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=6 577098^524288+1 Time: 1.63 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=7 577098^524288+1 Time: 1.5 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=8 577098^524288+1 Time: 1.51 ms/mul. Err: 1.88e-001 3020555 digits
SHIFT=9 577098^524288+1 Time: 1.8 ms/mul. Err: 1.72e-001 3020555 digits
SHIFT=10 577098^524288+1 Time: 3.31 ms/mul. Err: 1.72e-001 3020555 digits
Generalized Fermat Number Bench 2 -b2 20
SHIFT=5 468750^1048576+1 Time: 4.18 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 3.31 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 3.06 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 2.95 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 3 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 4.39 ms/mul. Err: 1.72e-001 5946413 digits
Generalized Fermat Number Bench 2 -b2 21
SHIFT=5 380742^2097152+1 Time: 8.56 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=6 380742^2097152+1 Time: 6.86 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=7 380742^2097152+1 Time: 6.51 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=8 380742^2097152+1 Time: 6.23 ms/mul. Err: 1.64e-001 11703432 digits
SHIFT=9 380742^2097152+1 Time: 6.01 ms/mul. Err: 1.72e-001 11703432 digits
SHIFT=10 380742^2097152+1 Time: 7.01 ms/mul. Err: 1.76e-001 11703432 digits
Generalized Fermat Number Bench 2 -b2 22
SHIFT=5 60000^4194304+1 Time: 17.3 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=6 60000^4194304+1 Time: 13.8 ms/mul. Err: 6.10e-003 20041019 digits
SHIFT=7 60000^4194304+1 Time: 13 ms/mul. Err: 5.86e-003 20041019 digits
SHIFT=8 60000^4194304+1 Time: 12.6 ms/mul. Err: 6.35e-003 20041019 digits
SHIFT=9 60000^4194304+1 Time: 12.1 ms/mul. Err: 5.86e-003 20041019 digits
Strange that at b2 22 it quit after SHIFT=9 so no SHIFT=10 there.
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
|
IMPORTANT UPDATE
As of GeneferCUDA for Windows V1.06, the default block size has been changed to the following:
For N <= 65536 the block size is 6
(This will use a lot of CPU, even a full core, but the speed increase is very significant. Neither Boinc nor PRPNet is currently using GeneferCUDA with N values this low, however.)
For 131072 <= N <= 1048576 the block size is 7
(This makes this version especially useful for PRPNet as well as Boinc.)
For N >= 2097152 the block size is 8
These changes should generally produce speed improvements on most systems. You are encouraged, however, to run the appropriate -b2 test to see which is best for your hardware.
This does not apply to the Linux distribution of GeneferCUDA. The Linux build may need different tuning than the Windows build, so Ronald is in a better position to address the best values for Linux.
What is the current N and when will it change? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
What is the current N and when will it change?
262144 is the current N and the schedule going forward is still under discussion.
____________
My lucky number is 75898524288+1 | |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b
Generalized Fermat Number Bench
2009574^8192+1 Time: 155 us/mul. Err: 1.88e-01 51636 digits
1632282^16384+1 Time: 144 us/mul. Err: 1.72e-01 101791 digits
1325824^32768+1 Time: 172 us/mul. Err: 1.88e-01 200622 digits
1076904^65536+1 Time: 215 us/mul. Err: 1.72e-01 395325 digits
874718^131072+1 Time: 310 us/mul. Err: 1.88e-01 778813 digits
710492^262144+1 Time: 255 us/mul. Err: 1.72e-01 1533952 digits
577098^524288+1 Time: 346 us/mul. Err: 1.72e-01 3020555 digits
468750^1048576+1 Time: 525 us/mul. Err: 1.64e-01 5946413 digits
380742^2097152+1 Time: 1.46 ms/mul. Err: 1.64e-01 11703432 digits
309258^4194304+1 Time: 567 us/mul. Err: 1.56e-01 23028076 digits
100^8388608+1 Time: 8.39 ms/mul. Err: 3.81e-05 16777217 digits
real 23m3.409s
user 1m30.650s
sys 0m44.859s
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b2 18
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b2 18
Generalized Fermat Number Bench 2
SHIFT=5 710492^262144+1 Time: 1.64 ms/mul. Err: 1.88e-01 1533952 digits CPU=75
SHIFT=6 710492^262144+1 Time: 531 us/mul. Err: 1.88e-01 1533952 digits CPU=28
SHIFT=7 710492^262144+1 Time: 255 us/mul. Err: 1.72e-01 1533952 digits CPU=14
SHIFT=8 710492^262144+1 Time: 183 us/mul. Err: 1.88e-01 1533952 digits CPU=10
SHIFT=9 710492^262144+1 Time: 189 us/mul. Err: 1.88e-01 1533952 digits CPU=6
SHIFT=10 710492^262144+1 Time: 186 us/mul. Err: 1.72e-01 1533952 digits CPU=3
real 3m14.141s
user 0m30.826s
sys 0m0.512s
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b2 19
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b2 19
Generalized Fermat Number Bench 2
SHIFT=5 577098^524288+1 Time: 3.46 ms/mul. Err: 1.80e-01 3020555 digits CPU=77-79
SHIFT=6 577098^524288+1 Time: 895 us/mul. Err: 1.72e-01 3020555 digits CPU=22
SHIFT=7 577098^524288+1 Time: 349 us/mul. Err: 1.72e-01 3020555 digits CPU=9
SHIFT=8 577098^524288+1 Time: 210 us/mul. Err: 1.72e-01 3020555 digits CPU=6
SHIFT=9 577098^524288+1 Time: 194 us/mul. Err: 1.88e-01 3020555 digits CPU=4
SHIFT=10 577098^524288+1 Time: 191 us/mul. Err: 1.88e-01 3020555 digits CPU=2
real 5m9.052s
user 0m52.947s
sys 0m2.828s
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b2 20
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 8.12 ms/mul. Err: 1.72e-01 5946413 digits CPU=91
SHIFT=6 468750^1048576+1 Time: 4.3 ms/mul. Err: 1.72e-01 5946413 digits CPU=52-54
SHIFT=7 468750^1048576+1 Time: 530 us/mul. Err: 1.64e-01 5946413 digits CPU=7
SHIFT=8 468750^1048576+1 Time: 257 us/mul. Err: 1.64e-01 5946413 digits CPU=3
SHIFT=9 468750^1048576+1 Time: 188 us/mul. Err: 1.76e-01 5946413 digits CPU=2
SHIFT=10 468750^1048576+1 Time: 187 us/mul. Err: 1.88e-01 5946413 digits CPU=2
real 8m57.471s
user 1m54.707s
sys 0m26.590s
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b2 21
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b2 21
Generalized Fermat Number Bench 2
SHIFT=5 380742^2097152+1 Time: 17.9 ms/mul. Err: 1.64e-01 11703432 digits CPU=98
SHIFT=6 380742^2097152+1 Time: 13.1 ms/mul. Err: 1.68e-01 11703432 digits CPU=78
SHIFT=7 380742^2097152+1 Time: 1.46 ms/mul. Err: 1.64e-01 11703432 digits CPU=9
SHIFT=8 380742^2097152+1 Time: 343 us/mul. Err: 1.66e-01 11703432 digits CPU=3
SHIFT=9 380742^2097152+1 Time: 205 us/mul. Err: 1.56e-01 11703432 digits CPU=2
SHIFT=10 380742^2097152+1 Time: 188 us/mul. Err: 1.58e-01 11703432 digits CPU=1
real 17m28.288s
user 4m10.508s
sys 1m31.246s
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b2 22
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b2 22
Generalized Fermat Number Bench 2
SHIFT=5 309258^4194304+1 Time: 37.8 ms/mul. Err: 1.72e-01 23028076 digits CPU=100
SHIFT=6 309258^4194304+1 Time: 31.8 ms/mul. Err: 1.72e-01 23028076 digits CPU=90-92
SHIFT=7 309258^4194304+1 Time: 18 ms/mul. Err: 1.56e-01 23028076 digits CPU=51-54
SHIFT=8 309258^4194304+1 Time: 574 us/mul. Err: 1.56e-01 23028076 digits CPU=9
SHIFT=9 309258^4194304+1 Time: 294 us/mul. Err: 1.72e-01 23028076 digits CPU=1
SHIFT=10 309258^4194304+1 Time: 220 us/mul. Err: 1.72e-01 23028076 digits CPU=1
real 35m10.920s
user 10m1.898s
sys 5m7.015s
Some confusing timings like seen in all other linux-versions.
My decisions for default values in the linux-app are bolded and depended on by one thought:
If a higher shift value is not twice as fast as the lower value then we have reached an optimal value.
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
|
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b2 20
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 8.12 ms/mul. Err: 1.72e-01 5946413 digits CPU=91
SHIFT=6 468750^1048576+1 Time: 4.3 ms/mul. Err: 1.72e-01 5946413 digits CPU=52-54
SHIFT=7 468750^1048576+1 Time: 530 us/mul. Err: 1.64e-01 5946413 digits CPU=7
SHIFT=8 468750^1048576+1 Time: 257 us/mul. Err: 1.64e-01 5946413 digits CPU=3
SHIFT=9 468750^1048576+1 Time: 188 us/mul. Err: 1.76e-01 5946413 digits CPU=2
SHIFT=10 468750^1048576+1 Time: 187 us/mul. Err: 1.88e-01 5946413 digits CPU=2
real 8m57.471s
user 1m54.707s
sys 0m26.590s
If a higher shift value is not twice as fast as the lower value then we have reached an optimal value.
By this logic the shift value for b2=20 should be 8 because 257 is less than a half of 530 and so is more than twice as fast as shift value 7.
And this is consistent with the values Michael found for Windows.
Rainer
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
As I have been discussing with Ronald privately, you should ignore the benchmark results under Linux. We haven't worked the kinks out yet and the numbers you're seeing are useless. They are, plain and simple, completely wrong, unless, somehow, you've found a way to overclock your GPU to around 20 GHz or so without the chips melting into a molten pile of glass, your power supply exploding in flames, and the master circuit breaker in your house tripping. The benchmarks are reporting numbers that are approximately ten times faster than these GPUs are capable of.
The benchmark problem is not unique to the Linux version -- both Shoichiro and myself have seen this exact problem previously. Until it's solved, however, the benchmarks in the Linux version are, unfortunately, meaningless. Any pattern you see in the results is merely a coincidence.
For the time being, if you want to know what shift values are best on your system, grab a stopwatch and run GeneferCUDA from the command line computing actual numbers. Time how long it takes to do a set number of iterations at each block setting.
Chances are the defaults in the program are the best settings.
boinc@vmware2k-3:~/Cuda/test/Genefer$ time ./GeneferCUDA-boinc -b2 20
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./GeneferCUDA-boinc -b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 8.12 ms/mul. Err: 1.72e-01 5946413 digits CPU=91
SHIFT=6 468750^1048576+1 Time: 4.3 ms/mul. Err: 1.72e-01 5946413 digits CPU=52-54
SHIFT=7 468750^1048576+1 Time: 530 us/mul. Err: 1.64e-01 5946413 digits CPU=7
SHIFT=8 468750^1048576+1 Time: 257 us/mul. Err: 1.64e-01 5946413 digits CPU=3
SHIFT=9 468750^1048576+1 Time: 188 us/mul. Err: 1.76e-01 5946413 digits CPU=2
SHIFT=10 468750^1048576+1 Time: 187 us/mul. Err: 1.88e-01 5946413 digits CPU=2
real 8m57.471s
user 1m54.707s
sys 0m26.590s
If a higher shift value is not twice as fast as the lower value then we have reached an optimal value.
By this logic the shift value for b2=20 should be 8 because 257 is less than a half of 530 and so is more than twice as fast as shift value 7.
And this is consistent with the values Michael found for Windows.
Rainer
____________
My lucky number is 75898524288+1 | |
|
Neo Volunteer tester
 Send message
Joined: 28 Oct 10 Posts: 710 ID: 71509 Credit: 91,178,992 RAC: 0
                   
|
GT430 stock - 16,152.05 secs / 15.44 secs CPU
Block size 7 - GPU temp 62 degrees
Neo | |
|
|
I have run the world record wu's on my GTX590 with the block size set to zero.
Run times were about 84 hours....
So how important is the block size? Do I need to change it? | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1943 ID: 352 Credit: 5,926,042,460 RAC: 1,458,176
                                   
|
Generally, I don't think there is a need to change it.
Below in this threat you will find some real numbers from testing.
You can run some benchmarks on your configuration and then set manually if it makes any difference.
____________
My stats | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
I have run the world record wu's on my GTX590 with the block size set to zero.
Run times were about 84 hours....
So how important is the block size? Do I need to change it?
There's probably no reason for most people to ever change it.
You could run the "-b2 22" benchmarks which would tell you the actual speed for each block size setting on your hardware. If you're running under Windows, the default size is 8, so if that's the fastest on your computer, you can just leave the setting at zero. If another block size is faster, you can change the setting if you wish to get a little more speed.
Under Linux and Mac, with the new version (3.2.0-0) of Genefer that isn't released yet, the default setting for the World Record tasks is 9. So if you're running Linux, you might want to set the value to 9. I'm not sure off the top of my head what the current software uses for a default under Linux. We had trouble setting it correctly because the benchmarks weren't running correctly under Linux.
For the "short" WUs, the default setting is 7 for all platforms. You can test for the best setting for the short WUs on your hardware by running the "-b2 19" benchmark test.
____________
My lucky number is 75898524288+1 | |
|
|
Hello!
I'm preparing for upcoming GPU chelly.
I've installed new 301.xx Windows x86 NVidia drivers and I have terrible screen lags. Also there is a major increase of computation time. 3-4 times more than with 285.xx drivers.
I use default settings.
My GPUs are a pair of GTX 460 under W7 x86.
Need some advice!
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
Hello!
I'm preparing for upcoming GPU chelly.
I've installed new 301.xx Windows x86 NVidia drivers and I have terrible screen lags. Also there is a major increase of computation time. 3-4 times more than with 285.xx drivers.
I use default settings.
My GPUs are a pair of GTX 460 under W7 x86.
Need some advice!
I haven't personally tried the 301 drivers, but you're the first person that's reported a problem. I suspect that the drivers themselves aren't the problem -- or at least aren't the entire problem.
The first thing to try is to go back to the 285 drivers and see if the problem goes away. If the problem remains, then you know something else is the cause. If the problem goes away, then you know the driver has something to do with the problem.
Aside from that, the only other advice I have is that almost all other people experiencing lag problems eventually were able to trace the problem to some other software running on the computer. That software's use of the GPU (usually for normal graphics display, not CUDA computing) interacted poorly with Genefer's high usage of the GPU. I don't know, however, why you would see different behavior with the new driver.
You're running the 32-bit version of Win7? That's somewhat unusual, so if there's a problem related to only the x86 installation that could explain why you're the only one to experience this problem.
Another thing you could try -- pull one of the GTX 460s out and see if the problem goes away with only one GPU.
____________
My lucky number is 75898524288+1 | |
|
|
a tips for x86 (32 bit) windows users:
i've tested 2 things: 0 (default value) and 5 (block size).
in fact for value 0 or 5 you have the same GPU load (99%) but :) i've took a Watt meter for see if there had a difference:
my results:
-for block size value to 0: 244 Watts
-for block size value to 5:206 Watts (even if load stay at 99% it prove a real difference between this two value)
in visual default value 0 is very horrible for windows 32 bit.
5 value is more acceptable (little lags) and you can continue to use your computer for others things (web browser or other).
nb: i use a gtx 260 v2 and nvidia driver 310 beta i think use value 5 can work for other nvidia graphic card.
sorry for the bad english :)
an other thing: when i choose 0 or 5 i've the same cpu load (2%)
____________
| |
|
|
NVIDIA GeForce GTX 660 Ti (2048MB) driver: 30697
Microsoft Windows 8
x64 Edition, (06.02.9200.00)
my graphic card made problems wi8th the rhigt setting.
this is overclocked from factory
Settings : 0, 7,8, is wrong and
crash
what to do ?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13879 ID: 53948 Credit: 383,481,681 RAC: 118,164
                              
|
NVIDIA GeForce GTX 660 Ti (2048MB) driver: 30697
Microsoft Windows 8
x64 Edition, (06.02.9200.00)
my graphic card made problems wi8th the rhigt setting.
this is overclocked from factory
Settings : 0, 7,8, is wrong and
crash
what to do ?
You can unhide your computers if you want me, or anyone, to look at the details, but in general:
A) The block size has nothing to do with crashes, errors, or reliability, so you can leave it at 0 until you solve the real problem. The only purpose for this setting is to reduce screen lag.
B) You say your card is overclocked, and overclocking doesn't usually work with GeneferCUDA. You can try setting your GPU to run at stock speeds and see if that resolves the problem. The fact that it's factory overclocked doesn't matter; it's still overclocked and that usually won't work.
C) If you're having trouble running the long WR tasks, try running the short tasks instead. Some systems are experiencing difficulties running the long tasks for reasons that are not yet fully understood.
____________
My lucky number is 75898524288+1 | |
|
|
Mike,
ok
____________
| |
|
|
New Test , Block Size setting "0" with Nvidia GTX 660 Ti 3,4 - 4,0 Ghz
OS Windows 8
overclocked
Now is runing.
3 % CPU /(high ) 1,0 % CPU /(slow )
____________
| |
|
|
genefercuda 3.1.2-2 (Windows 32-bit CUDA)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2013, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefercuda.exe -q 10406^4194304+1
Generalized Fermat Number Bench 2
SHIFT=5 10406^4194304+1 Time: 13.4 ms/mul. Err: 1.68e-004 16849710 digits
SHIFT=6 10406^4194304+1 Time: 9.72 ms/mul. Err: 1.83e-004 16849710 digits
SHIFT=7 10406^4194304+1 Time: 8.99 ms/mul. Err: 1.75e-004 16849710 digits
SHIFT=8 10406^4194304+1 Time: 8.58 ms/mul. Err: 1.68e-004 16849710 digits
SHIFT=9 10406^4194304+1 Time: 8.47 ms/mul. Err: 1.75e-004 16849710 digits
SHIFT=10 10406^4194304+1 Time: 8.73 ms/mul. Err: 1.72e-004 16849710 digits
Best SHIFT determined experimentally to be 9.
Testing 10406^4194304+1...
Using AUTO-SHIFT=9
Starting initialization...
maxErr during b^N initialization = 0.0000 (63.420 seconds).
Estimated total run time for 10406^4194304+1 is 131:22:56
CPU (i7-2600K 4500MHz) usage: initialization 100% (1.00), run ~0% (~0.01).
| |
|
|
Ready
look here : http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=0&appid=17
Guys, crunch what you can
look at this credits and crunch.
( 182 hours )
363.638,98 = for one Task - the highest and best Credits of all times in BoinC , this called " World - Record !!! "
Mike,
your Credits - ( U must wait )
and now the next Task, hehe....
____________
| |
|
|
You could run the "-b2 22" benchmarks which would tell you the actual speed for each block size setting on your hardware.
I hope I'm missing something obvious, but after poking around in a few threads it's not clear to me - where can I get the latest benchmark app? The beta links are dead (probably since we are past the beta).
OK, now that I just re-read a couple of posts it appears it's just a switch for the current Genefer apps - off to go do some experimenting. | |
|
|
I believe the current "stock" app self-benchmarks when it starts a task, so there's no need to ever change the preferences page block size setting from 0. The app will pick the best block size for your box. The amount of time it takes to run the benchmarks is trivial compared to the amount of time the work unit itself will take, so that's not a concern.
The above assumes you're not running a different version with app_info.xml.
--Gary | |
|
Message boards :
Generalized Fermat Prime Search :
GeneferCUDA Block Size Setting |