Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
GFN run times
Author |
Message |
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
For reference, I figured I'd post some benchmark run times. We search for GFNs of the form b^N+1. N, by far, has the biggest impact on run time, although b does play a smaller role.
The lowest value of N that's currently being searched using GeneferCUDA is 262144 (2^18), and the largest we can reasonably search right now is 4194304 (2^22). So these are the expected run times for those values of N. Times are based on my GTX 460, which is maybe half the speed of the fastest GeForce GPU, and double the speed of the slowest GPU that can run GeneferCUDA. It's more or less dead center in the middle of the pack of potentially usable GPUs. In practice, it seems that most people have faster GPUs rather than slower, so for most people your run times should be lower than these benchmarks. If you happen to have access to Tesla GPUs I would expect substantially higher performance due to increased DP performance on those $2000 GPUs. (Amazon rents dual-Tesla servers for as little as $0.67 per hour at their spot rate.)
N=262144: 90 minutes (Available on PRPNet; currently running on Boinc)
N=524288; 5 hours (Available on PRPNet)
N=1048576; 17 hours
N=2097152; 2.5 days
N=4194304; 8 days (Eventually to be available on Boinc)
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,010,728,651 RAC: 19,979,343
                                                
|
I would argue that the GTX 460 is a bit above middle, and I would caution that, unlike the sieve applications, the number of shaders does not directly correspond to performance the same way. For example, on the 262k units, my GTX 460 gets a bit better (probably higher clocked than Mike's) at 85 minutes, but my OEM GTX 560Ti does the same work in about 65 minutes (shader clocks on the two cards are the same). In shaders, that's a 336 shader card vs. a 352 shader card (about 5% more shaders). But the performance increase is about 23%. It is likely that the memory bus width (192-bit vs. 320-bit) account for some of this.
When considering a card for the Genefer project one will need to take a more comprehensive look at other features of the cards than just shaders (e.g., DP power--Quadros and Teslas have much more than GeForce cards).
On a related note, common low end cards (e.g., the 96 shader models) are running for me at about 4 to 5 hours for the 262k units. The slowest cards that are DP capable from NVidia have only 48 shaders, and these will run much slower (probably not advised for the N=4194304 work...though Admin will need to take these into consideration when setting work/report/purge deadlines).
____________
141941*2^4299438-1 is prime!
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
On a related note, common low end cards (e.g., the 96 shader models) are running for me at about 4 to 5 hours for the 262k units. The slowest cards that are DP capable from NVidia have only 48 shaders, and these will run much slower (probably not advised for the N=4194304 work...though Admin will need to take these into consideration when setting work/report/purge deadlines).
Well, yeah. if someone want to go searching for the world's largest known prime number on a laptop, it might take a while.
I think the smallest non-mobile DP GPU (GTX 260) has 192 shaders, although I might be mistaken. There's a lot of DP GPUs.
FWIW, it seems like the majority of the GPUs being used are 500-series cards. I'm usually the slowest of the wingmen in any WU. People sure do seem to like their teraflops. ;-)
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,010,728,651 RAC: 19,979,343
                                                
|
Well, yeah. if someone want to go searching for the world's largest known prime number on a laptop, it might take a while.
[\quote]
On the low end ones, but there are actually some fairly hefty mobile options out there (e.g., GTX 485M and 580M mathc up fairly well with the GTX 560 Ti desktop model).
[quote]
I think the smallest non-mobile DP GPU (GTX 260) has 192 shaders, although I might be mistaken. There's a lot of DP GPUs.
FWIW, it seems like the majority of the GPUs being used are 500-series cards. I'm usually the slowest of the wingmen in any WU. People sure do seem to like their teraflops. ;-)
Slowest capable desktop cards are probably the GT 420 (OEM), GeForce 510 (OEM), and GT 520...all have 48 shaders and should run at about half what a GT430/GT530 will do (well, the 510 card has really slow stock clocks, so even less than half might be more accurate). I think most will avoid using these cards on such a big search, but since many of the low end cards are OEM, many new users will likely give it a try with what they've got.
I'd suggest that, much like the PRPnet setup, the Genefer task be divided as separate projects based on N when we move out of beta into production mode. I might not mind running the biggest ones on my 460 or 560, but even on those (or my GTS 450) running the 1048576 or 2097152 separately would be a nice option.
____________
141941*2^4299438-1 is prime!
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Slowest capable desktop cards are probably the GT 420 (OEM), GeForce 510 (OEM), and GT 520...all have 48 shaders and should run at about half what a GT430/GT530 will do (well, the 510 card has really slow stock clocks, so even less than half might be more accurate). I think most will avoid using these cards on such a big search, but since many of the low end cards are OEM, many new users will likely give it a try with what they've got.
The 420 is listed as CC 1.0. That doesn't sound right. There's no mention of a 510 that I could find, but the 520 does indeed have 48 shaders.
So point to Scott. :)
I'd suggest that, much like the PRPnet setup, the Genefer task be divided as separate projects based on N when we move out of beta into production mode. I might not mind running the biggest ones on my 460 or 560, but even on those (or my GTS 450) running the 1048576 or 2097152 separately would be a nice option.
That was my suggestion also, but it seems that it's problematic. Boinc wasn't designed to run multiple projects like this and the number of projects PrimeGrid already has is stressing the server. There's probably going to only be one GFN project on the BOINC side. I suppose we can always direct people with slower cards over to the PSA.
I imagine if there's a problem with lots of people with laptops with minimal GPUs that only run 4 hours per day grabbing WUs, and thus causing lots of result timeouts, I could change the program to abort if some minimum spec isn't met. Perhaps it could be done on the server side, but I don't know if they can be that specific with the plan_class mechanism.
Oh, my... you know, I didn't realize that such slow DP cards existed. When I first started working on this, I thought the 260 was the smallest DP card.
I wonder if those slower cards have enough memory to run Genefer?
Let's see...
The 420 -- this can't be right! Not only does Nvidia list it as CC 1.0, but it also supposedly has 2GB of ram?
The 430 has 2GB also in the OEM confiuration, 1GB otherwise.
The 520 has 1GB.
The minimum required by the plan_class is half a gig, so the cards would pass that test.
Ah, I found the 510. It's not on their master list for some reason.
I guess this falls into the same category as selecting SoB to run on your Netbook. Probably not a good combination.
____________
My lucky number is 75898524288+1 | |
|
|
Was just wondering what your shaders were clocked too since my current jobs are taking on average 81 minutes on a 460 at 1600 mhz shaders. My 460 has 1 meg of memory and no jobs running on the cpus.
on a side note. your listed times of 8 days for future work sounds rather daunting. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Mine runs at stock clocks.
____________
My lucky number is 75898524288+1 | |
|
|
I'm running 2 types of the low end DP cards on my hosts. Both tested with boinc genefercuda 262144.
GT520:
http://www.primegrid.com/show_host_detail.php?hostid=193090
Runtime: 30,936.67 sec
GT430:
http://www.primegrid.com/show_host_detail.php?hostid=151963
Runtime: 27,022.58 sec
The 430 has 2GB also in the OEM confiuration, 1GB otherwise.
I purchased one with only 512MB by Zotac. Maybe its owed because it's a PCI-Version, not PCIe. Reference
Ah, I found the 510. It's not on their master list for some reason.
I think because this type is a new one. DP capabilities round about 8Gflops. My GT520 have 13Gflops DP cap.
The 420 -- this can't be right! Not only does Nvidia list it as CC 1.0, but it also supposedly has 2GB of ram?
The GT420 have half DP capabilities of my GT430 (22 GFlops), round about 11GFlops.
Regards Odi
____________
| |
|
|
I'd suggest that, much like the PRPnet setup, the Genefer task be divided as separate projects based on N when we move out of beta into production mode. I might not mind running the biggest ones on my 460 or 560, but even on those (or my GTS 450) running the 1048576 or 2097152 separately would be a nice option.
That was my suggestion also, but it seems that it's problematic. Boinc wasn't designed to run multiple projects like this and the number of projects PrimeGrid already has is stressing the server. There's probably going to only be one GFN project on the BOINC side. I suppose we can always direct people with slower cards over to the PSA.
I would suggest to admin to have 2 check boxes in preferences, 1 for lower n (262144, 524288) and the second for higher n (1048576 and up) and a note recommending the higher n's for GTX cards (and GTS450) and that they are preferred as smaller/slower cards may not be able to finish before time expires. That will give the faster cards a choice as well. Should be little problem for the server as it's similar to adding just 2 more sub projects instead of one. Doing that instead of dividing each n up for choice (although that would be nice) as that would definitely strain the server much more.
By the way my 460 is OC to 1900 and finishes the 262144s in 72 minutes.
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3246 ID: 50683 Credit: 152,646,050 RAC: 18,212
                         
|
Excuse me if I am wrong, but it looks like that on same range (262144) my card is faster on Boinc ( 64 minutes per WU) compared to 90 minutes per WU on PRPNet.
Is this new app 1.05 really faster of it is some other factors that influence on speed?
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Excuse me if I am wrong, but it looks like that on same range (262144) my card is faster on Boinc ( 64 minutes per WU) compared to 90 minutes per WU on PRPNet.
Is this new app 1.05 really faster of it is some other factors that influence on speed?
I'm just that good.
Just kidding, of course -- but the boinc version of the app IS faster than the PRPNet version, and it's fully compatible, so feel free to copy the Boinc version over to the PRPNet directories and use it instead. You'll need to rename it to GeneferCUDA.exe. That's what I do.
Details:
The Boinc version is compiled with an older CUDA toolkit, which is about 5% to 10% faster than the most recent toolkit. That's where most of the difference in speed comes from.
I also rewrote the initialization of GeneferCUDA, whish was fine at lower Ns but not so good at bigger Ns. At N=262144 this saves about 30 seconds. At N=4194304 it saves two hours.
Lastly -- and this one won't work on PRPNet yet, sorry -- the boinc version has a tuning option that can add some additional speed. On my computer, for example, changing the block size parameter from its default (8 or 0) to 7 reduces run time on my machine by 5 minutes. Boinc can take advantage of this, and you can use this from the command line if you're running Genefer CUDA by hand, but PRPNet currently has no way to take advantage of this.
By the way, it's not just 1.05 that's faster. V1.00 of the Boinc app had the same CUDA toolkit speed advantage. The initialization and block size speed improvements came in versions 1.03 and 1.05, respectively.
That being said, I can't imagine how the boinc version could possibly cut 26 minutes off the 90 minute run time for one of these WUs. All told, you might get 10 or 15 minutes, but not 26. I'm not sure what's going on there. Your computers are hidden so I can't really look into it. 90 minutes is roughly how long it takes mid-range GPUs to do these WUs (GTX 460), whereas the fastest GeForce cards (GTX 580) complete them in about 50 to 60 minutes.
____________
My lucky number is 75898524288+1 | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3246 ID: 50683 Credit: 152,646,050 RAC: 18,212
                         
|
We talk before about this: and I know that cuda 3.2 is faster then cuda 4.0, and I revert even older drivers . But that was my run times: maybe I am wrong, but it is simple to make test : i will use "new 1.05 app" and app from GeneferPRPNET and compare : on same machine, on same os, with same card ad sii. Maybe I was wrong about times.
When I revert to cuda3.2 and older drivers I got something like 5 minutes per WU, but when yesterday first GENFERCUDA was finished I was surprised with result of 3996 second per WU.
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
I revert even older drivers .
The drivers don't seem to matter that much. Early indications that one driver was faster than another were due to idiosyncrasies in the becnchmarks; the drivers don't seem to have any effect on speed. At least not any of the drivers I've tested.
By the way, when I want to to do timing tests, I don't use either Boinc or PRPNet. To get the test as accurate as possible, I run Genefer from the command line. This way all the tests use the exact same number to test, so the results are as accurate as possible.
I don't get any credit, of course, but getting more accurate information is more important to me.
____________
My lucky number is 75898524288+1 | |
|
|
Maybe this info will be useful for you:
Several months ago when I was testing llrcuda @ 2xGTX460, I found one very interesting thing:
When I had started 2 tasks at the same time, one of the windows was active, one - inactive. And what's interesting: task in active window was running faster.
So, I guess possibly GeneferCuda@PRPNet spent more time because task was running in inactive window.
____________
| |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3246 ID: 50683 Credit: 152,646,050 RAC: 18,212
                         
|
I found "old logs" from 27.1.2011 when I run GeneferCuda from PRPNet
Run time with Cuda 4.0 was around 4130 seconds per WU , but switched to Cuda 3.2 dropped to 3985 sec per WU
So you was right, I was wrong, computation times difference in only 30 sec at top...
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,010,728,651 RAC: 19,979,343
                                                
|
I have been going through results and compiling a list of performance data (now that the 1.05 version reports cards, shaders, clocks, etc. in the output file--Thanks for that Mike!)...I'll post a table of times by card when I get chance. In any case, here are some rules of thumb on the perfomance side of things (note: these are based on performance of 580k by 262k work):
1) The most important thing (as is also true with sieve GPU apps) is the number of shaders. More is better. However, compared to the sieve apps where this is overwhlemingly the big factor, it is not as dominant as a performance indicator with the Genefer app (for example, I have a 192 shader card with shorter times than some 336 shader cards and a 352 shader card that is better than several cards with 448 shaders--even better than a 480 shader card in one case).
2) Next most important thing is the shader clock. Higher clocks = better performance, but be careful here as overclocking can cause problems (for example, my superclocked EVGA GTX 550 Ti is completing work in about 1 minute less time than a stock clocked GTX 460--i.e., the much higher shader clock makes up for a difference of 144 fewer shaders).
3) Double-Precision capability of the card is also tremendously important (unlike with the sieve apps where single-precision is all that matters). Generally, on consumer cards this won't be relevant since the DP capability is a function of the number of shaders x shader clock. Basically, on Fermi cards, the DP capability is one-eighth of the SP capability (if you are using one of the GTX 2xx cards or Quadro or Tesla cards based on these chips, the number to use is one-twelfth of the SP capability). However, Fermi-based Quadro and Tesla cards have DP capability equal to one-half the SP capability. This results in slower clocked cards with fewer shaders performing as well as much more powerful (at least in SP capabiity) cards. For example, a 256 shader Quadro 4000 card clocked at only 1110 shader clock performs almost exactly the same as a 384 shader GTX 560 Ti with shader clocks at 1645.
4) Picking the best "Shift" value is also very important. In general, with the 262k units, I have observed improvements of as little 250 fewer seconds to as much as 1000 fewer seconds per work unit depending on the card involved when changing this value from 8 to 7. Keep in mind that shift values will need to be adjusted at different Genefer N (generally, higher N is better with higher shift value). For example, I have observed a GTX 570 (480 shaders, 1566 clock) outperform a GTX 580 (512 shaders, 1566 clock) by about 100 seconds per workunit by using shift=7 on the former vs. shift=8 on the latter. The key advice here--find the optimal shift value for your card at the given N value being tested.
5) Memory Bus-width may also matter (more is better again), but the results here are much less conclusive. It may be the case that some threshold process is involved here more than a linear process. Or it may be the case that certain ranges group together in performance (e.g., not much difference between 192-bit and 256-bit, but a bit of a jump at 320-bit).
**Note shader clocks are reported here as listed in Genefer output. To get actual shader clock, use the closest multiple of 54 that does not exceed the reported shader clock (i.e., shader clock reported as 1645 is actually clocked at 1620).***
____________
141941*2^4299438-1 is prime!
| |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3246 ID: 50683 Credit: 152,646,050 RAC: 18,212
                         
|
Lastly -- and this one won't work on PRPNet yet, sorry -- the boinc version has a tuning option that can add some additional speed. On my computer, for example, changing the block size parameter from its default (8 or 0) to 7 reduces run time on my machine by 5 minutes. Boinc can take advantage of this, and you can use this from the command line if you're running Genefer CUDA by hand, but PRPNet currently has no way to take advantage of this.
WOW!
I got nice increase from 3996 seconds to 3515 = 481 seconds or 8 minuter per WU when drop Block size from 8 -->7
Will do more testing to see is this optimal value for my GPU.
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
|
**Note shader clocks are reported here as listed in Genefer output. To get actual shader clock, use the closest multiple of 54 that does not exceed the reported shader clock (i.e., shader clock reported as 1645 is actually clocked at 1620).***
If people want to see their actual shader clock use nVidia Inspector. It will show that the actual shader clock along with the user's set clock. It's a no frills app with just the basic info but does have a performance graph if you click the second button in the upper left corner. You can download it at http://downloads.guru3d.com/NVIDIA-Inspector-1.94-download-2612.html
.
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
|
http://www.primegrid.com/result.php?resultid=343013945
33,763 seconds on a Xeon CPU X5365 @ 3.00GHz (Core2 based)
____________
Reno, NV
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
For reference, here are my estimated run times for larger N values where I've done some test runs. This supersedes the first post in this thread as I'm now including several different estimates at various b's, since it does vary somewhat, especially at the lowest b values.
All times are on a stock GTX 460. The fastest GPUs right now are almost twice as fast. The slowest GPUs (anything below a 450) are a lot slower. The very fastest, most expensive CPUs (with HT off and all other cores idle for maximum turbo boost) are probably about four times slower. YMMV.
N=1048576
(max b will be about 700,000)
b=14 4.5 hours
b=75898 20 hours
b=468750 22 hours
N=2097152
(max b will be about 570,000)
b=14 18 hours
b=75898 80 hours
b=380742 89 hours
N=4194304
(max b will be about 480,000)
b=14 72 hours
b=1248(*) 192 hours (12,986,467 digits) (actual run time; not an estimate)
b=309258 351 hours
N=8388608 (No search is currently planned at this N)
(max b will be about 405,000; fast init code currently limits b to about 100)
b=14 288 hours
b=36(**) 396 hours (13,055,212 digits)
b=100 522 hours
(*) This would be the world's largest known prime, except for one detail: it's composite. This is where we're in world record territory.
(**) If prime, this would be a new world record.
____________
My lucky number is 75898524288+1 | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1963 ID: 352 Credit: 6,402,379,339 RAC: 2,512,564
                                      
|
All times are on a stock GTX 460. The fastest GPUs right now are almost twice as fast. GTX 580 at stock speed is exactly twice as fast as GTX 460.
It looks like N=1048576 is where GPU is really starting to shine and is very efficient. From there, every next n makes run time ~4X longer.
____________
My stats | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,010,728,651 RAC: 19,979,343
                                                
|
All times are on a stock GTX 460. The fastest GPUs right now are almost twice as fast. GTX 580 at stock speed is exactly twice as fast as GTX 460.
It looks like N=1048576 is where GPU is really starting to shine and is very efficient. From there, every next n makes run time ~4X longer.
I am doubtful that these ratios among cards will hold up as we increase N. This is because I suspect that bus-width is going to come into play much more at the larger N (but I could very well be wrong). If so, the ratio for the 580 to the 460 will be more than twice as fast (and slower cards including the 450 will be proportionally worse). No matter what, the fairly modest speed differences at our current 262k test are going to become much more evident at these big N (i.e., a 5 minute difference at 262k is going to be measured in hrs when we are doing the 4m units).
____________
141941*2^4299438-1 is prime!
| |
|
|
Does the app checkpoint? Just curious.
____________
Reno, NV
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Yes, and it checkpoints immediately when you shut it down, so you don't lose any processing time because the last checkpoint was an hour ago.
____________
My lucky number is 75898524288+1 | |
|
axnVolunteer developer Send message
Joined: 29 Dec 07 Posts: 285 ID: 16874 Credit: 28,027,106 RAC: 0
            
|
N=8388608 (No search is currently planned at this N)
(max b will be about 405,000; fast init code currently limits b to about 100)
b=14 288 hours
b=36(**) 396 hours (13,055,212 digits)
b=100 522 hours
Note that all b <= 692, N=8388608 can be tested with B=b^2 and N=4194304
PS:- 692^2 = 478864 < 480000 | |
|
|
Anyone else have this experience? I have been running just gfn's and no cpu jobs for some time now. Today I started up 1 core up on some pps llr's but left the other core free. My runtimes on the gfn's jumped by 100+ seconds immediately and I had a unit error out. I'm pretty sure now that the cpu usage or lack affects gfns. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Anyone else have this experience? I have been running just gfn's and no cpu jobs for some time now. Today I started up 1 core up on some pps llr's but left the other core free. My runtimes on the gfn's jumped by 100+ seconds immediately and I had a unit error out. I'm pretty sure now that the cpu usage or lack affects gfns.
Your GPU is running at 1900 MHz, and I'll assume that this is overclocked by a substantial amount.
Running one core will raise the CPU temperature, and if you have a normal cooling arrangement the CPU heat gets blown into the computer case.
Even though the CPU is usually above the GPU and the airflow should be blowing the hot air out the top via the power supply, it's not a stretch to believe that that the GPU might be running slightly warmer with one CPU core running than with no cores running. The difference in temperature, combined with the clock rate, could be the cause of the problem.
Lowering the clock speed and or raising the GPU fan speed may help.
I don't have a plausible explanation as to why the run time would increase, however. As long as you left a core free, it shouldn't matter what you're doing on the other cores. One thing about using production work to compare timings: regardless of what project you're running, not all WUs are created equal. Even though the numbers may be similar, there could be significant processing differences. The best way to measure the effects of changing environmental conditions is to use a controlled test: Either run a complete WU manually from the command line, or run either the -b or -b2 benchmark tests from the command line. That way you're comparing apples to apples and you can sure that the differences you're seeing aren't caused by differences in the WUs.
EDIT: Actually, I can think of a scenario where it's possible that one core could affect the GPU if the cache is being saturated by that one core, causing the GPU to experience more cache misses. This effect would vary greatly between different CPU models. Since we have experienced a similar scenario where running LLR on all cores causes LLR to slow down (which I'm guessing is also due to the extra cache misses), this might account for what you're seeing.
____________
My lucky number is 75898524288+1 | |
|
|
Been awhile since I ran any gfn's so I was wondering if there was an update on current run times for both the short and long units. Specifically for the gtx 460 and gtx 570 (I would assume that the card(s) have been set to stock or near stock).
____________
@AggieThePew
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Been awhile since I ran any gfn's so I was wondering if there was an update on current run times for both the short and long units. Specifically for the gtx 460 and gtx 570 (I would assume that the card(s) have been set to stock or near stock).
For the 460, roughly 200 hours for the long and 5 hours for the short. gtx 580 times are roughly half of that, so your 570 will be somewhere between the two.
Short WU times aren't going to change much in the future, at least until we've exhausted the n=19 search space and (presumably) move the short WUs to n=20.
Long WUs times will continue to get longer, but not nearly as rapidly as they've done in the past.
____________
My lucky number is 75898524288+1 | |
|
|
Been awhile since I ran any gfn's so I was wondering if there was an update on current run times for both the short and long units. Specifically for the gtx 460 and gtx 570 (I would assume that the card(s) have been set to stock or near stock).
For the 460, roughly 200 hours for the long and 5 hours for the short. gtx 580 times are roughly half of that, so your 570 will be somewhere between the two.
Short WU times aren't going to change much in the future, at least until we've exhausted the n=19 search space and (presumably) move the short WUs to n=20.
Long WUs times will continue to get longer, but not nearly as rapidly as they've done in the past.
Thanks Mike :)
I will add that your observation on the warm weather affecting the GPU is true and not just for the GFN even if they are more prone to failure. | |
|
|
I ran some of the short units over the last few days with both the default "shift" and with it explicitly set to 8 in preferences. Overall runtime on the GPU (570) didn't seem to change, maybe the default was 1% faster. CPU usage was a different story. At the default (0, meaning 7) CPU seemed quite high, about 20% of 1 of 4 physical cores (HT off on a 2600K). With shift=8, CPU usage was roughly half of that. Is this expected or not?
The few World Record units that I've done, I've left at the default (meaning 9, I think). CPU usage is nice and low... maybe 2% of one core.
This is on Linux with 295.xx driver.
--Gary | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
There's a lot of variables, but, yes, higher block sizes can cause lower CPU utilization.
____________
My lucky number is 75898524288+1 | |
|
Message boards :
Generalized Fermat Prime Search :
GFN run times |