Author |
Message |
|
Double precision on this looks great...
1/3 FP32 compared to 1/24 FP32 for the GTX 680 / 690
http://www.anandtech.com/show/6760/nvidias-geforce-gtx-titan-part-1
|
|
|
|
Double precision wise, to unlock full performance you must open the NVIDIA Control Panel, navigate to “Manage 3D Settings”. In the Global Settings box you will find an option titled “CUDA – Double Precision”, but... GeForce GTX Titan runs at reduced clock speeds when full double-precision is enabled. Still a great option if you are working on CUDA applications. |
|
|
|
1.5TFlops DP performance which is 7.5x the speed of a GTX 580.
If that translates directly to real world performance then it will be able to run GFN short tasks in around an hour and world record tasks in around 12 hours. Pretty amazing stuff. |
|
|
|
1.5TFlops DP performance which is 7.5x the speed of a GTX 580.
If that translates directly to real world performance then it will be able to run GFN short tasks in around an hour and world record tasks in around 12 hours. Pretty amazing stuff.
You probably have not read this topic: Tesla K20.
Estimated total run time for 9000^4194304+1 is 81:59:59
Tesla K20 power is 1.17TFlops DP, and Titan have 1.3TFlops. See what will actually. |
|
|
|
You probably have not read this topic: Tesla K20.
I have, that's why I said
If that translates directly to real world performance
with the emphasis being on *IF*
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
The "turn on all the DP and slow down the clock" makes some sense.
Tesla cards have full DP -- and slower clocks.
Run on 4xx and 5xx cards with Genefer and the cards are rather unreliable at high clock speeds.
It seems very consistent.
Anyway, I'm looking forward to seeing some real world tests on this new GPU. I have to say I'm greatly relieved -- after the disappointment with the direction Nvidia took with the 680, I was afraid they were abandoning double precision altogether in the GeForce cards.
____________
My lucky number is 75898524288+1 |
|
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3145 ID: 130544 Credit: 2,210,990,431 RAC: 159,618
                      
|
TI have to say I'm greatly relieved -- after the disappointment with the direction Nvidia took with the 680, I was afraid they were abandoning double precision altogether in the GeForce cards.
Me too. |
|
|
|
If that translates directly to real world performance
with the emphasis being on *IF*
Well, it does not. :-)
|
|
|
|
If that translates directly to real world performance
with the emphasis being on *IF*
Well, it does not. :-)
why do you say that?
in part 2 of the first linked anandtech review, the gtx titan is about 3.4x times faster in double precision fft calculations than the gtx 680.
GFlops / s
gtx titan 222
gtx580 89
gtx680 65
i would say this is indeed very promising.
http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled/3
____________
|
|
|
|
Well, it does not. :-)
why do you say that?
in part 2 of the first linked anandtech review, the gtx titan is about 3.4x times faster in double precision fft calculations than the gtx 680.
GFlops / s
gtx titan 222
gtx580 89
gtx680 65
They are not running primegrid code. These numbers are very specific to any test program you throw at it.
I own a K20 and having run the benchmarked software you can see what the actual performance is in the k20 thread.
It is faster than the GTX 580 but definitely does not scale linearly by the theoretical numbers of Nvidia. Do not trust those numbers. |
|
|
|
I own a K20 and having run the benchmarked software you can see what the actual performance is in the k20 thread.
but you don't own a titan. As they're aimed at different markets who's to say they are going to perform the same? Certainly there was the theory on here that it's the ecc memory that slows the k20 down relative to the consumer cards.
Has anyone managed to do a comparison between them yet?
|
|
|
|
I own a K20 and having run the benchmarked software you can see what the actual performance is in the k20 thread.
but you don't own a titan.
Please do buy one so we can compare. I'm just trying to get the message across that it will not perform as you hope it will.
Certainly there was the theory on here that it's the ecc memory that slows the k20 down relative to the consumer cards.
I tested that, you can switch it off on the K20. It makes no real world difference.
|
|
|
|
Please do buy one so we can compare.
I'm sure someone will but it won't be me.
I'm just trying to get the message across that it will not perform as you hope it will.
Even if it does performed at the claimed speed at £825 it's more than I'm willing to spend on a gpu so to be honest I'm not fussed either way. I just thought it was interesting to theorise how fast it may run GFN while ignoring the Tesla times that for some reason don't match the claims.
|
|
|
|
I just thought it was interesting to theorise how fast it may run GFN while ignoring the Tesla times that for some reason don't match the claims.
The theoretical performance is just that, a marketing tool from the manufacturer to entice us to buy their product over that of the competition.
The performance is very dependant on the way you match the problem to the hardware. IF that can be matched. In the case of GFN, that match seems not to take place. I have spent quite a few days trying to improve it in various prescribed and unorthodox ways. It does not scale. Please check the times reported in my work packages against all the other cards and you will see the real world performance.
My conclusion is that this product does not deliver on this problem. |
|
|
|
Well they're due to arrive in the middle of next week so hopefully someone will be able to give us some real numbers. |
|
|
|
So, I wonder what the plan is for NVIDIA card releases of the immediate future? They haven't called it a 780 or... you know, any by any other name/number system where they can release cards clearly identifiable as superior or inferior to each other. It's just a one-off to cater for the middle ground between someone who is solely a gamer, and someone who is wanting a card solely for a professional workstation, then? Fair play to them, as j.sheridan says, though, the price is a little bit prohibitive for most (like me). I did find an ASUS branded one available to pre-order on one of the big/usual/trusted sites* for $1000, which is £656, though; which is about £100 cheaper than an ASUS branded GTX 690. A 690 has a TDP of 300W, too.
*I assume linking to a product on the forum isn't the done thing, but I'm sure you could hatch a fresh plan and find it for yourselves anyway. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
*I assume linking to a product on the forum isn't the done thing, but I'm sure you could hatch a fresh plan and find it for yourselves anyway.
If what you're linking to a computer product, go ahead. People do it all the time. There's nothing in the rules (convieniently right there on the left as I write this) that prohibits linking to a retailer's page for a GPU for the purpose of discussing new hardware, or to help people find better deals on computer hardware.
Commercial advertising is prohibited, but discussing sales of products of interest to our members is ok.
____________
My lucky number is 75898524288+1 |
|
|
|
There's nothing in the rules (convieniently right there on the left as I write this)
Ah, sorry. This was the link:
http://www.newegg.com/Product/Product.aspx?Item=N82E16814121724
If anyone is considering buying a card for both gaming and crunching Genefer (or anything else on PG that requires/will require DP), then that surely renders the 690 obsolete. |
|
|
|
Too bad the folks that benchmark cards don't run them on pg jobs. Be nice to see results from those. |
|
|
|
Exactly what I was thinking - I remember a while ago one hardware review site did do a benchmark using PG software. I don;t think it was AnandTech (though it may have been), and I think it was only to demonstrate the disappointing DP performance when the 680 launched.
Anyway, there is a part 2 to that review now, if you've not already read it, with Page 3 and Page 4 being the most relevant to us. There is an FFT DP benchmark there, at least.
In short: Want one, can't afford one. |
|
|
axnVolunteer developer Send message
Joined: 29 Dec 07 Posts: 285 ID: 16874 Credit: 28,027,106 RAC: 0
            
|
Anyway, there is a part 2 to that review now, if you've not already read it, with Page 3 and Page 4 being the most relevant to us. There is an FFT DP benchmark there, at least.
Based on those numbers there and applying some SWAG, I estimate that Titan would be 2x faster than 580 on GFN code. It would have been _a lot_ faster, except memory will be a bottleneck. |
|
|
|
I have one ordered and should be available 28-feb. I will test it on genefer and will see. |
|
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3145 ID: 130544 Credit: 2,210,990,431 RAC: 159,618
                      
|
Wowsers. |
|
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3145 ID: 130544 Credit: 2,210,990,431 RAC: 159,618
                      
|
Serious DP annihilation:
http://www.tomshardware.com/reviews/geforce-gtx-titan-performance-review,3442-10.html |
|
|
|
Serious DP annihilation:
http://www.tomshardware.com/reviews/geforce-gtx-titan-performance-review,3442-10.html
I wouldn't touch this card until they fix the driver crash issues mentioned in the article. Oh, and shave about $700 off the price :-)
--Gary |
|
|
|
First test with genefer short.
W7 64b + Driver 314.09 (i7-3770 3.4GHz), Boinc 6.12.34
Double precission enabled. Shift=7
Standard Clocks:
genefer errors in about 1% on maxerr exceeded.
Memory clock lowered -100Mhz (~2800MHz)
genefer is stable and still working. Estimated time 12 hours. Edit: errors in 6%. |
|
|
|
First test with genefer short.
W7 64b + Driver 314.09 (i7-3770 3.4GHz), Boinc 6.12.34
Double precission enabled. Shift=7
Standard Clocks:
genefer errors in about 1% on maxerr exceeded.
Memory clock lowered -100Mhz (~2800MHz)
genefer is stable and still working. Estimated time 12 hours. Edit: errors in 6%.
hmmm not all that impressive unless I'm misreading something. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
First test with genefer short.
W7 64b + Driver 314.09 (i7-3770 3.4GHz), Boinc 6.12.34
Double precission enabled. Shift=7
Standard Clocks:
genefer errors in about 1% on maxerr exceeded.
Memory clock lowered -100Mhz (~2800MHz)
genefer is stable and still working. Estimated time 12 hours. Edit: errors in 6%.
The most useful thing you could do is to run "genefercuda -b" from the command line. Change the name of the executable to whatever it's called on your system -- if you're running on boinc, it's some ridiculously long name.
Those benchmark times are easy to compare to benchmarks from other systems, and you're running the exact same tests (as opposed to measuring how long a boinc task runs, where each task is a different length).
____________
My lucky number is 75898524288+1 |
|
|
|
-b
Generalized Fermat Number Bench
2009574^8192+1 Time: 140 us/mul. Err: 0.2031 51636 digits
1632282^16384+1 Time: 170 us/mul. Err: 0.1592 101791 digits
1325824^32768+1 Time: 180 us/mul. Err: 0.1797 200622 digits
1076904^65536+1 Time: 210 us/mul. Err: 0.1875 395325 digits
874718^131072+1 Time: 330 us/mul. Err: 0.1602 778813 digits
710492^262144+1 Time: 480 us/mul. Err: 0.1797 1533952 digits
577098^524288+1 Time: 800 us/mul. Err: 0.1797 3020555 digits
468750^1048576+1 Time: 1.6 ms/mul. Err: 0.1641 5946413 digits
then CUFFT error 6
-b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 4.33 ms/mul. Err: 1.60e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 1.91 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 1.6 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 1.6 ms/mul. Err: 1.65e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 1.85 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 2.85 ms/mul. Err: 1.72e-001 5946413 digits
-b3
ends with CUFFT error 6 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
then CUFFT error 6
Does the new Genefercuda executable (see this post) help with that error?
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
If that's with the double precision boost enabled, this is certainly disappointing. Those numbers don't look that much better than a 580.
____________
My lucky number is 75898524288+1 |
|
|
|
-b
Generalized Fermat Number Bench
2199064^8192+1 Time: 140 us/mul. Err: 0.2344 51956 digits
1798620^16384+1 Time: 164 us/mul. Err: 0.2188 102481 digits
1471094^32768+1 Time: 179 us/mul. Err: 0.2500 202102 digits
1203210^65536+1 Time: 212 us/mul. Err: 0.2352 398482 digits
984108^131072+1 Time: 322 us/mul. Err: 0.2188 785521 digits
804904^262144+1 Time: 479 us/mul. Err: 0.2227 1548156 digits
658332^524288+1 Time: 781 us/mul. Err: 0.2500 3050541 digits
538452^1048576+1 Time: 1.6 ms/mul. Err: 0.2031 6009544 digits
440400^2097152+1 Time: 2.97 ms/mul. Err: 0.2051 11836006 digits
360204^4194304+1 Time: 5.78 ms/mul. Err: 0.2167 23305854 digits
294612^8388608+1 Time: 12.8 ms/mul. Err: 0.1797 45879398 digits
-b2 20
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 4.36 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 1.91 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 1.6 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 1.61 ms/mul. Err: 1.70e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 1.85 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 2.85 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=11 468750^1048576+1 Time: 1.6 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=12 468750^1048576+1 Time: 1.6 ms/mul. Err: 1.56e-001 5946413 digits
-b3 looks buggy
genefercuda 3.1.0-0 (Windows 32-bit CUDA)
Copyright 2001-2013, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2012, Michael Goetz, Ronald Schneider
Copyright 2011-2013, Iain Bethune
Using default SHIFT value=7
14^262144+1 300451 digits 0 days 0.0 hours (0.00 ms/mul, 262145 iterations) 0 GFLOPS
Using default SHIFT value=7
75898^262144+1 1279324 digits 0 days 0.0 hours (0.00 ms/mul, 262158 iterations) 0 GFLOPS
Using default SHIFT value=7
468750^262144+1 1486604 digits 0 days 0.0 hours (0.00 ms/mul, 262160 iterations) 0 GFLOPS
Using default SHIFT value=7
815000^262144+1 1549575 digits 0 days 0.0 hours (0.00 ms/mul, 786447 iterations) 0 GFLOPS
Using default SHIFT value=7
14^524288+1 600902 digits 0 days 0.0 hours (0.00 ms/mul, 524289 iterations) 0 GFLOPS
Using default SHIFT value=7
75898^524288+1 2558647 digits 0 days 0.0 hours (0.00 ms/mul, 524302 iterations) 0 GFLOPS
Using default SHIFT value=7
468750^524288+1 2973207 digits 0 days 0.0 hours (0.00 ms/mul, 524304 iterations) 0 GFLOPS
Using default SHIFT value=7
710000^524288+1 3067745 digits 0 days 0.0 hours (0.00 ms/mul, 2097166 iterations) 0 GFLOPS
Using default SHIFT value=7
14^1048576+1 1201803 digits 0 days 0.0 hours (0.00 ms/mul, 1048577 iterations) 0 GFLOPS
Using default SHIFT value=7
75898^1048576+1 5117293 digits 0 days 0.0 hours (0.00 ms/mul, 1048590 iterations) 0 GFLOPS
Using default SHIFT value=7
468750^1048576+1 5946413 digits 0 days 0.0 hours (0.00 ms/mul, 1048592 iterations) 0 GFLOPS
Using default SHIFT value=7
700000^1048576+1 6129030 digits 0 days 0.0 hours (0.00 ms/mul, 5242893 iterations) 0 GFLOPS
Using default SHIFT value=8
14^2097152+1 2403605 digits 0 days 0.0 hours (0.00 ms/mul, 2097153 iterations) 0 GFLOPS
Using default SHIFT value=8
75898^2097152+1 10234585 digits 0 days 0.0 hours (0.00 ms/mul, 2097166 iterations) 0 GFLOPS
Using default SHIFT value=8
380742^2097152+1 11703432 digits 0 days 0.0 hours (0.00 ms/mul, 2097168 iterations) 0 GFLOPS
Using default SHIFT value=8
570000^2097152+1 12070945 digits 0 days 0.0 hours (0.00 ms/mul, 8388622 iterations) 0 GFLOPS
Using default SHIFT value=8
14^4194304+1 4807210 digits 0 days 0.0 hours (0.00 ms/mul, 4194305 iterations) 0 GFLOPS
Using default SHIFT value=8
1248^4194304+1 12986466 digits 0 days 0.0 hours (0.00 ms/mul, 20971524 iterations) 0 GFLOPS
Using default SHIFT value=8
10000^4194304+1 16777217 digits 0 days 0.0 hours (0.00 ms/mul, 16777224 iterations) 0 GFLOPS
Using default SHIFT value=8
50000^4194304+1 19708909 digits 0 days 0.0 hours (0.00 ms/mul, 16777226 iterations) 0 GFLOPS
Using default SHIFT value=8
150000^4194304+1 21710101 digits 0 days 0.0 hours (0.00 ms/mul, 16777228 iterations) 0 GFLOPS
Using default SHIFT value=8
309258^4194304+1 23028076 digits 0 days 0.0 hours (0.00 ms/mul, 4194320 iterations) 0 GFLOPS
Using default SHIFT value=8
480000^4194304+1 23828853 digits 0 days 0.0 hours (0.00 ms/mul, 33554441 iterations) 0 GFLOPS
Using default SHIFT value=8
14^8388608+1 9614419 digits 0 days 0.0 hours (0.00 ms/mul, 8388609 iterations) 0 GFLOPS
Using default SHIFT value=8
36^8388608+1 13055212 digits 0 days 0.0 hours (0.00 ms/mul, 16777218 iterations) 0 GFLOPS
Using default SHIFT value=8
100^8388608+1 16777217 digits 0 days 0.0 hours (0.00 ms/mul, 16777219 iterations) 0 GFLOPS
|
|
|
|
If that's with the double precision boost enabled, this is certainly disappointing. Those numbers don't look that much better than a 580.
Yes |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
-b3 looks buggy
That's with the new executable, right?
If so, that's good news as it means it solves at least some of the problems with the WR tasks.
____________
My lucky number is 75898524288+1 |
|
|
|
-b3 looks buggy
That's with the new executable, right?
If so, that's good news as it means it solves at least some of the problems with the WR tasks.
Yes new executable. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
-b3 looks buggy
That's with the new executable, right?
If so, that's good news as it means it solves at least some of the problems with the WR tasks.
Yes new executable.
Thanks, that's really good news for the WR search.
Regarding performance, I wonder if the Titan would benefit from a higher shift value? Try running "genefercuda -b2 22" and see what it prints out.
____________
My lucky number is 75898524288+1 |
|
|
axnVolunteer developer Send message
Joined: 29 Dec 07 Posts: 285 ID: 16874 Credit: 28,027,106 RAC: 0
            
|
Regarding performance, I wonder if the Titan would benefit from a higher shift value? Try running "genefercuda -b2 22" and see what it prints out.
Perhaps, what is needed is to recompile the app using CUDA 5.0 (w/ cuFFT 5.0).
PS:- Since I don't have the numbers handy, are these numbers better or worse than a 580? and by how much? |
|
|
|
-b2 22 (new exe)
Generalized Fermat Number Bench 2
SHIFT=5 309258^4194304+1 Time: 16.9 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=6 309258^4194304+1 Time: 7.51 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=7 309258^4194304+1 Time: 6.06 ms/mul. Err: 1.89e-001 23028076 digits
SHIFT=8 309258^4194304+1 Time: 5.71 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=9 309258^4194304+1 Time: 5.73 ms/mul. Err: 1.64e-001 23028076 digits
SHIFT=10 309258^4194304+1 Time: 6.16 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=11 309258^4194304+1 Time: 5.72 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=12 309258^4194304+1 Time: 5.72 ms/mul. Err: 1.72e-001 23028076 digits
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
-b2 22 (new exe)
Generalized Fermat Number Bench 2
SHIFT=5 309258^4194304+1 Time: 16.9 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=6 309258^4194304+1 Time: 7.51 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=7 309258^4194304+1 Time: 6.06 ms/mul. Err: 1.89e-001 23028076 digits
SHIFT=8 309258^4194304+1 Time: 5.71 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=9 309258^4194304+1 Time: 5.73 ms/mul. Err: 1.64e-001 23028076 digits
SHIFT=10 309258^4194304+1 Time: 6.16 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=11 309258^4194304+1 Time: 5.72 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=12 309258^4194304+1 Time: 5.72 ms/mul. Err: 1.72e-001 23028076 digits
Ok, no improvement going above 8.
I suspect it may be possible to re-write the code to better take advantage of the hardware, but that's pure speculation.
____________
My lucky number is 75898524288+1 |
|
|
|
With shift=8 I have managed genefer to work stable (I hope).
Now the running task is about 38%. Expected total time looks very similar to my 580. |
|
|
|
Not this hardware in particular. I have made the code in general faster for every GTX card that can currently run genefercuda. Most of the improvements are not K20 specific.
I still have to figure out though why it crashes with cufft error 6 on your cards Mike. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
Not this hardware in particular. I have made the code in general faster for every GTX card that can currently run genefercuda. Most of the improvements are not K20 specific.
I still have to figure out though why it crashes with cufft error 6 on your cards Mike.
I forget if you're using, as the basis for your version, the current production version of Genefer, or the newest version from assembla that incorporates the new framework. I suspect it's the older production version.
The new framework has a build option to use CPU based fast init code instead of the GPU, which cuts the memory usage in half. That hopefully will eliminate the error 6 problem on GPUs that were experiencing that problem. In at least one case, it does.
____________
My lucky number is 75898524288+1 |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Perhaps, what is needed is to recompile the app using CUDA 5.0 (w/ cuFFT 5.0). Nothing special. Install Cuda 5.0 and recompile the app from assembla.com but be warned. Cuda v5 is much bigger than every predecessor. All downloads seem to need between 550MB (Fedora32) and 1050MB (Win64). The filesize for MacOS is 635MB...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Perhaps, what is needed is to recompile the app using CUDA 5.0 (w/ cuFFT 5.0).
Nothing special. Install Cuda 5.0 and recompile the app from assembla.com but be warned.
Funny, I do not see the post where that quote came from ?
I made the changes starting with the BOINC libraries, CUDA 5 and cuFFT 5.
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
***** IMPORTANT!!!!!!! *****
I just discovered that the current "new" build of geneferCUDA on assembla has broken maxErr code, therefore it can't detect when an error occurs. It will happily go and crunch a number way beyond its B limit, and it can't detect when an overclocking problem happens.
To avoid wasting countless hours crunching only to have the result fail validation, please cease using the new app immediately.
In addition, if you're using it with PRPNet (where no double checking is done), please contact me so that I can insure that any tasks you've completed do, indeed, get double checked.
Thank you.
UPDATE
The maxErr bug has been corrected. Feel free to download the new Windows build of GeneferCUDA.
Please see this post for the link, as well as a warning about the testing state of this build.
Thank you for your patience.
____________
My lucky number is 75898524288+1 |
|
|
|
Mike, was this new version automatically distributed via PrimeGrid ?
If so, how can we revert ?
(and for the record, this was not my improved version)
edit: via primegrid |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
Mike, was this new version automatically distributed via PrimeGrid ?
If so, how can we revert ?
(and for the record, this was not my improved version)
edit: via primegrid
No, if you didn't download it manually and set up app_info, you're not running it on boinc. Likewise, if you didn't download it manually and copy it into your prpnet directories, you're not running it there.
You would know it if you were running it since anyone who is running it had to intentionally make an effort to do so.
____________
My lucky number is 75898524288+1 |
|
|
|
What a refreshing difference to working in the public service. Here at PG if there is a problem it is admitted at once and instructions given, in the PS is is hidden until a disaster happens and then "I didn't do it" takes precedence over "what do we do, apart from covering up".
Good to see real support given.
____________
Member team AUSTRALIA
My lucky number is 9291*2^1085585+1 |
|
|
|
-b2 22 (new exe)
Generalized Fermat Number Bench 2
SHIFT=5 309258^4194304+1 Time: 16.9 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=6 309258^4194304+1 Time: 7.51 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=7 309258^4194304+1 Time: 6.06 ms/mul. Err: 1.89e-001 23028076 digits
SHIFT=8 309258^4194304+1 Time: 5.71 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=9 309258^4194304+1 Time: 5.73 ms/mul. Err: 1.64e-001 23028076 digits
SHIFT=10 309258^4194304+1 Time: 6.16 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=11 309258^4194304+1 Time: 5.72 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=12 309258^4194304+1 Time: 5.72 ms/mul. Err: 1.72e-001 23028076 digits
Ok, no improvement going above 8.
I suspect it may be possible to re-write the code to better take advantage of the hardware, but that's pure speculation.
this is a 680:
Command line: gen -b2 22 -d 1
Generalized Fermat Number Bench 2
SHIFT=5 309258^4194304+1 Time: 18.6 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=6 309258^4194304+1 Time: 11.2 ms/mul. Err: 1.56e-001 23028076 digits
SHIFT=7 309258^4194304+1 Time: 9.28 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=8 309258^4194304+1 Time: 8.81 ms/mul. Err: 1.72e-001 23028076 digits
SHIFT=9 309258^4194304+1 Time: 9.04 ms/mul. Err: 1.64e-001 23028076 digits
SHIFT=10 309258^4194304+1 Time: 9.36 ms/mul. Err: 1.56e-001 23028076 digits
so does that mean the titan is 40% faster or am I misreading it somehow?
(sorry, meant to post 580 times, will add those asap) |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
so does that mean the titan is 40% faster or am I misreading it somehow?
That's exactly what that means.
Personally, I'd run the -b test for comparing cards, since it shows you timings at all n values. The -b2 test is for seeing which shift setting is best at a particular N. But you can use those timings for comparing different GPUs also.
____________
My lucky number is 75898524288+1 |
|
|
|
what is the correct command line to run then? |
|
|
|
so does that mean the titan is 40% faster or am I misreading it somehow?
This is expected situation because GTX6xx are not strong in double precision.
|
|
|
|
what is the correct command line to run then?
-b
|
|
|
|
580 results showing titan to only be around 15% faster:
Command line: gen2 -b2 22 -d 1
Generalized Fermat Number Bench 2
SHIFT=5 309258^4194304+1 Time: 11.8 ms/mul. Err: 1.72e-001 23028076
digits
SHIFT=6 309258^4194304+1 Time: 8.06 ms/mul. Err: 1.88e-001 23028076
digits
SHIFT=7 309258^4194304+1 Time: 7.06 ms/mul. Err: 1.80e-001 23028076
digits
SHIFT=8 309258^4194304+1 Time: 6.82 ms/mul. Err: 1.88e-001 23028076
digits
SHIFT=9 309258^4194304+1 Time: 6.7 ms/mul. Err: 1.72e-001 23028076
digits
SHIFT=10 309258^4194304+1 Time: 6.9 ms/mul. Err: 1.64e-001
23028076 digits
The -b command doesn't work in the same way on the latest version of genefer:
Command line: gen -b
Priority change succeeded.
Generalized Fermat Number Bench
5683936^256+1 Time: 0 us/mul. Err: 0.2500 1730 digits
here it is with an old version:
Command line: gen2 -b
Generalized Fermat Number Bench
2009574^8192+1 Time: 78 us/mul. Err: 1.88e-001 51636 digits
1632282^16384+1 Time: 84.2 us/mul. Err: 1.88e-001 101791 digits
1325824^32768+1 Time: 112 us/mul. Err: 2.03e-001 200622 digits
1076904^65536+1 Time: 169 us/mul. Err: 1.72e-001 395325 digits
874718^131072+1 Time: 279 us/mul. Err: 1.64e-001 778813 digits
710492^262144+1 Time: 532 us/mul. Err: 2.03e-001 1533952 digits
577098^524288+1 Time: 939 us/mul. Err: 1.88e-001 3020555 digits
468750^1048576+1 Time: 1.73 ms/mul. Err: 1.56e-001 5946413 digits
380742^2097152+1 Time: 3.4 ms/mul. Err: 1.64e-001 11703432 digits
309258^4194304+1 Time: 6.93 ms/mul. Err: 1.88e-001 23028076 digits
100^8388608+1 Time: 13.9 ms/mul. Err: 3.81e-005 16777217 digits
Pretty disappointing that a card that purports to be 3x faster and costs 2.5x as much is bottlenecked so badly somewhere that it only manages a 15% increase. I guess we'd better hope the new genefer can squeeze some more out of it. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
what is the correct command line to run then?
For benchmarks, use "genefercuda -b". (Change the executable name as appropriate, of course.)
To see which SHIFT or block size works best at a specific N value, use "genefercuda -b2 #", where "#" is the small n (e.g., use 22 for N=4144304.)
Pretty disappointing that a card that purports to be 3x faster and costs 2.5x as much is bottlenecked so badly somewhere that it only manages a 15% increase. I guess we'd better hope the new genefer can squeeze some more out of it.
At least initially, the new CPU versions of Genefer will be getting a speed increase, but the changes to Genefercuda will only be attempting to improve reliability.
____________
My lucky number is 75898524288+1 |
|
|
|
but the changes to Genefercuda will only be attempting to improve reliability.
was referring to the one hydropower is working on. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
UPDATE
The maxErr bug has been corrected. Feel free to download the new Windows build of GeneferCUDA.
Please see this post for the link, as well as a warning about the testing state of this build.
Thank you for your patience.
____________
My lucky number is 75898524288+1 |
|
|
|
Here are the tests with hydropower's cuda 5 genefer:
-b
Generalized Fermat Number Bench
2009574^8192+1 Time: 110 us/mul. Err: 0.0000 51636 digits
1632282^16384+1 Time: 120 us/mul. Err: 0.0000 101791 digits
1325824^32768+1 Time: 130 us/mul. Err: 0.0000 200622 digits
1076904^65536+1 Time: 180 us/mul. Err: 0.0000 395325 digits
874718^131072+1 Time: 250 us/mul. Err: 0.0000 778813 digits
710492^262144+1 Time: 370 us/mul. Err: 0.0000 1533952 digits
577098^524288+1 Time: 580 us/mul. Err: 0.0000 3020555 digits
468750^1048576+1 Time: 1.03 ms/mul. Err: 0.0000 5946413 digits
380742^2097152+1 Time: 1.79 ms/mul. Err: 0.0000 11703432 digits
309258^4194304+1 Time: 3.44 ms/mul. Err: 1.0000 23028076 digits
100^8388608+1 Time: 7.19 ms/mul. Err: 0.0000 16777217 digits
-b2 22
Generalized Fermat Number Bench 2
SHIFT=5 309258^4194304+1 Time: 11.9 ms/mul. Err: 6.10e-005 23028076 digits
SHIFT=6 309258^4194304+1 Time: 5.43 ms/mul. Err: 6.10e-005 23028076 digits
SHIFT=7 309258^4194304+1 Time: 3.84 ms/mul. Err: 6.10e-005 23028076 digits
SHIFT=8 309258^4194304+1 Time: 3.45 ms/mul. Err: 6.10e-005 23028076 digits
SHIFT=9 309258^4194304+1 Time: 3.39 ms/mul. Err: 6.10e-005 23028076 digits
SHIFT=10 309258^4194304+1 Time: 3.75 ms/mul. Err: 6.10e-005 23028076 digits
|
|
|
|
that's looking more like it. twice as fast as an overclocked 580. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
SHIFT=8 309258^4194304+1 Time: 3.45 ms/mul. Err: 6.10e-005 23028076 digits
SHIFT=9 309258^4194304+1 Time: 3.39 ms/mul. Err: 6.10e-005 23028076 digits
Looks like you may want to use shift=9 as it seems to be slightly faster at n=22. (Unless screen lag is a problem.)
____________
My lucky number is 75898524288+1 |
|
|
|
Yes with hydropower's exe it's good settings, but with default genefer on primegrid, it's not good. |
|
|
|
is hydropower's exe faster for all cards or just tesla/titan? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
is hydropower's exe faster for all cards or just tesla/titan?
Supposedly, all cards, but it failed to run on mine.
____________
My lucky number is 75898524288+1 |
|
|
|
Is there a link to download it and try it or do I need to ask hp nicely? |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2349 ID: 1178 Credit: 17,530,656,536 RAC: 4,290,337
                                           
|
@Hydropower
Please PM me if you have a link to download your new cuda app version. I have a variety of cards and can do some testing to see where it works/fails.
|
|
|
axnVolunteer developer Send message
Joined: 29 Dec 07 Posts: 285 ID: 16874 Credit: 28,027,106 RAC: 0
            
|
Here are the tests with hydropower's cuda 5 genefer:
Do we have the numbers on straight recompile of existing GeneferCUDA using CUDA 5? |
|
|
|
Do we have the numbers on straight recompile of existing GeneferCUDA using CUDA 5?
I think not, but you have to ask hydropower.
|
|
|
|
Do we have the numbers on straight recompile of existing GeneferCUDA using CUDA 5?
You might recompile it yourself and also verify that the code still runs on older cards with CUDA 5 ? That would save me time as well. I see from your last 2011 logon that you used a GT240 ?
|
|
|
|
Any crunching results of the Titan available here ? |
|
|
|
Any crunching results of the Titan available here ?
Not now from me. I suspended primegrid for now and waitin for new app by Hydropower. |
|
|
|
Any crunching results of the Titan available here ?
There are benchmarks posted below by HA-Soft. Others are currently benchmarking various compiles of the HydroCuda version of GeneFerCuda on other hardware to make sure it improved for all cards and did not break anything.
I am very busy with work right now, so updates are delayed a bit. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2349 ID: 1178 Credit: 17,530,656,536 RAC: 4,290,337
                                           
|
Any crunching results of the Titan available here ?
There are benchmarks posted below by HA-Soft. Others are currently benchmarking various compiles of the HydroCuda version of GeneFerCuda on other hardware to make sure it improved for all cards and did not break anything.
I am very busy with work right now, so updates are delayed a bit.
I am one of the ones benchmarking the HydroCuda application across various hardware. Unfortunately, I get the CUFFT error 6 when running the -b option on both a dual GT-640 machine (Win7 64-bit, 310.90 driver) and a single GT-440 OEM machine (Vista 64-bit, 314.07 driver). I have a couple more cards to try it on over the next few days, so I'll report back then.
|
|
|
|
Titan becomes very hot soon, over 70C.
Even if I remain a stock speed, I had many Round Errors.
I need a Water cooling for TITAN.
I will spend a lot of money and time.
GPU=GeForce GTX TITAN
Global memory=4294967295 Shared memory/block=49152 Registers/block=65536 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=2147483647 65535 65535
CC=3.5
Clock=875 MHz
# of MP=14
Generalized Fermat Number Bench 2
SHIFT=5 468750^1048576+1 Time: 3.72 ms/mul. Err: 1.64e-001 5946413 digits
SHIFT=6 468750^1048576+1 Time: 1.91 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=7 468750^1048576+1 Time: 1.59 ms/mul. Err: 1.56e-001 5946413 digits
SHIFT=8 468750^1048576+1 Time: 1.6 ms/mul. Err: 1.70e-001 5946413 digits
SHIFT=9 468750^1048576+1 Time: 1.83 ms/mul. Err: 1.72e-001 5946413 digits
SHIFT=10 468750^1048576+1 Time: 2.8 ms/mul. Err: 1.64e-001 5946413 digits
Best SHIFT determined experimentally. Saving AUTOSHIFT|genefercuda|3.1.1-0|0|GeForce GTX TITAN|875|1048576=7 to genefer.cfg.
No project preference set; using AUTO-SHIFT=7
Starting initialization...
maxErr during b^N initialization = 0.0000 (9.780 seconds).
Testing 139244^1048576+1...
Estimated total run time for 139244^1048576+1 is 7:51:49
maxErr exceeded for 139244^1048576+1, 0.5000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
Waiting 10 minutes before attempting to continue from last checkpoint...
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
70 doesn't seem very hot for a running GPU, although I don't know what would be normal for a Titan.
There's a few things you can try that have worked on other GPUs:
You can try lowering the memory clock below stock speeds. This reduces power consumption, heat generation, and increases reliability and doesn't seem to affect Genefer's performance significantly.
You can increase the GPU fan speed, or take other measures to cool the GPU.
____________
My lucky number is 75898524288+1 |
|
|
boss  Send message
Joined: 27 Apr 11 Posts: 21 ID: 96592 Credit: 1,107,504,869 RAC: 702,393
                              
|
For this NV-GPU is the max temp 95°C
see here:
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan/specifications |
|
|
|
Titan becomes very hot soon, over 70C.
Even if I remain a stock speed, I had many Round Errors.
I need a Water cooling for TITAN.
I will spend a lot of money and time.
There is no problem with your card, definitely. It may be related to algoritm calculating maxerr.
Playing with mem clocks and cooling does not make any effect. Your card is ok. |
|
|
|
Lowering mem clock leads to more stability enough to complete short tasks.
But it is slower than GTX580.
TITAN mem = 2500MHz:
SHIFT=8 468750^1048576+1 Time: 1.86 ms/mul. Err: 1.70e-001 5946413 digits
GTX 580 stock:
SHIFT=8 468750^1048576+1 Time: 1.69 ms/mul. Err: 1.48e-001 5946413 digits
There is no problem with your card, definitely. It may be related to algoritm calculating maxerr.
program bug?
if so, I don't understand why lowering memory clock leads to its stability.
In my experience, round error is mainly caused by overheated GPU and especially RAM.
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Does anybody have a Titan or 580 and cuda50 in a linux-box?
Then you can try GeneferCUDA_linux_cuda50. The app was compiled with cuda50 and the compute capabilities 35, 30, 20 and 13.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
With SHIFT=8 my titan works ok (but slow) with default clock and temperatures about 80°C.
With SHIFT=7 my titan don not work with mem clocks lowered to minimum (in msiafterburner ~2500Mhz).
I think this can not be directly related to memory errors only. |
|
|
|
I think this can not be directly related to memory errors only
I tend to agree with that. On the Tesla I have experienced the rounding error on some runs of the code while running the benchmarks. After checking the ECC status, all items report zero errors, so it is not a bit-flip error.
I suspect it has something to do with asynchronous memory reads or writes from the card to main memory taking place too soon when the card is running at high speeds. Something that would be worked around by lowering the memory clocks, which slows down the card but not the main PC.
Another explanation would be that cuda instructions are being scheduled out of order (if that exists on these cards) maybe affecting rounding behaviour, but this is speculation.
It is still on my todo list to check that. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
With SHIFT=8 my titan works ok (but slow) with default clock and temperatures about 80°C.
With SHIFT=7 my titan don not work with mem clocks lowered to minimum (in msiafterburner ~2500Mhz).
I think this can not be directly related to memory errors only.
That's something entirely new. I've never heard of anything like this before.
All SHIFT does is vary the amount of work done in each "kernel". SHIFT=7 does half as much work in one call to the GPU as does SHIFT=8. Usually, the only effect of changing SHIFT is the amount of screen lag you get, although if SHIFT is too low (or too high), it can lower performance because too much time is spent starting and stopping the GPU as opposed to actually running the computations on the GPU.
I can't even begin to theorize right now about how this could be affecting maxErr. It shouldn't have any effect at all.
____________
My lucky number is 75898524288+1 |
|
|
|
Maybe the rounding error depend on how is fft spread over gpu in combination with tested number. |
|
|
|
Thank you. I also saw my results. I found that SHIFT = 7 in my err cases, too.
I understand that there may be concurrency problem somewhere.
|
|
|
|
Recently, my most GFN Long tasks for my old GTX 580 result in error 6.
>cuda_subs.cu(332) : cufftSafeCall() CUFFT error: 6.
For Titan, I unfortunately changed hardware configrations(PSU, Matherboad), drivers(314.09), and msi afterburner(3.0.0. beta6). And genefer was updated(2.0.2) recently too.
I want to investigate the error.
I find in event log that the error is watch-dog timeout (2 seconds) of the driver, and there is dump of driver:
C:\Windows\LiveKernelReports\WATCHDOG\WD-20130317-0154.dmp
and the dump says VIDEO_TDR_TIMEOUT_DETECTED (117).
http://msdn.microsoft.com/en-us/windows/hardware/gg487368.aspx
The long task run with Shift=9 now.
>No project preference set; using AUTO-SHIFT=9
How long is the kernel of genefer running ?
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
Not anywhere close to that long. A whole iteration takes at most several tens of milliseconds, and each kernel is only a fraction of that.
____________
My lucky number is 75898524288+1 |
|
|
|
Thanks. I will investigate the other factors. |
|
|
|
So after reading through this entire thread is it safe to say that two 580's are still better than one titan? |
|
|
|
Hi,
I want to compute sur Genefer short with my GTX Titan
please can you give me advices :
Which block size do you advice to choose ?
Turning ON or OFF for dual : Dual seems to fail :(
Or do I need to wait for new app by hydro ?
Thanks by advance
Julien |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
Unless you want to reduce screen lag, or have a really good reason why you want to use a specific block size (and if you're asking, then you don't), you should leave the block size at 0.
I don't know what you mean by "dual".
I see you're getting a lot of errors. I also see you're running the 320 driver. I may have read something about the 320 driver having problems, so perhaps you should try the 311 drivers.
____________
My lucky number is 75898524288+1 |
|
|
|
Thanks, I will try tomorrow with older drivers.
Yes, tasks are no more running after 1%
Sorry, I meant dual precision for computing of course ;)
Drivers allow you to activate dual precision (but frequency dropped a bit) with the Titan, 780 will not have this capacity. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
Thanks, I will try tomorrow with older drivers.
Yes, tasks are no more running after 1%
Sorry, I meant dual precision for computing of course ;)
Drivers allow you to activate dual precision (but frequency dropped a bit) with the Titan, 780 will not have this capacity.
Oh, right, you can increase DP performance but the clock rate drops.
For GFN, definitely turn on the DP high performance mode. For the sieves, you're probably better off with it turned off.
____________
My lucky number is 75898524288+1 |
|
|
|
TY |
|
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3145 ID: 130544 Credit: 2,210,990,431 RAC: 159,618
                      
|
Try the 310.90 drivers I believe, if these support the Titan. |
|
|
|
Unfortunately, it is not :( |
|
|
|
Double precision wise, to unlock full performance you must open the NVIDIA Control Panel, navigate to “Manage 3D Settings”. In the Global Settings box you will find an option titled “CUDA – Double Precision”,
Hmm - did they remove that in later drivers? I'm on 331.82 and the only thing I have starting with CUDA is "CUDA - GPUs" :( |
|
|
|
Double precision wise, to unlock full performance you must open the NVIDIA Control Panel, navigate to “Manage 3D Settings”. In the Global Settings box you will find an option titled “CUDA – Double Precision”,
Hmm - did they remove that in later drivers? I'm on 331.82 and the only thing I have starting with CUDA is "CUDA - GPUs" :(
The Double precision setting is right below the CUDA - GPUs setting for those models that support it. I am using 331.82, also. |
|
|
|
Using the opencl I can do a WR unit in a little over 56 hours on my titan in DP mode. Haven't tried the cuda version yet. I also haven't had any error outs either. Anyone have and times for the cuda version to compare? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
Using the opencl I can do a WR unit in a little over 56 hours on my titan in DP mode. Haven't tried the cuda version yet. I also haven't had any error outs either. Anyone have and times for the cuda version to compare?
You don't need to run a full GFN-WR tasks to compare the speeds. Run the apps from the command line with the -b switch and the app will run benchmarks.
Since the benchmarks use the exact same code as what is used to process the GFN tasks, the benchmarks accurately predict real life performance. You can use that to see how the OpenCL app compares to the CUDA app on your hardware.
I suspect the OpenCL app will be significantly faster.
____________
My lucky number is 75898524288+1 |
|
|
tng Send message
Joined: 29 Aug 10 Posts: 459 ID: 66603 Credit: 44,556,327,598 RAC: 21,868,213
                                                
|
Using the opencl I can do a WR unit in a little over 56 hours on my titan in DP mode. Haven't tried the cuda version yet. I also haven't had any error outs either. Anyone have and times for the cuda version to compare?
Check this system's results.
____________
|
|
|
|
That's about the same as what I'm getting in times. OCLcudaGFNWR is the OCL app. At least, that's what gets downloaded with Nvidia GPU (OpenCL) set in preferences. |
|
|
tng Send message
Joined: 29 Aug 10 Posts: 459 ID: 66603 Credit: 44,556,327,598 RAC: 21,868,213
                                                
|
That's about the same as what I'm getting in times. OCLcudaGFNWR is the OCL app. At least, that's what gets downloaded with Nvidia GPU (OpenCL) set in preferences.
There's one CUDA task in there. Took almost twice as long.
____________
|
|
|
|
Ahhh. Good eyes. Much longer to run and significantly less cpu usage. Much appreciated. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13876 ID: 53948 Credit: 383,190,940 RAC: 115,740
                              
|
Ahhh. Good eyes. Much longer to run and significantly less cpu usage. Much appreciated.
That's not actually true, at least not the way you think.
Despite what task manager or other CPU utilization tools might say, GeneferOCL (or any other OpenCL program) doesn't actually hog the CPU.
Yes, it does register as using 100% of the CPU (and the CPU seconds in the BOINC task display will be high) if the computer is otherwise idle or if you're reserving a core for the GPU.
However, what it's actually doing is effectively using the CPU at a priority lower than the lowest possible priority. (That's not exactly what it's doing, but it's the best way to explain it.) If anything else needs the CPU, including other BOINC tasks that are running at low priority, those tasks get the CPU rather than GeneferOCL. GeneferOCL's CPU usage won't interfere with other tasks.
In other words, GeneferOCL only uses a lot of CPU when there's an idle CPU core. If all the cores are busy, it uses much less CPU.
If you want to see how much CPU it's really using, first start running LLR on all the cores and then run GeneferOCL. That will give you a better measurement of actual CPU usage.
This is the way the OpenCL driver works.
____________
My lucky number is 75898524288+1 |
|
|
tng Send message
Joined: 29 Aug 10 Posts: 459 ID: 66603 Credit: 44,556,327,598 RAC: 21,868,213
                                                
|
Ahhh. Good eyes. Much longer to run and significantly less cpu usage. Much appreciated.
Nothing to do with eyes -- that's my host. Tried the CUDA app once to see what it would do. I found out.
____________
|
|
|
|
I reran the test you were talking about and I get different results. It was particularly noticeable on these PSP LLR tasks. Whether it's started first or not, the oclcudagfnwr appears to have a higher priority. The cpu tasks take the hit on usage (inferred by the increase in completion time). They returned to normal runtimes after giving the card it's own cpu core again. Task manager shows it using a full core by itself. So maybe what you mentioned about it doing other things gives it a boost over other tasks? I guess I could have tried this using something other than another boinc task but they were simply handy. This is on a windows 7 box. Also just checked hwmonitor (by CPUID). The core the card is using is just as warm (low to mid 50s - no overclocking) as the others. This would suggest it's being used just as much. It could be running out of cache on the cpu with the additional task (like it would if running 8 tasks by hyperthreading) but it's partner in crunching is nearly identical (4770k vs 4771) and running 4 PSPs with no hit. It's also a linux box so less junk in the background.
Maybe the card itself is so fast that it actually uses more cpu vs other cards? That would explain everything except the priority. |
|
|