Author |
Message |
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
GOOD NEWS!!!
llrCUDA has finally reached a point in its development where it's almost a competitive compliment to the CPU version. There's still some load issues when the CPU is under full load. However, with at least one CPU thread free, llrCUDA appears to run with high GPU load and very low CPU load. :)
In preparation for implementation, we'd like to start testing now to iron out any issues. A test suite is provided in the archive...please run and post your results. Also, test out different CPU load configurations impact on the GPU load.
Double precision GPU is required. You'll need driver 267.24 at minimum (download). llrCUDA v0.60 can be downloaded here (cudart and cufft dll's included). Windows 64 bit ONLY for now.
Like geneferCUDA, llrCUDA really shines at the higher n's.
I'm not sure if app_info will work right now in BOINC. One user has already experienced problems because none of the LLR projects have a GPU option and BOINC gets confused with a GPU in the app_info.
However, PRPNet could be a good place to test it. It might be as easy as adding llrCUDA.exe to the folder and updating the master_prpclient.ini file:
I'll update this post as results/updates/modifications come in. Thanks for testing...and good luck!
UPDATE - llrCUDA will work on the following ports:
server=121:0:1:prpnet.primegrid.com:12001
server=27:0:1:prpnet.primegrid.com:12006
server=PPSEhigh:0:1:prpnet.primegrid.com:12007
server=ESP:0:1:pgllr.mine.nu:9000
____________
|
|
|
|
Too bad this is limited to 64 bit systems. Any chance a 32 bit version will come out to test?
____________
@AggieThePew
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 674 ID: 845 Credit: 2,561,991,810 RAC: 434,430
                           
|
You'll also require cudart64_32_16.dll and cufft64_32_16.dll.
3*2^382449+1 and 3*2^414840-1 have too small exponents, 34354*5^141543-1 and 91848*33^91848+1 can't be tested right now because they're not base 2.
____________
|
|
|
|
Maybe it's time to enable GPU for PPS?
And release some test WUs with already known results in small number, say 333 or so.
Just to verify?
____________
wbr, Me. Dead J. Dona
|
|
|
|
Not mention another reason to go out and buy a decent GPU.
____________
@AggieThePew
|
|
|
|
And the reason to switch to 64 bit version of OS )))
____________
wbr, Me. Dead J. Dona
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Maybe it's time to enable GPU for PPS?
And release some test WUs with already known results in small number, say 333 or so.
Just to verify?
In time, but for now, we simply want to test it external to BOINC. As you can see in pschoefer's post, we first need to establish a proper test suite. :) AND include the proper dll's.
____________
|
|
|
|
All these dll's are already included to the last prpclient-4.2.3beta-windows-gpu.7z |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2329 ID: 1178 Credit: 15,582,885,790 RAC: 15,073,530
                                           
|
As with geneferCUDA, does this test app work with only DP capable GPUs, or is it compatible with all CUDA capable GPUs?
____________
141941*2^4299438-1 is prime!
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
As with geneferCUDA, does this test app work with only DP capable GPUs, or is it compatible with all CUDA capable GPUs?
Only on DP capable GPUs
____________
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
You'll need driver 267.24 at minimum
I was running fine with 266.58 version.
(even low CPU usage - i5 2500 / GTX 580 / Win 7 SP1)
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 9,807
                             
|
You'll also require cudart64_32_16.dll and cufft64_32_16.dll.
Any suggestions on where I can get these? I haven't turned them up in a search of my system nor google. Supposedly supplied with an nvidia toolkit but it requires registration as a dev. |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
Any suggestions on where I can get these? I haven't turned them up in a search of my system nor google. Supposedly supplied with an nvidia toolkit but it requires registration as a dev.
All these dll's are already included to the last prpclient-4.2.3beta-windows-gpu.7z
http://pgllr.mine.nu/PRPNet/
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 9,807
                             
|
Either I'm being blind or I got the wrong file, but I've checked and rechecked and I don't see those files there. Only *32_32_16.dll versions which don't work as is or renamed. |
|
|
|
Did PrimeGrid make an energy efficiency calculation about this llrcuda version when compared to quad or hex cpu machine running llr cpu version?
____________
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
Either I'm being blind or I got the wrong file, but I've checked and rechecked and I don't see those files there. Only *32_32_16.dll versions which don't work as is or renamed.
My mistake, they are not where I expected them to be...
Those DLLs are part of CUDA toolkit.
http://pgllr.mine.nu/PRPNet/cudart64_32_.7z
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
Did PrimeGrid make an energy efficiency calculation about this llrcuda version when compared to quad or hex cpu machine running llr cpu version?
I would say it's inefficient for low-digit numbers (like PPS <1M), especially when GPU needs quite a bit of CPU. Quad core CPU is more productive than GTX 580 as it does 4 tests in similar time comparing to single GPU.
It becomes more interesting with large-digit numbers (like SoB >10M)
I admin I don't have exact numbers in watts.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
Sorry can't seem to find the post but can anyone point me to where the list of nvidia dual precision cards are? Want to make sure I get one that will run all the newer apps we are getting.
____________
@AggieThePew
|
|
|
|
Did PrimeGrid make an energy efficiency calculation about this llrcuda version when compared to quad or hex cpu machine running llr cpu version?
I would say it's inefficient for low-digit numbers (like PPS <1M), especially when GPU needs quite a bit of CPU. Quad core CPU is more productive than GTX 580 as it does 4 tests in similar time comparing to single GPU.
It becomes more interesting with large-digit numbers (like SoB >10M)
I admin I don't have exact numbers in watts.
When you get some numbers put here the results because I already made the calculations. I just need to know timings for GPU and CPU, number of cores of CPU, % used of CPU when running GPU, model of GPU used, model of CPU used...
Carlos
____________
|
|
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 9,807
                             
|
Thanks, looks like I also need to update my version of 7z as it wont extract that archive! If nothing else this is making me check and update my software...
Edit: yup, running now. Tests 1, 2 and 5 wont start but 3 & 4 went through ok... 6 running now and might take a while... I think I'm looking at 20 hours on my GTS450.
Given that might take a while, have the existing test results.
GTS 450 (factory OC)
Driver 267.59
Win7-64
Nothing significant running on CPU (stock Q6600 2.4 GHz) during these tests.
9999*2^458051+1 is prime! Time : 273.441 sec.. Time per bit: 0.593 ms.
1000065*2^390927-1 is prime! Time : 373.803 sec.
While running the above, GPU-Z reports GPU load is 90+% and GPU memory load is several 10's of %. CPU typically 10-12% depending on test, which is up to about half a single core. |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
i5 2500 / GTX 580 / Win 7 / 266.58 driver version
9999*2^458051+1 is prime! Time : 170.255 sec.. Time per bit: 0.347 ms.
1000065*2^390927-1 is prime! Time : 186.063 sec.99.76%]. Time per iteration : 0.478 ms. (GPU load ~97%, CPU 6%)
9999*2^458051+1 is prime! Time : 164.910 sec. (tested on CPU)
1000065*2^390927-1 is prime! Time : 142.975 sec. (CPU)
BUT
19249*2^13018586+1, bit: 20000 / 13018600 [0.15%]. Time per bit: 12.916 ms. (CPU, memory usage ~55MB)
19249*2^13018586+1, bit: 40000 / 13018600 [0.30%]. Time per bit: 4.385 ms.
(GPU ~98-99%, CPU ~3%, GPU memory usage ~338MB, CPU memory usage ~315MB)
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Testing suite was updated to the following:
llrCUDA.exe -q"9999*2^458051+1" -d
llrCUDA.exe -q"1000065*2^390927-1" -d
llrCUDA.exe -q"313*2^1012240+1" -d
llrCUDA.exe -q"192971*2^4998058-1" -d
llrCUDA.exe -q"3*2^7033641+1" -d
As expected, llrCUDA shines at the higher n's:
19249*2^13018586+1, bit: 20000 / 13018600 [0.15%]. Time per bit: 12.916 ms. (CPU, memory usage ~55MB)
19249*2^13018586+1, bit: 40000 / 13018600 [0.30%]. Time per bit: 4.385 ms.
(GPU ~98-99%, CPU ~3%, GPU memory usage ~338MB, CPU memory usage ~315MB)
____________
|
|
|
|
As expected, llrCUDA shines at the higher n's:
19249*2^13018586+1, bit: 20000 / 13018600 [0.15%]. Time per bit: 12.916 ms. (CPU, memory usage ~55MB)
19249*2^13018586+1, bit: 40000 / 13018600 [0.30%]. Time per bit: 4.385 ms.
(GPU ~98-99%, CPU ~3%, GPU memory usage ~338MB, CPU memory usage ~315MB)
You are totally wrong. Although it is 2.9 x quicker (12.916/4.385), running 4 instances of llr on the i5 2500 (TDP 95 W) is still more energy efficient than running llr on a GTX 580 (TDP 244 W). The CPU still beats the GPU because it uses less Watts to test the same amount of candidates of a GPU, this is the real issue here. Please make the calculations. It gets worse when you have a 6 core machine and so on...
I am not against llrcuda, msft is doing a great job, I just think the client isn't optimized yet to beat a CPU.
____________
|
|
|
|
Carlos, you are very categorical.
Noone says that llrcuda is faster AND more energy efficient.
llrcuda@GPU is faster than core@CPU and this is a fact.
Every profit has his price. The price of speed is power supply.
If you don't want to use llrcuda, I will.
In any case to have a choise is better than not to have.
If your message is to stop and continue optimization,
try to do it yourself. Sources are opened for everyone. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Rick Reynolds wrote: Sorry can't seem to find the post but can anyone point me to where the list of nvidia dual precision cards are? Want to make sure I get one that will run all the newer apps we are getting.
All cards with at least cuda capability 1.3:
only GTX200 (no GTS or GT cards)
400 series
500 series
Tesla 1000
Tesla 2000
some Quadro's
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
You'll also require cudart64_32_16.dll and cufft64_32_16.dll.
Any suggestions on where I can get these? I haven't turned them up in a search of my system nor google. Supposedly supplied with an nvidia toolkit but it requires registration as a dev.
You only need to register as a dev to download the CUDA 4.0RC SDK but not for the current or old ones.
____________
|
|
|
|
As expected, llrCUDA shines at the higher n's:
19249*2^13018586+1, bit: 20000 / 13018600 [0.15%]. Time per bit: 12.916 ms. (CPU, memory usage ~55MB)
19249*2^13018586+1, bit: 40000 / 13018600 [0.30%]. Time per bit: 4.385 ms.
(GPU ~98-99%, CPU ~3%, GPU memory usage ~338MB, CPU memory usage ~315MB)
You are totally wrong. Although it is 2.9 x quicker (12.916/4.385), running 4 instances of llr on the i5 2500 (TDP 95 W) is still more energy efficient than running llr on a GTX 580 (TDP 244 W). The CPU still beats the GPU because it uses less Watts to test the same amount of candidates of a GPU, this is the real issue here. Please make the calculations. It gets worse when you have a 6 core machine and so on...
I am not against llrcuda, msft is doing a great job, I just think the client isn't optimized yet to beat a CPU.
It would be interesting to see the real world performance of a C2050 (Fermi based) Tesla (515 GFlops/s theoretical peak DP performance compared to the 197,6 of a GTX 580) compared with a (high end) CPU.
____________
|
|
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 9,807
                             
|
CPU: Q6600 (quad 2.4 GHz)
GPU: GTS 450 (factory OC)
Driver 267.59
Win7-64
CPU comparisons performed using cllr 3.8.5
Nothing significant running on CPU during these tests.
GPU: 9999*2^458051+1 is prime! Time : 272.392 sec.. Time per bit: 0.593 ms.
CPU: 9999*2^458051+1 is prime! Time : 297.001 sec.
GPU required at peak 44% of one CPU core.
Peak GPU loading GPU 96% mem 36% 181 MB
GPU: 1000065*2^390927-1 is prime! Time : 371.251 sec.
CPU: 1000065*2^390927-1 is prime! Time : 258.543 sec.
GPU required at peak 56% of one CPU core.
Peak GPU loading GPU 95% mem 41% 185 MB
GPU: 313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 950.595 sec
CPU: 313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 979.650 sec
GPU required at peak 28% of one CPU core.
Peak GPU loading GPU 98% mem 47% 185 MB
192971*2^4998058-1
GPU best time per iteration: 6.597ms for an estimated run time of 9.2 hours.
CPU best time per iteration: 8.882ms for an estimated run time of 12.3 hours.
GPU required at peak 36% of one CPU core.
Peak GPU loading GPU 98% mem 55% 273 MB
3*2^7033641+1
GPU best time per bit: 6.737ms for an estimated run time of 13.2 hours.
CPU best time per bit: 7.046ms for an estimated run time of 13.8 hours.
GPU required at peak 20% of one CPU core.
Peak GPU loading GPU 99% mem 60% 274 MB
The peak GPU loading is the maximum observed while it was doing the main work. The initial starting period could tie up a CPU core for a while with little GPU activity.
I'm too impatient to finish running the last two...
(edited to add more GPU running detail) |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
Just out of curiosity, 9999*2^458051+1 tested on CPU using variours apps
LLR: 170 secs
PFGW32: 175 secs
PFGW64: 164 secs
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
For your attention:
GPU affinity works!
But for llrCUDA GPU affinity you need to set cpuaffinity.
Example of prpclient.ini for prpclient-2:
// This sets the CPU affinity for LLR on multi-CPU machines. It defaults to
// -1, which means that LLR can run on an CPU.
cpuaffinity=1
// This sets the GPU affinity for CUDA apps on multi-GPU machines. It defaults to
// -1, which means that the CUDA app can run on an GPU.
gpuaffinity=1
variable gpuaffinity till works only for GeneferCUDA.
I think it needs some changes in prpclient: if llrcuda used, create separate llrcuda.ini and set value of affinity from prpclient.ini gpuaffinity variable.
Another way - to compile llrcuda for PRPNet with reading value of CPU_AFFINITY from gpuaffinity of llr.ini (see llr.c from llrcuda_win64src).
First way is more versatile, IMHO. |
|
|
|
Thrilled to see this is ready for testing!
I'd love to help but my GTX 460 is in a 32-bit XP P4 machine right now. Over the summer I hope to finally make a foray into the "I'm-serious-enough-to-build-my-own-actually-decent-machine" world, though.
At the moment, the CPU load issue apparently being seen leads me to believe that a Pentium 4 would probably be best left without anything to do whilst llrCUDA is running. Currently I can have 2 instances of PPSElow10 testing and the GPU on GFN262144 without issue, but I take it llrCUDA is different.
I'll definitely keep watching this thread for updates!
____________
|
|
|
|
OK, here's my 2 data points:
GTX460SE
i7-860 (2.8 gHz)
Vista-64
9999*2^458051+1 is prime! Time : 285.964 sec.
1000065*2^390927-1 is prime! Time : 371.171 sec.
313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 940.773 sec.
192971*2^4998058-1 is not prime. LLR Res64: 16DDFED7631A18AB Time : 28968.951 sec.
3*2^7033641+1 is prime! Time : 47063.063 sec. (Edit: Oops, ignore this one, Vista took a nap sometime overnight and I didn't notice for a while so the time is not comparable)
GTX260-192core
i7-860 (2.8 gHz)
Win7-64
9999*2^458051+1 is prime! Time : 405.491 sec.
1000065*2^390927-1 is prime! Time : 469.702 sec.
313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 1176.601 sec.
192971*2^4998058-1 is not prime. LLR Res64: 16DDFED7631A18AB Time : 32547.804 sec.
3*2^7033641+1 is prime! Time : 42232.472 sec.
I ran off 5 port 12007 wu's before I figured out I needed to run the tests first and they didn't seem particularly fast around 15 minutes (IIRC) on the GTX460SE
I'm wondering how/if llrCUDA will run GFN524288 wu?
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Larry, interesting to see the performance of both gpu's but how long needs your cpu in comparison to?
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
CPU: Core2 Duo E8400 @ 3.0GHz
GPU: GTX 460 1024Mb (810/1000)
Driver 266.58
Win7-64
CPU comparisons performed using cllr 3.8.5
Nothing significant running on CPU during these tests.
GPU: 9999*2^458051+1 is prime! Time : 229.950 sec.
CPU: 9999*2^458051+1 is prime! Time : 233.526 sec.
GPU: 1000065*2^390927-1 is prime! Time : 276.270 sec
CPU: 1000065*2^390927-1 is prime! Time : 203.798 sec.
GPU: 313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 697.670 sec.
CPU: 313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 783.478 sec.
GPU: 192971*2^4998058-1, iteration : 30000 / 4998058 [0.60%]. Time per iteration : 4.228 ms.
CPU: 192971*2^4998058-1, iteration : 30000 / 4998058 [0.60%]. Time per iteration : 7.190 ms.
GPU: 3*2^7033641+1, bit: 30000 / 7033642 [0.42%]. Time per bit: 4.148 ms.
CPU: 3*2^7033641+1, bit: 30000 / 7033642 [0.42%]. Time per bit: 5.681 ms. |
|
|
|
Per Rroonnaalldd's observation I ran CPU comparison to the earlier GPU tests.
CPU=i7 860 (2.8 ghz) on both machines, same motherboard.
GPU: 9999*2^458051+1 is prime! Time : 405.491 sec. GTX260-192core
GPU: 9999*2^458051+1 is prime! Time : 285.964 sec. GTX460SE
CPU: 9999*2^458051+1 is prime! Time : 228.610 sec.
GPU: 1000065*2^390927-1 is prime! Time : 469.702 sec.GTX260-192core
GPU: 1000065*2^390927-1 is prime! Time : 371.171 sec. GTX460SE
CPU: 1000065*2^390927-1 is prime! Time : 195.242 sec.
GPU: 313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 1176.601 sec. GTX260-192core
GPU: 313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 940.773 sec. GTX460SE
CPU: 313*2^1012240+1 is not prime. Proth RES64: 5FA128A9BECBCDD3 Time : 761.050 sec.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Larry, was HT on or off?
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
can anyone test some SoB numbers? they're pretty high now.
____________
wbr, Me. Dead J. Dona
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
can anyone test some SoB numbers? they're pretty high now.
Please have also a look in this thread http://www.mersenneforum.org/showthread.php?t=14608.
IIRC, one or two user posted some times and are in the timeframe between 3 and 5 days for one unit.
[edit]
link clickable
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Larry, was HT on or off?
I'm thinking on since I've never made any attempt to figure out how to enable/disable it. I was also running BOINC on 3 cores, but that should still have kept things so HT could kick in. In any case it was a constant since that's how I have been running both machines.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
HT on/off is a bios-setting.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Do you really have to shut off HT? Can't you just run boinc/prpnet to use the real cores?
So if you have an i7 with 4 cores and 8 threads, set boinc to use 50% of the processors. This is what I do on my rig and I see no issues with it.
____________
|
|
|
|
Do you really have to shut off HT? Can't you just run boinc/prpnet to use the real cores?
So if you have an i7 with 4 cores and 8 threads, set boinc to use 50% of the processors. This is what I do on my rig and I see no issues with it.
brinktastee:
I agree, since I have boinc set to 38% to get 3 cores and then was running a single instance of PRPclient.
Rroonnaalldd, even after looking in the bios setting screens at start up I can't really tell if HT is set on or off. Gigabyte seems to have a different language describing what the choices are there (at least to me). They list turbo at 29xx mhz while what I take to be normal as 2798 mhz (the xx is me not remembering what it was exactly). So even if it was kicking in the difference from 2.8 ghz to 2.9 ghz seems minimal.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
@Larry
Gigabyte had in the past ever a hidden Bios menu. Try CTRL+F1 after entering the Bios...
@brinktastee
I believe that doesn't work as expected. There are many possibilities to login the cpu-cores and only the OS will knows which core is real or HT.
Your OS can login all cores as:
real,HT,real,HT,real,HT,real,HT.................Nehalem, Clarkdale on Windows
1real,2real,1HT,2HT................................seen with Pentium4 and HT on Windows
1real,2real,3HT,4HT,1HT,2HT,3real,4real...seen on MAC OS X with Nehalem
For Windows i found the tool MPDetect. On linux you have "numactl" by Andi Kleen or the normal "cat /proc/cpuinfo"
With MPDetect you should see something like this:
Core2Duo E6600
the same Core2Duo E6600 but with older Bios-version
In comparison a Core2Quad Q6700
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
@Rroonnaalldd
@Larry
Gigabyte had in the past ever a hidden Bios menu. Try CTRL+F1 after entering the Bios...
I went back into BIOS and managed to go about 2 levels further down than I was looking previously and got to three items that seem to answer your earlier question.
item [setting]
Intel Turbo boost Tech. [AUTO]
CPU cores enabled [ALL]
CPU Multi-thread [ENABLED]
===> with a note to the side indicating that CPU hyper-threading is enabled.
So I am interpreting that as HT is enabled in the original BIOS that comes with the motherboard since I just put all the parts together and if it runs when I push the button I'm happy.
Larry
____________
|
|
|
|
sorry but I have not really understood where I was to put the files llrcuda etc. ....
// This is the name of LLR executable. On Windows, this needs to be
// the LLR console application, not the GUI application. The GUI
// application does not terminate when the PRP test is done.
// On some systems you will need to put a "./" in front of the executable
// name so that it looks in the current directory for it rather than
// in the system path.
// LLR can be downloaded from http://jpenne.free.fr/index2.html
llrexe=llrCUDA.exe
llrexe=llr.exe
____________
|
|
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
sorry but I have not really understood where I was to put the files llrcuda etc. ....
// This is the name of LLR executable. On Windows, this needs to be
// the LLR console application, not the GUI application. The GUI
// application does not terminate when the PRP test is done.
// On some systems you will need to put a "./" in front of the executable
// name so that it looks in the current directory for it rather than
// in the system path.
// LLR can be downloaded from http://jpenne.free.fr/index2.html
llrexe=llrCUDA.exe
llrexe=llr.exe
The client only accepts one of these, which will be the last one specified. |
|
|
|
Hello,
I would like to try the cuda app, but I have never used prp before,
could anyone please give me detailed instructions on how to run it on my gpu? (I'm running win7)
I tried installing it, but it quits after barely starting, the window disappears like 0.5 seconds after opening |
|
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
Hello,
I would like to try the cuda app, but I have never used prp before,
could anyone please give me detailed instructions on how to run it on my gpu? (I'm running win7)
I tried installing it, but it quits after barely starting, the window disappears like 0.5 seconds after opening
Run the client from the command line (DOS prompt). It will output any errors that it finds in the prpclient.ini file. |
|
|
|
Hello,
I would like to try the cuda app, but I have never used prp before,
could anyone please give me detailed instructions on how to run it on my gpu? (I'm running win7)
I tried installing it, but it quits after barely starting, the window disappears like 0.5 seconds after opening
Tom, I Private Messaged you some of my setting that I used to get it working. Hope it helps.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I tried installing it, but it quits after barely starting, the window disappears like 0.5 seconds after opening
"readme_primegrid.txt" wrote: Sample settings:
server=PPSE7171:100:2:uwin.mine.nu:7171
server=PPSE10K:0:20:uwin.mine.nu:10000
server=SGS:0:1:uwin.mine.nu:8181
This tells the client to only get work from the Proth Prime Search Extended port 7171. However, you can select other combinations such as 10, 60, 30. This would provide 10% work from PPSE7171, 60% work from PPSE10K, and 30% work from SGS. You can make any combinations you like. Just make sure all numbers add up to 100. :)
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
llrCUDA - 32 bit version... was just wondering (again) if a 32 bit version was in the works or even being considered.
____________
@AggieThePew
|
|
|
|
Windows 64 bit ONLY for now.
Any time frame for Linux? Updates?
Thanks,
BT
____________
|
|
|
BiBi Volunteer tester Send message
Joined: 6 Mar 10 Posts: 151 ID: 56425 Credit: 34,290,031 RAC: 0
                   
|
Any time frame for linux builds? |
|
|
|
Hey, so what's happening with this? Any chance a win32 build will be available soon (or ever)?
I wish I could use/test this right now, but I can't. :(
____________
|
|
|
BiBi Volunteer tester Send message
Joined: 6 Mar 10 Posts: 151 ID: 56425 Credit: 34,290,031 RAC: 0
                   
|
Can we use llrCUDA on PRPNet and are the candidates double tested? |
|
|
|
I guess this development has come to a stop?
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I guess this development has come to a stop?
Did you read the thread llrCUDA at mersenneforum?
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
I guess this development has come to a stop?
Did you read the thread llrCUDA at mersenneforum?
Not the complete thread, because:
* I'd expect the most updated information om the PG forum.
* Even on mersenneforum, the oldest post is also almost from 3 months ago.
So the question more or less remains: Is there still any development on this project and can we expect official Cuda support in BOINC for PG?
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 929 ID: 3110 Credit: 236,594,205 RAC: 10,765
                           
|
llrCUDA - 32 bit version... was just wondering (again) if a 32 bit version was in the works or even being considered.
FYI, tried and failed, so this is very unlikely. There's some problem with the code when it's compiled for 32 bits, but I have no clue what it is. I suspect only Jean Penne could find it.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Ken do you have compiled version for Boinc and PRPnet?
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
LlrCUDA is working in PRPNet. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
LlrCUDA is working in PRPNet.
I found llrCUDA only for win64. I tried to compile the linux64-version from source and lost this game. The file "cutil_inline.h" could not be found...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Yes, only for win 64 bit by now. |
|
|
|
I had installed the app and fixed the ini but somehow on 121 units constant started with the normal llr.exe
Does this mean that those units must be done by the llr.exe ?
Anyway i put // for the llr.exe in the ini and started again
Now the llrCUDA finally kicks in what you guys think after some inital slow start.
I think when everything runs smooth a port to boinc would be awesome
And i think this especially in the Seventeen Or Bust would make a huge impact
I saw people posting huge calculations getting done with awesome speeds
As i am watching it run i see the gpu load about 94%
But what is impressive is the load on the memory controller its almost constant at 55 to 58%
Before one ask you can see gpu mc load in a tool like aida 64 which has added this future in its recent product.
Sadly the test versions are limited so i am not sure if the test version will show this |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
I had installed the app and fixed the ini but somehow on 121 units constant started with the normal llr.exe
Does this mean that those units must be done by the llr.exe ?
No, the PRPclient will use whatever llr you have listed in the llrexe= field. Also, the following PRPNet projects can definitely use llrCUDA:
- Proth Prime Search
- 27121 Prime Search
- extended Sierpinski Problem
SGS might be possible but we haven't tested the k limits of llrCUDA.
Currently, Rytis is attempting to implement a 64 bit Linux version of llrCUDA in BOINC. Additionally, he'll be adding a CUDA checkbox to the PPS (LLR) project. PPS (LLR) has been selected ONLY FOR TEST purposes. Once confirmed that llrCUDA and checkbox will work in BOINC, the SoB and PSP will be the first two projects added.
____________
|
|
|
|
Well i just noticed the server=121:25:1:prpnet.primegrid.com:12001 started only llr.exe even though i have put the correct settings to the ini.
llrexe=llrCUDA.exe
//llrexe=llr.exe
After i disabled the llr.exe in the ini it completely disregarded previous done work and started all over from scratch.
From what i can see it seems to go faster though then previous work done on the llr.exe application.
but time will tell the truth :D
Anyway when this unit is finished i will let it run on the other ports mentioned
Just am wondering why it did not want to start with the cuda version, and why previous work got deleted by the app
Yes it looks pretty promising this far, but maybe some more testing is needed before it gets widespread over boinc. |
|
|
|
I also tested llrCuda with primes k*2^n-1 on prpnet servers by NPLB and TPS. Both running fine but need up to a half cpu-core.
In 27121 Prime Search on port 12001 and 12006, at my tests llrCuda needs a full cpu-core. Have anyone else tested this and have a comparable result?
Just am wondering why it did not want to start with the cuda version
I also note this. If the normal llr.exe is set in the ini file, the prpnet client prefers this. I suppose, because llrCuda is still in beta development.
and why previous work got deleted by the app.
Because it's another app.
Regards Odi
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
and why previous work got deleted by the app.
Because it's another app.
The same behaviour was also viewable with different genefer-entries. I posted about this in another thread.
Pretty uncool if you restart the client and the calculation starts from scratch because the first app reaches its max-range or a round off error occurs and the job must be done by the next entry in the app-list.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
BiBi Volunteer tester Send message
Joined: 6 Mar 10 Posts: 151 ID: 56425 Credit: 34,290,031 RAC: 0
                   
|
I used llrCUDA on PRPnet; I think it consumes a lot of energy and it also uses one core to keep the GPU busy. Maybe the newer CUDA cards consume less energy.
It would be nice to use llrCUDA for SOB tasks. ;)
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I used llrCUDA on PRPnet; I think it consumes a lot of energy and it also uses one core to keep the GPU busy. Maybe the newer CUDA cards consume less energy.
One core to keep the gpu busy? Pretty uncool.
I have the card from the thread run times with a GTS450 "eco".
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
BiBi Volunteer tester Send message
Joined: 6 Mar 10 Posts: 151 ID: 56425 Credit: 34,290,031 RAC: 0
                   
|
I have the card from the thread run times with a GTS450 "eco".
These are runtimes for CUDA sieves. You might try the CUDA llr on PRPnet. It's a cool way to heat a room ;)
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
You might try the CUDA llr on PRPnet. It's a cool way to heat a room ;)
BiBi please read my postings from september, 16th...
PS: The file http://pgllr.mine.nu/PRPNet/prpclient-4.3.5-linux_64-gpu.7z contains only genefercuda...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
We are currently "alpha" testing a native BOINC llrCUDA Linux 64 bit app. We hope to have more news in a day or so.
____________
|
|
|
BiBi Volunteer tester Send message
Joined: 6 Mar 10 Posts: 151 ID: 56425 Credit: 34,290,031 RAC: 0
                   
|
LlrCUDA is working in PRPNet.
I found llrCUDA only for win64. I tried to compile the linux64-version from source and lost this game. The file "cutil_inline.h" could not be found...
I think i have the same sources. I succeeded building it. I cannot remember how. It might be a good portion of determination ...
Did you succeed also?
Edit: it is a different source. compiling boinc is nog that easy... good luck. :-) |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3114 ID: 50683 Credit: 76,797,694 RAC: 7,338
                       
|
Can you share build you made?
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
314187728^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I found llrCUDA only for win64. I tried to compile the linux64-version from source and lost this game. The file "cutil_inline.h" could not be found...
I think i have the same sources. I succeeded building it. I cannot remember how. It might be a good portion of determination ...
Did you succeed also?
Edit: it is a different source. compiling boinc is nog that easy... good luck. :-)
For the boinc stuff, you seems to need the boinc-sources too.
I tried it with different sources from the mersenneforum. Success? No... PEBKAC, error-40 or something else.
Maybe it has something to do with my outdated DotschUX1.2-installation because i have also a problem with the milkyway-app for nvidia-gpu's. This is an opencl-app and need at least glibc2.11 or newer.
I am interested too in a precompiled version. In most cases i use http://ifile.it for short time uploads, easy and without flash.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
We are currently "alpha" testing a native BOINC llrCUDA Linux 64 bit app. We hope to have more news in a day or so.
We are waiting on Rytis to make a few changes. Hopefully we'll be able to test soon.
____________
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
The initial testing has completed with no success. However, a few bugs were worked out so we're one step closer.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Does anybody have a precompiled llrCUDA-app for linux64?
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Few questions, as I've been following development of llrCUDA, at least from a distance, since it was first announced:
- are we still limited to exclusively win64? I see talk further in the thread of linux64, but I have neither.
- does 32-bit simply present too much of a bottleneck due to the rate at which an app as calculation-heavy as LLR must send and receive instructions? That is, when communicating its appstate, it is simply more efficient to use 64-bit arch instead of 32? If that's the case then okay, makes perfect sense!
- I seem to remember reading about the architecture of ATi cards and why the setup of their stream processors is somewhat more conducive to intensely linear non-parallel algorithms (ie Collatz) - but we've seen that CUDA is better at sieving by a long shot. Is there talk of getting the ATi crowd into PRPNet through a future port of LLR to CAL/Brook or OpenCL? (could test OpenCL using next-gen NViDIA cards too, for bugtesting as they are not very fast with OpenCL at all)
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Does anybody have a precompiled llrCUDA-app for linux64?
Done...
- does 32-bit simply present too much of a bottleneck due to the rate at which an app as calculation-heavy as LLR must send and receive instructions? That is, when communicating its appstate, it is simply more efficient to use 64-bit arch instead of 32? If that's the case then okay, makes perfect sense! No. The thread llrCUDA at mersenneforum.org describes the actual situation.
The 32bit-app produces different errors while the same code for the 64bit-app runs flawlessly. There is some hope that the developer of Prime95, Jean Penné, has a solution...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
That would be great, if llrCUDA ends up able to run at 32 bit systems. I'm limited by a farm of very slow (Pentium 4) CPUs but run a GTX460 and plan on adding more GPUs, mainly NViDIA) - interesting, though, to think that if so many more people were able to run llrCUDA how fast the buffers would run dry on certain prpnet ports.
Thanks for the answer.
I suppose there's no imminent chance of a Mac CUDA app? Especially given the last Macs to ship with NViDIA chips were mid- to late-2010, I can understand why it might seem pointless. Still, a 330M would tackle a long 27121 or ESP WU in half or less than half the time of an i5, I guess.
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 674 ID: 845 Credit: 2,561,991,810 RAC: 434,430
                           
|
Still, a 330M would tackle a long 27121 or ESP WU in half or less than half the time of an i5, I guess.
A 330M won't be able to run llrCUDA at all, as its compute capability is only 1.2, while 1.3 is needed for double precision support.
____________
|
|
|
|
Ah I somehow forgot llrCUDA is DP-only. So there really is no point in making a Mac application for CUDA, then.
____________
|
|
|
|
Ah I somehow forgot llrCUDA is DP-only. So there really is no point in making a Mac application for CUDA, then.
The Mac Pro comes with the ATI Radeon HD 5870 1GB. This is a DP card, right?
Edit: Cuda not OpenCL, ok never mind.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 929 ID: 3110 Credit: 236,594,205 RAC: 10,765
                           
|
The 32bit-app produces different errors while the same code for the 64bit-app runs flawlessly. There is some hope that the developer of Prime95, Jean Penné, has a solution... Close. Jean Penné is the developer of LLR, who might be able to find a solution. (But I doubt he's looking.) George Woltman is the developer of Prime95.
____________
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
The initial testing has completed with no success. However, a few bugs were worked out so we're one step closer.
There will be no further testing until after the conclusion of the Challenge (06 Oct 2011).
____________
|
|
|
|
As i remember one comment from a gpu guru.
The problem with gpu computing was that normal integer calculations is not the strongest point for these for this function a cpu is still much more effective.
ATI is ofcourse way behing nvidia on the gpu processing simply because they had to start from scratch while nvidia already had made cuda.
But since some good guy found a way to use their gpu was able todo some work lightning fast they stepped on the moving train which we call gpu processing as well.
So the primary language cal was not anything near cuda in terms of usabillity for do the somewhat complexer tasks
Then ati decided that opencl was probably a good bet, which would make gpu processing for all software developers a change to be on the same train.
Do not forget opencl is a standard where also nvidia has a lot to say, as the biggest and richest videocard maker.
But at the moment we see that even though it is becoming better every year still is behind the cuda development. In fact i blame nvidia for not helping much in the opencl development cause they rather want people to adopt the non open cuda standard
Nevertheless i still think that in the near future we probably have nice goodies coming for us with the raw power gpu's has to offer all of us |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
According to a post by "crunch3r". OpenCL for nVidia and ATI/AMD is not fully compatible, sometimes you need to write different code-paths.
OpenCL is not a native GPU-language like CAL or CUDA, this means performance impacts.
Why should nVidia make OpenCL better? Cuda is approved since years, works as expected and you find unlimited developer-guides from nVidia and other sources. Another point was, the nVidia guys listened to the wishes of scientific people and scientists. They want ECC for the memory and some other things, they got it with the Tesla 2000 series.
...and what is with ATI/AMD? Did you find any developer-guides? ATI/AMD seems to have the most powerful hardware, but they invested only littlest time in driver or a sdk or they simple wait on a big player (Intel released a beta-version of an OpenCL-sdk for their HD2000/HD3000-gpu's) to speed up this "OpenCL-thing".
Take a look at the actual situation. CAL is a one way and OpenCL in newer Catalyst-versions is not available for all gpu's (only beta-support for HD4000 series, no support for the HD3000 series). IMO, this is a shame for the HD38XX, HD4770 and HD48XX-gpu's, they all are DP-capable and could theoretically calculate also a llr-unit on GPU...
[add]
add HD series
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
lol yes in some points your right
But again you expect a relative small company like amd which hardly make a profit, can really spend so much money as intel and nvidia do
Your talking about companies who are much bigger and absolutely have huge bank accounts. Most forget that even though amd have some impact on the server and personal pc market.
The big money is still in the companies market and if you would check what is being sold to them you see its almost allways intel/nvidia combinations, why simple they give huge bonuses to dell/hp and other companies to sell their product.
Again we are talking about the 2 pretty rich companies, they simply can spend much more on advertisement and R&D
So yes amd can not spend billions on developing besides competing they need to keep up with these giants
Have you seen how much money amd lost in the previous years, they just stayed alive because some very wealthy investors spend lot of money into amd/ati to keep it on the market.
We do not want a standard based on one brand, do you think that is good for us people. I know for sure amd does not have the means to spend much on develop a cuda like language.
And ofcourse nvidia do not want opencl to succeed. They simply want to be the only one selling THE SOLUTION.....
Thats also why opencl is not competing, again why simply because nvidia has a huge finger in it here.
But i think we as consumers could do them a favor by not using cuda/cal anymore and stick on opencl ;).
The mass does choose where it goes with these things if more and more go on the same standard the more effort is done in the one we want.
In a way i am kinda certain that they not really can compete on all projects simply because both made a different approach on how things gets done
I think in some projects the nvidia cards will allways be the faster, and on other projects ati will beat them. Not just because nividia has the better language but simply by design
Yes hd 3000 and lower had no real design for processing the fact that some still can be used by applications is more luck then intention.
Thats the moment AMD stepped on the gpu processing train all because one person showed the world what could be possible with a ati gpu.
We should cherrish that moment or we still have only seen nvidia on the gpu processing market for years.
Collatz text : Gipsel/Cluster Physik - ATI application guru without whom there would be no ATI app (or it would run at CPU speeds)
I shall explain why i keep pushing any and all that way, we consumers need the competition if amd falls out of the market we have a huge problem
Because there is totally nothing left if that happens, you be on the mercy of what those 2 giants want. We are at this point now just because amd allways made intel the giant start develop new products. If amd had not done so we probably still was using a pentium 4 or worse.
On the gpu side it was allways ati who kicked against the nvidia giant who made them develop over and over faster products, yes there where some others but mostly ati kicked their balls.
So without themn the world would have looked different. |
|
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
Don't forget that Apple is a big supporter of OpenCL. |
|
|
mfbabb2 Volunteer tester
 Send message
Joined: 10 Oct 08 Posts: 510 ID: 30360 Credit: 17,180,154 RAC: 38,236
                     
|
Comparing OpenCL to Cuda is like comparing Java to C++ (or maybe even assembler) -- the latter will always outperform (blow away) the former. IMHO
____________
Murphy (AtP)
|
|
|
|
maybe it's unrelated to cuda exactly - but how can I disable CPU workunits completely?
if i left only Gpu checkboxes after save pps have a cpu checkbox selected.
____________
wbr, Me. Dead J. Dona
|
|
|
|
maybe it's unrelated to cuda exactly - but how can I disable CPU workunits completely?
if i left only Gpu checkboxes after save pps have a cpu checkbox selected.
You have to have at least 1 cpu project checked in the pg preferences. If you want to disable the cpu tasks, uncheck the allow cpu box at the top of your preferences and make sure the correct gpu allow box(s) are checked.
The other option is to control the pc side by changing the boinc pref to not allow cpu jobs.
____________
@AggieThePew
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Gives you only GPU-work...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 4,797
                              
|
I haven't followed the progress of this in a while, so forgive me if this has been answered somewhere. I looked but couldn't find an answer.
I keep getting an error when trying to get LLR CUDA WUs. Is there no LLR CUDA work available, or is something not working correctly?
I'm trying to run the LLR CUDA app from the BOINC side. I have USE NVIDIA GPU checked, and PROTH PRIME SEARCH (LLR) CUDA checked.
If I also select PPS (Sieve) CUDA, I receive (and crunch) the PPS sieve CUDA WUs without trouble. But if only the PPS (LLR) CUDA box is checked, I get this error message in the logs:
27-Oct-2011 13:42:05 [PrimeGrid] Sending scheduler request: Requested by user.
27-Oct-2011 13:42:05 [PrimeGrid] Requesting new tasks for GPU
27-Oct-2011 13:42:08 [PrimeGrid] Scheduler request completed: got 0 new tasks
27-Oct-2011 13:42:08 [PrimeGrid] Message from server: No work sent
27-Oct-2011 13:42:08 [PrimeGrid] Message from server: No work available for the applications you have selected. Please check your project preferences on the web site.
The environment is as follows:
11/1/2011 10:49:09 AM | | Starting BOINC client version 6.12.34 for windows_x86_64
11/1/2011 10:49:09 AM | | Config: report completed tasks immediately
11/1/2011 10:49:09 AM | | Config: don't compute while Wow.exe is running
11/1/2011 10:49:09 AM | | Config: don't use GPUs while Wow.exe is running
11/1/2011 10:49:09 AM | | Config: don't use GPUs while wmplayer.exe is running
11/1/2011 10:49:09 AM | | Config: don't use GPUs while CivilizationV.exe is running
11/1/2011 10:49:09 AM | | Config: don't use GPUs while CivilizationV_DX11.exe is running
11/1/2011 10:49:09 AM | | Config: GUI RPC allowed from:
11/1/2011 10:49:09 AM | | Config: 192.168.1.2
11/1/2011 10:49:09 AM | | Config: 192.168.1.32
11/1/2011 10:49:09 AM | | Config: 192.168.1.8
11/1/2011 10:49:09 AM | | Config: 192.168.1.9
11/1/2011 10:49:09 AM | | Config: 192.168.1.12
11/1/2011 10:49:09 AM | | log flags: file_xfer, sched_ops, task, benchmark_debug, cpu_sched
11/1/2011 10:49:09 AM | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5
11/1/2011 10:49:09 AM | | Data directory: C:\ProgramData\BOINC
11/1/2011 10:49:09 AM | | Running under account Mike
11/1/2011 10:49:09 AM | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7]
11/1/2011 10:49:09 AM | | Processor: 4.00 MB cache
11/1/2011 10:49:09 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 syscall nx lm vmx tm2 pbe
11/1/2011 10:49:09 AM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00)
11/1/2011 10:49:09 AM | | Memory: 8.00 GB physical, 16.00 GB virtual
11/1/2011 10:49:09 AM | | Disk: 931.51 GB total, 399.55 GB free
11/1/2011 10:49:09 AM | | Local time is UTC -4 hours
11/1/2011 10:49:09 AM | | NVIDIA GPU 0: GeForce GTX 460 (driver version 27533, CUDA version 4000, compute capability 2.1, 962MB, 605 GFLOPS peak)
11/1/2011 10:49:09 AM | PrimeGrid | URL http://www.primegrid.com/; Computer ID 173068; resource share 100
11/1/2011 10:49:09 AM | | General prefs: from http://bam.boincstats.com/ (last modified 21-Jan-2010 13:01:12)
11/1/2011 10:49:09 AM | | Host location: none
11/1/2011 10:49:09 AM | | General prefs: using your defaults
11/1/2011 10:49:09 AM | | Reading preferences override file
11/1/2011 10:49:09 AM | | Preferences:
11/1/2011 10:49:09 AM | | max memory usage when active: 4095.23MB
11/1/2011 10:49:09 AM | | max memory usage when idle: 7371.41MB
11/1/2011 10:49:09 AM | | max disk usage: 10.00GB
11/1/2011 10:49:09 AM | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
11/1/2011 10:49:09 AM | | Not using a proxy
Note that the same results occurred with 6.10.58.
Also, I'm crunching TRP LLR on the CPU side, so this isn't a situation where BOINC is acting funny because no CPU projects are checked. I've got TRP enabled on the CPU.
Thanks in advance,
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
AFAIK there is no work on llr cuda on the boinc side. That check box is for the internal tests. At least for now. ;) |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 4,797
                              
|
AFAIK there is no work on llr cuda on the boinc side. That check box is for the internal tests. At least for now. ;)
Thanks. In that case, I shall stop my attempts at getting it to work. :)
____________
My lucky number is 75898524288+1 |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
The non-boinc 64bit-version of llrCUDA works without problems since may or june this year and can be used with the PRPclient.
We have only two bigger app-problems. Firstly, get the Boinc-side to work too (app causes a seg-fault or an ELF-message in the log). Secondly, solving some weird problems (all sorts of errors, roundoff and such) on 32bit-systems (this is confirmed for linux32 and untested on win32, but this should not differ)...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 4,797
                              
|
What kind of speeds are you seeing with LLR-CUDA?
____________
My lucky number is 75898524288+1 |
|
|
|
Sorry to keep pushing but - when can we expect a Win32 version of the app to be pushed out, at very least for alpha testing? cos I'd totally want to help with that. I'm great at knowing when my computer's doing something wrong.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Sorry to keep pushing but - when can we expect a Win32 version of the app to be pushed out, at very least for alpha testing? cos I'd totally want to help with that. I'm great at knowing when my computer's doing something wrong.
The problem on linux32 hosts (untested on win32, but this should differ not much) was posted by Ken on "24 Jun 11, 02:12", confirmed by msft on "26 Jun 11, 08:27" and is still unsolved...
What kind of speeds are you seeing with LLR-CUDA?
I tested only "30*2^400000+1" and "1000065*2^390927-1" to hold the times as low as possible.
GTS 450 eco: 30*2^400000+1 is not prime. Proth RES64: 703F8ACAB9634020 Time : 318.324 sec.
Conroe E6600: 30*2^400000+1 is not prime. Proth RES64: 703F8ACAB9634020 Time : 143.041 sec.
GPU: 1000065*2^390927-1 is prime! Time : 522.220 sec.
CPU: 1000065*2^390927-1 is prime! Time : 257.142 sec.
I think the times are not as bad as they are to seem in the first moment.
Firstly, all nVidia consumer cards (GeForce) are penalized in DP-capability. DP is only 1/4 (GF1X0) to 1/12 (GF1X4) of the SP-performance. Secondly, the clockrates of the GPU (especially the core, the shader rate is only doubled the core) are much lower.
Other and faster timings are posted by "mdettweiler" for a GTX 460 and a Core 2 Duo E4500 @2.2GHz.
gary@herford:~/Desktop/gpu-stuff/llrcuda$ ./llrCUDA -d -q173061*2^483047-1
Starting Lucas Lehmer Riesel prime test of 173061*2^483047-1
Using real irrational base DWT, FFT length = 131072
V1 = 3 ; Computing U0...done.
173061*2^483047-1 is not prime. LLR Res64: A0E9B4B336A69FDA Time : 419.431 sec
[2011-02-08 01:40:16 EST] Candidate: 173061*2^483047-1 Program: cllr.exe Residue: A0E9B4B336A69FDA Time: 515 sec
His conclusion was, llrCUDA seems to shine on large base 2 numbers. He want to test SOB and PSP but he posted no newer runtimes...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
(...)
I think the times are not as bad as they are to seem in the first moment.
Firstly, all nVidia consumer cards (GeForce) are penalized in DP-capability. DP is only 1/4 (GF1X0) to 1/12 (GF1X4) of the SP-performance. Secondly, the clockrates of the GPU (especially the core, the shader rate is only doubled the core) are much lower.
The first quote about the DP-capability is correct (okay, i have read something about 1/8 for GF1x0, 1/2 for Tesla M20x0), the second quote is wrong.
A GTX 480 has a core frequency of 607 MHz, a GTX 580 772 MHz. The corresponding Tesla M2050/M2070 and M2090 have 575 MHz and 650 MHz.
But my other question is: what happened to the source? http://pgllr.mine.nu/software/LLR is not available.
I only have some old code llrcuda.0.21, but that does not seem to work:
Nvidia GTX 260 216
# export LD_LIBRARY_PATH=/tmp/; ./llrCUDA -d -v
Primality Testing of k*2^n+/-1 Program - Portable C Version 3.7.1d
# export LD_LIBRARY_PATH=/tmp/; ./llrCUDA -d -q"347*2^1041223+1"
Starting Proth prime test of 347*2^1041223+1, FFTLEN = 131072 ; a = 3
347*2^1041223+1 is not prime. Proth RES64: 5065D78353EB6D7D Time : 1220.348 sec.
Intel Core i5-2400
# ./primegrid_sllr_3.8.6_i686-pc-linux-gnu -d -v
Primality Testing of k*b^n+/-1 Program - Version 3.8.6
# ./primegrid_sllr_3.8.6_i686-pc-linux-gnu -d -q"347*2^1041223+1"
Starting Proth prime test of 347*2^1041223+1
347*2^1041223+1 is prime! Time : 616.398 sec.
compiled with -arch=sm_13 and
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
But my other question is: what happened to the source? http://pgllr.mine.nu/software/LLR is not available.
I only have some old code llrcuda.0.21, but that does not seem to work:
The latest source is here: llrcuda.0.60.tar.bz2
____________
|
|
|
|
grabbed it, works, thanks! |
|
|
|
Nvidia GTX 460 (768 MB)
# ./llrCUDA -d -q"347*2^1041223+1"
Starting Proth prime test of 347*2^1041223+1
347*2^1041223+1 is prime! Time : 861.061 sec.
Intel Xeon W3520 (essentially a Core i7-920)
# ./primegrid_sllr_3.8.6_i686-pc-linux-gnu -d -q"347*2^1041223+1"
Starting Proth prime test of 347*2^1041223+1
347*2^1041223+1 is prime! Time : 834.134 sec.
Not too fast and uses 50% of one core. HT enabled in BIOS, but nothing else ran on the CPU so it should have been as if HT was off. That is you should double the runtime on the CPU for full load of 8 WUs.
As could be seen here the Sandy Bridge is faster, clock-equalized the Xeon W3520 would have completed in around 727 s whereas the Sandy Bridge was done in 616 s. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
nVidia GTS450 eco # ./llrCUDA64 -d -q"347*2^1041223+1"
Starting Proth prime test of 347*2^1041223+1
Using complex irrational base DWT, FFT length = 131072, a = 3
347*2^1041223+1, bit: 7000 / 1041231 [0.67%]. Time per bit: 1.769 ms.
347*2^1041223+1, bit: 14000 / 1041231 [1.34%]. Time per bit: 1.477 ms.
347*2^1041223+1 is prime! Time : 1533.712 sec.
Conroe Core2Duo E6600# ./primegrid_sllr_3.8.6_i686-pc-linux-gnu -d -q"347*2^1041223+1
Starting Proth prime test of 347*2^1041223+1
Using all-complex Core2 type-3 FFT length 64K, Pass1=256, Pass2=256, a = 3
347*2^1041223+1, bit: 7000 / 1041231 [0.67%]. Time per bit: 0.988 ms.
347*2^1041223+1, bit: 14000 / 1041231 [1.34%]. Time per bit: 0.975 ms.
347*2^1041223+1 is prime! Time : 1008.579 sec.
(...)
I think the times are not as bad as they are to seem in the first moment.
Firstly, all nVidia consumer cards (GeForce) are penalized in DP-capability. DP is only 1/4 (GF1X0) to 1/12 (GF1X4) of the SP-performance.
Secondly, the clockrates of the GPU (especially the core, the shader rate is only doubled the core) are much lower.
The first quote about the DP-capability is correct (okay, i have read something about 1/8 for GF1x0, 1/2 for Tesla M20x0), the second quote is wrong.
A GTX 480 has a core frequency of 607 MHz, a GTX 580 772 MHz. The corresponding Tesla M2050/M2070 and M2090 have 575 MHz and 650 MHz. Why is my second quote wrong?
GTX480 core = 607MHz then the shader clock would be 1214MHz and this is only half the clockrate of my slow Core2Duo.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Ah you were comparing to the CPU, that is apples and oranges. I thought out of context you compared the core freq of the consumer series Nvida cards to the Tesla/Quadro-line. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Yes, i compared apples with oranges...
OT:
Are you a member of the nVidia registered developer program?
I ask because i saw your posting about "New LLVM-based compiler delivers up to 10% faster performance for many applications in Cuda4.1".
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
llrCUDA is in limbo currently. We've been unsuccessful in building a native BOINC version. Therefore, running it through the wrapper is the only other option. Unfortunately, the wrapper needs to be modified to accept llrCUDA, and we have not been successful in accomplishing that either. :(
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
My integration of version 0.83 in Ken's llrcudaboinc-sources is ready for usage.
All compiler warnings are also solved...
llrcudaboinc.083
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
If anybody need some computation times of llrCUDA64, here they are: WU 237449934 and WU 237450218
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Wow, nice to see it's moving forward. |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
If anybody need some computation times of llrCUDA64, here they are: WU 237449934
Good progress, validated...still long way to go.
GTS 450 ~450GFLOPs (1730 sec GPU + 820 CPU), that's ~50Wh + 7,5Wh or ~55Wh total per WU.
vs my host i5-2500 ~95GFLOPs at stock speed (1020 sec CPU), that's 95W/4*1020/3600 or ~7Wh per WU.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
If anybody need some computation times of llrCUDA64, here they are: WU 237449934
Good progress, validated...still long way to go.
GTS 450 ~450GFLOPs (1730 sec GPU + 820 CPU), that's ~50Wh + 7,5Wh or ~55Wh total per WU.
vs my host i5-2500 ~95GFLOPs at stock speed (1020 sec CPU), that's 95W/4*1020/3600 or ~7Wh per WU.
Take the other workunit 237450218 and compared to the older Q6600 it looks a little bit better.
The biggest problem are the penalized DP-capabilities of all nVidia consumercards.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,702,802,045 RAC: 1,040,148
                                   
|
Take the other workunit 237450218 and compared to the older Q6600 it looks a little bit better.
The biggest problem are the penalized DP-capabilities of all nVidia consumercards.
Yeah, a bit :-)
(~11Wh)
I wonder how ATI would perform in LLR...
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Take the other workunit 237450218 and compared to the older Q6600 it looks a little bit better.
The biggest problem are the penalized DP-capabilities of all nVidia consumercards.
Yeah, a bit :-)
(~11Wh)
I wonder how ATI would perform in LLR...
...simply better. But first we need someone, who is able to write a native app...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
I wonder how can SoB benefit from llrCUDA.
They have really big n's now....
____________
wbr, Me. Dead J. Dona
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I wonder how can SoB benefit from llrCUDA.
They have really big n's now....
llrCUDA is still work in progress.
If you test this on a Tesla- or Quadro-card then you can see the GPU flying.
With nVidias consumer cards and their crippled DP-capabilities (DP is only 1/4 until 1/12 of SP) you need at least a GTX460 to see a speed improvement.
The GTX550 should produce the same error like GeneferCUDA on the same card.
This problem seems to have all cards with a 192bit memory interface and 1GB VRAM like the GTX460_V2...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
The GTX550 should produce the same error like GeneferCUDA on the same card.
This problem seems to have all cards with a 192bit memory interface and 1GB VRAM like the GTX460_V2...
I tested it some months ago at prpnet with my GTX550Ti, before I tested Genfercuda. Curiously, without Problems. I'd running PPSE (before the port retired), 121, 27 and also Riesel Prime and Twin Prime search on ports from NPLB.
Only the LLR search of Operation Megabit Twin failed, but I did not analysed this.
But runtimes and especially CPU utilization did not convince me, actually LLR runs much more efficient on CPU.
Regards Odi
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 4,797
                              
|
The GTX550 should produce the same error like GeneferCUDA on the same card.
This problem seems to have all cards with a 192bit memory interface and 1GB VRAM like the GTX460_V2...
I tested it some months ago at prpnet with my GTX550Ti, before I tested Genfercuda. Curiously, without Problems. I'd running PPSE (before the port retired), 121, 27 and also Riesel Prime and Twin Prime search on ports from NPLB.
Only the LLR search of Operation Megabit Twin failed, but I did not analysed this.
But runtimes and especially CPU utilization did not convince me, actually LLR runs much more efficient on CPU.
Regards Odi
Your observations make sense. As far as I can determine, the problem with the 550 TI isn't actually a malfunction. The GPU is just a little bit different, and in this particular instance (for reasons not fully understood) the code executes differently enough such that slightly more rounding errors occur.
Rounding errors are a correct and normal consequence of how these programs work. As long as you can ensure that the rounding errors don't actually change the numbers you're working with, that's fine. It's only when the rounding errors accumulate and get large enough to actually affect the significant portion of the numbers that there's a problem.
With small numbers (e.g., PPSELow), this doesn't make a difference, but with larger numbers (GFN or Megabit) the rounding errors are getting close to affecting the actual data. For some reason, the rounding errors seem to be slightly larger on the 550 TI. That "slightly" is just enough to make a difference when the numbers you're working with are pushing the precision limitations of the software.
____________
My lucky number is 75898524288+1 |
|
|
|
The GTX550 should produce the same error like GeneferCUDA on the same card.
This problem seems to have all cards with a 192bit memory interface and 1GB VRAM like the GTX460_V2...
I tested it some months ago at prpnet with my GTX550Ti, before I tested Genfercuda. Curiously, without Problems. I'd running PPSE (before the port retired), 121, 27 and also Riesel Prime and Twin Prime search on ports from NPLB.
Only the LLR search of Operation Megabit Twin failed, but I did not analysed this.
But runtimes and especially CPU utilization did not convince me, actually LLR runs much more efficient on CPU.
Regards Odi
Your observations make sense. As far as I can determine, the problem with the 550 TI isn't actually a malfunction. The GPU is just a little bit different, and in this particular instance (for reasons not fully understood) the code executes differently enough such that slightly more rounding errors occur.
Rounding errors are a correct and normal consequence of how these programs work. As long as you can ensure that the rounding errors don't actually change the numbers you're working with, that's fine. It's only when the rounding errors accumulate and get large enough to actually affect the significant portion of the numbers that there's a problem.
With small numbers (e.g., PPSELow), this doesn't make a difference, but with larger numbers (GFN or Megabit) the rounding errors are getting close to affecting the actual data. For some reason, the rounding errors seem to be slightly larger on the 550 TI. That "slightly" is just enough to make a difference when the numbers you're working with are pushing the precision limitations of the software.
As a side note, I think the root of the problem is actually a bit different in the case of Operation Megabit Twin candidates; I've tested llrCUDA myself before on numbers of all sorts of sizes (including others at n=1000000, which is where OMT searches), and they did fine. The tricky part with OMT is that it's a fixed-n search, which means k gets increasingly larger as the search progresses--much like PrimeGrid's Sophie Germain Search project. llrCUDA has a much lower maximum-k limit than CPU LLR, which is why it fails on present-level OMT candidates. (That is, it's an algorithmic limit of llrCUDA as presently written, rather than simply a point at which rounding errors become critically large as in GeneferCUDA.)
Regarding llrCUDA's speed, one thing to keep in mind is that GPU primality-testing programs tend to do best on large candidates. For GeneferCUDA, rounding errors keep it from working on small n's to begin with, so it's only used on large n's where the GPU boost is significant (typically, where the tests would take multiple days on a CPU and minutes or hours on a GPU depending on how fast it is). llrCUDA, however, *can* be used on large and small candidates alike, in that it will complete the test successfully and produce an accurate result; but the smaller the candidate, the less advantage it has over a CPU, to the point where sufficiently small candidates are actually slower on a GPU than a CPU. Basically, llrCUDA will see the biggest speed advantage on long tests like those done in the PSP, SoB, Woodall and Cullen subprojects (and the mega-prime search on PRPnet).
To put this into perspective with an example, in my earlier testing (on a slightly-factory-overclocked GTX 460 with 768 MB RAM), I found that the GPU was about as fast, throughput-wise, as two cores of a Q6600 combined in the vicnity of n=1M (1e6). By contrast, the same version (this was a few versions back) could complete SoB tests (n=25M or so) in less than two days! |
|
|
|
Thx a lot for you info Max. I supposed something in this direction at OMT, but never checked it.
Because of the low DP capabilities at the most of my cards, I never tried the "big ones", but already read about the improvements at this.
Regards Odi
____________
|
|
|
|
By contrast, the same version (this was a few versions back) could complete SoB tests (n=25M or so) in less than two days!
ZOMG.
Lets convince Jonh to run some test prpserver for SoBCuda ))))
____________
wbr, Me. Dead J. Dona
|
|
|
|
Why does the minimum driver 267.24? I had run it with 260.99 or something changed? |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Could have something to do with the used Cuda-version while compiling the app.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 4,797
                              
|
Could have something to do with the used Cuda-version while compiling the app.
Exactly. Each version of the CUDA Toolkit (2.3, 3.2, 4.0, 4.1, etc.) has a minimum driver that's required for code compiled with that toolkit. Anyone who wants to run a CUDA program must have the appropriate (or later) driver installed for the specific toolkit that was used to build the program.
There is no way around that restriction; the driver plays a critical role in CUDA programs. The driver is actually part of the CUDA compiler and is what creates the machine code that runs on the GPU when the program is run.
____________
My lucky number is 75898524288+1 |
|
|
|
is it possible to check primes from here http://www.mersenneforum.org/showpost.php?p=145066&postcount=1 with llrcuda?
____________
wbr, Me. Dead J. Dona
|
|
|
|
is it possible to check primes from here http://www.mersenneforum.org/showpost.php?p=145066&postcount=1 with llrcuda?
Yes, but llrCUDA will only be able to say that they're PRP (probably prime), which has already been verified in the case of these numbers. What's tricky about them is that they are not of the standard k*b^n+-1 form (the 1 at the end is something else) and thus the LLR test doesn't work on them; LLR can only perform a PRP test. As it turns out, the only way to prove full primality for these numbers is by using the ECPP method, which is much slower--to the point that a prime LLR could do in 30 seconds would take millions of years to prove with ECPP on a modern computer. Hence, why the progress of proving the remainder of the numbers listed there is rather gradual. |
|
|