| Author |
Message |
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
You read that right! I've managed to port PPSieve-CUDA to OpenCL and compile it with the ATI compiler!
The code base is very similar to PPSieve-CUDA, and will eventually be merged with it. But for now, if you have Linux and an ATI GPU, please download:
PPSieve-OpenCL (source, on the redcl branch)
And give it a shot with the usual test procedure:
32 bit: ./ppsieve-cl-boinc-x86-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
64 bit: ./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
It should output the following factors:
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Please provide the output from stderr.txt, especially in case of error. Please also provide as much detail about your system as possible, including your ATI GPU model number, driver version, and Stream Processing Unit clock speed if possible.
Also note that this first version doesn't use vectorized arithmetic, so it may be possible to make it ~1.5-2x faster. The first goal is to prove that this can run on an ATI GPU. :)
____________
|
|
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
|
sorry, my HD3200 isnt able to run an OpenCL-Kernel :(
____________
|
|
|
|
|
|
I entered this into the command line:
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
And got this:
./ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
No stderr.txt file was created.
I was using my ATI Mobility Radion 4650. I think that the driver version was 10.8
This was running off of my jumpdrive, using Ubuntu 10.04
I must say that being unfamiliar with Linux and running on an 8GB jumpdrive is causing me problems trying to test this.
____________
May the Force be with you always.
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
I'm not sure how much work is required to set up for running an OpenCL app. That's part of what I'm testing here.
I downloaded the ATI Stream SDK from here and went through this installation procedure (PDF). I'm hoping the full installation isn't necessary, and that you only need to grab the lib/x86/libOpenCL.so or lib/x86_64/libOpenCL.so file and put it in the same directory with my app. But I really don't know.
Edit: Also, the Stream SDK version was 2.1 when I downloaded it, and that's what I used to compile this, but they're serving 2.2 now. Hopefully, that won't make a difference.
____________
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
|
OS: Debian GNU Linux Squeeze
GPU: ATI HD 5750 1024MB
Driver: 10.7
SDK: ati-stream-sdk-v2.2-lnx64
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.0.1-alpha (testing)
Compiled Aug 28 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
# more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Getting Platforms. (clGetPlatformsIDs)
called boinc_finish
EDIT: I don't know if this is important. At the the beginning of the test i use the wrong terminal and test the app on a host with a nvidia-gpu (GTX260). The app runs without problems.
____________
|
|
|
|
|
|
O.K., I tested it again, with the libOpenCl.so from SDK V2.2. I get the same message:
./ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
Is it expecting the file to be in a location other than where it launches from?
____________
May the Force be with you always.
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
O.K., I tested it again, with the libOpenCl.so from SDK V2.2. I get the same message:
./ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
Is it expecting the file to be in a location other than where it launches from?
Linux search in the libarypath.
Have you set the 2 Variables from the instruction?
If not copy or link the lib to /usr/lib/ or set the 2 variables.
After that you can test with ldd:
ldd ppsieve-cl-boinc-x86_64-linux
linux-vdso.so.1 => (0x00007fff4e9ff000)
libOpenCL.so => /home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libOpenCL.so (0x00007f32533f9000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f32531d1000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f3252ebc000)
libm.so.6 => /lib/libm.so.6 (0x00007f3252c3a000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f3252a24000)
libc.so.6 => /lib/libc.so.6 (0x00007f32526c2000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3253600000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f32524be000)
librt.so.1 => /lib/librt.so.1 (0x00007f32522b6000)
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Looks like you'll (both/all) have to follow this entire installation procedure.
Don't forget to get this ICD registration file and extract it in the root directory, as root.
____________
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
Don't forget to get this ICD registration file and extract it in the root directory, as root.
Thanks for the hint. I extract this but not to /.
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.0.1-alpha (testing)
Compiled Aug 28 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
# more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070003407873 in ppcheck42070e9.txt
Thread 0 starting
Invalid MIT-MAGIC-COOKIE-1 keyDetected 4 multiprocessors (20 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 46.20 sec. (0.01 init + 46.18 sieve) at 147577 p/sec.
Processor time: 158.15 sec. (0.02 init + 158.13 sieve) at 43102 p/sec.
Average processor utilization: 1.10 (init), 3.42 (sieve)
called boinc_finish
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Looks like it was using your CPU, not your GPU. Probably because you're using an AMD processor. Try adding a "--device 1" to the end of the command line.
____________
|
|
|
|
|
|
So, out of curiosity, why is Linux the first platform to test an application on?
I would like to help test this ATI app, but right now I simply do not know enough Linux to actively do so. I am afraid that I must wait for a windows testing app, or learn a lot about Linux.
By the way, thanks Ken (and anyone else involved) for starting ATI development!
____________
May the Force be with you always.
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
So, out of curiosity, why is Linux the first platform to test an application on?
Because this is my main computer. No other reason.
____________
|
|
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
|
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cl-0.0.1-alpha (testing)
Compiled Aug 28 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
# more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Invalid MIT-MAGIC-COOKIE-1 keySIGSEGV: segmentation violation
Stack trace (10 frames):
./ppsieve-cl-boinc-x86_64-linux[0x414d8d]
/lib/libpthread.so.0(+0xef60)[0x7f0558473f60]
/home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libatiocl64.so(+0xca3b0)[0x7f055654f3b0]
/home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libatiocl64.so(+0x10b640)[0x7f0556590640]
/home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libatiocl64.so(clCreateCommandQueue+0x69)[0x7f0556533879]
./ppsieve-cl-boinc-x86_64-linux[0x410e15]
./ppsieve-cl-boinc-x86_64-linux[0x40e83e]
./ppsieve-cl-boinc-x86_64-linux[0x40ad5e]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f0557976c4d]
./ppsieve-cl-boinc-x86_64-linux[0x409fd9]
Exiting...
suspending the wus on gpu gives the same result
---
trying with other devices
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Invalid MIT-MAGIC-COOKIE-1 keyCreating Command Queue. (clCreateCommandQueue)
called boinc_finish
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Well, I'm stumped. A segfault is hard enough to debug when you do have access to the hardware it happens on. Since it doesn't happen on a CPU, I don't.
I was hoping that someone would come along with an Intel processor and AMD graphics card, run the test, and find that it worked fine there. Since that hasn't happened, I've uploaded more-or-less the same code compiled with debugging on. If that doesn't produce a useful error message, I'm afraid ATI/OpenCL is at a standstill for now.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
Would love to help out, but my ATI cards are all in WIN boxes...
____________
141941*2^4299438-1 is prime!
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
Would love to help out, but my ATI cards are all in WIN boxes...
Same here... i7 + HD5850, but Win7. :)
Is there any live CD that includes the required drivers or an easy guide how to install them in a live environment?
____________
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1905 ID: 352 Credit: 4,056,750,098 RAC: 4,353,900
                                 
|
Would love to help out, but my ATI cards are all in WIN boxes...
Same here as I wrote via PM.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Yeah, OK, it'll probably be easier to build a Windows version than to teach all of you how to use Linux. I'm working on it.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Alright, folks, I stuck a Windows executable in that zipfile. It works on my CPU, but I kind of expect it will also segfault on a real GPU.
But test away; maybe we'll get lucky somewhere.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13633 ID: 53948 Credit: 280,904,358 RAC: 40,710
                           
|
Alright, folks, I stuck a Windows executable in that zipfile. It works on my CPU, but I kind of expect it will also segfault on a real GPU.
But test away; maybe we'll get lucky somewhere.
Since it's OpenCL, is this supposed to work on NVIDIA also?
____________
My lucky number is 75898524288+1 |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
I get a "Null Platform found Exiting Application" message.
AMD Athlon x2 5600+
Win7 Enterprise 64-bit
HD 4650
Driver 10.7 (also tested with 10.5 with same error).
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
No; nVIDIA already has a version specifically made for it. You can try compiling it for nVIDIA if you want, but it's not likely to be faster than the CUDA version.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Scott, did you install the ATI Stream SDK? I think you have to do that. I haven't yet seen a way to run this app without the full installation.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Scott, did you install the ATI Stream SDK? I think you have to do that. I haven't yet seen a way to run this app without the full installation.
Figured that out after I posted :)
After the install, this is what I get:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 2
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Then the app crashes with no log file or factors found. (*note: device 0 = AMD cpu, device 1 = onboard AMD GPU - not OPENCL compatible). The application does run fine on the cpu.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
this is my experience.
download zip file. extract and run ppsieve-cl-x86-windows.exe
problem. 1) get pop-up, "Unable to locate OpenCL.dll"
answer: download and install ATI SDK
problem 2) running gives "unable to determing platforms" or similar message on console
answer: reboot machine (oops)
it then an successfully.. (however i dont have a GPU.. its an intel GMA )
so here are my CPU opencl results.
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Device 0 looks like an 4-core CPU, not a GPU. Adjusting.
Detected 4 multiprocessors (20 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
p=42070001310721, 21.84K p/sec, 3.85 CPU cores, 13.1% done. ETA 08 Sep 10:30
p=42070002621441, 19.69K p/sec, 4.08 CPU cores, 26.2% done. ETA 08 Sep 10:30
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070003932161, 18.62K p/sec, 3.90 CPU cores, 39.3% done. ETA 08 Sep 10:30
p=42070005242881, 18.67K p/sec, 3.95 CPU cores, 52.4% done. ETA 08 Sep 10:31
42070006307657 | 1513*2^1771812+1
p=42070006553601, 18.76K p/sec, 3.97 CPU cores, 65.5% done. ETA 08 Sep 10:31
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
p=42070007864321, 18.82K p/sec, 3.95 CPU cores, 78.6% done. ETA 08 Sep 10:31
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
p=42070009175041, 18.54K p/sec, 3.91 CPU cores, 91.8% done. ETA 08 Sep 10:31
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 532.38 sec. (0.06 init + 532.33 sieve) at 19206 p/sec.
Processor time: 2097.70 sec. (0.06 init + 2097.64 sieve) at 4874 p/sec.
Average processor utilization: 1.07 (init), 3.94 (sieve)
machine specs
windows xp sp3
intel i5 @3.2Ghz
intel GMA HD graphics (integrated)
so not fast.. but it works. |
|
|
|
|
|
oh.. and if i add --d 2
it crashes with a popup
error code 0xc0000005 |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
HD 4650
FYI, I've been looking at the kernel with the Stream Kernel Analyzer, and it looks like the current configuration (that writes bytes as output) won't compile on 4000-series GPUs. I'll look at changing this soon, but it would be nice if someone with a 5000 series produced valid results with this.
Edit: And I see tocx has a 5000 series, so that's not likely.
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
It's working for me... :)
i7 980X + HD5850 (GPU clock: 725 MHz, Memory clock: 1000 MHz) + GTX460
Win7 Prof. x64
Catalyst 10.8, SDK 2.2
First try:
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Device 0 looks like an 12-core CPU, not a GPU. Adjusting.
Detected 12 multiprocessors (60 SPUs) on device 0.
[...]
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 27 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 535.30 sec. (0.02 init + 535.27 sieve) at 56320 p/sec.
Processor time: 6247.65 sec. (0.03 init + 6247.62 sieve) at 4825 p/sec.
Average processor utilization: 1.30 (init), 11.67 (sieve)
OK, that's the CPU. However, results are correct.
Second try:
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected 18 multiprocessors (90 SPUs) on device 1.
[...]
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 27 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 25.47 sec. (0.03 init + 25.44 sieve) at 1184985 p/sec.
Processor time: 1.62 sec. (0.03 init + 1.59 sieve) at 18945683 p/sec.
Average processor utilization: 1.16 (init), 0.06 (sieve)
About 96% load on the ATI GPU, correct results. :)
Can't get it to run on the GTX460, but we have the CUDA app for that one, so it's no problem.
____________
|
|
|
|
|
|
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
Thread 0 starting
Device 1 looks like an 10-core CPU, not a GPU. Adjusting.
Detected 10 multiprocessors (50 SPUs) on device 1.
Error: Building Program (clBuildProgram)
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
It's working for me... :)
...
Second try:
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected 18 multiprocessors (90 SPUs) on device 1.
[...]
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 27 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 25.47 sec. (0.03 init + 25.44 sieve) at 1184985 p/sec.
Processor time: 1.62 sec. (0.03 init + 1.59 sieve) at 18945683 p/sec.
Average processor utilization: 1.16 (init), 0.06 (sieve)
About 96% load on the ATI GPU, correct results. :)
YES!!!!!!!!!! :D
I'm guessing tocx just needed me to compile with the sdk 2.2 instead of 2.1. :)
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
Thread 0 starting
Device 1 looks like an 10-core CPU, not a GPU. Adjusting.
Detected 10 multiprocessors (50 SPUs) on device 1.
Error: Building Program (clBuildProgram)
The error building program on a 4000-series processor is expected for now; I'll fix it soon. But this also points out another bug: multiprocessor counts are 16 times as large as they're being detected. I'll have to overhaul the device detection code.
____________
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1905 ID: 352 Credit: 4,056,750,098 RAC: 4,353,900
                                 
|
|
Catalyst 10.8, SDK 2.2, Windows 2008 R2 x64, HD5850.
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 2
The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail.
(getting same error when running ppsieve-cl-x86-windows.exe without any parameters)
using sxstrace I extracted following
=================
Begin Activation Context Generation.
Input Parameter:
Flags = 0
ProcessorArchitecture = Wow32
CultureFallBacks = en-US;en
ManifestPath = E:\ppsieve-cl-x86-windows.exe
AssemblyDirectory = E:\
Application Config File =
-----------------
INFO: Parsing Manifest File E:\ppsieve-cl-x86-windows.exe.
INFO: Manifest Definition Identity is (null).
INFO: Reference: Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8"
INFO: Resolving reference Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8".
INFO: Resolving reference for ProcessorArchitecture WOW64.
INFO: Resolving reference for culture Neutral.
INFO: Applying Binding Policy.
INFO: No publisher policy found.
INFO: No binding policy redirect found.
INFO: Begin assembly probing.
INFO: Did not find the assembly in WinSxS.
INFO: Attempt to probe manifest at C:\Windows\assembly\GAC_32\Microsoft.VC90.CRT\9.0.21022.8__1fc8b3b9a1e18e3b\Microsoft.VC90.CRT.DLL.
INFO: Did not find manifest for culture Neutral.
INFO: End assembly probing.
INFO: Resolving reference for ProcessorArchitecture x86.
INFO: Resolving reference for culture Neutral.
INFO: Applying Binding Policy.
INFO: No publisher policy found.
INFO: No binding policy redirect found.
INFO: Begin assembly probing.
INFO: Did not find the assembly in WinSxS.
INFO: Attempt to probe manifest at C:\Windows\assembly\GAC_32\Microsoft.VC90.CRT\9.0.21022.8__1fc8b3b9a1e18e3b\Microsoft.VC90.CRT.DLL.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT.DLL.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT.MANIFEST.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT\Microsoft.VC90.CRT.DLL.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT\Microsoft.VC90.CRT.MANIFEST.
INFO: Did not find manifest for culture Neutral.
INFO: End assembly probing.
ERROR: Cannot resolve reference Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8".
ERROR: Activation Context generation failed.
End Activation Context Generation.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Try installing the 32-bit version of the Microsoft Visual C++ 2008 redistributable package.
P.S. Please also use code tags judiciously. When text inside them is too long, they make the forum overflow my screen to the right.
____________
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1905 ID: 352 Credit: 4,056,750,098 RAC: 4,353,900
                                 
|
Try installing the 32-bit version of the Microsoft Visual C++ 2008 redistributable package.
Thanks it helped.
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 18 multiprocessors (90 SPUs) on device 1.
...
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 9.27 sec. (0.04 init + 9.23 sieve) at 1108234 p/sec.
Processor time: 1.22 sec. (0.05 init + 1.17 sieve) at 8738081 p/sec.
Average processor utilization: 1.10 (init), 0.13 (sieve)
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
|
|
Alas my ATI 5770 bought in april is defunct. Put it in yesterday after it having idled in the package for two months. I will coordinate a replacement on monday with the vendor. Then i am free to look into this. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Alright, I have a new test version ready for everyone, v0.0.2 alpha. This version is vectorized, so it could be up to twice as fast as the last version; though I'm suspecting it might be slower than the last version. Should that be the case, I have a version with no vectorization ready to go as well.
Vectorization is controlled with the -v parameter, so please try -v 2, -v 4, and maybe -v 3 (though that's likely to be really slow!). The default is 4.
Also, this code should now run on 4xxx series cards, though I will need comparisons to previous runs on 5xxx cards as well.
Good luck!
P.S. Gerrit, I hope your card gets well soon.
____________
|
|
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
|
I've neither linux or a decent windows version *g* nor an amd grafics card, but i've tried the opencl version
Z:\ppsieve-cl>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999
-N 2000000 -c 60 --device 0
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 192 multiprocessors (960 SPUs) on device 0.
clang: Unknown command line argument '-g'. Try: 'clang --help'
GPU is an nVidia 8800 GTS with G80 chipset (wikipedia says this card is opencl capable), driver 258.96, OS Windows XP
I've searched for this output on google and found this: http://stackoverflow.com/questions/3060201/compile-opencl-kernels-with-debug-information
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Hm, yeah, I probably should take the -g out at this point...OK, done!
Edit: In any case, you'd probably need to compile with nVIDIA's compiler to get this to work on nVIDIA.
____________
|
|
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
|
okay, got another error
Z:\ppsieve-cl>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999
-N 2000000 -c 60
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 192 multiprocessors (960 SPUs) on device 0.
Error: Building Program (clBuildProgram)
Ken please dont tweak this app for my old card. i don't want to steal your time with this one ;)
____________
|
|
|
|
|
|
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 17.40 sec. (0.03 init + 17.36 sieve) at 588844 p/sec.
Processor time: 1.78 sec. (0.05 init + 1.73 sieve) at 5904107 p/sec.
Average processor utilization: 1.38 (init), 0.10 (sieve)
with -v 2
Elapsed time: 15.69 sec. (0.03 init + 15.65 sieve) at 653138 p/sec.
Processor time: 1.29 sec. (0.05 init + 1.25 sieve) at 8191947 p/sec.
with -v 3
Error: Building Program (clBuildProgram) |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
Glad it's working on a 4xxx card. :)
with -v 2
Elapsed time: 15.69 sec. (0.03 init + 15.65 sieve) at 653138 p/sec.
Processor time: 1.29 sec. (0.05 init + 1.25 sieve) at 8191947 p/sec.
Huh. OK, I guess I'll go with -v 2 unless someone who ran the earlier app finds this one slower.
with -v 3
Error: Building Program (clBuildProgram)
Heh. I was right, though, -v 3 was much slower. :P
____________
|
|
|
|
|
|
Hardware configuration :
Q6600 Intel Processor
4 Go RAM
2 Ati Radeon HD5870 Sapphire (crossfire)
Software configuration :
Gentoo Linux 10.0 system (64 bits)
2.6.34-gentoo-r6 kernel
ati-drivers 10.8 (catalyst)
ati-stream-sdk-bin 2.2
ppsieve version cl-0.0.2-alpha (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 64 multiprocessors (320 SPUs) on device 0.
Thread 0 interrupted
Sieve incomplete: 42070000000000 <= p < 42070001310721
count=41690,sum=0x1857193c797374b0
Elapsed time: 16.99 sec. (0.01 init + 16.98 sieve) at 77199 p/sec.
Processor time: 59.81 sec. (0.01 init + 59.80 sieve) at 21919 p/sec.
Average processor utilization: 0.90 (init), 3.52 (sieve)
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070001310721 in ppcheck42070e9.txt
Thread 0 starting
Detected 64 multiprocessors (320 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 97.11 sec. (0.01 init + 97.10 sieve) at 91791 p/sec.
Processor time: 384.34 sec. (0.01 init + 384.33 sieve) at 23191 p/sec.
Average processor utilization: 0.99 (init), 3.96 (sieve)
called boinc_finish |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
HD 4650, Catalyst 10.7, SDK 2.2
Win7 64-bit
with -v 2
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1 -v2
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 1.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 42.49 sec. (0.05 init + 42.44 sieve) at 240871 p/sec.
Processor time: 42.03 sec. (0.05 init + 41.98 sieve) at 243536 p/sec.
Average processor utilization: 1.04 (init), 0.99 (sieve)
with -v 3
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1 -v3
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 1.
Error: Building Program (clBuildProgram)
with -v 4
I get a driver crash, followed by the application running, but it is incredibly slow...while writing this message it has gotten just over 10% done.
Also, the SPUs are incorrect. A 4650 card has 320, not 640 as the app reports.
Edit: Also, I am wondering at the full core of CPU usage...maybe because this is an AMD CPU (Athlon x2 5600+)?
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
Average processor utilization: 0.99 (init), 3.96 (sieve)
That's using your CPU. Try --device 1. Sorry I can't edit the first post to suggest that.
Also note that two cards in SLI are still two cards, and will probably be referenced separately, as --device 1 and --device 2.
Scott, I'm wondering about that too, but I doubt the CPU is the problem. Try --device 2 and --device 3 just to make sure, though.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Scott, I'm wondering about that too, but I doubt the CPU is the problem. Try --device 2 and --device 3 just to make sure, though.
--device 2 is the onboard graphics being used in tandem with the 4650 via the "surround view" feature...allows it to crunch over at Collatz, but is not OpenCL capable (app crashes as expected)
--device 0 runs the CPU...99% on the task manager and runs much slower than the 4650. Also shows up as 160 SPUs.
Edit: CPU results
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 0 -v2
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 32 multiprocessors (160 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
p=42070001048577, 17.48K p/sec, 1.92 CPU cores, 10.5% done. ETA 09 Sep 14:59
p=42070002097153, 13.01K p/sec, 1.94 CPU cores, 21.0% done. ETA 09 Sep 15:01
p=42070003145729, 13.15K p/sec, 1.94 CPU cores, 31.5% done. ETA 09 Sep 15:01
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070004194305, 13.09K p/sec, 1.94 CPU cores, 41.9% done. ETA 09 Sep 15:02
p=42070005242881, 13.16K p/sec, 1.94 CPU cores, 52.4% done. ETA 09 Sep 15:02
p=42070006291457, 13.15K p/sec, 1.95 CPU cores, 62.9% done. ETA 09 Sep 15:02
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
p=42070007340033, 13.15K p/sec, 1.94 CPU cores, 73.4% done. ETA 09 Sep 15:02
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
p=42070008388609, 13.15K p/sec, 1.94 CPU cores, 83.9% done. ETA 09 Sep 15:02
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
p=42070009437185, 13.15K p/sec, 1.94 CPU cores, 94.4% done. ETA 09 Sep 15:02
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 761.70 sec. (0.05 init + 761.65 sieve) at 13423 p/sec.
Processor time: 1476.74 sec. (0.05 init + 1476.69 sieve) at 6923 p/sec.
Average processor utilization: 0.96 (init), 1.94 (sieve)
____________
141941*2^4299438-1 is prime!
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
HD5850 again:
-v 2:
Elapsed time: 22.72 sec. (0.04 init + 22.69 sieve) at 1328786 p/sec.
Processor time: 2.73 sec. (0.03 init + 2.70 sieve) at 11170287 p/sec.
-v 3:
Elapsed time: 26.23 sec. (0.04 init + 26.19 sieve) at 1151050 p/sec.
Processor time: 3.06 sec. (0.05 init + 3.01 sieve) at 10012744 p/sec.
-v 4:
Elapsed time: 25.09 sec. (0.04 init + 25.05 sieve) at 1203483 p/sec.
Processor time: 2.96 sec. (0.06 init + 2.90 sieve) at 10389565 p/sec.
____________
|
|
|
|
|
|
Sorry, I make a mistake for my first test.
GPU (first or second, I don't know) calculation :
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cl-0.0.2-alpha (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 64 multiprocessors (320 SPUs) on device 0.
Thread 0 interrupted
Sieve incomplete: 42070000000000 <= p < 42070001572865
count=50065,sum=0x1d3ad93327b6df1f
Elapsed time: 17.79 sec. (0.01 init + 17.77 sieve) at 88504 p/sec.
Processor time: 69.20 sec. (0.01 init + 69.19 sieve) at 22734 p/sec.
Average processor utilization: 0.98 (init), 3.89 (sieve)
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070001572865 in ppcheck42070e9.txt
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 1.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.20 sec. (0.01 init + 6.19 sieve) at 1398453 p/sec.
Processor time: 6.20 sec. (0.01 init + 6.19 sieve) at 1398381 p/sec.
Average processor utilization: 0.98 (init), 1.00 (sieve)
called boinc_finish
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
OK, a new version, 0.1.0-beta, is out. Major changes you should notice:
- It only runs on GPUs. So you shouldn't need to pass --device anymore unless you have two or more.
- It defaults to -v 2. So you shouldn't need to pass that anymore either.
So this should make it possible to run the app on BOINC PPSE WUs, with an appropriate app_info.xml file. (No, I don't know how to create one.)
Other things you might find interesting:
- -m works now. Default is 7, but you can fiddle to your heart's content. Not that I expect any significant improvements, but I've been wrong before.
- Better error messages. Not that you should notice.
Based on the number of crashes I saw from the previous version, I'm not sure any OpenCL app can come out of beta yet. But hopefully this is as stable as it can be. If you see any major issues, post them here.
Oh, and Scott, I have no answers for you about the 100% CPU usage on the 4650. I can only guess that either some driver is too old or it's a quirky card. I will say that I'm glad I didn't buy one. (I was thinking about it!)
____________
|
|
|
|
|
|
Win7 x64
SDK 2.2
Catalyst 10.7
HD4870
-v2
00:53:22 (3460): Can't set up shared mem: -1. Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 14.10 sec. (0.05 init + 14.06 sieve) at 727317 p/sec.
Processor time: 1.64 sec. (0.06 init + 1.58 sieve) at 6488672 p/sec.
Average processor utilization: 1.33 (init), 0.11 (sieve)
00:53:36 (3460): called boinc_finish
-v3
00:53:57 (3572): Can't set up shared mem: -1. Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
Error: Building Program (clBuildProgram)
00:53:58 (3572): called boinc_finish
-v4
00:54:20 (1656): Can't set up shared mem: -1. Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 15.62 sec. (0.05 init + 15.57 sieve) at 656588 p/sec.
Processor time: 2.14 sec. (0.06 init + 2.07 sieve) at 4927488 p/sec.
Average processor utilization: 1.33 (init), 0.13 (sieve)
00:54:36 (1656): called boinc_finish
|
|
|
|
|
OK, a new version, 0.1.0-beta, is out. Major changes you should notice:
- It only runs on GPUs.
Nice, i couldn't use my gpu in boinc with the previous version.
This app_info works for me, but i'm not sure it will for everyone.
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>pps_sr2sieve_20090322.sieveinput</name>
<nbytes>243615956.000000</nbytes>
<max_nbytes>0.000000</max_nbytes>
<status>1</status>
<sticky/>
</file_info>
<file_info>
<name>ppsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>124</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>ppsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info> |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Oh, and Scott, I have no answers for you about the 100% CPU usage on the 4650. I can only guess that either some driver is too old or it's a quirky card. I will say that I'm glad I didn't buy one. (I was thinking about it!)
I will try the driver when I am in the office tomorrow...might be the difference between 10.7 vs. 10.8 since I do not have the CPU load with 10.8 on my home machine with a 4670 card (results below). Also, you can see from the results below that the m setting of 7 is not optimal at least for this card.
HD4670, catalyst 10.8, sdk 2.2
Vista 64-bit on i7-920
Default settings:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 38.44 sec. (0.03 init + 38.40 sieve) at 266233 p/sec.
Processor time: 1.31 sec. (0.05 init + 1.26 sieve) at 8090813 p/sec.
Average processor utilization: 1.38 (init), 0.03 (sieve)
with -m 1
Elapsed time: 57.24 sec. (0.03 init + 57.21 sieve) at 178710 p/sec.
Processor time: 1.42 sec. (0.03 init + 1.39 sieve) at 7363548 p/sec.
Average processor utilization: 0.92 (init), 0.02 (sieve)
with -m 2
Elapsed time: 29.49 sec. (0.04 init + 29.45 sieve) at 347152 p/sec.
Processor time: 1.54 sec. (0.05 init + 1.50 sieve) at 6826626 p/sec.
Average processor utilization: 1.26 (init), 0.05 (sieve)
with -m 3
Elapsed time: 41.41 sec. (0.03 init + 41.38 sieve) at 247073 p/sec.
Processor time: 1.56 sec. (0.05 init + 1.51 sieve) at 6756244 p/sec.
Average processor utilization: 1.38 (init), 0.04 (sieve)
with -m 4
Elapsed time: 31.42 sec. (0.03 init + 31.38 sieve) at 325780 p/sec.
Processor time: 1.51 sec. (0.05 init + 1.47 sieve) at 6971872 p/sec.
Average processor utilization: 1.38 (init), 0.05 (sieve)
with -m 5
Elapsed time: 38.29 sec. (0.03 init + 38.25 sieve) at 267270 p/sec.
Processor time: 1.31 sec. (0.05 init + 1.26 sieve) at 8090813 p/sec.
Average processor utilization: 1.38 (init), 0.03 (sieve)
with -m 6
Elapsed time: 31.84 sec. (0.03 init + 31.81 sieve) at 321376 p/sec.
Processor time: 1.33 sec. (0.03 init + 1.29 sieve) at 7895855 p/sec.
Average processor utilization: 0.95 (init), 0.04 (sieve)
with -m 8
Elapsed time: 31.99 sec. (0.03 init + 31.95 sieve) at 319958 p/sec.
Processor time: 1.28 sec. (0.03 init + 1.25 sieve) at 8191947 p/sec.
Average processor utilization: 0.92 (init), 0.04 (sieve)
with -m 9
Elapsed time: 35.72 sec. (0.04 init + 35.69 sieve) at 286464 p/sec.
Processor time: 1.40 sec. (0.03 init + 1.37 sieve) at 7447224 p/sec.
Average processor utilization: 0.89 (init), 0.04 (sieve)
with -m 10
Elapsed time: 32.77 sec. (0.03 init + 32.73 sieve) at 312334 p/sec.
Processor time: 1.40 sec. (0.05 init + 1.36 sieve) at 7532824 p/sec.
Average processor utilization: 1.38 (init), 0.04 (sieve)
So it looks like even m's are better with -m2 being best overall.
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Thanks, Scott! I now see that makes sense. I now have no idea where I got the 7, but searching for it I ran across this PDF. My "BLOCKSIZE" (which doesn't mean much in OpenCL except that it's a constant) is 128. That happens to be the right number of threads to fill a pair of wavefronts. Apparently four wavefronts avoid latency issues, so it makes sense that even m's would be better.
Now, no offense, Scott, but I'm not going to optimize for your low-end GPU specifically. But if I can get some tests of higher-end GPUs verifying this (and I expect I will), I'll release a version where m defaults to 2 tomorrow.
P.S. Anyone want to try -v 4 with -m 2? I doubt it will help, but like I said, I've been wrong before.
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
Some test results for 0.1.0-beta on the HD5850:
default:
Elapsed time: 22.83 sec. (0.04 init + 22.80 sieve) at 1322316 p/sec.
Processor time: 2.90 sec. (0.05 init + 2.85 sieve) at 10559889 p/sec.
-m1:
Elapsed time: 68.73 sec. (0.04 init + 68.69 sieve) at 438898 p/sec.
Processor time: 3.20 sec. (0.06 init + 3.14 sieve) at 9614226 p/sec.
-m2:
Elapsed time: 40.08 sec. (0.03 init + 40.05 sieve) at 752812 p/sec.
Processor time: 2.90 sec. (0.05 init + 2.85 sieve) at 10559889 p/sec.
-m2, -v 4:
Elapsed time: 36.81 sec. (0.04 init + 36.78 sieve) at 819732 p/sec.
Processor time: 2.98 sec. (0.05 init + 2.93 sieve) at 10279039 p/sec.
-m3:
Elapsed time: 27.54 sec. (0.03 init + 27.50 sieve) at 1096096 p/sec.
Processor time: 2.68 sec. (0.05 init + 2.64 sieve) at 11434671 p/sec.
-m4:
Elapsed time: 21.34 sec. (0.04 init + 21.29 sieve) at 1415782 p/sec.
Processor time: 2.78 sec. (0.06 init + 2.71 sieve) at 11106090 p/sec.
-m8:
Elapsed time: 20.24 sec. (0.05 init + 20.19 sieve) at 1493280 p/sec.
Processor time: 2.59 sec. (0.05 init + 2.54 sieve) at 11855581 p/sec.
-m12:
Elapsed time: 20.22 sec. (0.04 init + 20.18 sieve) at 1493724 p/sec.
Processor time: 2.95 sec. (0.06 init + 2.89 sieve) at 10445728 p/sec.
-m16:
Elapsed time: 19.83 sec. (0.04 init + 19.80 sieve) at 1522851 p/sec.
Processor time: 2.64 sec. (0.03 init + 2.61 sieve) at 11571616 p/sec.
-m20:
Elapsed time: 19.94 sec. (0.05 init + 19.89 sieve) at 1515501 p/sec.
Processor time: 2.62 sec. (0.03 init + 2.59 sieve) at 11641324 p/sec.
-m24:
Elapsed time: 20.48 sec. (0.04 init + 20.44 sieve) at 1474724 p/sec.
Processor time: 3.04 sec. (0.05 init + 3.00 sieve) at 10064893 p/sec.
-m32:
Elapsed time: 19.71 sec. (0.04 init + 19.67 sieve) at 1532762 p/sec.
Processor time: 3.26 sec. (0.03 init + 3.23 sieve) at 9335555 p/sec.
Also tried m 5, 6, 9, 10 and some numbers above 32, but they all were slower that the default.
@tepek: Your app_info works, but you don't need to include the file_info for pps_sr2sieve_20090322.sieveinput, as ppsieve doesn't need the sievefile.
____________
|
|
|
|
|
|
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.03 sec. (0.01 init + 6.01 sieve) at 1700196 p/sec.
Processor time: 5.99 sec. (0.01 init + 5.98 sieve) at 1710700 p/sec.
Average processor utilization: 0.99 (init), 0.99 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=2
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 10.15 sec. (0.01 init + 10.14 sieve) at 1008154 p/sec.
Processor time: 10.14 sec. (0.01 init + 10.13 sieve) at 1008975 p/sec.
Average processor utilization: 0.74 (init), 1.00 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=7
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.87 sec. (0.01 init + 6.86 sieve) at 1490629 p/sec.
Processor time: 6.85 sec. (0.01 init + 6.83 sieve) at 1496237 p/sec.
Average processor utilization: 0.99 (init), 1.00 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=10
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 7.95 sec. (0.01 init + 7.93 sieve) at 1288497 p/sec.
Processor time: 7.95 sec. (0.01 init + 7.94 sieve) at 1287153 p/sec.
Average processor utilization: 0.74 (init), 1.00 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=20
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 7.79 sec. (0.01 init + 7.78 sieve) at 1314750 p/sec.
Processor time: 7.79 sec. (0.01 init + 7.78 sieve) at 1314739 p/sec.
Average processor utilization: 0.96 (init), 1.00 (sieve)
called boinc_finish |
|
|
|
|
|
hd 4850
default Elapsed time: 45.33 sec. (0.03 init + 45.29 sieve) at 665589 p/sec.
Processor time: 2.70 sec. (0.05 init + 2.65 sieve) at 11367408 p/sec.
-m 4Elapsed time: 42.39 sec. (0.03 init + 42.36 sieve) at 711744 p/sec.
Processor time: 2.48 sec. (0.05 init + 2.43 sieve) at 12387563 p/sec.
-m 4 -v 4Elapsed time: 39.20 sec. (0.03 init + 39.16 sieve) at 769739 p/sec.
Processor time: 3.28 sec. (0.05 init + 3.23 sieve) at 9335552 p/sec.
-m 8Elapsed time: 40.09 sec. (0.03 init + 40.06 sieve) at 752615 p/sec.
Processor time: 2.48 sec. (0.05 init + 2.43 sieve) at 12387563 p/sec.
-m 8 -v 4Elapsed time: 44.21 sec. (0.03 init + 44.18 sieve) at 682404 p/sec.
Processor time: 2.87 sec. (0.05 init + 2.82 sieve) at 10676572 p/sec.
-m 16Elapsed time: 39.07 sec. (0.03 init + 39.04 sieve) at 772144 p/sec.
Processor time: 2.32 sec. (0.03 init + 2.29 sieve) at 13145986 p/sec.
-m 16 -v 4Elapsed time: 46.20 sec. (0.03 init + 46.17 sieve) at 652917 p/sec.
Processor time: 2.84 sec. (0.03 init + 2.81 sieve) at 10735886 p/sec.
-m 32Elapsed time: 39.60 sec. (0.03 init + 39.57 sieve) at 761783 p/sec.
Processor time: 2.78 sec. (0.05 init + 2.73 sieve) at 11042627 p/sec.
-m 64
Elapsed time: 39.12 sec. (0.03 init + 39.09 sieve) at 771216 p/sec.
Processor time: 2.37 sec. (0.03 init + 2.34 sieve) at 12883063 p/sec. |
|
|
|
|
|
I have a lot of wu errors like this http://www.primegrid.com/result.php?resultid=188107021
Few wus are ok.
what is the problem ? |
|
|
|
|
|
new app_info, faster than the previous one (1200 to 1030 seconds on a 4850)
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>ppsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>124</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline>-m 16</cmdline>
<file_ref>
<file_name>ppsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info> |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Now, no offense, Scott, but I'm not going to optimize for your low-end GPU specifically. But if I can get some tests of higher-end GPUs verifying this (and I expect I will), I'll release a version where m defaults to 2 tomorrow.
None taken...the 4600 series is mid-range usually, but since OpenCL only runs on 4xxx and 5xxx cards, the low-end designation is appropriate.
Also, I figured out the issue on the 4650 card with CPU utilization of one full core. Turns out it wasn't the driver or quirky card, rather a quirky set-up. I had "surround view" turned on with a dummy plug to allow the onboard graphics to crunch along side the 4650 on Collatz (AMD 760 chipset, so not OpenCL capable). When I turn that off, the cpu load during sieve is about the same as the 4670 in my home machine with GPU times just a few seconds slower (as expected given the slower core clocks). I have not been able to find any info about the "Surround View" feature with OpenCL, so maybe this is a new issue that should be reported to AMD/ATI?
P.S. Anyone want to try -v 4 with -m 2? I doubt it will help, but like I said, I've been wrong before.
The -m2 setting remains the fastest for the 4600 series (I also tested m16, m32, and m64, each larger m being faster, but not as fast as the m2 setting). As before, the -v4 setting crashes the driver on the 4600 series.
____________
141941*2^4299438-1 is prime!
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Thanks, Scott! I now see that makes sense. I now have no idea where I got the 7, but searching for it I ran across this PDF. My "BLOCKSIZE" (which doesn't mean much in OpenCL except that it's a constant) is 128. That happens to be the right number of threads to fill a pair of wavefronts. Apparently four wavefronts avoid latency issues, so it makes sense that even m's would be better.
I wonder if the different -m performance across the cards so far is related to ATI employing different wavefront sizes in different cards? The 58xx/57xx and 48xx use 64, but a 5450 card has wavefront size of 32 (I haven't found the size for the 4650 yet...some cards are also have wavefronts of size 16, but I don't think any of these are OpenCL capable???).
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
@Elgrande71: A computation error means just that: your GPU found a factor that your CPU couldn't verify. Usually, this means you've overclocked too much!
By the way, for PPSieve, there's not much reason to overclock the memory, if such things are separate in ATI like they are in nVIDIA. You can under-clock the memory, overclock the shaders, and get better performance (as long as you don't overclock too much).
@tepek
<cmdline>-m 16</cmdline> Nice! I didn't know about that. Now everyone can adjust their client manually as needed. :)
As for automatically finding a good setting, I'm not sure about that yet. :/ But it does look like -v 4 is best forgotten.
____________
|
|
|
|
|
@Elgrande71: A computation error means just that: your GPU found a factor that your CPU couldn't verify. Usually, this means you've overclocked too much!
By the way, for PPSieve, there's not much reason to overclock the memory, if such things are separate in ATI like they are in nVIDIA. You can under-clock the memory, overclock the shaders, and get better performance (as long as you don't overclock too much).
My GPU cards are a bit overclocked (Sapphire HD5870 Vapor-X) but now, there is no problem at all.
First WUs are gone in errors but now everything is working fine.
Thank you for your advice and your work.
Continue. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
I wonder if the different -m performance across the cards so far is related to ATI employing different wavefront sizes in different cards? The 58xx/57xx and 48xx use 64, but a 5450 card has wavefront size of 32 (I haven't found the size for the 4650 yet...some cards are also have wavefronts of size 16, but I don't think any of these are OpenCL capable???).
Yes, all three sizes are possible. And it seems to be hard to figure out which is which. That's what's stopping me from changing the default -m size right now.
About computation errors, note that most computation errors go undetected. Especially the important ones, where a factor is missed. So if you're getting detected computation errors, even if they stop, it would be a good idea to change something to prevent future errors.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Alright, I've pushed a new version that just changes the default -m to match what CL_KERNEL_WORK_GROUP_SIZE returns. It's so similar that I didn't even bother changing the version number. (But the build date is today.) It also prevents using a -m higher than the default, as I think it returns a maximum.
Since the default is now the maximum possible size, I'd like to see what various cards set -m to on the test run. (It's printed to stdout.) It might make sense to make the default -m some fraction of what it is now. So let me know what your cards produce.
Thanks!
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
HD 4650 results...Something is not quite right with this one...
ppsieve-cl-x86-windows.exe -p42070e9 -P42
070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070002359297 in ppcheck42070e9.txt
Thread 0 starting
Detected 8 SIMDs (640 SPUs?) on device 0.
Using 128 threads (about -m 0.25).
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070004718593, 39.32K p/sec, 0.03 CPU cores, 47.2% done. ETA 10 Sep 17:08
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
p=42070007077889, 35.53K p/sec, 0.03 CPU cores, 70.8% done. ETA 10 Sep 17:08
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
p=42070009437185, 38.39K p/sec, 0.03 CPU cores, 94.4% done. ETA 10 Sep 17:08
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 204.62 sec. (0.05 init + 204.57 sieve) at 38443 p/sec.
Processor time: 6.43 sec. (0.06 init + 6.36 sieve) at 1235588 p/sec.
Average processor utilization: 1.25 (init), 0.03 (sieve)
Also, -m2 is not allowed and is reduced to something less than 1 when tried.
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
OK, that didn't work. I've reverted to the previous build. I'll have to figure out something else later.
____________
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
|
finally got time to try this...
HD4670, Ubuntu 10.04 x86_64 (collatz works fine on this)
i have done the ICD registration and ldd looks good
i use this wrapper
#!/bin/sh
export DISPLAY=:0.0
export ATISTREAMSDKROOT=${HOME}/ATI/ati-stream-sdk-v2.2-lnx64
export ATISTREAMSDKSAMPLESROOT=${ATISTREAMSDKROOT}
export LD_LIBRARY_PATH=${ATISTREAMSDKROOT}/lib/x86_64:${LD_LIBRARY_PATH}
exec ./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 $@
but it can't find the device (have tried various --device options)
$ ./ppsieve.sh
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
$ cat stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
called boinc_finish
____________
|
|
|
|
|
|
try to run on : Ubuntu 10.04 64b with HD4870 (no overclock) driver 10.8 ( + ATI Stream SDK 2.1 and 2.2. same result)
command :
ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
result :
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
stderr.txt :
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Error: Building Program (clBuildProgram): Program build failure
called boinc_finish
same error with a real WU : http://www.primegrid.com/result.php?resultid=188269557
what's wrong ? |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Jip, did you complete all of the installation procedure (PDF), including setting up all necessary environment variables?
Vato: I'm afraid I don't have any personal experience with cards being found. (Since I don't have one. :P)
____________
|
|
|
|
|
|
with that command :
ldd ppsieve-cl-boinc-x86_64-linux
i have :
linux-vdso.so.1 => (0x00007fff425cd000)
libOpenCL.so => /usr/lib/libOpenCL.so (0x00007f2c50e5c000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f2c50c3f000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f2c5092a000)
libm.so.6 => /lib/libm.so.6 (0x00007f2c506a7000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f2c50490000)
libc.so.6 => /lib/libc.so.6 (0x00007f2c5010c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2c5107b000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f2c4ff08000)
librt.so.1 => /lib/librt.so.1 (0x00007f2c4fd00000)
i have "atiocl32.icd" and "atiocl64.icd" in "/etc/OpenCL/vendors"
and "libatiocl64.so" and "libOpenCL.so" in /usr/lib64
and has you could see, it work to detect my GPU :
stderr.txt :
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Error: Building Program (clBuildProgram): Program build failure
called boinc_finish
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
check echo $ATISTREAMSDKROOT
____________
|
|
|
|
|
|
i have no "$ATISTREAMSDKROOT" because "libatiocl64.so" and "libOpenCL.so" are in /usr/lib64 and then are find. as show by the ldd command. another lib is used ?
i try the complete sdk install. perhaps better but i dont think juste for run.
for libOpenCL with ldd i have :
ldd /usr/lib64/libOpenCL.so
linux-vdso.so.1 => (0x00007fff53dff000)
libdl.so.2 => /lib/libdl.so.2 (0x00007feb09147000)
librt.so.1 => /lib/librt.so.1 (0x00007feb08f3f000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007feb08c2a000)
libm.so.6 => /lib/libm.so.6 (0x00007feb089a7000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007feb08790000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007feb08572000)
libc.so.6 => /lib/libc.so.6 (0x00007feb081ef000)
/lib64/ld-linux-x86-64.so.2 (0x00007feb0956a000)
ldd on libatiocl64.so :
ldd /usr/lib64/libatiocl64.so
linux-vdso.so.1 => (0x00007fff039ff000)
libdl.so.2 => /lib/libdl.so.2 (0x00007fd4c995d000)
libX11.so.6 => /usr/lib/libX11.so.6 (0x00007fd4c9627000)
libGL.so.1 => /usr/lib/libGL.so.1 (0x00007fd4c944e000)
libGLU.so.1 => /usr/lib/libGLU.so.1 (0x00007fd4c91dd000)
librt.so.1 => /lib/librt.so.1 (0x00007fd4c8fd5000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fd4c8cc0000)
libm.so.6 => /lib/libm.so.6 (0x00007fd4c8a3d000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fd4c8826000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007fd4c8608000)
libc.so.6 => /lib/libc.so.6 (0x00007fd4c8285000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd4ca96a000)
libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007fd4c8069000)
libXext.so.6 => /usr/lib/libXext.so.6 (0x00007fd4c7e56000)
libatiuki.so.1 => /usr/lib/libatiuki.so.1 (0x00007fd4c7d4d000)
libXau.so.6 => /usr/lib/libXau.so.6 (0x00007fd4c7b48000)
libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x00007fd4c7942000)
all dependency seem correct. no ? |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
I think you need the complete SDK install because the OpenCL code is distributed as source code. The OpenCL SDK has to compile it, every time, before it can run!
I am aware that the current SDK allows compiling in binary OpenCL code; however, the app is currently set up to "bake in" certain constants.
____________
|
|
|
|
|
|
same error with the complete sdk install .....
echo $ATISTREAMSDKROOT
/home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64
echo $ATISTREAMSDKSAMPLESROOT
/home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64
echo $LD_LIBRARY_PATH
/home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64/lib/x86_64:
ldd ppsieve-cl-boinc-x86_64-linux
linux-vdso.so.1 => (0x00007fff4531c000)
libOpenCL.so => /home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libOpenCL.so (0x00007f5d73951000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f5d7371c000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f5d73407000)
libm.so.6 => /lib/libm.so.6 (0x00007f5d73184000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f5d72f6d000)
libc.so.6 => /lib/libc.so.6 (0x00007f5d72be9000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5d73b58000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f5d729e5000)
librt.so.1 => /lib/librt.so.1 (0x00007f5d727dd000)
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
OK, I've pushed a new version that will report the compile error. Let's see what's wrong.
____________
|
|
|
|
|
|
after a "sudo ldconfig"
when i run in my boinc/projetcs/www.primegrid.com directory with this command :
ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 21.26 sec. (0.02 init + 21.24 sieve) at 481269 p/sec.
Processor time: 20.93 sec. (0.02 init + 20.91 sieve) at 488934 p/sec.
Average processor utilization: 1.16 (init), 0.98 (sieve)
called boinc_finish
but with a real WU : http://www.primegrid.com/result.php?resultid=188323747
Stderr output
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/www.primegrid.com/ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
</stderr_txt>
]]>
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
|
Have now put 32bit and 64bit SDK libs in /lib32 and /lib64 respectively, re-run ldconfig, downloaded latest ppsieve-cl.zip, tried boinc and non-boinc and 32bit and 64bit executables - and still haven't got very far. Running under strace just shows lots of mmap() calls succeeding prior to calling clone() and futex(). Still HD4670 Ubuntu 10.04 x86_64 - any further hints gratefully received.
$ ./doit.sh
ppsieve version cl-0.1.0a-beta (testing)
Compiled Sep 12 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
____________
|
|
|
|
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes. |
|
|
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes.
Don't hesitate to put another version of your app for testing. |
|
|
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes.
you have to disable crossfire to make the second gpu working in opencl. It's a long standing bug, fortunately now solved for "old" Brook/CAL apps but still not for OpenCL/CAL apps.
It's a big problem for people with a 5970, because you cannot disable crossfire on that graphic card and so you're forced to use only one gpu.
It's not a problem of this app, it's a problem of both driver (10.7b, 10.8 & 10.9) and AMD SDK 2.2 |
|
|
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes.
you have to disable crossfire to make the second gpu working in opencl. It's a long standing bug, fortunately now solved for "old" Brook/CAL apps but still not for OpenCL/CAL apps.
It's a big problem for people with a 5970, because you cannot disable crossfire on that graphic card and so you're forced to use only one gpu.
It's not a problem of this app, it's a problem of both driver (10.7b, 10.8 & 10.9) and AMD SDK 2.2
I have to summarize my situation :
without crossfire only one GPU is detected by Boinc
with crossfire two GPU detected by Boinc
All other configurations have been tested without success.
GPU workunits computation errors only occured if m parameter is equal or higher than 32 and only one GPU is concerned (not the other).
Is it clear ?
Ati (Amd) developpers have to fix their drivers and SDK. |
|
|
|
|
|
I told you that the "second" gpu often makes computational errors with crossfire enabled. This is a problem with SDK 2.2 and AMD drivers. They'll fix this, eventually, with the following 2.3 release.
You can do two things:
- disable crossfire (you can, you do not have a 5970 in which you can't) and make boinc able to find the second gpu using a dummy plug and extending the desktop (search for this procedure on google) to the secondary fake monitor
- leave crossfire enabled and crunch only on one gpu, the first one.
If not, you'll have to live with the 2nd gpu making a lot of computational errors until the next sdk |
|
|
|
|
|
OS: Windows Vista SP-2 - ATI CCC 10.9 - ATI Stream SDK 2.2 - OpenCL 1.1
CPU: Core 2 Quad 9550 @ 3.4 GHz
GPU: ATI HD 4770 @ stock clock (750/800 MHz)
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0a-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 15.55 sec. (0.03 init + 15.52 sieve) at 658781 p/sec.
Processor time: 1.34 sec. (0.05 init + 1.29 sieve) at 7895855 p/sec.
Average processor utilization: 1.51 (init), 0.08 (sieve)
A complete work unit on the HD 4770 @ stock clock needs less time (45 seconds faster) than on my overclocked GTX 260-192 @ 667 MHz.
If I overclock the HD 4770 too it is up to 160 seconds faster than the already overclocked GTX.
An additional note:
The Windows GUI is much more responsive if I crunch PPS sieve WU's on the ATI instead of using the CUDA enabled GTX.
____________
|
|
|
|
|
|
Hi all ,, would also like to contribute
OS: Win 7 - ATI CC 10.7 - ATI Stream SDK 2.2
CPU: i7
GPU: ATI HD5700 Series
ppsieve-cl-x86-windows -p42070e9 -P42070010e6 -k 1201 -K 9999 -
N 2000000 -c 60
ppsieve version cl-0.1.0a-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 10.19 sec. (0.03 init + 10.15 sieve) at 1006898 p/sec.
Processor time: 1.00 sec. (0.03 init + 0.97 sieve) at 10570257 p/sec.
Average processor utilization: 0.92 (init), 0.10 (sieve)
Does that help ??
What to do more to help ?? |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
GPU: ATI HD5700 Series
That's not much detail. I infer from the 800 SPUs that this is a 5770. Is it overclocked at all? (Edit: since it might be overclocked at the factory, can you determine its clock speed?)
Does that help ??
What to do more to help ??
That's a little helpful, but not my main focus right now. Right now I'd like to get the new algorithms tested in the CUDA thread. Then I can port them here and you can help me test that.
____________
|
|
|
|
|
|
That's not much detail. I infer from the 800 SPUs that this is a 5770. Is it overclocked at all? (Edit: since it might be overclocked at the factory, can you determine its clock speed?)
Yes it is indeed a 5770 ! I think was running @ 800Mhz
That's a little helpful, but not my main focus right now. Right now I'd like to get the new algorithms tested in the CUDA thread. Then I can port them here and you can help me test that.
OK will be glad any time to help .. just let me know what to do |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Alright, everybody, I finally have a new version of PPSieve-CL! V0.2.0-beta incorporates the algorithms of the CUDA version 0.2.1a. Get it at the usual place. Windows builds included too!
Now, to testing! Please test:
- The usual range
- -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
That should produce:20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Found 13 factors
- -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
The reasons for this should quickly become obvious if they aren't already. That should produce:20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
20070002680493 | 6455*2^1778260-1
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Found 14 factors
And let me know how fast it goes too! (Speed is printed in stderr.txt when using the BOINC apps.)
Thanks!
____________
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
|
Did anyone have any advice for me on getting this to work at all?
I still always get "Device not found".
It's just infuriating running collatz instead of sieving.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Vato, I have a vague memory of seeing something about "Download the latest Catalyst drivers" somewhere. Other than that, I give up. Sorry.
____________
|
|
|
|
|
Did anyone have any advice for me on getting this to work at all?
I still always get "Device not found".
It's just infuriating running collatz instead of sieving.
Did you install the Stream SDK 2.2?
____________
|
|
|
|
|
|
hd4850, win7 64bit
-p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Elapsed time: 9.04 sec. (0.03 init + 9.01 sieve) at 1134494 p/sec.
Processor time: 1.14 sec. (0.05 init + 1.09 sieve) at 9362226 p/sec.
Average processor utilization: 1.42 (init), 0.12 (sieve)
-p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Found 13 factors
Elapsed time: 12.70 sec. (0.02 init + 12.67 sieve) at 806670 p/sec.
Processor time: 1.19 sec. (0.03 init + 1.15 sieve) at 8856163 p/sec.
Average processor utilization: 1.42 (init), 0.09 (sieve)
-p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
20070002680493 | 6455*2^1778260-1
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Elapsed time: 12.61 sec. (0.02 init + 12.59 sieve) at 812052 p/sec.
Processor time: 1.64 sec. (0.05 init + 1.59 sieve) at 6425058 p/sec.
Average processor utilization: 1.95 (init), 0.13 (sieve)
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
Did you install the Stream SDK 2.2?
Yes - followed the install to the letter - see posts 26284 & 26357 earlier in this thread. I guess I have to chase my tail with many attempts to get the right combo or sequence, but that's scary with the recent fglrx issues. Why can't they just ship the SDK runtime with the cat drivers like normal folks do?
____________
|
|
|
|
|
Did you install the Stream SDK 2.2?
Yes - followed the install to the letter - see posts 26284 & 26357 earlier in this thread. I guess I have to chase my tail with many attempts to get the right combo or sequence, but that's scary with the recent fglrx issues. Why can't they just ship the SDK runtime with the cat drivers like normal folks do?
Which Catalyst driver version do you use? 10.8 or newer?
____________
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
|
Whatever ubuntu 10.04 + patches is.
I always found that chasing the ATI drivers eventually got me an unbootable system.
And when CVE-2010-3081 got fixed, I got one of those anyway.
This is the downside of ATI...
I can't find a minimum cat driver version in this thread, and the SDK didn't specify either that I can remember - so, what is the actual minimum?
____________
|
|
|
|
|
Whatever ubuntu 10.04 + patches is.
I always found that chasing the ATI drivers eventually got me an unbootable system.
And when CVE-2010-3081 got fixed, I got one of those anyway.
This is the downside of ATI...
I can't find a minimum cat driver version in this thread, and the SDK didn't specify either that I can remember - so, what is the actual minimum?
For the Stream SDK 2.2 / Open CL 1.1 it's the latest 10.9 driver according to the SDK download page:
ATI Radeonâ„¢ HD - ATI Catalystâ„¢ 10.9 Driver Suite
You can get them from the x-updates ppa:
https://launchpad.net/~ubuntu-x-swat/+archive/x-updates
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
I can't find a minimum cat driver version in this thread, and the SDK didn't specify either that I can remember - so, what is the actual minimum?
10.7 is the absolute minimum, though it is only partially supported (and did not work perfectly for me on the tests earlier). 10.8 is probably the true minimum for the SDK that was previously used in testing earlier in the thread, but ATI changes these relatively rapidly so you should update to the latest for best results.
____________
141941*2^4299438-1 is prime!
|
|
|
blahVolunteer tester Send message
Joined: 27 Sep 08 Posts: 19 ID: 29724 Credit: 3,462,933 RAC: 5
         
|
|
OS: Windows 7 64 bit - ATI CCC 10.9 - ATI Stream SDK 2.2 - OpenCL 1.1
CPU: Core 2 Quad 6600 @ 2.4 GHz
GPU: ATI HD 4770 @ 820/840 MHz
ppsieve version cl-0.2.0-beta (testing)
1) -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 8.72 sec. (0.05 init + 8.67 sieve) at 1178703 p/sec.
Processor time: 1.50 sec. (0.06 init + 1.44 sieve) at 7123434 p/sec.
Average processor utilization: 1.33 (init), 0.17 (sieve)
2) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 12.76 sec. (0.03 init + 12.73 sieve) at 803136 p/sec.
Processor time: 2.04 sec. (0.05 init + 2.00 sieve) at 5119967 p/sec.
Average processor utilization: 1.50 (init), 0.16 (sieve)
3) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
Elapsed time: 12.79 sec. (0.06 init + 12.73 sieve) at 803136 p/sec.
Processor time: 2.31 sec. (0.06 init + 2.25 sieve) at 4551083 p/sec.
Average processor utilization: 1.00 (init), 0.18 (sieve)
Factors match for all 3 ranges.
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
|
Thanks chaps!
I'll go with your advice and see how it works.
(Though I'll probably wait to see if I can get it "for free" with ubuntu 10.10 which is only a few days away)
____________
|
|
|
|
|
|
OS: Windows 7 64 bit - ATI CCC 10.7 - ATI Stream SDK 2.2 - OpenCL 1.1
CPU: i7 860 @ 2.8 GHz
GPU: ATI HD 5700 @ 770/1000 MHz
ppsieve version cl-0.2.0-beta (testing)
1) -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 7.32 sec. (0.05 init + 7.27 sieve) at 1406581 p/sec.
Processor time: 1.42 sec. (0.08 init + 1.34 sieve) at 7620414 p/sec.
Average processor utilization: 1.47 (init), 0.18 (sieve)
2) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 10.44 sec. (0.04 init + 10.40 sieve) at 982984 p/sec.
Processor time: 1.84 sec. (0.08 init + 1.76 sieve) at 5799610 p/sec.
Average processor utilization: 1.95 (init), 0.17 (sieve)
3) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
Elapsed time: 10.47 sec. (0.04 init + 10.44 sieve) at 979687 p/sec.
Processor time: 2.04 sec. (0.05 init + 2.00 sieve) at 5119967 p/sec.
Average processor utilization: 1.20 (init), 0.19 (sieve)
Factors match for all 3 ranges. |
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
i7 980X @ 4 GHz + HD5850 @ 725/1000
Win7 Prof. x64, Catalyst 10.9, SDK 2.2
ppsieve version cl-0.2.0-beta (testing)
1) -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 4.20 sec. (0.02 init + 4.18 sieve) at 2444532 p/sec.
Processor time: 0.59 sec. (0.03 init + 0.56 sieve) at 18204347 p/sec.
Average processor utilization: 1.42 (init), 0.13 (sieve)
2) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 6.11 sec. (0.02 init + 6.09 sieve) at 1678383 p/sec.
Processor time: 0.62 sec. (0.05 init + 0.58 sieve) at 17712310 p/sec.
Average processor utilization: 2.13 (init), 0.09 (sieve)
3) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
Elapsed time: 6.06 sec. (0.02 init + 6.04 sieve) at 1691995 p/sec.
Processor time: 0.69 sec. (0.03 init + 0.66 sieve) at 15603714 p/sec.
Average processor utilization: 1.49 (init), 0.11 (sieve)
All expected factors found.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
I'm getting reports from a PSA tester of errors with the following range on Windows:
-p6900015e6 -P6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R It should report the following factors, and does on my Linux64 CPU emulation:
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
But I get errors on Win32 CPU emulation, which are different from the errors reported by the other user.
Can someone with Windows who's run the standard tests run this one to compare? How about someone with Linux 32-bit? That would clear up a few things.
Thanks!
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
Works for me.
ppsieve-cl-x86-windows.exe -p6900015e6 -P6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 288 multiprocessors (1440 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 0.87 sec. (0.01 init + 0.86 sieve) at 1222117 p/sec.
Processor time: 0.25 sec. (0.02 init + 0.23 sieve) at 4481075 p/sec.
Average processor utilization: 1.42 (init), 0.27 (sieve)
____________
|
|
|
|
|
|
Works for me too.
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 1.60 sec. (0.01 init + 1.59 sieve) at 660683 p/sec.
Processor time: 0.53 sec. (0.03 init + 0.50 sieve) at 2100500 p/sec.
Average processor utilization: 2.23 (init), 0.31 (sieve) |
|
|
blahVolunteer tester Send message
Joined: 27 Sep 08 Posts: 19 ID: 29724 Credit: 3,462,933 RAC: 5
         
|
|
Worked for me also.
ppsieve-cl-x86-windows.exe -p6900015e6 -P6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 1.67 sec. (0.03 init + 1.64 sieve) at 640155 p/sec.
Processor time: 0.62 sec. (0.02 init + 0.61 sieve) at 1723486 p/sec.
Average processor utilization: 0.50 (init), 0.37 (sieve)
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
It worked for the guy with the original error as well! I'm now checking to see if his original command line produces the error.
But that was a good range to test anyway.
____________
|
|
|
|
|
|
For me testing as well OK......
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 2.11 sec. (0.03 init + 2.08 sieve) at 504579 p/sec.
Processor time: 0.58 sec. (0.02 init + 0.56 sieve) at 1867113 p/sec.
Average processor utilization: 0.56 (init), 0.27 (sieve) |
|
|
|
|
|
is it safe to run it on boinc? Because i'm running it on boinc, tell me if I have to stop...
thanks a lot! |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
It certainly appears safe to run on BOINC. The Computation Errors only seem to appear when running with a sieve file. It's very possible that my Git merge created a bug in the sieve file reading code.
If you got a Computation Error in BOINC, your WU would error out. If you're not seeing that, there shouldn't be any problem.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Alright, PPSieve-CL v0.2.1-beta is ready for testing. It should be faster than the last version; but I'm really unsure by how much. OpenCL seems to be a relatively slow language.
Also, the sieve file computation errors bug is still a mystery. It's not something I changed in reading the sieve file. So long as it finds the right factors without a sieve file, it should be OK to use this in BOINC.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
i7-920
Windows Vista 64-bit
HD4670
ppsieve-cl-x86-windows.exe -p6900015e6 -P 6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.1-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 3.48 sec. (0.02 init + 3.46 sieve) at 303320 p/sec.
Processor time: 0.67 sec. (0.03 init + 0.64 sieve) at 1639414 p/sec.
Average processor utilization: 1.64 (init), 0.19 (sieve)
Still list the 4670 incorrectly with 640 SPUs (should be 320)?
Do you need it tested on the other ranges in the thread?
____________
141941*2^4299438-1 is prime!
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
Here are the other ranges:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.1-beta (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070006815745, 113.6K p/sec, 0.02 CPU cores, 68.2% done. ETA 17 Oct 20:52
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 95.62 sec. (0.04 init + 95.58 sieve) at 106966 p/sec.
Processor time: 1.37 sec. (0.06 init + 1.31 sieve) at 7801857 p/sec.
Average processor utilization: 1.42 (init), 0.01 (sieve)
Screen was horribly sluggish with this one.
ppsieve-cl-x86-windows.exe -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.1-beta (testing)
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
nstep changed to 22
Computation Error: no candidates found for p=20070000113153 between 1004042 and 1254026.
20070000475957 | 4995*2^1822738+1
Computation Error: no candidates found for p=20070000860671 between 1504010 and 1753994.
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
Computation Error: no candidates found for p=20070002489651 between 754058 and 1004042.
20070002606341 | 4809*2^497683+1
Computation Error: no candidates found for p=20070004648247 between 504074 and 754058.
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Computation Error: no candidates found for p=20070009297743 between 1504010 and 1753994.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070010000000
Found 13 factors
count=326136,sum=0x5ad678173464405c
Elapsed time: 19.85 sec. (0.03 init + 19.82 sieve) at 515875 p/sec.
Processor time: 1.48 sec. (0.05 init + 1.44 sieve) at 7123434 p/sec.
Average processor utilization: 1.51 (init), 0.07 (sieve)
no screen problems here...
ppsieve-cl-x86-windows.exe -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.1-beta (testing)
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
nstep changed to 22
Computation Error: no candidates found for p=20070000292703 between 1753994 and 2000000.
Computation Error: no candidates found for p=20070000462433 between 1504010 and 1753994.
20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
Computation Error: no candidates found for p=20070001722619 between 1004042 and 1254026.
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
Computation Error: no candidates found for p=20070002155237 between 1004042 and 1254026.
20070002680493 | 6455*2^1778260-1
Computation Error: no candidates found for p=20070002886811 between 754058 and 1004042.
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070010000000
Found 14 factors
count=326136,sum=0x5ad678173464405c
Elapsed time: 19.89 sec. (0.03 init + 19.85 sieve) at 514992 p/sec.
Processor time: 1.61 sec. (0.05 init + 1.56 sieve) at 6553558 p/sec.
Average processor utilization: 1.42 (init), 0.08 (sieve)
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
Here are the other ranges:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
...
Elapsed time: 95.62 sec. (0.04 init + 95.58 sieve) at [b]106966 p/sec[/b].
Screen was horribly sluggish with this one.
That ain't right.
nstep changed to 22
Computation Error: no candidates found for p=20070000113153 between 1004042 and 1254026.
20070000475957 | 4995*2^1822738+1
Computation Error: no candidates found for p=20070000860671 between 1504010 and 1753994.
That ain't right either.
Looks like this one's going to need some work. :(
If someone's looking for a good version to use with BOINC, use this instead.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
If someone's looking for a good version to use with BOINC, use this instead.
I get the same times and errors with the apps at this link.
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
That link might not have been accessible for anyone but me. I've now reverted to v0.2.0-beta, and I'm sticking with that.
Upon further reflection, it may not be possible to implement the latest algorithm on OpenCL. It appears that converting from vector long to vector int actually takes some operations in OpenCL; the algorithm depends on casting long to int with no cost.
So I'm sticking with v0.2.0 beta and never buying an AMD GPU. Hey, at least I finally made a decision! :P
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
That link might not have been accessible for anyone but me. I've now reverted to v0.2.0-beta, and I'm sticking with that.
Upon further reflection, it may not be possible to implement the latest algorithm on OpenCL. It appears that converting from vector long to vector int actually takes some operations in OpenCL; the algorithm depends on casting long to int with no cost.
So I'm sticking with v0.2.0 beta and never buying an AMD GPU. Hey, at least I finally made a decision! :P
OpenCL certainly seems limited compared to native algorithms as with CUDA on the NVidia cards. Have you thought much about trying to do the app in ATI's native Brook/CAL? The Collatz project was able to do their app that way, but I have no idea how difficult it is to work with the ATI cards this way...
____________
141941*2^4299438-1 is prime!
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
OpenCL certainly seems limited compared to native algorithms as with CUDA on the NVidia cards. Have you thought much about trying to do the app in ATI's native Brook/CAL? Yes.The Collatz project was able to do their app that way, but I have no idea how difficult it is to work with the ATI cards this way...
Well, I couldn't do it without buying an ATI card, and I don't want to buy an ATI card if it's going to be slower than an nVIDIA card. Catch-22!
Also, the current fastest algorithm on nVIDIA is very linear. ATI needs instruction-level parallelism, and evidently that's not easy to come by. So I'm not sure CAL could do much either. Certainly not sure enough to buy an ATI card.
On the other hand, the vectorizing I did on ATI only gave about a 33% speedup. It might be worth un-vectorizing it and applying the newest algorithm. But ATI/OpenCL is so unpredictable that I'm not inclined to try this soon.
____________
|
|
|
|
|
|
Rumor has it i do get back my vendor-"repaired" ATI 5770 in the course of this week.
I would take a look into it, but CAL seemed to me as a big pain in the back and there is (or was?) not that good documentation available as for CUDA. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
I found this PDF about CAL - about four days ago from the time stamp. Knock yourself out! :)
____________
|
|
|
|
|
|
Very interesting work Ken_g6!
I'll have to see if I can get the code to compile and run on my Macintosh. It seems to be going well so far, with only minor modifications, the compilation part at least.
Some progress (the executable name lies):
$ ./ppsieve-cl-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 18 2010 with GCC 4.2.1 (Apple Inc. build 5659)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Error: Building Program (clBuildProgram): Program build failure
I still need to compile the boinc libraries to get the -boinc version... but even this failure is promising at this point. |
|
|
|
|
|
GAAA i hate ATI!!!!
# yum localinstall --nogpgcheck fglrx64_6_9_0-8.712-1.x86_64.rpm
Loaded plugins: kernel-module, security
Setting up Local Package Process
Examining fglrx64_6_9_0-8.712-1.x86_64.rpm: fglrx64_6_9_0-8.712-1.x86_64
Marking fglrx64_6_9_0-8.712-1.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package fglrx64_6_9_0.x86_64 0:8.712-1 set to be updated
--> Finished Dependency Resolution
Beginning Kernel Module Plugin
Finished Kernel Module Plugin
Dependencies Resolved
========================================================================================================================
Package Arch Version Repository Size
========================================================================================================================
Installing:
fglrx64_6_9_0 x86_64 8.712-1 /fglrx64_6_9_0-8.712-1.x86_64 121 M
Transaction Summary
========================================================================================================================
Install 1 Package(s)
Upgrade 0 Package(s)
Total size: 121 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : fglrx64_6_9_0 1/1
Error! Bad return status for module build on kernel: 2.6.18-194.17.1.el5 (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/fglrx/8.712/build/ for more information.
Error! Invalid number of parameters passed.
Usage: remove -m <module> -v <module-version> --all
or: remove -m <module> -v <module-version> -k <kernel-version>
DKMS part of installation failed. Please refer to /usr/share/ati/fglrx-install.log for details
Installed:
fglrx64_6_9_0.x86_64 0:8.712-1
Complete! |
|
|
|
|
|
ATI is soooo coooool
# ./make.sh
AMD kernel module generator version 2.1
doing Makefile based build for kernel 2.6.x and higher
rm -rf *.c *.h *.o *.ko *.GCC* .??* *.symvers
make -C /lib/modules/2.6.18-194.17.1.el5/build SUBDIRS=/var/lib/dkms/fglrx/8.712/build/2.6.x modules
make[1]: Entering directory `/usr/src/kernels/2.6.18-194.17.1.el5-x86_64'
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/firegl_public.o
/var/lib/dkms/fglrx/8.712/build/2.6.x/firegl_public.c:2415: Warnung: »kcl_flush_tlb_one« definiert, aber nicht verwendet
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_acpi.o
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_agp.o
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_debug.o
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.o
/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.c: In function »KCL_IOCTL_AllocUserSpace32«:
/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.c:196: Fehler: Implizite Deklaration der Funktion »compat_alloc_user_space«
/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.c:196: Warnung: return erzeugt Zeiger von Ganzzahl ohne Typkonvertierung
make[2]: *** [/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.o] Fehler 1
make[1]: *** [_module_/var/lib/dkms/fglrx/8.712/build/2.6.x] Fehler 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-194.17.1.el5-x86_64'
make: *** [kmod_build] Fehler 2
build failed with return value 2
https://access.redhat.com/kb/docs/DOC-40265
RHEL errata
solution
solution
# diff -u /tmp/kcl_ioctl.c 2.6.x/kcl_ioctl.c
--- /tmp/kcl_ioctl.c 2010-10-19 21:30:27.000000000 +0200
+++ 2.6.x/kcl_ioctl.c 2010-10-19 21:23:52.000000000 +0200
@@ -193,7 +193,7 @@
*/
void* ATI_API_CALL KCL_IOCTL_AllocUserSpace32(long size)
{
- return compat_alloc_user_space(size);
+ return arch_compat_alloc_user_space(size);
}
#endif // __x86_64__ |
|
|
|
|
|
no luck, i have streak-sdk 2.1 an the related ICD.
$ ./ppsieve-cl-x86_64-linux -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
./ppsieve-cl-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-x86_64-linux)
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
nstep changed to 22
Error: Building Program (clBuildProgram): Program build failure
/tmp/OCLL5MUom.cl(76): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(21)
^
/tmp/OCLL5MUom.cl(76): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(21)
^
/tmp/OCLL5MUom.cl(77): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(42)
^
/tmp/OCLL5MUom.cl(77): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(42)
^
/tmp/OCLL5MUom.cl(99): warning: variable "n" was declared but never referenced
uint n = D_NMIN;
^
/tmp/OCLL5MUom.cl(174): error: expression must have integral type
my_factor_found <<= shift;
^
5 errors detected in the compilation of "/tmp/OCLL5MUom.cl".
stream-sdk-samples do work
$ /opt/ati-stream-sdk/samples/opencl/bin/x86_64/CLInfo
/opt/ati-stream-sdk/samples/opencl/bin/x86_64/CLInfo: /usr/lib64/libOpenCL.so: no version information available (required by /opt/ati-stream-sdk/samples/opencl/bin/x86_64/CLInfo)
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd
Platform Name: ATI Stream
Number of devices: 2
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 2001Mhz
Address bits: 64
Max memory allocation: 1073741824
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 0
Cache size: 0
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x2b8217c24228
Name: Genuine Intel(R) CPU @ 0000 @ 2.00GHz
Vendor: GenuineIntel
Driver version: 1.1
Profile: FULL_PROFILE
Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Extensions: cl_khr_icd cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 10
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 960Mhz
Address bits: 32
Max memory allocation: 268435456
Image support: No
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 268435456
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x2b8217c24228
Name: Juniper
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.556
Profile: FULL_PROFILE
Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_amd_device_attribute_query
Passed!
$ ls -l /usr/lib64/libOpenCL*
lrwxrwxrwx 1 root root 14 12. Okt 19:15 /usr/lib64/libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx 1 root root 16 12. Okt 19:15 /usr/lib64/libOpenCL.so.1 -> libOpenCL.so.1.0
lrwxrwxrwx 1 root root 18 12. Okt 19:15 /usr/lib64/libOpenCL.so.1.0 -> libOpenCL.so.1.0.0
-rwxr-xr-x 1 root root 20968 12. Okt 19:15 /usr/lib64/libOpenCL.so.1.0.0
$ ls -l /etc/OpenCL/vendors/
insgesamt 12
-r--r--r-- 1 root root 15 5. Mär 2010 atiocl32.icd
-r--r--r-- 1 root root 15 5. Mär 2010 atiocl64.icd
lrwxrwxrwx 1 root root 23 15. Jun 23:20 libatiocl32.so -> /usr/lib/libatiocl32.so
lrwxrwxrwx 1 root root 25 15. Jun 23:20 libatiocl64.so -> /usr/lib64/libatiocl64.so
-r--r--r-- 1 root root 11 12. Okt 19:15 nvidia.icd
$ ls -l /usr/lib64/libatiocl64.so
lrwxrwxrwx 1 root root 45 9. Apr 2010 /usr/lib64/libatiocl64.so -> /opt/ati-stream-sdk/lib/x86_64/libatiocl64.so
$ ls -l /opt/ati-stream-sdk/lib/x86_64/libatiocl64.so
-rwxr-xr-x 1 a0062995 a0062995 12477000 15. Apr 2010 /opt/ati-stream-sdk/lib/x86_64/libatiocl64.so |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Nothing worked for me until I upgraded to the 2.2 SDK.
____________
|
|
|
|
|
|
now i need a new driver, i love it so, the 10.9 can't be installed, will try 10.7... |
|
|
|
|
|
10.8 works, but only if you do trick the driver and build the module while you are building the distro-specific rpm and copy it over to your kernel-module-extra-dir, what a mess...
$ ./ppsieve-cl-x86_64-linux -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
./ppsieve-cl-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-x86_64-linux)
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
nstep changed to 22
20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
20070002680493 | 6455*2^1778260-1
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070010000000
Found 14 factors
count=326136,sum=0x5ad678173464405c
Elapsed time: 10.56 sec. (0.02 init + 10.54 sieve) at 970182 p/sec.
Processor time: 9.20 sec. (0.02 init + 9.18 sieve) at 1113732 p/sec.
Average processor utilization: 1.15 (init), 0.87 (sieve)
Scientific Linux SL release 5.5 (Boron)
Genuine Intel(R) CPU @ 0000 @ 2.00GHz (Xeon E5504 ES with HyperThreading)
6 GB RAM
MSI R5770 PMDIG (800 SPU @ 850 MHz; 1024 MB DDR5)
ati-stream-sdk-2.2
10.8-driver
gpuload as reported by aticonfig --odgc is 95% |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Congrats! :)
____________
|
|
|
|
|
|
# time ./ppsieve-cl-boinc-x86_64-linux -p1186491e9 -P1186492e9 -k 1201 -K 9999 -N 2000000 -c 60
./ppsieve-cl-boinc-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-boinc-x86_64-linux)
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=86, nstep=37
ppsieve initialized: 1201 <= k <= 9999, 86 <= n < 2000000
nstep changed to 32
(...)
Found 27 factors
real 8m36.841s
user 1m21.796s
sys 7m7.255
Can't open init data file - running in standalone mode
Sieve started: 1186491000000000 <= p < 1186492000000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Thread 0 completed
Sieve complete: 1186491000000000 <= p < 1186492000000000
count=28805195,sum=0xbece431c19d67201
Elapsed time: 514.84 sec. (0.13 init + 514.71 sieve) at 1943001 p/sec.
Processor time: 509.01 sec. (0.14 init + 508.88 sieve) at 1965273 p/sec.
Average processor utilization: 1.02 (init), 0.99 (sieve)
called boinc_finish
Aroud 50% faster than a GT 240
found factors do match with the ones from my 9400 GT that ran the same range. |
|
|
|
|
I couldn't do it without buying an ATI card, and I don't want to buy an ATI card if it's going to be slower than an nVIDIA card. Catch-22!
I don't want to be unrespectful, but I think that with a decent algorithm, in these calculations ati cards should be really faster than a similar nvidia.
Anyway, thanks for your work. It's also open source, so when somebody will be able to go onward (time is not free, unfortunately), it will start from there. |
|
|
|
|
I don't want to be unrespectful, but I think that with a decent algorithm, in these calculations ati cards should be really faster than a similar nvidia.
FWIW... the real 'problem' here is openCL and it's JIT... It's crap.
Just a tip... http://sourceforge.net/projects/calpp/ that should make it easier... |
|
|
|
|
|
Maybe i ask that to early, because you still try to test the application and BOINC will come later. In these case, i will ask later again. ;)
After pschoefer told me that there is an way to crunch for PG on an ATI i tested it under BOINC. And cool - it works fine (as long as i deactivate manually my onboard ATI HD 2400 on Device0).
Prior that, i tested the sieve on cmd.exe, it find the correct ATI-card and ignored the HD 2400.
Under Boinc instead it starts one wu with device 0 and compute correctly (remember, according to BOINC, the device 0 should be the HD2400 ?!?), then it starts a second wu with device 1 (at normal the ATI 5750 ?!?) and immediately produce an error.
As result (ignoring the wrong device-output):
The ATI HD 5750 (725/1150MHz; Device 1) crunch and crunch and crunch.... ;)
The ATI HD 2400 (when active) produce error on error on error... :/
My question: Is there a way to exclude the HD2400 for PG? The card could crunch for Collatz (RV610-Chip... not fast, but the card works). Is there a way to write it in the app_info.xml or is that a problem, i can't solve?
Thank you for your regard. |
|
|
|
|
|
I doubt that you crunch with the CUDA version on an ATI card :)
I think you tried to get the OpenCL version to work on your 5750 and since
OpenCL (to my knowledge) doesn't support your HD 2400 it selects the right
card running the application through the command line by simply ignoring the
not OpenCL capable HD 2400.*
If you use app_info.xml files for Collatz and PG you could try to add a --device X
switch to the command lines with X denoting the device you want to use for the
respective BOINC project (HD 2400 for Collatz and the 5750 for PG) but I don't
know what BOINC will do if it has only work for the PG project and tries to start two
work units which will then end up on the same card (via the added --device X) switch.
*Even if BOINC sees your HD 2400 as device 0, for the ATI OpenCL app
device 0 is most probably the first OpenCL capable device and that is
your 5750.
____________
|
|
|
|
|
|
CUDA.. was wrong.. jepp :D My mistake... and cancelled in text. And yes.. i meant the openCL. ;)
Okay, that explain, why in BOINC it crunch on device 0 (instead the BOINC tells me, that device 0 should be the HD 2400). Thanks for your explanation.
I will try to test it via the app_info.xml.... and pschoefer has to explain it. He has more knowledge with that. :D
Thanks for that hint. :) |
|
|
|
|
|
Wherever i searched (boincs website and their "help-wiki"... there's no result for my question. What boinc could need is an <ignore_gpu> 1|2|3|n</ignore_gpu>-Tag for app_info.xml. :/ |
|
|
|
|
|
there is an option for the cc_config.xml file which is <ignore_ati_dev>N</ignore_ati_dev>, but i don't know if it will work in an app_info. |
|
|
|
|
|
Thanks tepek for answering,
It seems, that the <ignore_ati_dev>N</ignore_ati_dev> -option only works on the global settings-file from boinc. |
|
|
|
|
|
Haven't gotten it to work so far. I have the SDK 2.2 installed, and just installed fresh drivers. Running Win 7 64bit Dell Optiplex 755 with an ATI Radeon 2400:
>ppsieve-cl-x86-windows.exe -p6900015e6 -P6 900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found. |
|
|
|
|
|
Spamguy, as you can read on http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx your card is not listed and so, the card won't work. ;)
But if you want the card to do something; Especially Collatz@home (don't know a second project ad hoc) also works with Radeon HD 2400. |
|
|
|
|
|
D'oh, you're right. I didn't expect the 2400 to be omitted from any compatibility lists; it's not that old (~3 years) or rare.
Collatz? Meh. Primes (or lack thereof, if we're dealing with sieves) are cooler. :) |
|
|
|
|
|
The 2400 for PG?.. That would solve my own problems with my device 1-problem :D |
|
|
|
|
|
Ati Radeon HD5870 Catalyst 10.9
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 2.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.79 sec. (0.01 init + 6.77 sieve) at 1509057 p/sec.
Processor time: 6.79 sec. (0.01 init + 6.78 sieve) at 1508006 p/sec.
Average processor utilization: 0.99 (init), 1.00 (sieve)
called boinc_finish
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 4.81 sec. (0.04 init + 4.77 sieve) at 2142853 p/sec.
Processor time: 3.55 sec. (0.01 init + 3.54 sieve) at 2888216 p/sec.
Average processor utilization: 0.37 (init), 0.74 (sieve)
called boinc_finish
____________
|
|
|
|
|
|
Your 5870 is slower than my 5770?
$ time ./ppsieve-cl-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
./ppsieve-cl-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-x86_64-linux)
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.39 sec. (0.03 init + 6.36 sieve) at 1607500 p/sec.
Processor time: 6.33 sec. (0.03 init + 6.30 sieve) at 1624074 p/sec.
Average processor utilization: 1.10 (init), 0.99 (sieve) |
|
|
|
|
|
One step closer for mac support. Got the BOINC libraries compiled, and the boinc version compiled, but the same error as before:
$ ./ppsieve-cl-boinc-x86_64-mac -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -60
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 22 2010 with GCC 4.2.1 (Apple Inc. build 5664)
ppsieve-cl-boinc-x86_64-mac: invalid option -- 6
01:14:16 (11821): called boinc_finish
$ cat stderr.txt
01:12:23 (11800): Can't open init data file - running in standalone mode
pmax not specified, using default pmax = pmin + 1e9
Please specify an input file or all of kmin, kmax, and nmax
01:12:23 (11800): called boinc_finish
01:12:48 (11805): Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Error: Building Program (clBuildProgram): Program build failure
01:12:49 (11805): called boinc_finish
Now to debug the source of the Program build failure. Up to this point there haven't been any code changes, just some redefining of files in headers and editing the Makefile. |
|
|
|
|
|
I seemed to have missed the "c " in "-c 60" in the above example, but the error remains. Tackling that error is the next project. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Oh, yeah, this was the error:
ppsieve-cl-boinc-x86_64-mac: invalid option -- 6
-c does nothing on BOINC. Try this for starters:
$ ./ppsieve-cl-boinc-x86_64-mac -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000
____________
|
|
|
|
|
|
So.... As long, as the Problem with two GPU (but only one serves correct conclusions) exists on BOINC (and there's no other way to disable one GPU for a specified project), i disable the wrong one. :/
Better for all projects would be a '<ignore_GPU_dev>N</ignore_GPU_dev>-option' for the project-app_info.xml (as I said). ;) |
|
|
|
|
|
On Radeon 5870 ( 900 MHz core, catalyst 10.10 ) took about 215 s on 1 wu in PPS sieve. Is it good, or not? :)
____________
Polish National Team |
|
|
|
|
On Radeon 5870 ( 900 MHz core, catalyst 10.10 ) took about 215 s on 1 wu in PPS sieve. Is it good, or not? :)
My HD5850 took around 330s but fermi cards does it in around 100s :/ at this time.
____________
|
|
|
|
|
|
Is there a chance, that this application speed up like the cuda app?
I don't think that is everything what 5870 can do.
____________
Polish National Team |
|
|
|
|
Is there a chance, that this application speed up like the cuda app?
I don't think that is everything what 5870 can do.
Its in development by ken. I hope too for an increase of speed soon.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
Is there a chance, that this application speed up like the cuda app?
I don't think that is everything what 5870 can do.
Its in development by ken. I hope too for an increase of speed soon.
Don't hold your breath. I tried the exact same speedup as I used for CUDA, but it made the OpenCL slower. :(
I think this might have to do with the vectorization I did for OpenCL. So I could try un-vectorizing and then trying that speedup. But AMD needs so much ILP (Instruction-Level Parallelism) that I wouldn't expect much of a boost.
What I'm working on now is bugfixes for larger nmin.
____________
|
|
|
|
|
|
Anything new about the -n Parameter-error?
I still can't build the app via the provided Makefile and am currently looking into AP26 with ATI-stream (why? no idea... ;) )
Will build my own Makefile somewhat this week. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Alright, I'm back! A week ago I thought I had the -n fix almost done. Then I got my GTX 460 in the mail, and I just couldn't wait to install it. I scratched my motherboard badly and had to order a new one; but I'm finally back up and running, and I think I've finally fixed the -n problem!
So, download v0.2.3-beta and give it a try. I changed the version so you know it's nothing like v0.2.2 of the CUDA app. This version uses a lot more CPU, and won't be any faster than v0.2.0.
On the other hand, this could be a template for a much faster low-level Stream app, if anyone wants to give it a shot. At a low level, I don't know of any GPU that works directly with 64-bit numbers, so you'd have to handle the numbers as pairs of 32-bit numbers. But all that would be needed is the main loop, plus the optimized <=32-bit shiftmod that can be found in the CUDA app. As long as there's a umul32hi instruction, and add with carry, it should be possible.
Edit: I'm going to try again to install that GTX 460 shortly. If I don't show up here again for awhile, you can guess why. :P
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
i7 980X @ 4 GHz + HD5850 @ 725/1000
Win7 Prof. x64, Catalyst 10.10, SDK 2.2
ppsieve version cl-0.2.3-beta (testing)
1) -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 3.94 sec. (0.04 init + 3.91 sieve) at 2616594 p/sec.
Processor time: 0.59 sec. (0.03 init + 0.56 sieve) at 18204347 p/sec.
Average processor utilization: 0.87 (init), 0.14 (sieve)
Slightly faster than 0.2.0-beta, same CPU usage.
2) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 5.69 sec. (0.03 init + 5.67 sieve) at 1803958 p/sec.
Processor time: 0.87 sec. (0.03 init + 0.84 sieve) at 12136224 p/sec.
Average processor utilization: 1.16 (init), 0.15 (sieve)
Almost 10% faster, 30% more CPU usage.
3) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
CL setup complete.
cthread_count = 18432
Computation Error: no candidates found for p=20070000113153 between 1254026 and 1504010.
4) -p20070e9 -P20070010e6 -k 5 -K 9999 -n 2M -N 3M -c 60
Elapsed time: 5.69 sec. (0.03 init + 5.67 sieve) at 1803958 p/sec.
Processor time: 0.87 sec. (0.03 init + 0.84 sieve) at 12136224 p/sec.
Average processor utilization: 1.16 (init), 0.15 (sieve)
While -n is working now, -R is broken. :(
____________
|
|
|
|
|
|
My initial result as follows:
However ..
8 cores running on other project's
ATI running another project during this test
Will redo test with all cores running in idle
..pls wait
F:\PRP Client>ppsieve-cl-x86-windows.exe -p1186491e9 -P1186492e9 -k 1201 -K 9999
-N 2000000 -c 60
ppsieve version cl-0.2.3-beta (testing)
nstart=86, nstep=37
ppsieve initialized: 1201 <= k <= 9999, 86 <= n < 2000000
Sieve started: 1186491000000000 <= p < 1186492000000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 10240
1186491022263817 | 6941*2^233525+1
1186491080146633 | 4395*2^1291390+1
p=1186491091488257, 1.525M p/sec, 0.08 CPU cores, 9.1% done. ETA 05 Nov 17:53
1186491095011007 | 6337*2^7988+1
p=1186491184811521, 1.538M p/sec, 0.07 CPU cores, 18.5% done. ETA 05 Nov 17:53
1186491191752583 | 2241*2^901315+1
1186491214124857 | 3423*2^1464822+1
p=1186491278134785, 1.539M p/sec, 0.07 CPU cores, 27.8% done. ETA 05 Nov 17:53
1186491289770991 | 1857*2^1984791+1
1186491356920087 | 9471*2^1760528+1
p=1186491371458049, 1.539M p/sec, 0.07 CPU cores, 37.1% done. ETA 05 Nov 17:53
1186491384738463 | 2525*2^1518847+1
1186491451160011 | 1887*2^794562+1
1186491453475903 | 3635*2^1262489+1
p=1186491464781313, 1.542M p/sec, 0.07 CPU cores, 46.5% done. ETA 05 Nov 17:53
1186491483260371 | 5675*2^560553+1
1186491508531129 | 9239*2^82853+1
1186491511971901 | 9795*2^1854582+1
1186491512556677 | 3021*2^1745789+1
1186491522406637 | 4739*2^561617+1
p=1186491558104577, 1.540M p/sec, 0.08 CPU cores, 55.8% done. ETA 05 Nov 17:53
1186491583230223 | 1481*2^23921+1
1186491594148577 | 7941*2^821073+1
p=1186491651427841, 1.542M p/sec, 0.07 CPU cores, 65.1% done. ETA 05 Nov 17:53
1186491695054123 | 7941*2^863421+1
1186491696289171 | 6759*2^1711227+1
1186491701397437 | 6485*2^911463+1
p=1186491744751105, 1.543M p/sec, 0.07 CPU cores, 74.5% done. ETA 05 Nov 17:53
1186491773175427 | 9039*2^1690489+1
1186491781207897 | 8925*2^339084+1
1186491784054111 | 3859*2^679438+1
p=1186491838336513, 1.547M p/sec, 0.07 CPU cores, 83.8% done. ETA 05 Nov 17:53
1186491879046813 | 6889*2^1550574+1
1186491888090709 | 3273*2^1456432+1
p=1186491931659777, 1.539M p/sec, 0.07 CPU cores, 93.2% done. ETA 05 Nov 17:53
1186491941241479 | 3339*2^326445+1
1186491961620941 | 9493*2^1403080+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 1186491000000000 <= p < 1186492000000000
Found 27 factors
count=28805195,sum=0xbece431c19d67201
Elapsed time: 650.43 sec. (0.26 init + 650.17 sieve) at 1538174 p/sec.
Processor time: 47.35 sec. (0.27 init + 47.08 sieve) at 21241630 p/sec.
Average processor utilization: 1.04 (init), 0.07 (sieve) |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
While -n is working now, -R is broken. :(
ARGH! I hate ATI!
The good news is the GPU went in (almost) without a hitch, so I will be able to look into this.
____________
|
|
|
|
|
|
I do agree, cuda is way more convenient. I haven't looked into the app further and am even stuck while porting the ap26. Would download the new sieve-app next week and give it a try. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Now -R is fixed, in v0.2.3 non-beta. Plus, I added TPSieve-CL binaries to the zipfile.
I'm hoping this kills the last of the bugs, but let me know if you find more.
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
Just did several tests, -R, -n and with sievefile, everything is working now. tpsieve is also working. But I can't get more than 80% GPU load, any ideas?
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
It's possible that moving the initialization to the CPU has made the app either CPU-limited or GPU bandwidth limited.
As long as it's working, at this point I'm happy. And now that I have my GTX 460, I'd kind of like to wash my hands of ATI and OpenCL.
____________
|
|
|
|
|
|
There is a little speedup with the new app on my HD5850:
old: 5:30min
new: 4:50min
But slower than GTX460.
Thx Ken for the optimization. Cant wait for a new faster version!
____________
|
|
|
|
|
|
Is this App double precision, IE it won't work on older/cheaper ATI cards?
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
Is this App double precision, IE it won't work on older/cheaper ATI cards?
No, but it's OpenCL, so it won't work on cards older than the 4000 series.
As for speedups, I might be able to produce a small one, but I'm waiting to see if pschoefer can produce a bigger one by switching to a lower-level Stream language.
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
I have the PPS Wu's running on a Dual HD 5870 Box, 7 Min for 2 Wu's & 14 for 4 Wu's. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
A question: For those of you who installed the latest 10.10 drivers, were you able to get this app to run without installing the SDK? (I.e. Did you get it running by downloading nothing but the drivers from ATI/AMD?) If so, that could be a big step towards making this an official app.
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
I couldn't get the Wu's to run at first by installing the SDK Package, finally I just Uninstalled everything but the Video Drivers v10.5 and just installed the v10.10 Drivers and the Wu's took off running okay.
Of course I had to put the App File & exc file in the Directory too, I put the cudart.dll in not know if it needed it or not so I put it in. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
A question: For those of you who installed the latest 10.10 drivers, were you able to get this app to run without installing the SDK? (I.e. Did you get it running by downloading nothing but the drivers from ATI/AMD?) If so, that could be a big step towards making this an official app.
Upgraded to 10.10 and can run the app. without also downloading the separate SDK (not sure if the upgrade wipes out all of the older SDK install or not...would be nice to have someone do it as a completely fresh install).
HD4670 on 64-bit Vista.
____________
141941*2^4299438-1 is prime!
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1905 ID: 352 Credit: 4,056,750,098 RAC: 4,353,900
                                 
|
Upgraded to 10.10 and can run the app. without also downloading the separate SDK (not sure if the upgrade wipes out all of the older SDK install or not...would be nice to have someone do it as a completely fresh install).
Which edition? Accelerated Parallel Processing (APP) containt OpenCL drivers...
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
|
A question: For those of you who installed the latest 10.10 drivers, were you able to get this app to run without installing the SDK? (I.e. Did you get it running by downloading nothing but the drivers from ATI/AMD?) If so, that could be a big step towards making this an official app.
there're two version of catalyst (devs said that maybe we will have only one in future when OpenCL will be more stable):
- one is the classic version, that still need SDK
- the other has the OpenCL driver integrated (APP edition) and if you install this one you don't need any other SDK to make your application work. |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Probably been brought up before but the 2 biggest things I notice running the PPS's on my ATI Cards is that they use a lot of CPU, 20%+ Per Core at times so It's probably better to just not run any CPU work ?
Also Memory doesn't seem to play a role, they run just as fast with my Memory set at 1000 as at 300 ... |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Upgraded to 10.10 and can run the app. without also downloading the separate SDK (not sure if the upgrade wipes out all of the older SDK install or not...would be nice to have someone do it as a completely fresh install).
Which edition? Accelerated Parallel Processing (APP) containt OpenCL drivers...
APP version...
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
I couldn't get the Wu's to run at first by installing the SDK Package, finally I just Uninstalled everything but the Video Drivers v10.5 and just installed the v10.10 Drivers and the Wu's took off running okay.
Of course I had to put the App File & exc file in the Directory too, I put the cudart.dll in not know if it needed it or not so I put it in.
I would like to get this working but lack the knowledge of what to do. What are the steps to get this working with either Win 7 or linux?
(I would rather mess my Win 7 host than my Linux hosts.)
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
I couldn't get the Wu's to run at first by installing the SDK Package, finally I just Uninstalled everything but the Video Drivers v10.5 and just installed the v10.10 Drivers and the Wu's took off running okay.
Of course I had to put the App File & exc file in the Directory too, I put the cudart.dll in not know if it needed it or not so I put it in.
I would like to get this working but lack the knowledge of what to do. What are the steps to get this working with either Win 7 or linux?
(I would rather mess my Win 7 host than my Linux hosts.)
Win 7 or Linux I don't know much about so it's best if someone else helps you. If no one does later on I'll try a stab at Win 7 but am to busy right now to give it a shot ...
|
|
|
|
|
|
Here is what I have done so far to make it run on Seven 64bits
Stop Boinc
install ATI drivers 10.9 (but try the 10.10 with OpenCl SDK drivers).
install SDK 2.2 (not needed if 10.10 with opencl drivers)
http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx#two
reboot to get all set properly
download the PPSieve-OpenCL file at the begining of his thread.
inside you have files tu unzip in primegrid subdirectory
C:\ProgramData\BOINC\projects\www.primegrid.com\
ppsieve-cl-boinc-x86-windows.exe
ppsieve-cl-x86-windows.exe
tpsieve-cl-boinc-x86-windows.exe (futur part of project i think).
Then you have to create a file app_info.xml (see message 26255 on this thread.
Use notepad to copy and paste and save in same directory
By the way, I have not deleted any other files and I was crunching PPS on cpu before with last application (a big file TRP_20100604.sieveinput is still present.
I have selected only PPS work on my Primegrid account preferences.
Then start again Boinc and check if it runs |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
I've just released v0.2.3a, which fixes a couple of bugs in TPSieve-CL, and allows the dummy -T option so TPSieve won't get confused with PPSieve.
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Some of us are new to all of this Ken, I just did manage to get my Box's running the pps wu's and then they shut that part down and now I have to figure out how to get them to run the tps wu's.
So when you make a new release just what are we supposed to do with it ??? right now I'm still running the pps wu's and it's no simple thing for me to all of a sudden have to switch everything to something else. I spent a good 8-10 hr's yesterday getting them to run the pps's now it looks like I have to spend another 8-10 switching to something else. |
|
|
|
|
|
What driver do i need for the new 0.2.3?
It does not work with my 10.8-drivers and my 2.2-SDK, it only runs on the CPU (640 SPUs detected it says...).
I think i will drop the ATI completely and swap it with my GTX 260 that i sold to a colleague... |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
What driver do i need for the new 0.2.3? 10.10 APP drivers would work, as we've been discussing.
It does not work with my 10.8-drivers and my 2.2-SDK, it only runs on the CPU (640 SPUs detected it says...). That shouldn't happen, unless you built the app yourself and included -D_DEVICEEMU. If it's not using 90% of all your CPU cores, it's not only running on your CPU. Have you tried the 0.2.3a binaries?
I think i will drop the ATI completely and swap it with my GTX 260 that i sold to a colleague... If no one's going to write a CAL version of this app, that may not be a bad idea.
____________
|
|
|
|
|
|
App 0.2.3 is running with 10.10 without any issues.
____________
|
|
|
|
|
(...)
That shouldn't happen, unless you built the app yourself and included -D_DEVICEEMU. If it's not using 90% of all your CPU cores, it's not only running on your CPU. Have you tried the 0.2.3a binaries?
It is the app from the download, it says cl-0.2.3 (testing) and uses all 8 HT-cores and yields 156791 p/sec
runtime for -p42070e9 -P42070010e6 is:
cl-0.2.0-beta (testing) 8.272 s
cl-0.2.3 (testing) 67.57 s
I think i have to go the extra-mile and build the 10.10-driver and test again. |
|
|
|
|
|
Okay, got it, but it is still not working:
# time ./ppsieve-cl-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 0
./ppsieve-cl-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-x86_64-linux)
ppsieve version cl-0.2.3 (testing)
Compiled Nov 6 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070002883585 in ppcheck42070e9.txt
Thread 0 starting
FATAL: Module nvidia not found.
Detected 128 multiprocessors (640 SPUs) on device 0.
CL setup complete.
cthread_count = 8192
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 47.33 sec. (0.02 init + 47.31 sieve) at 155146 p/sec.
Processor time: 362.66 sec. (0.02 init + 362.63 sieve) at 20241 p/sec.
Average processor utilization: 1.05 (init), 7.66 (sieve)
real 0m47.336s
user 6m2.421s
sys 0m0.292s
What is it all about with the nvidia-Module?
Old version runs as fine as before. |
|
|
|
|
|
working for me:
Win7 64
SDK 2.2
Catalyst 10.7
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000
ppsieve version cl-0.2.3a (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
CL setup complete.
cthread_count = 10240
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 8.48 sec. (0.04 init + 8.44 sieve) at 1210686 p/sec.
Processor time: 1.09 sec. (0.08 init + 1.01 sieve) at 10082392 p/sec.
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -R
ppsieve version cl-0.2.3a (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
CL setup complete.
cthread_count = 10240
42070000154219 | 6023*2^934790-1
42070001803331 | 5237*2^486598-1
42070003062431 | 7465*2^1994555-1
42070005645821 | 3633*2^119620-1
42070007733361 | 7007*2^1691614-1
42070008458437 | 7095*2^1422761-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 6 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 8.23 sec. (0.04 init + 8.19 sieve) at 1248081 p/sec.
Processor time: 1.12 sec. (0.05 init + 1.08 sieve) at 9497909 p/sec.
Average processor utilization: 1.14 (init), 0.13 (sieve) |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
If the old version works fine, I probably just forgot to "make clean" this time. There will be another version, as TPSieve isn't working right yet.
Edit: That is not the app from the current download. The current app says:
ppsieve version cl-0.2.3a (testing)
Compiled Nov 9 2010 with GCC 4.3.3
But there will still be another version.
____________
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
And I finally found the bug - mainly in TPSieve-CL, but it could affect PPSieve-CL at high ranges of P. So, get v0.2.3b, which may be slightly slower, but should work much better.
____________
|
|
|
|
|
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000
ppsieve version cl-0.2.3b (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
CL setup complete.
cthread_count = 10240
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 16.78 sec. (0.04 init + 16.74 sieve) at 610695 p/sec.
Processor time: 1.00 sec. (0.05 init + 0.95 sieve) at 10743539 p/sec.
Average processor utilization: 1.14 (init), 0.06 (sieve)
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -R
ppsieve version cl-0.2.3b (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
CL setup complete.
cthread_count = 10240
42070000154219 | 6023*2^934790-1
42070001803331 | 5237*2^486598-1
42070003062431 | 7465*2^1994555-1
42070005645821 | 3633*2^119620-1
42070007733361 | 7007*2^1691614-1
42070008458437 | 7095*2^1422761-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 6 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 17.34 sec. (0.05 init + 17.29 sieve) at 591166 p/sec.
Processor time: 1.23 sec. (0.08 init + 1.15 sieve) at 8856163 p/sec.
Average processor utilization: 1.70 (init), 0.07 (sieve)
no problems atm |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Technically, if nstep <=32, or 31 for TPSieve, you'll have no problem with the old version. Since I don't think nstep will go above 32 until P is around 1,000,000T or higher, the old PPSieve version should work for now. But soon we won't be using it anymore anyway.
____________
|
|
|
|
|
|
but ppsieve will be used over at PST for ppse4/5M and rsp4/5M. its not much but i'm glad you did it :)
even if the opencl app is much slower than the cuda one, my hd4870 sieves at nearly same speed like my i7 920 with -t8. For me its like i have a second i7 i can use for primegrid.
(ok nvidia card owners have 6 more pcs :) ) |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
And I just found a way to restore the previous speed without the previous bug. :) So let me know how v0.2.3c works.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
And I just found a way to restore the previous speed without the previous bug. :) So let me know how v0.2.3c works.
HD4670
i7-920 Vista HP 64-bit
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000
ppsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
CL setup complete.
cthread_count = 8192
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 18.06 sec. (0.03 init + 18.03 sieve) at 567159 p/sec.
Processor time: 0.87 sec. (0.03 init + 0.84 sieve) at 12136224 p/sec.
Average processor utilization: 0.92 (init), 0.05 (sieve)
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Just a note on the drivers, I had removed ATI Stream SDK and updated to the 10.10 using Steam. Then I had to install the 'OpenCL Driver' package (from the 'Individual Downloads'). The alternative is to use the 'Accelerated Parallel Processing (APP) Technology Edition' driver package. (Now I am just have to find out how to get an app_info.xml to actually work with it.)
HD5850 Windows 7 64-bit
>ppsieve-cl-x86-windows.exe -p42070e9 -P42
070010e6 -k 1201 -K 9999 -N 2000000
ppsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 288 multiprocessors (1440 SPUs) on device 0.
CL setup complete.
cthread_count = 18432
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 5.49 sec. (0.04 init + 5.44 sieve) at 1877853 p/sec.
Processor time: 1.03 sec. (0.03 init + 1.00 sieve) at 10239938 p/sec.
Average processor utilization: 0.73 (init), 0.18 (sieve)
E:\Users\Me\Downloads\0.23c_ppsieve-cl>
E:\Users\Me\Downloads\0.23c_ppsieve-cl> |
|
|
|
|
|
ATI 5770 // CAT 10.9 // Core clock 740 // Mem Clock 885
Win 7 64bit
i7 Intel 860
ppsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
CL setup complete.
cthread_count = 10240
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 8.47 sec. (0.08 init + 8.39 sieve) at 1218188 p/sec.
Processor time: 1.31 sec. (0.06 init + 1.25 sieve) at 8191947 p/sec.
Average processor utilization: 0.82 (init), 0.15 (sieve)
Looks to work fine for me.
Quote Msg 27890
(Now I am just have to find out how to get an app_info.xml to actually work with it.)
I too would be interested to know how to 'setup' the app to run automatically..
e.g. getting work from server and sending back.
Understand that this could be done by the 'app_info.xml' file ?
Any detailed help appreciated...Forum founds not yet really helped so far ..
Thank you |
|
|
|
|
|
Under the first post is a line:
Only the first post and the last 75 posts (of the 188 posts in this thread) are displayed.
Click here to also display the remaining posts.
You can click it to see all posts including the post with the old version of the ATI/OpenCL app_info.xml.
Anyway. Here is a new one for tpsieve:
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>tpsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>130</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>tpsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
This app_info.xml is designed for the new ATI/OpenCL/GPU work only.
Make sure you have finished all PrimeGrid tasks and reported them back.
Windows Installation Instructions (32+64 bit):
1. Shutdown BOINC (Client, Manager and all running science apps).
2. Save the lines above into a file called app_info.xml.
3. Copy the file (app_info.xml) and the tpsieve-cl-boinc-x86-windows.exe into the project directory (which might be hidden).
On my Vista installation this is the directory:
C:\ProgramData\BOINC\projects\www.primegrid.com
4. Restart BOINC.
Be aware of the fact that at the moment there is no PPS Sieve work available since PrimeGrid
is in the middle of the transition phase from ppsieve (sieving either on the Proth or on the Riesel side)
to tpsieve (sieving on the Proth and on the Riesel side simultaneously).
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Thanks Ralf ... |
|
|
|
|
|
Thank you soo much Paladin and Ralf !
Followed your instruction and it seems to work now ... GREAT !!
However 1 more Q..
a) Be aware of the fact that at the moment there is no PPS Sieve work available ..
so when will new work be available ?
Thanks |
|
|
|
|
Thank you soo much Paladin and Ralf !
Followed your instruction and it seems to work now ... GREAT !!
However 1 more Q..
a) Be aware of the fact that at the moment there is no PPS Sieve work available ..
so when will new work be available ?
Thanks
When all PPS sieve WUs in progress are completed the switch will occur. Yesterday the number
of WUs in progress dropped down to slightly over 16,000.
To quote the admins (John):
We expect to be up and running again next week.
You can take a look at this thread for the current state of affairs.
____________
|
|
|
|
|
|
ATI HD5770
Intel Xeon E5504 ES
Scientific Linux 5.5 (2.6.18-194.17.4.el5) x86_64
time ./ppsieve-cl-x86_64-linux -p20070e9 -P20070030e6 -k 1201 -K 9999 -N 2000000 -c 60
./ppsieve-cl-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-x86_64-linux)
ppsieve version cl-0.2.3c (testing)
Compiled Nov 11 2010 with GCC 4.3.3
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070030000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
nstep changed to 22
CL setup complete.
cthread_count = 10240
20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
20070012507137 | 7061*2^1730371+1
20070014814419 | 6923*2^1257277+1
20070014902183 | 6981*2^1674047+1
20070014977063 | 4809*2^470009+1
20070015687191 | 5973*2^502229+1
20070016440869 | 9577*2^34776+1
20070016765411 | 1421*2^1766083+1
20070016992617 | 7001*2^1177373+1
20070017505563 | 7627*2^401848+1
20070017763227 | 6625*2^1332032+1
20070018281791 | 4685*2^1510525+1
20070019870763 | 8189*2^2803+1
20070020027809 | 5031*2^723009+1
20070020746903 | 8401*2^1665760+1
20070020822867 | 1285*2^1300724+1
20070021227887 | 5441*2^715393+1
20070021461873 | 5251*2^1961728+1
20070021656557 | 9639*2^1501295+1
20070025749457 | 4769*2^1235959+1
20070026004331 | 6885*2^1246616+1
20070026054417 | 6445*2^911818+1
20070027042589 | 9033*2^1767986+1
20070027157451 | 9965*2^1295127+1
20070027234133 | 2485*2^20872+1
20070028746013 | 5707*2^1770344+1
20070029659087 | 7945*2^230508+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070030000000
Found 39 factors
count=979501,sum=0x10d16054a061b5d7
Elapsed time: 25.25 sec. (0.02 init + 25.23 sieve) at 1194742 p/sec.
Processor time: 21.78 sec. (0.02 init + 21.76 sieve) at 1385177 p/sec.
Average processor utilization: 1.10 (init), 0.86 (sieve)
real 0m25.271s
user 0m2.850s
sys 0m18.976s |
|
|
|
|
|
xxx tpsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2
00000 -c 60 -M2
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
nstep changed to 22
CL setup complete.
cthread_count = 10240
42070000070587 | 9475*2^197534+1
42070000154219 | 6023*2^934790-1
42070000198537 | 3373*2^1046686+1
42070001803331 | 5237*2^486598-1
42070003062431 | 7465*2^1994555-1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070005645821 | 3633*2^119620-1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070007733361 | 7007*2^1691614-1
42070008458437 | 7095*2^1422761-1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
42070010190569 | 5625*2^1903125+1
42070011430123 | 3821*2^1406279+1
42070012209011 | 9405*2^360411-1
42070012301263 | 1957*2^1185814+1
42070013521999 | 1965*2^404493+1
42070013970587 | 7143*2^1462422+1
42070013989247 | 5037*2^838603+1
42070016416499 | 4571*2^466510-1
42070017332953 | 6237*2^1916994+1
42070018235321 | 1941*2^363948+1
42070019117111 | 2523*2^999263-1
42070019542387 | 8587*2^1703626+1
42070021901227 | 6589*2^1149693-1
42070023987581 | 9811*2^318944+1
42070024242289 | 8319*2^1792800-1
42070024339237 | 9257*2^1170495+1
42070024532551 | 4311*2^1690093+1
42070024936837 | 5679*2^1726142+1
42070024995961 | 9111*2^1707153+1
42070026021997 | 4039*2^1819590+1
42070026719239 | 9981*2^629165-1
42070027452199 | 1323*2^854008+1
42070028029061 | 8205*2^1394191-1
42070029006583 | 5943*2^663870+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 40 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 45.02 sec. (0.04 init + 44.98 sieve) at 670183 p/sec.
Processor time: 1.98 sec. (0.05 init + 1.93 sieve) at 15584353 p/sec.
Average processor utilization: 1.34 (init), 0.04 (sieve)
Does that mean that the thread was executed mainly from the GPU ?
Sorry .. newby |
|
|
|
|
Elapsed time: 45.02 sec. (0.04 init + 44.98 sieve) at 670183 p/sec.
Processor time: 1.98 sec. (0.05 init + 1.93 sieve) at 15584353 p/sec.
Average processor utilization: 1.34 (init), 0.04 (sieve)
Does that mean that the thread was executed mainly from the GPU ?
Yes :)
The runtime was 45 seconds and the CPU was in use only 2 seconds of the time.
The CPU utilization was around 4.5%.
____________
|
|
|
|
|
|
With my 5870 (catalyst 10.10) on my Q6600 Linux system (Gentoo 2.6.34 kernel),
I have this
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.3c (testing)
Compiled Nov 11 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
CL setup complete.
cthread_count = 20480
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
And also
./ppsieve-cl-x86_64-linux -p20070e9 -P20070030e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.3c (testing)
Compiled Nov 11 2010 with GCC 4.3.3
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070030000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
nstep changed to 22
CL setup complete.
cthread_count = 20480
20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
20070012507137 | 7061*2^1730371+1
20070014814419 | 6923*2^1257277+1
20070014902183 | 6981*2^1674047+1
20070014977063 | 4809*2^470009+1
20070015687191 | 5973*2^502229+1
20070016440869 | 9577*2^34776+1
20070016765411 | 1421*2^1766083+1
20070016992617 | 7001*2^1177373+1
20070017505563 | 7627*2^401848+1
20070017763227 | 6625*2^1332032+1
20070018281791 | 4685*2^1510525+1
20070019870763 | 8189*2^2803+1
20070020027809 | 5031*2^723009+1
20070020746903 | 8401*2^1665760+1
20070020822867 | 1285*2^1300724+1
20070021227887 | 5441*2^715393+1
20070021461873 | 5251*2^1961728+1
20070021656557 | 9639*2^1501295+1
20070025749457 | 4769*2^1235959+1
20070026004331 | 6885*2^1246616+1
20070026054417 | 6445*2^911818+1
20070027042589 | 9033*2^1767986+1
20070027157451 | 9965*2^1295127+1
20070027234133 | 2485*2^20872+1
20070028746013 | 5707*2^1770344+1
20070029659087 | 7945*2^230508+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070030000000
Found 39 factors
count=979501,sum=0x10d16054a061b5d7
Elapsed time: 11.88 sec. (0.01 init + 11.87 sieve) at 2539866 p/sec.
Processor time: 11.85 sec. (0.01 init + 11.84 sieve) at 2545612 p/sec.
Average processor utilization: 0.72 (init), 1.00 (sieve)
____________
|
|
|
|
|
|
Hello!
Can anyone tell me when we get a big number of PPSsieve ATI tasks get?
Thx a lot.
Greetings from Germany. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
Take a look at http://www.primegrid.com/stats_pps_sieve.php.
Enough work for all.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
I've been messing around for 3+ hr's now but can't get the Wu's to download because apparently I don't have the Proper App File. I keep getting various error messages when BOINC starts up. It sees the App file but no matter how I set it up it doesn't accept it. I'm using the following App File ...
<app_info>
<app>
<name>tps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>primegrid_tpsieve_1.35_windows_intelx86__cuda23.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>tps_sr2sieve</app_name>
<version_num>135</version_num>
<plan_class>cuda23</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>CUDA</type>
<count>0.5</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>primegrid_tpsieve_1.35_windows_intelx86__cuda23.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
NOTE: I'm getting this Message too: I manually put the .exe file in the PrimeGrid Directory but when BOINC starts up it Deletes it out of the Directory and then tells me it's not there ...
11/21/2010 7:31:34 AM | PrimeGrid | Found app_info.xml; using anonymous platform
11/21/2010 7:31:34 AM | PrimeGrid | File referenced in app_info.xml does not exist: primegrid_tpsieve_1.35_windows_intelx86__cuda23.exe |
|
|
|
|
I've been messing around for 3+ hr's now but can't get the Wu's to download because apparently I don't have the Proper App File. I keep getting various error messages when BOINC starts up. It sees the App file but no matter how I set it up it doesn't accept it. I'm using the following App File ...
<app_info>
<app>
<name>tps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>primegrid_tpsieve_1.35_windows_intelx86__cuda23.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>tps_sr2sieve</app_name>
<version_num>135</version_num>
<plan_class>cuda23</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>CUDA</type>
<count>0.5</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>primegrid_tpsieve_1.35_windows_intelx86__cuda23.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
NOTE: I'm getting this Message too: I manually put the .exe file in the PrimeGrid Directory but when BOINC starts up it Deletes it out of the Directory and then tells me it's not there ...
11/21/2010 7:31:34 AM | PrimeGrid | Found app_info.xml; using anonymous platform
11/21/2010 7:31:34 AM | PrimeGrid | File referenced in app_info.xml does not exist: primegrid_tpsieve_1.35_windows_intelx86__cuda23.exe
You have changed too much - This is probably the source of the mistake:
Line 3: <name>tps_sr2sieve</name>
Line 11: <app_name>tps_sr2sieve</app_name>
Change this to:
<name>pps_sr2sieve</name>
and
<app_name>pps_sr2sieve</app_name>
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Okay that works Ralf, I changed it and got work on 1 Box anyway, are the ATI Wu's running yet, I see the application is posted ???
EDIT: Looks like under 9 Min's to run 4 on the GTX 580's I picked up a few days ago ... :) |
|
|
|
|
Okay that works Ralf, I changed it and got work on 1 Box anyway, are the ATI Wu running yet, I see the application is posted ???
The app is out but people complain about getting no work for it. Here is a working app_info.xml for the ATI app from Ken's archive
ATI/OpenCL app_info.xml:
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>tpsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>135</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>tpsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
You can find the archive here:
sites.google.com/site/kenscode/prime-programs
in the ppsieve-cl archive. it contains the tpsieve-cl-boinc-x86-windows.exe too.
The primegrid version of the app is here:
http://www.primegrid.com/download/primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe
If you use this you need to change the name of the executable in the app_info.xml.
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Okay & Thanks, I'll give it a try and see what happens ... |
|
|
|
|
|
Is the 2 WU on 1 GPU more efficient than 1?
____________
Polish National Team |
|
|
|
|
Is the 2 WU on 1 GPU more efficient than 1?
It was slightly more efficient but it might have changed with the introduction of longer WUs. Since BOINC needs quite some time to start a new WU (4-5 seconds) it was a way to reduce the idle time of the GPU between 2 WUs that required only 135-140 seconds to crunch.
____________
|
|
|
|
|
Is the 2 WU on 1 GPU more efficient than 1?
It was slightly more efficient but it might have changed with the introduction of longer WUs. Since BOINC needs quite some time to start a new WU (4-5 seconds) it was a way to reduce the idle time of the GPU between 2 WUs that required only 135-140 seconds to crunch.
Here is my old post with some timings on my GTX 460.
____________
|
|
|
|
|
|
Thanks for the info.
It looks like the 2WU is good setting.
____________
Polish National Team |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
Just did several tests, -R, -n and with sievefile, everything is working now. tpsieve is also working. But I can't get more than 80% GPU load, any ideas?
Ditto here on all my ATI 5850 & 5870 Box's, Maximum of 80% usage down to as low as 60% Usage. Any setting we can put in the App File to Increase this.
A side benefit though is that they run 20c Cooler than they do at any other GPU Project ... :)
I'm only running the Memory @ 300 as it doesn't seem to speed up the Wu's any by running them @ a Higher Memory Setting ...
____________
|
|
|
|
|
A side benefit though is that they run 20c Cooler than they do at any other GPU Project ... :)
Not for long anymore ;) - PM for you...
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
What is the MAX Cache for the pps/tps wu's ??? I was getting 200 but all day today I keep getting a "This computer has reached a limit on tasks in progress" Message even though I've fell well below the 200 I was getting ...
____________
|
|
|
|
|
Ditto here on all my ATI 5850 & 5870 Box's, Maximum of 80% usage down to as low as 60% Usage. Any setting we can put in the App File to Increase this.
A side benefit though is that they run 20c Cooler than they do at any other GPU Project ... :)
I'm only running the Memory @ 300 as it doesn't seem to speed up the Wu's any by running them @ a Higher Memory Setting ...
After running 4 WU's on my ATI 5830 (Core clock 800, mem clock 750 Mhz, Catalyst 10.7, Vista 32-bits, Boinc 10.56) on this host http://www.primegrid.com/results.php?hostid=71825&offset=0&show_names=0&state=3&appid=9 I also see a maximum GPU-load of ~70% and max temp of ~62 degrees Celsius.
Is it a settings-problem or will the app be further optimized? |
|
|
|
|
After running 4 WU's on my ATI 5830 (Core clock 800, mem clock 750 Mhz, Catalyst 10.7, Vista 32-bits, Boinc 10.56) on this host http://www.primegrid.com/results.php?hostid=71825&offset=0&show_names=0&state=3&appid=9 I also see a maximum GPU-load of ~70% and max temp of ~62 degrees Celsius.
Is it a settings-problem or will the app be further optimized?
it seems from Ken_g6 that it does not want to work on this app anymore, unfortunately! So no new optimization for the moment (except from surprises!) ;)
Anyway, I tried to remove the app_info.xml file now that it seems to have become a standard app, but my host is not getting it: I've disabled cpu working on project preferences ( i don't want to give Primegrid my cpu, honestly), enabled ATI working and requesting work for every project (including PPS, of course!), but boinc keeps saying:
- Requesting more work for ATI GPU
- No work sent
And no other messages (no red messages, for example).
With app_info.xml it works perfectly, I dunno why it does not work automagically without... Any one other? I'm using Catalyst APP 10.11, for the record. Maybe some debug is needed on the server side?? |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
What is the MAX Cache for the pps/tps wu's ??? I was getting 200 but all day today I keep getting a "This computer has reached a limit on tasks in progress" Message even though I've fell well below the 200 I was getting ...
Who ever kicked the Server Thanks, my Cache's went back up to 200 again ... :)
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Anybody know how far back you can go for both NVIDIA & ATI Cards for them to work on the GPU tps Wu's ???
____________
|
|
|
|
|
|
.Steve nice power you got there!
____________
Polish National Team |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Anybody know how far back you can go for both NVIDIA & ATI Cards for them to work on the GPU tps Wu's ???
Any CUDA cards, including the oldest G80 cards (8800 GTS, GTX, Ultra) should work.
Due to it being an OpenCL app, only ATI HD4xxx and newer cards will work (though I have heard that ATI is doing some testing on OpenCL drivers for the 3xxx series...probably just a rumor though).
____________
141941*2^4299438-1 is prime!
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Okay & Thank's Scott ...
____________
|
|
|
|
|
|
Ok i have been reading thru the forums to get the boinc manager to run PPS sieve on my ATI HD 4300/4500 series, and i have copied the tpsieve app and made the app_info.xml file, which as resulted in the manager downloading workunits, but they do not seem to be processing. I have a GPU monitor and the utilization is basically idle. I run the test command against the app:
tpsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 200000 -c 60 -M2
And the results come out correctly, based on what others were showing, and the GPU went to 100% during the test as expected. Here is the content of my app_info.xml
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>tpsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>130</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>tpsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
Im running Win7 x64 with Catalyst driver 10.10 and tpsieve version cl-0.2.3c (testing)
Any help on getting these workunits to run is greatly appreciated.
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Try this App File & .exc after downloading the .exe shown in it. It's what I'm using ... (primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe)
Make sure you have .NET Framework installed too, 3.5 SP1 is what I'm using ...
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>135</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
____________
|
|
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
Ok i have been reading thru the forums to get the boinc manager to run PPS sieve on my ATI HD 4300/4500 series, and i have copied the tpsieve app and made the app_info.xml file, which as resulted in the manager downloading workunits, but they do not seem to be processing. I have a GPU monitor and the utilization is basically idle.
...
Any help on getting these workunits to run is greatly appreciated.
If the tasks are just sitting there 'Ready to start,' check under Activity that 'Use GPU always' is selected. Some preferences prevent GPU computing so the setting 'based on preferences' may not work. |
|
|
|
|
|
your app_info.xml seems to be ok, but the opencl app will need some more cpu, for me 0.2 was working well in <avg_ncpus>.
BoincManager recognizes your GPU and primegrids app_info?
what are the wu doing? status running or what? |
|
|
|
|
|
samuel7 had it right, there is an option to use the GPU while the PC is in use. Thanks man, that did the trick. Working good now.
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
but the opencl app will need some more cpu, for me 0.2 was working well in <avg_ncpus>
That doesn't seem to do anything for me running XP Pro 64-Bit, I went as high as <avg_ncpus>1</avg_ncpus> and the Percentage of use still stayed below 85% ...
____________
|
|
|
|
|
|
The sad part is that the HD 4300/4500 apparently is not a very good card, so after all that its not even worth risking the high heat on the card for the small output... Oh well, its fun to tinker with this stuff.
____________
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
The sad part is that the HD 4300/4500 apparently is not a very good card, so after all that its not even worth risking the high heat on the card for the small output... Oh well, its fun to tinker with this stuff.
Actually my 5850's & 5870's run quite cool compared to other Projects. I'm used to seeing 85c+ on them but running the tpsieve Wu's I only see about 70c + or - a few c depending on the Box because the Wu's won't run more than 80% to 85% Load on the GPU.
____________
|
|
|
|
|
|
I've tried installing this ATI app for a 5970, and can't get it to work. I'm able to download work, but it all errors out immediately. I'm using the exact app_info.xml file that .Steve posted, and downloaded that executable from the PG site.
I'm running BOINC 6.10.58, Win7x64, Catalyst 10.7. My .NET is up to date, and GPU crunching is enabled. Not sure what else to try at this moment... |
|
|
|
|
|
drivers ATI 10.7 does not have OpenCl support, either you install OpenCL or more simple you install drivers 10.10 big files version with Opencl inside (take care the last ones 10.11 seem not to have emmbedded OPENCL drivers) |
|
|
|
|
drivers ATI 10.7 does not have OpenCl support, either you install OpenCL or more simple you install drivers 10.10 big files version with Opencl inside (take care the last ones 10.11 seem not to have emmbedded OPENCL drivers)
THANK YOU. Done, and getting work done without errors (moved to 10.10). Much appreciated! |
|
|
|
|
drivers ATI 10.7 does not have OpenCl support, either you install OpenCL or more simple you install drivers 10.10 big files version with Opencl inside (take care the last ones 10.11 seem not to have emmbedded OPENCL drivers)
THANK YOU. Done, and getting work done without errors (moved to 10.10). Much appreciated!
EDIT: Sorry for the double post... |
|
|
|
|
|
but...
like I asked some posts ago: is anybody able to get work using the official app (without app_info.xml)??
I am not able to get any work, and I would like to dismiss app_info.xml...
(Catalyst 10.11 APP, GPU fully working also in OpenCL but NOT on this project...) |
|
|
GurneyVolunteer tester Send message
Joined: 15 Oct 07 Posts: 24 ID: 13485 Credit: 7,590,126 RAC: 0
                  
|
but...
like I asked some posts ago: is anybody able to get work using the official app (without app_info.xml)??
I am not able to get any work, and I would like to dismiss app_info.xml...
(Catalyst 10.11 APP, GPU fully working also in OpenCL but NOT on this project...)
I've got the same problem. I tried with app_info.xml and everything worked just fine. Without I get only messages :"No work sent" and "No work available for the applications you have selected. Please check your project preferences on the web site."
(Catalyst 10.11, SDK2.2, OpenCL) |
|
|
|
|
but...
like I asked some posts ago: is anybody able to get work using the official app (without app_info.xml)??
I am not able to get any work, and I would like to dismiss app_info.xml...
(Catalyst 10.11 APP, GPU fully working also in OpenCL but NOT on this project...)
I've got the same problem. I tried with app_info.xml and everything worked just fine. Without I get only messages :"No work sent" and "No work available for the applications you have selected. Please check your project preferences on the web site."
(Catalyst 10.11, SDK2.2, OpenCL)
Same here.
____________
|
|
|
|
|
|
great work ken, keep it up! :)
win 7 64 bit.
5970 and 4870x2 on one mobo.
sdk 2.2 and cat 10.5 installed.
always getting device not found errors.
i can't upgrade the driver to higher than 10.5, the system crashes.
i'm suspecting that if I removed the 5970 or 4870x2 it should'nt crash if I upgraded to a higher version. |
|
|
|
|
but...
like I asked some posts ago: is anybody able to get work using the official app (without app_info.xml)??
I am not able to get any work, and I would like to dismiss app_info.xml...
(Catalyst 10.11 APP, GPU fully working also in OpenCL but NOT on this project...)
I've got the same problem. I tried with app_info.xml and everything worked just fine. Without I get only messages :"No work sent" and "No work available for the applications you have selected. Please check your project preferences on the web site."
(Catalyst 10.11, SDK2.2, OpenCL)
Same here.
So it's a known problem. Should I use the app_info.xml or is anybody trying so solve it?
Let me know, I would like to help debug this problem so I will stay without app_info.xml if something is in the works and you need some reports! |
|
|
|
|
A question: For those of you who installed the latest 10.10 drivers, were you able to get this app to run without installing the SDK? (I.e. Did you get it running by downloading nothing but the drivers from ATI/AMD?) If so, that could be a big step towards making this an official app.
I just loaded the Catalyst 10.11 drivers for my Mobility Radion HD 4650 card. I had no SDK loaded, and I got this message when I tried to launch the program:
The program can't start because OpenCL.dll is missing from your computer.
After I loaded the SDK V2, this is what happened:
C:\Users\Kevin\Desktop\ppsieve-cl> tpsieve-cl-x86-windows.exe ./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1203 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
What am I doing wrong?
____________
May the Force be with you always.
|
|
|
|
|
|
Try SDK 2.2 instead. |
|
|
|
|
|
Sorry, an error in my previous post. I used SDK 2.2.
____________
May the Force be with you always.
|
|
|
|
|
|
You have to use Catalyst 10.10 or 10.11 APP edition, not the regular one. If you use the APP edition you don't need any SDK installed.
Remove everything and retry installing this other version! |
|
|
|
|
|
O.K., I uninstalled the SDK, and the 10.11 drivers. I installed only the Catalyst driver 10.11 APP. The only change is that I do not have to install the SDK. I get this result again:
C:\Users\Kevin\Desktop\ppsieve-cl> tpsieve-cl-x86-windows.exe -p42070e9 -P420700
10e6 -k 1201 -K 9999 -N 2000000
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1203 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
____________
May the Force be with you always.
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
It looks like the Wu Length's are going to go up by about 25%, my GTX 580's are already getting some of them. The increase in Length also brought an increase in Credit to 2314 or at least that's what the GTX 580's are getting on the longer ones.
____________
|
|
|
|
|
C:\Users\Kevin\Desktop\ppsieve-cl> tpsieve-cl-x86-windows.exe ./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000
I cannot understand what are you launching: the command is
tpsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 200000 -c 60 -M2
you're mixing the windows and the linux executables... giving a bad parameter to the windows exe at the end.
Anyway, if instead you're doing it right, try to reinstall the sdk, and then launch some sample app. There you can find what's wrong with your driver installation and also verify if it's working.
Did you try with GPU-Z, for example, to see if it recognize any OpenCL device? |
|
|
|
|
|
Well, cenit, thanks for your advice. It was very helpful. I loaded GPU-Z, as you had suggested, and in confirms, my card is not compatible with OpenCl. It looks like I won't be testing this app, after all.
Thanks everyone for their help and for developing this aplication, as I am sure that there are plenty of other people out there who will be able to use it when it is complete.
____________
May the Force be with you always.
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Well, cenit, thanks for your advice. It was very helpful. I loaded GPU-Z, as you had suggested, and in confirms, my card is not compatible with OpenCl. It looks like I won't be testing this app, after all.
If the driver is not properly installed, the OpenCL box in GPU-z will not be checked. HD4xxx cards are OpenCL capable.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Alright. This may be good news. But... The installation appeared to go smoothly and no errors were in the installation log. GPU-Z fails to recognize Open-CL with drivers 10.11 APP, or 10.11(regular). Furthermore, 10.11 is the only recomended driver for my card and OS.
I cannot find anything that says that my Mobility Radeon HD 4650 is OpenCL compatible... Is it possible that the HD4xxx series cards are compatible in general, but my card is not?
____________
May the Force be with you always.
|
|
|
|
|
|
Got rid of the 4870x2 and kept the 5970 on the mobo.
Went to ATI and downloaded Accelerated Parallel Processing (APP) Technology Edition ver 10.11
Followed the thread topics, got app_info.xml modified.
WUs takes 20 - 30 minutes to complete.
GPUs 0 and 1 max out at 60% and CPU 0 and 1 at 20%.
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Alright. This may be good news. But... The installation appeared to go smoothly and no errors were in the installation log. GPU-Z fails to recognize Open-CL with drivers 10.11 APP, or 10.11(regular). Furthermore, 10.11 is the only recomended driver for my card and OS.
Try the 10.10 drivers...some people are reporting some issues with the 10.11 series.
I cannot find anything that says that my Mobility Radeon HD 4650 is OpenCL compatible... Is it possible that the HD4xxx series cards are compatible in general, but my card is not?
See here. Your mobility card is listed near the bottom of the list.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Here are some odd behaviors!
Placed the 4870x2 back with the 5970 after upgrading to 10.11
4 GPUS max at 80%, each cpu core max at 25 for the q6600.
Problem is, when any WU completes on any core on the 4870x2, the fan goes to max throttle and the system restarts. Windows will only recognize the 5970 after the reboot, so a hard shutdown is required for the 4870x2 to be recognized again.
I suspected power supply strain, so lowered the core frequencies of the 4870x2, but problem did not resolve at minimum frequency of 500mhz.
This does not happen with other projects.
Could be driver, application, or opencl related? |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
Could be a heating or power supply problem.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
Could be a heating or power supply problem.
Definitely power supply. 850 watts is barely enough.
Heat on each GPU never exceed 80c and the fans are barely heard.
In any case, I've stabilized the system now for a few hours. Down clocked the 4870x2 gpu and mem to 500 and kept the 5970 at 730 core 1010 mem.
30 mins for 5970. 1 hour for 4870s. |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
I'm running my Dual GTX 580 i7 920 Box with a 850 PSU ... Seems to be alright so far, but it's a Enermax 85+ PSU so it's a pretty good one ...
____________
|
|
|
|
|
Alright. This may be good news. But... The installation appeared to go smoothly and no errors were in the installation log. GPU-Z fails to recognize Open-CL with drivers 10.11 APP, or 10.11(regular). Furthermore, 10.11 is the only recomended driver for my card and OS.
I cannot find anything that says that my Mobility Radeon HD 4650 is OpenCL compatible... Is it possible that the HD4xxx series cards are compatible in general, but my card is not?
Mobility 4650 card is definitely OpenCL compatible. I don't know why this app isn't running.
Can you please post a screenshot of GPU-Z and one of Catalyst Hardware info?
anyway, I've resumed my app_info.xml.
It seems that project admins are not interested in correcting this bug that prevents the standard app to work... |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
It seems that project admins are not interested in correcting this bug that prevents the standard app to work...
I'm sure their interested in fixing the bug but just don't know how yet ...
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
anyway, I've resumed my app_info.xml.
It seems that project admins are not interested in correcting this bug that prevents the standard app to work...
They are interested but not all problems can be solved with one or two entries in the server config. Sometimes they were punk'd by the boinc dev's.
Take a look back at Aqua. They had massive login and forum failures in the past. Only with direct intervention of "Dreamer Ahead" this problem was solved. Or the actual problems with the boinc client 6.12.X. Every version has new features to hide all the old unsolved bugs.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
anyway, I've resumed my app_info.xml.
It seems that project admins are not interested in correcting this bug that prevents the standard app to work...
They are interested but not all problems can be solved with one or two entries in the server config. Sometimes they were punk'd by the boinc dev's.
Take a look back at Aqua. They had massive login and forum failures in the past. Only with direct intervention of "Dreamer Ahead" this problem was solved. Or the actual problems with the boinc client 6.12.X. Every version has new features to hide all the old unsolved bugs.
yeah, sorry, I was a little harsh with my words.
Anyway, before on my 4870 the wu were running for 10 mins. Now the workunits run for 40 minutes. Is it right?? 10 mins was, for example, one month ago with pps, so maybe it's the correct time for these new tps... |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
The times for my GT240(GT215) on PPSieve was 10min (1P-wu) and 20min (2P-wu).
With TPSieve i'am now at ~40min (only resends available) and ~50min for 2P-units with 2314 credits.
[add]
The OpenCL-app is slower than the cuda-version.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
The times for my GT240(GT215) on PPSieve was 10min (1P-wu) and 20min (2P-wu).
With TPSieve i'am now at ~40min (only resends available) and ~50min for 2P-units with 2314 credits.
[add]
The OpenCL-app is slower than the cuda-version.
so for me 40 min on a 4870 and 2314 credits seems OK, considering that RV770 is better than GT240 but is running a slower app... (and is not OpenCL native hardware like RV8xx/9xx) |
|
|
|
|
|
some of you lucky crunchers can @least use their ATI-Card in this project.
..for me, whatsoever I changed/installed (yes I went trough ALL the postings here) can not make my ATI HD5770 working on the project.
. deinstalled/reinstalled BOINC (now Ver. 6.12.6(x86)) including Registry-cleaning + 'cleanging of all remainings from BOINC' from HD
. running with/without proposed cc_config.xml , app_info.xml
. GPU-Z shows OpenCL support
Interestingly:
. DNETC working properly on my ATI-Card
.running e.g. = "primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe"
in a 'Command-Prompt' window works fine
?? Why is there a double underline in the file intelx86__ati13ati.exe ??
Appreciate any help on this personal? issue , also by PM
Thank you to support a novice !
|
|
|
|
|
|
pay attention that the executable that you need to use here on boinc (and so in your app_info.xml) is the tpsieve one with _boinc_ in its name!! you can find a valid app_info.xml in this thread |
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2651 ID: 1 Credit: 58,387,426 RAC: 116,228
                     
|
some of you lucky crunchers can @least use their ATI-Card in this project.
..for me, whatsoever I changed/installed (yes I went trough ALL the postings here) can not make my ATI HD5770 working on the project.
. deinstalled/reinstalled BOINC (now Ver. 6.12.6(x86)) including Registry-cleaning + 'cleanging of all remainings from BOINC' from HD
. running with/without proposed cc_config.xml , app_info.xml
. GPU-Z shows OpenCL support
Interestingly:
. DNETC working properly on my ATI-Card
.running e.g. = "primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe"
in a 'Command-Prompt' window works fine
?? Why is there a double underline in the file intelx86__ati13ati.exe ??
Appreciate any help on this personal? issue , also by PM
Thank you to support a novice !
Please send me (admin@primegrid.com) sched_request_www.primegrid.com.xml from your client's data directory - I hope I'll be able to change server settings so that you can receive work.
Double underline is needed for BOINC to differentiate between different application versions and to send matching version for a host. I believe that your host is asking for something else and not ati13ati - that's what the file will help to solve.
____________
|
|
|
|
|
some of you lucky crunchers can @least use their ATI-Card in this project.
..for me, whatsoever I changed/installed (yes I went trough ALL the postings here) can not make my ATI HD5770 working on the project.
. deinstalled/reinstalled BOINC (now Ver. 6.12.6(x86)) including Registry-cleaning + 'cleanging of all remainings from BOINC' from HD
. running with/without proposed cc_config.xml , app_info.xml
. GPU-Z shows OpenCL support
Interestingly:
. DNETC working properly on my ATI-Card
.running e.g. = "primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe"
in a 'Command-Prompt' window works fine
?? Why is there a double underline in the file intelx86__ati13ati.exe ??
Appreciate any help on this personal? issue , also by PM
Thank you to support a novice !
Please send me (admin@primegrid.com) sched_request_www.primegrid.com.xml from your client's data directory - I hope I'll be able to change server settings so that you can receive work.
Double underline is needed for BOINC to differentiate between different application versions and to send matching version for a host. I believe that your host is asking for something else and not ati13ati - that's what the file will help to solve.
Would you like mine too? I am not able to get work if not using an app_info! I could remove app_info (on monday) and then send to you my send_request...xml
It seems that the server does not send work to people without app_info |
|
|
|
|
|
I was able to get the app working using the 1.30 app_info
my 5850 finished its first WU in 22:02 very nice indeed
|
|
|
|
|
|
Believe it or not...... !!!
After messing around for day's and week's to get work for my
ATI - Card....
First this ....
30.11.2010 12:53:48 ATI GPU 0: ATI Radeon HD5700 series (Juniper) (CAL version 1.4.880, 1024MB, 1440 GFLOPS peak)
30.11.2010 12:53:48 PrimeGrid Found app_info.xml; using anonymous platform
30.11.2010 12:53:48 PrimeGrid [error] State file error: missing application llrTRP
30.11.2010 12:53:48 PrimeGrid [error] Can't handle workunit in state file
30.11.2010 12:53:48 PrimeGrid [error] State file error: missing application llrTPS
And after 'fiddeling`around this.....
30.11.2010 12:57:30 PrimeGrid work fetch resumed by user
30.11.2010 12:57:31 PrimeGrid Sending scheduler request: To fetch work.
30.11.2010 12:57:31 PrimeGrid Requesting new tasks for GPU
30.11.2010 12:57:32 PrimeGrid Scheduler request completed: got 25 new tasks
30.11.2010 12:57:33 PrimeGrid update requested by user
30.11.2010 12:57:34 PrimeGrid Starting pps_sr2sieve_4336572_0
30.11.2010 12:57:34 PrimeGrid Starting task pps_sr2sieve_4336572_0 using pps_sr2sieve version 135
30.11.2010 12:57:38 PrimeGrid Sending scheduler request: Requested by user.
30.11.2010 12:57:38 PrimeGrid Requesting new tasks for GPU
30.11.2010 12:57:39 PrimeGrid Scheduler request completed: got 26 new tasks
30.11.2010 12:57:49 PrimeGrid Sending scheduler request: To fetch work.
30.11.2010 12:57:49 PrimeGrid Requesting new tasks for GPU
30.11.2010 12:57:50 PrimeGrid Scheduler request completed: got 26 new tasks
Jupieee .........
|
|
|
|
|
|
did you do something in particular to get work flowing?
maybe they fixed now the server... |
|
|
|
|
|
Yes I did (a lot of things mostly by try&error)
My first analysis:
a) some files in wrong place or missing
b) wrong account-setup within primegrid
c) maybe a fixed server ???
Still working on in to find out what I did to make it work...
some points were:
-Hypertrheading = OFF (now rinning 4 cores only)
- Account setup: CPU = OFF , ATI = ON , Only work for PPSieve
Will try now to get CPU back to work etc.
If I found a stable setting will report details (if some interest exist)
Happy x-mas to all of you |
|
|
|
|
Yes I did (a lot of things mostly by try&error)
My first analysis:
a) some files in wrong place or missing
b) wrong account-setup within primegrid
c) maybe a fixed server ???
Still working on in to find out what I did to make it work...
some points were:
-Hypertrheading = OFF (now rinning 4 cores only)
- Account setup: CPU = OFF , ATI = ON , Only work for PPSieve
Will try now to get CPU back to work etc.
If I found a stable setting will report details (if some interest exist)
Happy x-mas to all of you
are you using the app_info.xml?? Because I cannot receive any tps wu without app_info.xml |
|
|
|
|
|
I've not read it so I'll ask. Are there plans to make other OpenCL apps for other sieves? |
|
|
|
|
are you using the app_info.xml?? Because I cannot receive any tps wu without app_info.xml
Same for me: No work without app_info.xml
..but found out that app_info.xml needs to be in specific folder..
Best chrunching-regards |
|
|
|
|
|
How can I add a support for another project (for example SGS LLR) to this app_info.xml - on Windows 32 bit? I still want to run this subproject on my CPU and I want to run PPSieve on GPU...
____________
|
|
|
|
|
|
thats going to be tricky. If you change your settings to allow work for your CPU you are liable to get PPS sieve work as well as SGS.
It may seem complicated but you'll need to change your PG project settings to accept CPU tasks and to SGS only. when your cache fills then change the settings back to GPU ans PPS sieve |
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
|
Not the case.
Set the Primegrid preferences to allow CPU and GPU work.
Tick SGS LLR subproject as allowed (this will feed your CPU).
Tick 'Send work from any subproject if selected projects have no work' (this will feed your GPU)
____________
|
|
|
|
|
|
Hmm, this is not working...I have to try something different...
Edit : Finally WORKING :-) with this app_info.xml :
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>tpsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>135</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>tpsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>llrTPS</name>
<user_friendly_name>Sophie Germain Prime Search (LLR)</user_friendly_name>
</app>
<file_info>
<name>primegrid_llr_wrapper_5.11_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>llr.ini.5.09</name>
</file_info>
<file_info>
<name>primegrid_llr_5.09_windows_intelx86.exe.orig</name>
</file_info>
<app_version>
<app_name>llrTPS</app_name>
<version_num>511</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>1788215593.929880</flops>
<api_version>6.3.0</api_version>
<file_ref>
<file_name>primegrid_llr_wrapper_5.11_windows_intelx86.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>llr.ini.5.09</file_name>
<open_name>llr.ini.5.09</open_name>
</file_ref>
<file_ref>
<file_name>primegrid_llr_5.09_windows_intelx86.exe.orig</file_name>
<open_name>primegrid_llr_5.09_windows_intelx86.exe.orig</open_name>
</file_ref>
</app_version>
</app_info>
First, just let PG download the files "primegrid_llr_wrapper_5.11_windows_intelx86.exe", "primegrid_llr_5.09_windows_intelx86.exe.orig" and "llr.ini.5.09" by selecting SGS LLR as your only project. You have to have an empty PG folder !!
Then shut down client, copy the GPU app and this app_info to your PG BOINC folder, restart client and enjoy !!! ;-)
____________
|
|
|
|
|
|
Sorry, but this is only partial help..
I could also make my ATI-card working with an app_info file..
..but then .. no more work for CPU available/refused to download whatsoever I changed..
Someone has to tell exactly how to setup BOINC-folder..
e.g. my system:
Main BOINC folder and subfolder:
xDrive: >New Boinc
subfolder:
>data
subfolder:
>>notices
>>projects
subfolder
>www.primegrid.com
>slots
subfolder
>0
>>locale
>>skins
During Installation there was also made a folder called:
xDrive: >projects
with subfolder
> www.primegrid.com
> dnetc.net
Q: Where to copy which file(s) ???? |
|
|
|
|
|
There is a folder in your BOINC folder, which is called "projects". In this folder there is a folder www.primegrid.com. Copy this app_info.xml (or similar) and the ATI app (tpsieve-cl-boinc-x86-windows.exe) to this folder, start BOINC and it should download some new work and compute it.
____________
|
|
|
|
|
|
Halleluia ....
ATI crunching now..
and CPU crunching now..
dont ask my how I managed..
but for sure
Ondra@SpaceFamiliy.C
..there is something wrong in your app_info
(@least in my case)
your reference:
<file_name>tpsieve-cl-boinc-x86-windows.exe</file_name>
should read (in my case)
<file_name>primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe</file_name>
with associated ".exe" files in that folder
I'll now let run my setup for a while and see what comes out of it
Halleluia ....
|
|
|
|
|
|
is there any speed difference for the pg version vs the tps version 1.35 or are they the same app with different names |
|
|
|
|
Halleluia ....
ATI crunching now..
and CPU crunching now..
dont ask my how I managed..
but for sure
Ondra@SpaceFamiliy.C
..there is something wrong in your app_info
(@least in my case)
your reference:
<file_name>tpsieve-cl-boinc-x86-windows.exe</file_name>
should read (in my case)
<file_name>primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe</file_name>
with associated ".exe" files in that folder
I'll now let run my setup for a while and see what comes out of it
Halleluia ....
Yeah, but there is only one file, but two names for it :-)...it´s up to you which one you use ;-)...at least it´s running...
____________
|
|
|
|
|
|
Cenit wrote:
Mobility 4650 card is definitely OpenCL compatible. I don't know why this app isn't running.
Can you please post a screenshot of GPU-Z and one of Catalyst Hardware info?
Ah... Here is a picture of GPU-Z with 10.10 drivers. The driver package I installed did not appear to load Catalyst Control Center.
I tried to run the app, through boinc, with the app_info.xml file and the .exe in primegrid's data directory. It will download an ATI wu, but when it tries to run, it instantly sais "computation error". I suppose it will not be able to run until I can get the test app to work on my computer as well. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
It looks like it is not loading the SDK (I assume that you used the APP version of the driver...if not, you need to do so). If you have been doing several driver changes of late, you may need to do a completely clean install to get everything working properly. That means you need to get some type of driver sweeper/cleaner that will delete all the ATI driver components completely from the system so that a new install can start from scratch.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Um... I hate to say it, but, I just completely re-installed Windows 7 this morning. 10.10 is the only driver I have loaded, and it is definitely the APP version. The installation log indicated that SDK 2.2 was loaded. |
|
|
|
|
|
the strangest thing is that also Direct Compute is not thicked as it should be. This is Windows 7 you say or windows vista? Because if it's vista it needs massive windows update. But you're saying w7... so strange! |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Um... I hate to say it, but, I just completely re-installed Windows 7 this morning. 10.10 is the only driver I have loaded, and it is definitely the APP version. The installation log indicated that SDK 2.2 was loaded.
Well that's just weird...
One other possibility would be to do it manually. That is, wipe the driver and install the 10.10 standard driver. Then go to this site and download and install the SDK separately.
ATI mobility cards have always been quirky, so even if you get it to work, I'd report it on their boards to notify them of the problem.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
O.K., so I'm running a Dell laptop 1747, Windows 7.
It turns out nothing I install changes the driver version on GPU-Z -- except the official Dell drivers... I've tried loading ATI's 10.10, 10.10App, 10.11, 10.11App, but they don't change the driver version. When I loaded the latest(I think) Dell drivers, it updated the version to 8.713.3.2000, however still not OpenCl compatible. So it appears that I have not managed to successfuly load any of the drivers that I thought I was loading.
I am wondering if my system will only accept Dell modified drivers. Is that possible?
____________
May the Force be with you always.
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
O.K., so I'm running a Dell laptop 1747, Windows 7.
It turns out nothing I install changes the driver version on GPU-Z -- except the official Dell drivers... I've tried loading ATI's 10.10, 10.10App, 10.11, 10.11App, but they don't change the driver version. When I loaded the latest(I think) Dell drivers, it updated the version to 8.713.3.2000, however still not OpenCl compatible. So it appears that I have not managed to successfuly load any of the drivers that I thought I was loading.
I am wondering if my system will only accept Dell modified drivers. Is that possible?
Ah...now that's an easy fix....
Most OEM laptop makers restrict ATI (and NVidia for that matter) drivers to those that they modify themselves. Dell is horribly slow at updating these to current versions (I know personally with my own Dell laptop with an NVidia GPU). However, you can get the most recent "non-officially supported by Dell" drivers for most ATI mobility GPUs here with instructions on how to install using the Mobility Modder.
____________
141941*2^4299438-1 is prime!
|
|
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 796 ID: 18447 Credit: 382,504,347 RAC: 225,569
                       
|
|
I thought the official drivers now supported mobility HD (since 10.4 or thereabouts), and the modder you refer to is no longer necessary. (It's certainly how I run the HD3450 in my laptop!)
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
In general, that's correct. But Dell is SO BAD about this that it may still be needed to get around the OEM issues.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
In general, that's correct. But Dell is SO BAD about this that it may still be needed to get around the OEM issues.
maybe he was downloading the Desktop catalyst and not the mobility catalyst? Let's wait from him. He HAS to use MOBILITY catalyst.
Anyway, desktop catalyst should have told him "INSTALLATION ABORTED", and in any case any unsuccessful installation should have warned him. Those are not catalyst problem, maybe he was too much quick when working that he didn't even read the messages... |
|
|
|
|
|
[/quote]
maybe he was downloading the Desktop catalyst and not the mobility catalyst? Let's wait from him. He HAS to use MOBILITY catalyst.
Anyway, desktop catalyst should have told him "INSTALLATION ABORTED", and in any case any unsuccessful installation should have warned him. Those are not catalyst problem, maybe he was too much quick when working that he didn't even read the messages...[/quote]
hi
I´m reading this thread for a while, because i have exactly the same problem with my hd4650 in my laptop. But I have a HP laptop. (with win7)
I tried everything slowly and several times and also checked to use the mobility catalyst version for the driver. (I also tried installing with just updating and installing with deinstalling the old version.)
All i can say is even with the modder which i used with the vista manual on the site gpu-z is still showing i have catalyst 9.7 as driver... (and i know it has to be 10.11)
And that i get correct work for gpu sieve. But it just calculates for 1 or 2 seconds and then i just get a calculation error.
|
|
|
|
|
|
Thanks for speaking up, Thomas. It sounds like you and I have exactly the same problem.
I definitely used Mobility drivers, cenit, and believe me, I have been reading every message, baloon, and pop-up that I get during installation. Nothing indicates that the drivers did not install correctly, save for the fact that the driver version does not change.
Most OEM laptop makers restrict ATI (and NVidia for that matter) drivers to those that they modify themselves. Dell is horribly slow at updating these to current versions (I know personally with my own Dell laptop with an NVidia GPU). However, you can get the most recent "non-officially supported by Dell" drivers for most ATI mobility GPUs here with instructions on how to install using the Mobility Modder.
I tried the Mobility Modder, following the instructions , but nothing changed. I don't see what I could have done wrong, but I'm not so sure I did do anything wrong. Thomas sais that the modder didn't work for him either.
Now, my Dad knows a lot more about computers than I do. Usually if I can't figure something out, he can. We have the exact same computer, so he tried to get the drivers to load on his as well. He didn't have any luck either. So here's what we're thinking: what if Dell custom built the GPU, or somehow restricted it's base functions. That would explain why, say, Direct Compute is not enabled, as cenit said it should be, or maybe why the Mobility Modder has no effect. These are all guesses, but here's one reason Dell might have done it, at least on our computers: Overheating. With i7-720 cpus these computers run hot. After all, a laptop's cooling system is not the most effective. Now add a fully functional 1G Medium-performance ATI card to the mix, and you might just have a meltdown.
It sounds plausible to me that Dell would do something like that. Any thoughts?
____________
May the Force be with you always.
|
|
|
|
|
|
Kevin D Puckett,
We have a member in our team with a Mobility Radeon HD 4650 and can not get it to run also.
We had them try the test files and this is what came up.
psieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 200000 -c 60 -M2
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 200000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
____________
|
|
|
|
|
|
sdl*, I just ran the same test myself after reading your post. I am running Dell drivers: 8.632.1.2000. This is my result:
tpsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 200000 -c 60 -M2
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 200000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
It looks like the exact same error.
____________
May the Force be with you always.
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Now, my Dad knows a lot more about computers than I do. Usually if I can't figure something out, he can. We have the exact same computer, so he tried to get the drivers to load on his as well. He didn't have any luck either. So here's what we're thinking: what if Dell custom built the GPU, or somehow restricted it's base functions. That would explain why, say, Direct Compute is not enabled, as cenit said it should be, or maybe why the Mobility Modder has no effect. These are all guesses, but here's one reason Dell might have done it, at least on our computers: Overheating. With i7-720 cpus these computers run hot. After all, a laptop's cooling system is not the most effective. Now add a fully functional 1G Medium-performance ATI card to the mix, and you might just have a meltdown.
It sounds plausible to me that Dell would do something like that. Any thoughts?
Sort of...
Dell probably did not custom build the actual GPU. However, DELL certainly (as do almost all OEM's--e.g., HP, Lenovo, etc.) builds its own flavor of GPU drivers for download for laptops. My guess is that (probably for the heating reasons you mention), DELL has not updated their driver to include OpenCL (and possibly other options) and has kept either (most likely) through software or (less likely) through BIOS the current versions of the ATI drivers from loading. This was certainly the case for a very long time with DELL not releasing CUDA drivers for NVidia cards (though the work around there was much easier than with ATI).
Given that the Mobility Modder will not work in these cases, I think the only option you have is to complain (quite loudly) to DELL to update their drivers to fully support the devices they sell. :(
EDIT: You might try running in LINUX to see if the drivers will load there. That would bypass any DELL created software barriers...
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Alright, for now at least I will be running Collatz with my GPU.
The computer with the ATI card is my main computer, so I need it to be running windows... If it were one of my secondary computers, that would be a good idea. Anyway, thanks for the idea.
____________
May the Force be with you always.
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Kevin D Puckett,
We have a member in our team with a Mobility Radeon HD 4650 and can not get it to run also.
We had them try the test files and this is what came up.
psieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 200000 -c 60 -M2
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 200000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
I found http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx#two. OpenCL for all series 4XXX mobility-chips are listed with beta support.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
Kevin D Puckett,
We have a member in our team with a Mobility Radeon HD 4650 and can not get it to run also.
We had them try the test files and this is what came up.
psieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 200000 -c 60 -M2
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1201 <= k <= 9999, 76 <= n < 200000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
I found http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx#two. OpenCL for all series 4XXX mobility-chips are listed with beta support.
rroonnaalldd,
We tried that driver and it did not help, but thanks for your input.
____________
|
|
|
|
|
|
So then what is Collatz doing different? It will run on my card, no hastle, with the default Dell drivers.
____________
May the Force be with you always.
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
So then what is Collatz doing different? It will run on my card, no hastle, with the default Dell drivers.
It is using ATI's native Brook/CAL (equivalent to CUDA for NVidia) rather than OpenCL. Drivers have supported that for a very long time now, but the SDK 2.2 for the OpenCL version used here at PG is only in the most recent drivers (10.10 APP and later) and was tested in only a few versions before (10.7 was the earliest I believe).
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Hi everyone,
How do I correct this?
I get a Computation Error.
"Output file pps_sr2sieve_4650218_0_0 for task pps_sr2sieve_4650218_0 absent"
Can anyone help?
Thank you.
Gentilli.
|
|
|
|
|
Hi everyone,
How do I correct this?
I get a Computation Error.
"Output file pps_sr2sieve_4650218_0_0 for task pps_sr2sieve_4650218_0 absent"
Can anyone help?
Thank you.
Gentilli.
Your computers are hidden
If you are running Linux - Check your permissions
should all be Boinc as the owner
Boinc needs to own the files so it can create and write to that file
That is my best guess
Steve
____________
From the High Desert in New Mexico
|
|
|
|
|
|
Hi Steve,
No I am running Windows XP 64 bit, This is the first batch of ati WU I get and it went trough all of them in a matter of 2 seconds.
Thank you,
Gentilli.
____________
|
|
|
|
|
Hi Steve,
No I am running Windows XP 64 bit, This is the first batch of ati WU I get and it went trough all of them in a matter of 2 seconds.
Thank you,
Gentilli.
OK
Are these CPU or GPU WU's or both?
If GPU, which card nVidia or ati
are using a app_info.xml file
If an app_info.xml file - in the file_info section is a <executable/> there?
which ?.exe are you using pps****** or tps****** file for the executable
What drivers are you using?
Is .net installed?
Is the driver an opengl one?
I am still only guessing
You get very little info as what you have.
Steve
____________
From the High Desert in New Mexico
|
|
|
|
|
|
Hi again Steve,
I am going to try to explain it as best as I can......
3 days ago I modified my "School" profile to accept ATI WU and CPU WU for this machine under "Separate preferences for school" and a couple of CPU LLRs.
I did not modify anything at all. Last night after receiving no ATI WUs I decided today that I wanted to modify whatever was necessary to get ATI WUs.
But low and behold I had received ATI WUs from Prime Grid before doing any modifications. I was happier than s****. So I suspended the Collatz WUs and my cards (2x5770) began chewing away at the ATI WUs but only produced ERRORS...............
So what do you think......
Regards,
Gentilli.
____________
|
|
|
|
|
|
If I understand you,
ATI GPU x2
No app_info.xml
There have been problem with Scheduling Problem and ATI GPU WU's on the
Prime Grid Server. I know they are working on this issue and maybe others
on getting thing to work with out an app_info.xml file. You might be caught
up in this issue.
Now I know you are running two ATI cards.
Do you have .net (DotNet) install on your computer?
Check the "Control Panel" / "Add or Remove Programs" and Check
if not add it => I have .net version 3.5 SPI
Did you install the ATI drivers or did Window install it?
If Windows installed it, You need to upgrade to the version that supports
opengl. On my XP64 box I have 10.11 (the 2nd file to down load)
the one who is 79.7 MB not the 63.5 MB one.
Hope this helps
Steve
____________
From the High Desert in New Mexico
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Hi again Steve,
I am going to try to explain it as best as I can......
3 days ago I modified my "School" profile to accept ATI WU and CPU WU for this machine under "Separate preferences for school" and a couple of CPU LLRs.
I did not modify anything at all. Last night after receiving no ATI WUs I decided today that I wanted to modify whatever was necessary to get ATI WUs.
But low and behold I had received ATI WUs from Prime Grid before doing any modifications. I was happier than s****. So I suspended the Collatz WUs and my cards (2x5770) began chewing away at the ATI WUs but only produced ERRORS...............
So what do you think......
Regards,
Gentilli.
Sounds like you have the standard driver installed rather than the APP driver. If that is the case, either update to the APP driver or install the SDK separately.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Hmmm... Problem when restarting from checkpoint: here.
____________
|
|
|
|
|
|
Thank you Scott and Steve.
I just installed the mentioned drivers and everything is working well.
Thanks again.
Best regards,
Gentilli.
____________
|
|
|
|
|
Hmmm... Problem when restarting from checkpoint: here.
Yikes! It's even worse! All WUs that were to crunch on 4870 card errored out like this. And when I took a second look at them, it's not a checkpointing problem, but "elapsed time exceeded". Someone will need to fix those _fpops values in WUs.
BR,
____________
|
|
|
|
|
|
http://www.primegrid.com/result.php?resultid=206719641
Why? |
|
|
|
|
http://www.primegrid.com/result.php?resultid=206719641
Why?
Do you have the driver Catalyst 10.10 APP or 10.11 APP (not the standard one, but the APP version is required for this app!!) ??? |
|
|
|
|
|
Thank's, it helps.
But I got new problem:
12/6/2010 6:11:51 PM PrimeGrid Aborting task pps_sr2sieve_4728390_2: exceeded elapsed time limit 22210.137012
|
|
|
|
|
Thank's, it helps.
But I got new problem:
12/6/2010 6:11:51 PM PrimeGrid Aborting task pps_sr2sieve_4728390_2: exceeded elapsed time limit 22210.137012
See http://www.primegrid.com/forum_thread.php?id=2867
____________
|
|
|
|
|
|
why is this application being pushed out as class "ati13ati" and not as "ati_opencl" ?
this new class should avoid that computer without requirements are assigned work they cannot do...
also, I think that flops estimate are completely wrong. Time estimates are way off and maybe the "time exceeded limit" depends on this... And in the meantime my manager downloads a lot of wu, thinking that it will do them in a blink! |
|
|
|
|
|
I have two identical machines, each with XP64 and a 4870. One works, the other doesn't. I've tried everything I can think of, with no luck.
According to this, -117 error code is "This error is generated when the requested slot number (the working directory for a result) is negative." I have tried
- 10.10 APP
- 10.11 APP
- uninstalling BOINC, deleting all the slots, and reinstalling BOINC
- Using app_info.xml with manual app
- Not using app_info.xml with manual app
http://www.primegrid.com/result.php?resultid=207163599
Stderr output
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
- exit code -117 (0xffffff8b)
</message>
<stderr_txt>
Sieve started: 3480145000000000 <= p < 3480148000000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Computation Error: no candidates found for p=3480145000002259 between 755799 and 1130583.
07:31:41 (2604): called boinc_finish
</stderr_txt>
]]>
____________
Reno, NV
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
|
Are you sure that the .Net Framework 3.5 is on both machines?
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
Are you sure that the .Net Framework 3.5 is on both machines?
I have the following .Net Framework installed on both:
2.0 SP2
3.0 SP2
3.5 SP1
4 Client Profile
____________
Reno, NV
|
|
|
|
|
I have two identical machines, each with XP64 and a 4870. One works, the other doesn't. I've tried everything I can think of, with no luck.
According to this, -117 error code is "This error is generated when the requested slot number (the working directory for a result) is negative." I have tried
- 10.10 APP
- 10.11 APP
- uninstalling BOINC, deleting all the slots, and reinstalling BOINC
- Using app_info.xml with manual app
- Not using app_info.xml with manual app
Can you run Collatz?
Also, can you run the ATI app in standalone mode?
You may have to do clean install of the full ATI drivers and remove any ATI stream apps, if any. Also remove any OC settings if any are being used.
If those don't work then it is likely a card problem so it could be simple connections (reinstall the card) or something more serious.
|
|
|
|
|
Can you run Collatz?
Yep. DNETC & MW too.
Also, can you run the ATI app in standalone mode?
I haven't tried. I will try that when I get home tonight.
You may have to do clean install of the full ATI drivers and remove any ATI stream apps, if any.
This I already tried. Full removal and driver sweeper too.
Also remove any OC settings if any are being used.
No OC.
If those don't work then it is likely a card problem so it could be simple connections (reinstall the card) or something more serious.
If that were the case, then it wouldn't run any of the other projects. But it won't hurt to try. I will also try this tonight.
____________
Reno, NV
|
|
|
|
|
|
I got the "computer error" message and being a novis i tested your advises.
When SDK2.2 install on my machine it didn't include a path variable.
http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_Installation_Notes.pdf
Restarted and it worked.
Hope it helps someone.
64-bit win7
Or so i thought.
First task to pass the 2 sec ended with this:
http://www.primegrid.com/result.php?resultid=207032572
:( |
|
|
|
|
I have two identical machines, each with XP64 and a 4870. One works, the other doesn't. I've tried everything I can think of, with no luck.
According to this, -117 error code is "This error is generated when the requested slot number (the working directory for a result) is negative."
I'm getting the same error... More Here:-
http://www.primegrid.com/forum_thread.php?id=2852&nowrap=true#28941
- But I have managed to complete at least 1 ATi task successfully.
Also BoincTasks reports the error as "Computation error (-177,Maximum time exceeded)"
which makes a lot more sense as ALL these tasks crash after the same length of time..
I have a funny feeling there is a too-short time-limit in the code somewhere...!
____________
|
|
|
|
|
Also, can you run the ATI app in standalone mode?
I haven't tried. I will try that when I get home tonight.
I tried to run it manually, with no success. But I am not sure I am doing it right. I downloaded the zip file in the very first post of this thread. Is that still the current version I should be trying with this experiment? In any case, I did this:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
And that resulted in thousands of lines that look like this:
Computation Error: no candidates found for p=42070009999983 between 503884 and 7
55788.
Computation Error: no candidates found for p=42070009999983 between 251980 and 5
03884.
Computation Error: no candidates found for p=42070009999983 between 76 and 25198
0.
Computation Error: no candidates found for p=42070009999983 between 503884 and 7
55788.
Computation Error: no candidates found for p=42070009999983 between 251980 and 5
03884.
Computation Error: no candidates found for p=42070009999983 between 76 and 25198
0.
Eventually, I had to just kill the task. Now what?
____________
Reno, NV
|
|
|
|
|
|
Should it be:
primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
I get on my XP64 HD-4830
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1203 <= k <= 9999, 76 <= n < 2000000
nstep changed to 22
CL setup complete.
cthread_count = 8192
42070003511309 | 6057*2^1043547+1
42070005645821 | 3633*2^119620-1
42070008458437 | 7095*2^1422761-1
Found 3 factors |
|
|
|
|
|
Not a clue :-)
It could be as simple as a windows permission/access issue or driver related issue.
Does GPU-Z have the Open-GL box selected? It's on the 'graphic card' tab at the very bottom. There are other tools that have information.
You might try to run an Open-GL stress program like
FurMark but please note I have not used it. The site also has some demos as well as other utilities that should work if Open-GL is available.
You could always try using Linux or 32-bit win xp... |
|
|
|
|
Should it be:
primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
So you are saying I should use something different than the app from the first post in this thread? Because that 1.35 is not part of that zip file.
Edit: Okay, I copied the 1.35 exe from the project folder. It works, I think:
C:\Documents and Settings\Administrator\My Documents\ppsieve-cl>primegrid_tpsiev
e_1.35_windows_intelx86__ati13ati.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N
2000000 -c 60
tpsieve version cl-0.2.3c (testing)
nstart=76, nstep=32
tpsieve initialized: 1203 <= k <= 9999, 76 <= n < 2000000
nstep changed to 22
CL setup complete.
cthread_count = 10240
So then why doesn't it work in BOINC?
____________
Reno, NV
|
|
|
|
|
Should it be:
primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
So you are saying I should use something different than the app from the first post in this thread? Because that 1.35 is not part of that zip file.
YES
I think the ppsieve is for the older WU's before the pause in the sub-project
The new WU's need the tpsieve to work.
Now I maybe wrong, but not too wrong
Try what I PMed you, create a sub-dir in your www.primegrid.com and move everthing there. Then copy everything I sent you into www.primegrid.com.
and restart Boinc. Let me know the results
Steve |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
I think the ppsieve is for the older WU's before the pause in the sub-project
The new WU's need the tpsieve to work.
Edit: That is true; however...
When you see Computation Errors (of no candidates found), that means something is going wrong on the GPU. (Or there's a bug, but many people aren't experiencing this, so I'm doubting that.)
____________
|
|
|
|
|
I think the ppsieve is for the older WU's before the pause in the sub-project
The new WU's need the tpsieve to work.
Edit: That is true; however...
When you see Computation Errors (of no candidates found), that means something is going wrong on the GPU. (Or there's a bug, but many people aren't experiencing this, so I'm doubting that.)
So now what? Stand alone with the 1.35 works. Running it with BOINC does not.
____________
Reno, NV
|
|
|
|
|
So now what? Stand alone with the 1.35 works. Running it with BOINC does not.
This is my setup to run ppsieve (GPU) & Sophie (CPU) WU units on Windows with an ATI Card
Works on Windows XP64 Pro & Vista64
Boinc 6.10.56
ATI Drivers 10.11 (APP-Opengl)
.net 3.5 SP1
===Directory=== ... BOINC\projects\www.primegrid.com ====== contains these files ========
----------------------files---------------------------------------------------------------
llr.ini.5.09
primegrid_llr_wrapper_5.11_windows_intelx86.exe
primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe
primegrid_llr_5.09_windows_intelx86.exe.orig <--note------ .exe.orig
app_info.xml
-------------------------app_info.xml-------------------------------------------------
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>135</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>primegrid_tpsieve_1.35_windows_intelx86__ati13ati.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>llrTPS</name>
<user_friendly_name>Sophie Germain Prime Search (LLR)</user_friendly_name>
</app>
<file_info>
<name>primegrid_llr_wrapper_5.11_windows_intelx86.exe</name>
<executable/>
</file_info>
<file_info>
<name>llr.ini.5.09</name>
</file_info>
<file_info>
<name>primegrid_llr_5.09_windows_intelx86.exe.orig</name>
</file_info>
<app_version>
<app_name>llrTPS</app_name>
<version_num>511</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>1788215593.929880</flops>
<api_version>5.09</api_version>
<file_ref>
<file_name>primegrid_llr_wrapper_5.11_windows_intelx86.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>llr.ini.5.09</file_name>
<open_name>llr.ini.5.09</open_name>
</file_ref>
<file_ref>
<file_name>primegrid_llr_5.09_windows_intelx86.exe.orig</file_name>
<open_name>primegrid_llr_5.09_windows_intelx86.exe.orig</open_name>
</file_ref>
</app_version>
</app_info>
--------------------------------------------------------------------------------------------------------
Separate preferences for school
Resource share : 100
Use CPU : checked
Use ATI GPU : checked
Sophie Germain Prime Search (LLR) : checked
Send work from any subproject if selected projects have no work : checked
===============everythig else is blank (not checked)=================================
Hope this will work
Steve Martin
____________
From the High Desert in New Mexico
|
|
|
|
|
|
I currently have two ATI cards setup to run this. One works fine and one doesn't.
On Monday I setup a HD5850 and it is happily munching through WUs in about 23 minutes.
Last night I setup a HD4850 and it appeared to be working with one quirk. While watching it, it seemed like the WU would finish in about 52 minutes. At about 46 minutes it suddenly jumped to %100 and moved on to the next one. I didn't think anything was wrong since I've seen stuff like this before in other projects. Today I noticed this machine was not getting any credit. Every WU errored at about 46 minutes with a maximum time exceeded error.
Is this something that can be fixed before the challenge? Is a HD4890 fast enough to complete these? Is a HD4870 fast enough to finish these.
____________
|
|
|
|
|
I currently have two ATI cards setup to run this. One works fine and one doesn't.
On Monday I setup a HD5850 and it is happily munching through WUs in about 23 minutes.
Last night I setup a HD4850 and it appeared to be working with one quirk. While watching it, it seemed like the WU would finish in about 52 minutes. At about 46 minutes it suddenly jumped to %100 and moved on to the next one. I didn't think anything was wrong since I've seen stuff like this before in other projects. Today I noticed this machine was not getting any credit. Every WU errored at about 46 minutes with a maximum time exceeded error.
Is this something that can be fixed before the challenge? Is a HD4890 fast enough to complete these? Is a HD4870 fast enough to finish these.
I have a HD-4980 on two Vista Boxes and a HD-4830 on a XP Box.
I am using the App_info.xml setup in the message above. #29023
I know there was an issue with
"Maximum elapsed time exceeded"
on default setting - so I went back to the app_info.xml file
I am waiting for thing to shake out with that issue.
So you need to wait or go the app route
Good Luck
Steve
____________
From the High Desert in New Mexico
|
|
|
nenymSend message
Joined: 23 Apr 09 Posts: 22 ID: 39029 Credit: 1,395,911,998 RAC: 2,241,035
                         
|
|
Task pps_sr2sieve_4945295_1 errored out with Maximum elapsed time exceeded. Host ID 10601 (XP x_86, HD4770). I thought 4945295 was from new batch having corrected that issue. I have used the stock app without any app_info.xml.
|
|
|
|
|
Task pps_sr2sieve_4945295_1 errored out with Maximum elapsed time exceeded. Host ID 10601 (XP x_86, HD4770). I thought 4945295 was from new batch having corrected that issue. I have used the stock app without any app_info.xml.
see here. I think that they didn't fix the problem!
http://www.primegrid.com/forum_thread.php?id=2852&nowrap=true#29044 |
|
|
|
|
|
Hi everyone,
I get the same error..... exceeded time limit.
XP 64, 2x 5770, 10.10
Gentilli.
|
|
|
nenymSend message
Joined: 23 Apr 09 Posts: 22 ID: 39029 Credit: 1,395,911,998 RAC: 2,241,035
                         
|
|
You are right, it works fine using app_info.xml and manual sieving app, as did. |
|
|
|
|
I currently have two ATI cards setup to run this. One works fine and one doesn't.
On Monday I setup a HD5850 and it is happily munching through WUs in about 23 minutes.
Last night I setup a HD4850 and it appeared to be working with one quirk. While watching it, it seemed like the WU would finish in about 52 minutes. At about 46 minutes it suddenly jumped to %100 and moved on to the next one. I didn't think anything was wrong since I've seen stuff like this before in other projects. Today I noticed this machine was not getting any credit. Every WU errored at about 46 minutes with a maximum time exceeded error.
Is this something that can be fixed before the challenge? Is a HD4890 fast enough to complete these? Is a HD4870 fast enough to finish these.
I have a HD-4980 on two Vista Boxes and a HD-4830 on a XP Box.
I am using the App_info.xml setup in the message above. #29023
I know there was an issue with
"Maximum elapsed time exceeded"
on default setting - so I went back to the app_info.xml file
I am waiting for thing to shake out with that issue.
So you need to wait or go the app route
Good Luck
Steve
Thanks. I used the file and it seems to be working fine.
____________
|
|
|
|
|
|
There is a 15% win with the latest ATI sdk 2.3 and driver 10.12, but i don't know if this is the sdk or the driver which give me the win, because i first install the sdk and it needs the 10.12 driver's version to work. |
|
|
|
|
There is a 15% win with the latest ATI sdk 2.3 and driver 10.12, but i don't know if this is the sdk or the driver which give me the win, because i first install the sdk and it needs the 10.12 driver's version to work.
I have found the 10.12 APP/Opengl drivers for 64 Vista :
http://sites.amd.com/us/game/downloads/Pages/radeon_vista-64.aspx#1
But it does not say anything about ATI sdk 2.3 drivers
Can you post a link?
____________
TIA Steve Martin
____________
From the High Desert in New Mexico
|
|
|
|
|
|
It's here |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
I tried the 10.12 Drivers on 1 Box and didn't see any increase in speed with the Wu's.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
I tried the 10.12 Drivers on 1 Box and didn't see any increase in speed with the Wu's.
Ditto for me.
Added the SDK 2.3 also...with that, early result shows a 1.5% increase (maybe?), but will need to run a few more to verify.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
I tried the 10.12 Drivers on 1 Box and didn't see any increase in speed with the Wu's.
Ditto for me.
Added the SDK 2.3 also...with that, early result shows a 1.5% increase (maybe?), but will need to run a few more to verify.
same for me:
with SDK 2.3 one WU about 3min faster
from 41min down to 38min
Happy x-Mas |
|
|
|
|
|
I have a Raedon HD 4350 and on enabling it to recieve gpu WU's all error out.
This is the error I'm seeing:
<core_client_version>6.10.56</core_client_version>
< Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
I have a Raedon HD 4350 and on enabling it to recieve gpu WU's all error out.
This is the error I'm seeing:
<core_client_version>6.10.56</core_client_version>
< Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Uninstalled all ATI software & Drivers and the reinstaled the APP driver after a reboot.
I got the exact same error.
Reinstalled the SDK and still no help. I still have not found any where to set the ATISTREAMSDKROOT variable.
Thoughts?
Have you installed boinc as service or normal?
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
Have you installed boinc as service or normal?
This would only be an issue on the one box with Win7. Service installs with GPU are fine on Win XP.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
Uninstalled all ATI software & Drivers and the reinstaled the APP driver after a reboot.
I got the exact same error.
Reinstalled the SDK and still no help. I still have not found any where to set the ATISTREAMSDKROOT variable.
Thoughts?
DaveSun, all your primegrid hosts are windows without any BOINC compatible GPU listed. So BOINC is not seeing the GPU - once you get that addressed then you should try to run DNET or Collatz.
Another app I found is GPU Caps Viewer. It should show a check box for opengl and the demo should run.
You should try to run the standalone version first and if the error remains then it is probably a windows related error such as permissions.
At the moment the GPU is disabled in myccconfig.xml to prevent trashing work units so Boinc doesn't see the GPU. I have run GPU-Z and it shows it running OpenCL. I'll checkout Caps viewer and see what it says. Unless the SDK is not installing properly there should not be a permission problem.
I hope to get this sorted soon as I have 2 other ATI and 1 Fermi GPU waiting to be moved into other systems. |
|
|
|
|
Have you installed boinc as service or normal?
This would only be an issue on the one box with Win7. Service installs with GPU are fine on Win XP.
The Win7 system is slated to get the Fermi card.
The system I am currently trying to get set up is one of my Q9300s
Boinc 6.10.56 running as a service.
Wondering if I might need to upgrade Boinc.
Does it make a differance that these systems run headless? |
|
|
|
|
At the moment the GPU is disabled in myccconfig.xml to prevent trashing work units so Boinc doesn't see the GPU. I have run GPU-Z and it shows it running OpenCL. I'll checkout Caps viewer and see what it says. Unless the SDK is not installing properly there should not be a permission problem.
You really need to make sure that BOINC sees your cards and that another GPU project runs. If so, then download and run the latest standalone version (given this thread) and post the output.
|
|
|
|
|
At the moment the GPU is disabled in myccconfig.xml to prevent trashing work units so Boinc doesn't see the GPU. I have run GPU-Z and it shows it running OpenCL. I'll checkout Caps viewer and see what it says. Unless the SDK is not installing properly there should not be a permission problem.
You really need to make sure that BOINC sees your cards and that another GPU project runs. If so, then download and run the latest standalone version (given this thread) and post the output.
BOINC does see the card just fine. I believe that I have sorted this thing out. Currently crunching the first one on the GPU.
I did a complete uninstall of all ATI software drivers and anything else I could find. Then I did a search and destroy on the HD and registry for all ATI related files.
Reinstalled the APP drivers. Installed BOINC 6.10.58. and that's when it hit me that BOINC maybe running in protected mode on that machine, I turned off the protected mode. Enabled the GPU and crunching began.
GPU-Z is currently showing temps at 46 C and 99% load. :) |
|
|
|
|
For both cases, for whatever reason, the SDK is not installing properly. If you did not install the APP version of the driver, you need to do so (look for it as the second or third entry, typically, in the list of drivers--the regular drivers do not have what is needed for OpenCL). If you did install the APP version, then some problem likely occurred during the installation. Reinstalling can sometimes fix this, but if not, then two other options might work. First, you could install the sdk directly (google "ATI SDK 2.3" and chose the correct 32- or 64-bit version for your OS). If that does not work, uninstall the ATI drivers and software completely (*note: it is a good idea to use a driver sweeper/cleaner software after doing a standard uninstall to make sure that all the old install is wiped) and then do a full reinstall of the APP drivers.
I followed exactly that procedure several times now, including driver sweeper. Oops, just realized I used the w7 2.3 SDK by accident...hopefully that's it.
Now after a complete new clean install of 10.12 APP driver and even after a complete new install of the 2.3 SDK for XP 64 it still didn't work.
The solution was simple: using W7 now and things started working instantly. Now I am still guessing why, could be the SDK installation under XP is bugged?
|
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
The solution was simple: using W7 now and things started working instantly. Now I am still guessing why, could be the SDK installation under XP is bugged?
All my Box's are XP & they run the SDK okay on the ATI ones ... XP 64-Bit
____________
|
|
|
|
|
The solution was simple: using W7 now and things started working instantly. Now I am still guessing why, could be the SDK installation under XP is bugged?
All my Box's are XP & they run the SDK okay on the ATI ones ... XP 64-Bit
Do you use a box with two ATI cards? |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
The solution was simple: using W7 now and things started working instantly. Now I am still guessing why, could be the SDK installation under XP is bugged?
All my Box's are XP & they run the SDK okay on the ATI ones ... XP 64-Bit
Do you use a box with two ATI cards?
I only have 2 Box's right now running ATI's and both of them have 2 HD5870's in them. I did have about a dozen or so Win XP Pro 64-Bit DUAL HD5850 & HD5870 ATI Box's running the ppsieve Wu's at first but have slowly switched them over to NVIDIA Cards. I want to go hit the GPUGrid Project for awhile and ATI just ain't getting it there ...
____________
|
|
|
|
|
|
Was the -117 error code problem ever solved?
____________
Reno, NV
|
|
|
|
|
|
I have something to report.
I was able to run these fine on 4850 using SDK 2.2. W7 64-bit. No errors whatsoever.
Recently, i updated the SDK to 2.3. now, all WU from PG for the GPU comp. error out.
Seems to me this app doesn't cover 2.3 SDK Open-CL. Anyone else verify this? I know this is very likely because i ran Milkyway@home, which uses CAL, and it ran fine. only PG, which uses CL, conks out.
EDIT: I concur with the above post stating that GPU-Z does not display the Open-CL tick.
Verdict: 2.3 SDK isn't compatible with this app.
Recc. Downgrade SDK to 2.2.
I'll do so, and post back with results. |
|
|
|
|
I have something to report.
I was able to run these fine on 4850 using SDK 2.2. W7 64-bit. No errors whatsoever.
Recently, i updated the SDK to 2.3. now, all WU from PG for the GPU comp. error out.
Seems to me this app doesn't cover 2.3 SDK Open-CL. Anyone else verify this? I know this is very likely because i ran Milkyway@home, which uses CAL, and it ran fine. only PG, which uses CL, conks out.
EDIT: I concur with the above post stating that GPU-Z does not display the Open-CL tick.
Verdict: 2.3 SDK isn't compatible with this app.
Recc. Downgrade SDK to 2.2.
I'll do so, and post back with results.
Verdict: you didn't even read any message.
Why?
Because to everybody else SDK 2.3 works better than 2.2, sometimes even with lower computing times.
Check your installation. Maybe you didn't fully uninstall Catalyst 10.11 and SDK 2.2 before installing Catalyst 10.12 APP edition |
|
|
|
|
|
i stand corrected.
If my driver date is correct, i have 10.8. Req's on site say 10.9+ is req for SDK 2.3. That is most likely why. I'm gonna have to remove CCC 10.8 and install 10.11.
I also suggest that the SDK should be removed before CCC, just to make things cleaner. SDK is reliant on CCC and the ATI drivers.
If you remove the bottom of a tower of blocks first, the top would most likely collapse into a mess. |
|
|
|
|
|
I've been running ppsieve for a few days now and have some questions. I've got a 5870hd with the latest drivers, and ubuntu.
1. workunits are taking about 30 minutes, reading this thread it looks like its taking 10 minutes for most people with a 5870. Does that have something to do with app_info.xml, or have the work units increased in length since then? I'm also running riesel llr at the same time on my 2 cpus if that makes a big difference
2. where can I find out about what app_info.xml is?
thanks
|
|
|
|
|
I've been running ppsieve for a few days now and have some questions. I've got a 5870hd with the latest drivers, and ubuntu.
1. workunits are taking about 30 minutes, reading this thread it looks like its taking 10 minutes for most people with a 5870. Does that have something to do with app_info.xml, or have the work units increased in length since then? I'm also running riesel llr at the same time on my 2 cpus if that makes a big difference
2. where can I find out about what app_info.xml is?
thanks
The WU length was increased several times in the last months.
____________
|
|
|
|
|
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
- exit code -117 (0xffffff8b)
</message>
<stderr_txt>
Sieve started: 53120740000000000 <= p < 53120746000000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Computation Error: no candidates found for p=53120740348359503 between 95 and 381023.
10:25:52 (3380): called boinc_finish
</stderr_txt>
]]>
What it means? Why "Compute error"?
Yesterday was OK, but now I have many errors.
http://www.primegrid.com/results.php?hostid=196693&offset=0&show_names=0&state=5&appid=
Yesterday, I installed the latest version from ATI.
SDK 2.4 + CCC 11.3 + BM 6.10.60
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2258 ID: 1178 Credit: 10,867,108,087 RAC: 11,866,263
                                        
|
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
- exit code -117 (0xffffff8b)
</message>
<stderr_txt>
Sieve started: 53120740000000000 <= p < 53120746000000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Computation Error: no candidates found for p=53120740348359503 between 95 and 381023.
10:25:52 (3380): called boinc_finish
</stderr_txt>
]]>
What it means? Why "Compute error"?
Yesterday was OK, but now I have many errors.
http://www.primegrid.com/results.php?hostid=196693&offset=0&show_names=0&state=5&appid=
Yesterday, I installed the latest version from ATI.
SDK 2.4 + CCC 11.3 + BM 6.10.60
I have had problems on multiple ATI cards with the newer 11.x drivers. 11.3 was supposed to fix some of these issues, but it may not have. Rolling back to the 10.x drivers (and maybe a manual SDK install) may be needed.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Hi all,
just installed latest ATI drivers 11.3 and BOINC ver. 6.12.22(x64)
with following result:
<core_client_version>6.12.22</core_client_version>
<![CDATA[
<stderr_txt>
Sieve started: 55011394000000000 <= p < 55011400000000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Thread 0 completed
Sieve complete: 55011394000000000 <= p < 55011400000000000
count=155652304,sum=0x3956cd29d2864d18
Elapsed time: 3906.73 sec. (1.71 init + 3905.02 sieve) at 1536537 p/sec.
Processor time: 299.66 sec. (1.72 init + 297.95 sieve) at 20138575 p/sec.
Average processor utilization: 1.01 (init), 0.08 (sieve)
17:00:37 (4132): called boinc_finish
</stderr_txt>
]]>
Best regards
parabol |
|
|
|
|
|
Working well without invalids on both cores of my 5970 with Catalyst 11.4 and SDK 2.4. Takes about 36 minutes per task with core/memory @ 770/500.
Uses a full CPU core for each task but looks like it does that for everyone with multiple ATI GPUs. Default 0.79 avg_ncpus is too low with multiple ATI GPUs and causes overcommitment of CPU unless BOINC is configured to use less than all CPU cores. One day I will increase avg_ncpus value to 1 in an app_info.xml to work around this. For now it's working and that's good.
<core_client_version>6.12.22</core_client_version>
< Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
All PPS sieve wu are failing with Catalyst 11.6 on Vista 32bit with HD4670. Work fine with Catalyst 11.5.
According to ATI GPU (HD4350) Issues... "Scott Brown" wrote: ATI dropped the OpenCL support for 4xxx cards in the newest drivers, so older drivers are the only option to get these cards to work at PG.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
|
umm...
i have a HD4850 running on 11.9... PPS sieve is working fine... and GPU-z sez i has OpenCL.
you might need to install AMD APP SDK for it to work. |
|
|
|
|
|
Hi,
I run XP Home ServicePack3 on CPU P4,2.66MHz Northwood, installed a HD4670 AGP.
The OCL code runs, but CPU-usage is 100% all the time.
Runtime per wu is 2h 35min
Have a look on my hostid=232541
Any solution to avoid such high CPU utilization ?
heinz |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Any solution to avoid such high CPU utilization ? Not really. The CPU utilization comes from the OpenCL-code.
Take a look at Milkyway, there happened the same on nVidia cards.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
|
With AMD 7970 I've got some problems
20:39:02 (1032): Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Computation Error: no candidates found for p=42070000039579 between 755788 and 1001548.
20:39:03 (1032): called boinc_finish
22:15:22 (2496): Can't set up shared mem: -1. Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Computation Error: no candidates found for p=42070000039579 between 755788 and 1001548.
22:15:23 (2496): called boinc_finish
Drivers: AMD Catalyst 12.1 Preview + OpenCL 1.2 package
2012-01-11 22:08:54 | | ATI GPU 0: ATI unknown (CAL version 1.4.1658, 3072MB, 3033MB available, 11520 GFLOPS peak)
2012-01-11 22:08:54 | | OpenCL: ATI GPU 0: Tahiti (driver version CAL 1.4.1658 (VM), device version OpenCL 1.1 AMD-APP (851.6), 6144MB)
2012-01-11 22:08:54 | | ATI GPU is OpenCL-capable
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
With AMD 7970 I've got some problems
Well, I hoped this GCN problem would go away with updated drivers, but after two years, it looks like it hasn't. So I tried working on this again, and found that I can now compile versions that will run on my nVidia card. Not as fast as CUDA, but fast enough to help with debugging.
The good news is that I've figured out a way to double-check if the card is working correctly on each prime factor, and recover from errors. So Computation Errors will no longer be fatal! Unless your card mis-computes a large portion (12% or more) of tests, which is largely what I'm putting this alpha version out to find out.
The better news is that I finally got around to porting the current CUDA algorithm to OpenCL. It's only 20% faster than the old algorithm on my nVidia card, but I'm hopeful it will be faster on AMD cards.
The bad news is I don't know if I can fix the problem on GCN cards that started all this, yet. The other bad news is that this double-checking uses more CPU time, so I need to find out if I need to optimize the CPU code. I gather from reading this thread that OpenCL uses one core on AMD cards too? If not, please let me know how much CPU it does use on a fast card. If so, um, just let me know if the new version doesn't run faster than the old one.
What I'd like you do to:
1. If you don't have an AMD card, Radeon 4000-series or later, or your computer is a Mac without Windows or Linux, skip this and go back to whatever you were doing before.
2. The rest of you, please download my test binaries. No, they're not built for BOINC, yet.
3. Please run the binary for your platform, with the following arguments:
-p35679989642e7 -P35679989643e7 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
4. If the app works without errors, and finds that "356799896420257891 | 6125*2^4654257+1", please continue. Otherwise, stop here and report the errors you see in this thread.
5. Please get the old binary, from your BOINC directory, here, or here, and compare how long it takes to run with the same arguments. Its timing may be written to a file called "stderr.txt".
6. If you have a GCN card (7000-series, or R7 or R9-series, or the on-chip graphics of "Kaveri"), I'd like you to try some other timing tests on my new binaries. Try replacing -M2 with -M1, -M4, and maybe -M8. Then, please run each of those tests again with the additional argument "--vecsize=1". This will help me determine if I can make GCN cards faster with different default parameters. Also please report your CPU usage if it wasn't 100% on every test.
7. Please don't forget to report your results here. Thanks for your help!
____________
|
|
|
|
|
|
Asus HD7950
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 28672
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.82 sec. (2.14 init + 2.68 sieve) at 3814776 p/sec.
Processor time: 3.81 sec. (2.12 init + 1.68 sieve) at 6068109 p/sec.
Average processor utilization: 0.99 (init), 0.63 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M1 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 28672
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.69 sec. (2.09 init + 2.60 sieve) at 3932155 p/sec.
Processor time: 3.84 sec. (2.11 init + 1.73 sieve) at 5904107 p/sec.
Average processor utilization: 1.01 (init), 0.67 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M4 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 7 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 28672
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.72 sec. (2.10 init + 2.62 sieve) at 3902138 p/sec.
Processor time: 3.74 sec. (2.11 init + 1.64 sieve) at 6241482 p/sec.
Average processor utilization: 1.00 (init), 0.63 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M8 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9995, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 28672
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.69 sec. (2.09 init + 2.60 sieve) at 3932154 p/sec.
Processor time: 3.79 sec. (2.11 init + 1.68 sieve) at 6068109 p/sec.
Average processor utilization: 1.01 (init), 0.65 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M2 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 14336
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.58 sec. (2.09 init + 2.49 sieve) at 4105865 p/sec.
Processor time: 3.79 sec. (2.11 init + 1.68 sieve) at 6068109 p/sec.
Average processor utilization: 1.01 (init), 0.68 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M1 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 14336
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.71 sec. (2.09 init + 2.62 sieve) at 3902138 p/sec.
Processor time: 3.70 sec. (2.09 init + 1.61 sieve) at 6362679 p/sec.
Average processor utilization: 1.00 (init), 0.61 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M4 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 7 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 14336
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.62 sec. (2.10 init + 2.52 sieve) at 4056986 p/sec.
Processor time: 3.70 sec. (2.09 init + 1.61 sieve) at 6362679 p/sec.
Average processor utilization: 1.00 (init), 0.64 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M8 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9995, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 14336
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.67 sec. (2.11 init + 2.56 sieve) at 3993594 p/sec.
Processor time: 3.68 sec. (2.11 init + 1.58 sieve) at 6488672 p/sec.
Average processor utilization: 1.00 (init), 0.62 (sieve)
For these short tests only 20% GPU usage, cannot tell you how much cpu usage but it looks like not a full cpu core. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Thanks, Rebirther! Now I know a few things:
- The Windows app works! I can't test it here, so I wasn't sure.
- Not all GCN cards are failing with the new app, which is a Very Good Thing.
- --vecsize=1 does produce a speedup on GCN. Now I have to figure out how to test if a card is GCN or not.
Just a couple more things:
1. Could you please run the same range on the old app? (Step #5.)
2. If you want to run a longer range, try this:
-p356799891e9 -P356799900e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
Same result, just a bigger range around it.
____________
|
|
|
|
|
Thanks, Rebirther! Now I know a few things:
- The Windows app works! I can't test it here, so I wasn't sure.
- Not all GCN cards are failing with the new app, which is a Very Good Thing.
- --vecsize=1 does produce a speedup on GCN. Now I have to figure out how to test if a card is GCN or not.
Just a couple more things:
1. Could you please run the same range on the old app? (Step #5.)
2. If you want to run a longer range, try this:
-p356799891e9 -P356799900e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
Same result, just a bigger range around it.
Do you have a link for the old app or is it the old link below in the first post?
If you can check the output there are some 0 factors found, sometimes 1.
The results based on Catalyst 13.12 on 64bit Win7.
Edit
Based on the app from 26.02.2011
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.3e (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 28672
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.55 sec. (2.17 init + 3.38 sieve) at 3026356 p/sec.
Processor time: 3.48 sec. (2.07 init + 1.40 sieve) at 7281731 p/sec.
Average processor utilization: 0.95 (init), 0.42 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M1 -c 60
tpsieve version cl-0.2.3e (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 28672
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.60 sec. (2.14 init + 3.46 sieve) at 2956343 p/sec.
Processor time: 3.63 sec. (2.11 init + 1.53 sieve) at 6687303 p/sec.
Average processor utilization: 0.98 (init), 0.44 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M4 -c 60
tpsieve version cl-0.2.3e (testing)
nstart=3000000, nstep=46
tpsieve initialized: 7 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 28672
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.59 sec. (2.13 init + 3.46 sieve) at 2953780 p/sec.
Processor time: 3.57 sec. (2.12 init + 1.45 sieve) at 7046838 p/sec.
Average processor utilization: 1.00 (init), 0.42 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M8 -c 60
tpsieve version cl-0.2.3e (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9995, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 28672
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.65 sec. (2.14 init + 3.52 sieve) at 2906747 p/sec.
Processor time: 3.57 sec. (2.12 init + 1.45 sieve) at 7046838 p/sec.
Average processor utilization: 0.99 (init), 0.41 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M2 -c 60 --vecsize=1
tpsieve version cl-0.2.3e (testing)
tpsieve-cl-x86-windows.exe: out of range argument --vecsize 1
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M1 -c 60 --vecsize=1
tpsieve version cl-0.2.3e (testing)
tpsieve-cl-x86-windows.exe: out of range argument --vecsize 1
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M4 -c 60 --vecsize=1
tpsieve version cl-0.2.3e (testing)
tpsieve-cl-x86-windows.exe: out of range argument --vecsize 1
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -M8 -c 60 --vecsize=1
tpsieve version cl-0.2.3e (testing)
tpsieve-cl-x86-windows.exe: out of range argument --vecsize 1
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
Do you have a link for the old app or is it the old link below in the first post?
5. Please get the old binary, from your BOINC directory, here, or here, and compare how long it takes to run with the same arguments. Its timing may be written to a file called "stderr.txt".
If you can check the output there are some 0 factors found, sometimes 1.
Whoa! And no Computation Errors reported either!
Well, whatever the problem is, I can reproduce it here with those particular command lines. So I ought to be able to fix it...
____________
|
|
|
|
|
|
Oh, nearly the same time posted. See the update below, I hope it was the right app. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
*facepalm* *doh*
-M needs to always be 2.
Try -m (lower-case), with -m4, and -m16.
Edit: P.S. No, the times are not the same at all! The new algorithm is 33% faster on your machine!
CPU usage also went up, after adjusting for the proportional speed increase, about 20%. I expect some people may be CPU-limited if I release it like this. I'm gonna go ahead and try optimizing the CPU usage with SSE2 multiplication code at some point before release.
____________
|
|
|
|
|
|
If these are the correct command lines...
older app
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m2 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 7168
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 6.71 sec. (2.18 init + 4.54 sieve) at 2253847 p/sec.
Processor time: 3.85 sec. (2.17 init + 1.68 sieve) at 6068109 p/sec.
Average processor utilization: 1.00 (init), 0.37 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m4 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 14336
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.20 sec. (2.18 init + 3.02 sieve) at 3382963 p/sec.
Processor time: 3.95 sec. (2.18 init + 1.76 sieve) at 5799610 p/sec.
Average processor utilization: 1.00 (init), 0.58 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m16 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 57344
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.85 sec. (2.17 init + 2.68 sieve) at 3813225 p/sec.
Processor time: 4.27 sec. (2.17 init + 2.11 sieve) at 4854486 p/sec.
Average processor utilization: 1.00 (init), 0.79 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m2 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 3584
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 7.11 sec. (2.17 init + 4.95 sieve) at 2065765 p/sec.
Processor time: 3.82 sec. (2.15 init + 1.67 sieve) at 6124819 p/sec.
Average processor utilization: 0.99 (init), 0.34 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m4 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 7168
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.18 sec. (2.17 init + 3.01 sieve) at 3396470 p/sec.
Processor time: 3.78 sec. (2.18 init + 1.59 sieve) at 6425058 p/sec.
Average processor utilization: 1.01 (init), 0.53 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m16 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 28672
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.63 sec. (2.17 init + 2.46 sieve) at 4157498 p/sec.
Processor time: 3.79 sec. (2.15 init + 1.64 sieve) at 6241482 p/sec.
Average processor utilization: 0.99 (init), 0.67 (sieve)
new app
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m2 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 7168
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 6.68 sec. (2.15 init + 4.53 sieve) at 2259329 p/sec.
Processor time: 3.81 sec. (2.15 init + 1.65 sieve) at 6182600 p/sec.
Average processor utilization: 1.00 (init), 0.37 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m4 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 14336
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.23 sec. (2.17 init + 3.06 sieve) at 3337681 p/sec.
Processor time: 3.92 sec. (2.15 init + 1.76 sieve) at 5799607 p/sec.
Average processor utilization: 0.99 (init), 0.58 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m16 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 57344
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.81 sec. (2.17 init + 2.65 sieve) at 3863659 p/sec.
Processor time: 4.21 sec. (2.18 init + 2.03 sieve) at 5041198 p/sec.
Average processor utilization: 1.01 (init), 0.77 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m2 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 3584
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 7.10 sec. (2.17 init + 4.93 sieve) at 2072885 p/sec.
Processor time: 3.84 sec. (2.17 init + 1.67 sieve) at 6124819 p/sec.
Average processor utilization: 1.00 (init), 0.34 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m4 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 7168
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.14 sec. (2.19 init + 2.95 sieve) at 3467906 p/sec.
Processor time: 3.79 sec. (2.20 init + 1.59 sieve) at 6425058 p/sec.
Average processor utilization: 1.01 (init), 0.54 (sieve)
f:\test>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n
3000000 -N6000000 -T -m16 -c 60 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 3 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 28672
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 0 factors
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 4.62 sec. (2.19 init + 2.43 sieve) at 4203649 p/sec.
Processor time: 3.90 sec. (2.18 init + 1.72 sieve) at 5957780 p/sec.
Average processor utilization: 1.00 (init), 0.71 (sieve)
|
|
|
|
|
*facepalm* *doh*
-M needs to always be 2.
Try -m (lower-case), with -m4, and -m16.
Edit: P.S. No, the times are not the same at all! The new algorithm is 33% faster on your machine!
CPU usage also went up, after adjusting for the proportional speed increase, about 20%. I expect some people may be CPU-limited if I release it like this. I'm gonna go ahead and try optimizing the CPU usage with SSE2 multiplication code at some point before release.
lol, that was not the time meaning of crunching, only for posting ;)
Forgot the -M command but looks ok for all 6 values, every value got the same factor found and count by 1. |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,621,444 RAC: 21
                    
|
- The Windows app works! I can't test it here, so I wasn't sure.
- Not all GCN cards are failing with the new app, which is a Very Good Thing.
- --vecsize=1 does produce a speedup on GCN. Now I have to figure out how to test if a card is GCN or not.
Just a couple more things:
1. Could you please run the same range on the old app? (Step #5.)
2. If you want to run a longer range, try this:
-p356799891e9 -P356799900e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
Same result, just a bigger range around it.
What is the -T command line option? Not mentioned in the help.
AMD 7970, X6 1100T
0.68 CPU cores, 82% GPU load:
>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 32768
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.55 sec. (2.53 init + 3.03 sieve) at 3378109 p/sec.
Processor time: 4.98 sec. (2.53 init + 2.45 sieve) at 4174242 p/sec.
Average processor utilization: 1.00 (init), 0.81 (sieve)
>tpsieve-cl-x86-windows.exe -p356799891e9 -P356799900e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799891000000000 <= p < 356799900000000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 32768
p=356799891563871745, 9.398M p/sec, 0.68 CPU cores, 6.3% done. ETA 26 Jan 11:55
p=356799892149239297, 9.708M p/sec, 0.68 CPU cores, 12.8% done. ETA 26 Jan 11:55
p=356799892737490433, 9.758M p/sec, 0.68 CPU cores, 19.3% done. ETA 26 Jan 11:55
p=356799893328363009, 9.801M p/sec, 0.68 CPU cores, 25.9% done. ETA 26 Jan 11:55
p=356799893919235585, 9.807M p/sec, 0.69 CPU cores, 32.4% done. ETA 26 Jan 11:55
p=356799894506438145, 9.746M p/sec, 0.68 CPU cores, 39.0% done. ETA 26 Jan 11:55
p=356799895085252097, 9.615M p/sec, 0.66 CPU cores, 45.4% done. ETA 26 Jan 11:55
p=356799895672454657, 9.718M p/sec, 0.68 CPU cores, 51.9% done. ETA 26 Jan 11:55
p=356799896253890049, 9.642M p/sec, 0.67 CPU cores, 58.4% done. ETA 26 Jan 11:55
356799896420257891 | 6125*2^4654257+1
p=356799896840830465, 9.744M p/sec, 0.68 CPU cores, 64.9% done. ETA 26 Jan 11:55
p=356799897431440897, 9.789M p/sec, 0.68 CPU cores, 71.5% done. ETA 26 Jan 11:55
p=356799898010254849, 9.601M p/sec, 0.67 CPU cores, 77.9% done. ETA 26 Jan 11:55
p=356799898589068801, 9.595M p/sec, 0.67 CPU cores, 84.3% done. ETA 26 Jan 11:55
p=356799899170504193, 9.638M p/sec, 0.68 CPU cores, 90.8% done. ETA 26 Jan 11:55
p=356799899745910273, 9.557M p/sec, 0.68 CPU cores, 97.2% done. ETA 26 Jan 11:55
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799891000000000 <= p < 356799900000000000
Found 1 factor
count=222687271,sum=0x11a0ce66e3586af7
Elapsed time: 933.80 sec. (2.53 init + 931.27 sieve) at 9664464 p/sec.
Processor time: 632.99 sec. (2.53 init + 630.46 sieve) at 14275537 p/sec.
Average processor utilization: 1.00 (init), 0.68 (sieve)
I ran the following test to get loading, not run to completion:
-m2 I get 0.30 CPU cores, 88% GPU load.
-m6 I get 0.54 CPU cores, 84% GPU load.
--vecsize=1 I get 0.64 CPU cores, 84% GPU load.
-m2 --vecsize=1 I get 0.24 CPU cores, 79% GPU load.
-m4 --vecsize=1 I get 0.48 CPU cores, 85% GPU load.
-m6 --vecsize=1 I get 0.51 CPU cores, 85% GPU load.
-m16 --vecsize=1 I get 0.70 CPU cores, 83% GPU load.
The -m value is multiplied by 2048 to give cthread_count.
-m16 appears to be the default.
The obvious conclusion is to start cooking with gas:
-m64 --vecsize=1 I get 0.87 CPU cores, 97% GPU load:
>tpsieve-cl-x86-windows.exe -p356799891e9 -P356799900e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60 -m64 --vecsize=1
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799891000000000 <= p < 356799900000000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 131072
p=356799891780140545, 13.00M p/sec, 0.83 CPU cores, 8.7% done. ETA 26 Jan 12:59
p=356799892602224129, 13.52M p/sec, 0.85 CPU cores, 17.8% done. ETA 26 Jan 12:59
p=356799893418540545, 13.45M p/sec, 0.85 CPU cores, 26.9% done. ETA 26 Jan 12:59
p=356799894245080577, 13.60M p/sec, 0.85 CPU cores, 36.1% done. ETA 26 Jan 12:59
p=356799895061134849, 13.39M p/sec, 0.84 CPU cores, 45.1% done. ETA 26 Jan 12:59
p=356799895871946241, 13.29M p/sec, 0.84 CPU cores, 54.1% done. ETA 26 Jan 12:59
356799896420257891 | 6125*2^4654257+1
p=356799896687476225, 13.41M p/sec, 0.85 CPU cores, 63.2% done. ETA 26 Jan 12:59
p=356799897501171201, 13.41M p/sec, 0.84 CPU cores, 72.2% done. ETA 26 Jan 12:59
p=356799898317487617, 13.44M p/sec, 0.85 CPU cores, 81.3% done. ETA 26 Jan 12:59
p=356799899127774721, 13.31M p/sec, 0.83 CPU cores, 90.3% done. ETA 26 Jan 12:59
p=356799899943828993, 13.41M p/sec, 0.84 CPU cores, 99.4% done. ETA 26 Jan 12:59
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799891000000000 <= p < 356799900000000000
Found 1 factor
count=222687271,sum=0x11a0ce66e3586af7
Elapsed time: 676.24 sec. (2.51 init + 673.73 sieve) at 13358792 p/sec.
Processor time: 570.17 sec. (2.51 init + 567.66 sieve) at 15854995 p/sec.
Average processor utilization: 1.00 (init), 0.84 (sieve)
>tpsieve-cl-x86-windows.exe -p356799891e9 -P356799900e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60 -m64
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799891000000000 <= p < 356799900000000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 262144
p=356799891752091137, 12.53M p/sec, 0.81 CPU cores, 8.4% done. ETA 26 Jan 13:14
p=356799892525678081, 12.51M p/sec, 0.80 CPU cores, 17.0% done. ETA 26 Jan 13:14
p=356799893299265025, 12.53M p/sec, 0.80 CPU cores, 25.5% done. ETA 26 Jan 13:14
p=356799894072851969, 12.51M p/sec, 0.80 CPU cores, 34.1% done. ETA 26 Jan 13:14
p=356799894846438913, 12.51M p/sec, 0.80 CPU cores, 42.7% done. ETA 26 Jan 13:14
p=356799895620025857, 12.51M p/sec, 0.79 CPU cores, 51.3% done. ETA 26 Jan 13:14
p=356799896393612801, 12.45M p/sec, 0.79 CPU cores, 59.9% done. ETA 26 Jan 13:14
356799896420257891 | 6125*2^4654257+1
p=356799897165626881, 12.42M p/sec, 0.78 CPU cores, 68.5% done. ETA 26 Jan 13:14
p=356799897939475969, 12.58M p/sec, 0.80 CPU cores, 77.1% done. ETA 26 Jan 13:14
p=356799898713062913, 12.47M p/sec, 0.78 CPU cores, 85.7% done. ETA 26 Jan 13:14
p=356799899486387713, 12.46M p/sec, 0.78 CPU cores, 94.3% done. ETA 26 Jan 13:14
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799891000000000 <= p < 356799900000000000
Found 1 factor
count=222687271,sum=0x11a0ce66e3586af7
Elapsed time: 725.07 sec. (2.53 init + 722.54 sieve) at 12456272 p/sec.
Processor time: 574.58 sec. (2.53 init + 572.06 sieve) at 15733067 p/sec.
Average processor utilization: 1.00 (init), 0.79 (sieve))
<Edit> Updated after I realised I had resumed from a checkpoint.
Reasonably quicker with -m64 than default.
Bit slower on the AMD 7970 without the --vecsize=1. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
The -m value is multiplied by 2048 to give cthread_count.
Technically, the -m value is multiplied by multiprocessor count times four.
-m16 appears to be the default.
Actually, it's -m8 by default. (Unless I'm misreading my own code, which has happened quite a lot lately.)
The obvious conclusion is to start cooking with gas:
-m64 --vecsize=1 I get 0.87 CPU cores, 97% GPU load:
Awesome!
About 300% quicker with -m64 than default.
I think you may have misinterpreted something here:
Elapsed time: 333.87 sec. (2.54 init + 331.33 sieve) at 13502319 p/sec.
The second bolded number is giving a correct estimate; the first is not. Why? Because:
Resuming from checkpoint p=356799895526440449 in tpcheck356799891e9.txt
You might want to delete tpcheck* files between runs. ;)
Still, you got a very impressive 40% improvement over the default settings.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13633 ID: 53948 Credit: 280,904,358 RAC: 40,710
                           
|
Elapsed time: 333.87 sec. (2.54 init + 331.33 sieve) at 13502319 p/sec.
The second bolded number is giving a correct estimate; the first is not. Why?
Complete wild-assed guess here, but this sounds a lot like a problem that had a lot of really smart people stumped when GeneferCUDA first came out. We couldn't figure out why the benchmarks were all messed up on the Linux build.
The underlying problem didn't matter so much with the CPU version of Genefer, but it did with the GPU version. The standard C library has functions for measuring time, and the same functions exist in both the Linux and Windows (MSVC) compilers, except on one platform the time is measuring elapsed (wall clock) time and on the other platform it's measuring CPU time. Same exact function, but it works differently depending on whether you're on Windows or Linux. On a CPU program running at 100% utilization it doesn't matter a lot which one you're measuring, but on a GPU app it does.
Sorry, I don't recall exactly which function it was, and as you know I'm in the process of rebuilding my dev machine so I can't easily look up the details at the moment. Given that the benchmarks worked correctly on Windows but not on Linux, I suspect that Linux was measuring CPU time and Windows is measuring elapsed time.
____________
My lucky number is 75898524288+1 |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
The question was rhetorical. I edited my post to make that clear. |
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
R9-280X @ 1 GHz, i7-980X @ 4 GHz (HT enabled), Catalyst 13.12, Win7 x64
Old version:
>tpsieve-cl-x86-windows_old.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.3e (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 32768
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 6.21 sec. (2.88 init + 3.32 sieve) at 3075521 p/sec.
Processor time: 4.68 sec. (2.89 init + 1.79 sieve) at 5698744 p/sec.
Average processor utilization: 1.00 (init), 0.54 (sieve)
New version with default settings:
>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
nstep changed to 32
CL setup complete.
cthread_count = 32768
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 5.69 sec. (2.97 init + 2.73 sieve) at 3748820 p/sec.
Processor time: 5.19 sec. (2.93 init + 2.26 sieve) at 4519694 p/sec.
Average processor utilization: 0.99 (init), 0.83 (sieve)
GPU load ~64% in both cases, CPU load according to task manager matches the "Average processor utilization".
New version with different -m settings:
-m1:
Elapsed time: 11.08 sec. (2.88 init + 8.20 sieve) at 1246407 p/sec.
Processor time: 4.98 sec. (2.89 init + 2.09 sieve) at 4890716 p/sec.
Average processor utilization: 1.00 (init), 0.25 (sieve)
-m2:
Elapsed time: 7.73 sec. (2.88 init + 4.85 sieve) at 2106538 p/sec.
Processor time: 5.05 sec. (2.89 init + 2.17 sieve) at 4714790 p/sec.
Average processor utilization: 1.00 (init), 0.45 (sieve)
-m4:
Elapsed time: 5.99 sec. (2.88 init + 3.11 sieve) at 3287148 p/sec.
Processor time: 5.12 sec. (2.89 init + 2.23 sieve) at 4582908 p/sec.
Average processor utilization: 1.00 (init), 0.72 (sieve)
-m16:
Elapsed time: 5.52 sec. (2.87 init + 2.64 sieve) at 3867965 p/sec.
Processor time: 4.96 sec. (2.85 init + 2.11 sieve) at 4854489 p/sec.
Average processor utilization: 0.99 (init), 0.80 (sieve)
-m32:
Elapsed time: 5.64 sec. (2.88 init + 2.77 sieve) at 3695962 p/sec.
Processor time: 5.10 sec. (2.89 init + 2.22 sieve) at 4615182 p/sec.
Average processor utilization: 1.00 (init), 0.80 (sieve)
-m64:
Elapsed time: 6.01 sec. (2.88 init + 3.12 sieve) at 3273465 p/sec.
Processor time: 5.09 sec. (2.90 init + 2.18 sieve) at 4681113 p/sec.
Average processor utilization: 1.01 (init), 0.70 (sieve)
In all cases, the factor was found and GPU load again ~64%.
New version with different -m settings and --vecsize=1:
-m4:
Elapsed time: 6.29 sec. (2.92 init + 3.37 sieve) at 3035342 p/sec.
Processor time: 4.66 sec. (2.61 init + 2.06 sieve) at 4964817 p/sec.
Average processor utilization: 0.89 (init), 0.61 (sieve)
-m8:
Elapsed time: 5.68 sec. (2.88 init + 2.80 sieve) at 3645875 p/sec.
Processor time: 5.04 sec. (2.89 init + 2.15 sieve) at 4748955 p/sec.
Average processor utilization: 1.00 (init), 0.77 (sieve)
-m16:
Elapsed time: 5.37 sec. (2.87 init + 2.50 sieve) at 4087578 p/sec.
Processor time: 5.01 sec. (2.87 init + 2.14 sieve) at 4783618 p/sec.
Average processor utilization: 1.00 (init), 0.85 (sieve)
-m32:
Elapsed time: 5.41 sec. (2.88 init + 2.52 sieve) at 4055150 p/sec.
Processor time: 4.96 sec. (2.85 init + 2.11 sieve) at 4854489 p/sec.
Average processor utilization: 0.99 (init), 0.84 (sieve)
Again, factor found and GPU load ~64% in all cases.
I'll try it on an old HD5870 later today. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
I'll try it on an old HD5870 later today.
Thanks, I look forward to it. I'd like to see how performance has changed on older cards. At some point, I'd also like to make sure this app can still run on 4000-series cards.
I think it's becoming clear that a longer test range is necessary. It doesn't have to be the one Roger did, but it needs to be bigger than that quick 5-second one. Perhaps "-p356799896e9 -P356799897e9" would be a good compromise? That's 100 times as long as the really short range, but 1/9 the size of the long range I suggested. It should take at least a minute on even a fast card, but hopefully it won't take hours on a slow card.
At this point, please do test -m16, -m32, and -m64, as well as the default. I'm trying to narrow down what the default settings should be for both older cards and GCN cards, and if they should be different. I think they should be, which means that, apparently, I'll need to make a list of GCN cards.
____________
|
|
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 667 ID: 845 Credit: 2,374,701,989 RAC: 15,281
                          
|
|
HD5870 @ 850 MHz, Q9550 @ 2.83 GHz, Catalyst 13.12, Win7 x64
Old version:
>tpsieve-cl-x86-windows_old.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.3e (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 20480
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 6.45 sec. (2.61 init + 3.83 sieve) at 2667110 p/sec.
Processor time: 4.40 sec. (2.62 init + 1.78 sieve) at 5748733 p/sec.
Average processor utilization: 1.00 (init), 0.46 (sieve)
New version with default settings:
>tpsieve-cl-x86-windows.exe -p35679989642e7 -P35679989643e7 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896420000000 <= p < 356799896430000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Cypress.
nstep changed to 32
CL setup complete.
cthread_count = 20480
356799896420257891 | 6125*2^4654257+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896420000000 <= p < 356799896430000000
Found 1 factor
count=247653,sum=0x24ec7a976776d2ff
Elapsed time: 6.38 sec. (2.66 init + 3.72 sieve) at 2748865 p/sec.
Processor time: 4.88 sec. (2.65 init + 2.23 sieve) at 4582908 p/sec.
Average processor utilization: 1.00 (init), 0.60 (sieve)
CPU usage according to task manager is not different from reported avg. processor utilization. GPU load is just ~33% but goes higher (~90%) for the longer test range.
Different -m settings and --vecsize=1 don't improve performance for this card. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
I've uploaded a second alpha version in the same place. It should use less CPU, but it does require SSE2. I 'm hoping this won't be a problem - I can't see anyone using an OpenCL-capable AMD card with an Athlon XP or a Pentium III. The newest Athlon XP is at least three years older than the oldest OpenCL-capable AMD card. If necessary I might be able to enable older processors, but it would be complicated, especially on Windows.
Please remember to use at least the moderate P range when testing:
tpsieve-cl-x86-windows.exe -p356799896e9 -P356799897e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
The smaller range I gave earlier is just too small to get accurate results.
This version should also auto-detect your GCN-based cards and adjust for them: It multiplies the given -m by 8 (so the default of -m8 now acts like -m64 did) and divides the given --vecsize by 2 (for a default of 1).
So, I need someone to test this to make sure all of the above worked as intended - so that GPU performance is optimal without -m and --vecsize options, and CPU usage is reduced. I'd also still like to get a test on a 4000-series card, and some non-Tahiti GCN cards would be nice. But I suspect the next step is figuring out how to build BOINC on Windows again.
____________
|
|
|
|
|
|
Asus HD7950
new one
f:\test>tpsieve-cl-x86-windows.exe -p356799896e9 -P356799897e9 -k5 -K9999 -n3000
000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha2 (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896000000000 <= p < 356799897000000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
GCN device detected; use -m1 --vecsize=4 to undo effect
nstep changed to 32
CL setup complete.
cthread_count = 114688
356799896420257891 | 6125*2^4654257+1
p=356799896612106241, 10.19M p/sec, 0.78 CPU cores, 61.2% done. ETA 27 Jan 16:52
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896000000000 <= p < 356799897000000000
Found 1 factor
count=24741711,sum=0x29445f772863e9fd
Elapsed time: 101.90 sec. (2.18 init + 99.72 sieve) at 10028972 p/sec.
Processor time: 80.17 sec. (2.06 init + 78.11 sieve) at 12803523 p/sec.
Average processor utilization: 0.95 (init), 0.78 (sieve)
old one
f:\test>tpsieve-cl-x86-windows.exe -p356799896e9 -P356799897e9 -k5 -K9999 -n3000
000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.3e (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896000000000 <= p < 356799897000000000
Thread 0 starting
Detected 448 multiprocessors (2240 SPUs) on device 0.
nstep changed to 32
CL setup complete.
cthread_count = 28672
p=356799896288620545, 4.810M p/sec, 0.20 CPU cores, 28.9% done. ETA 27 Jan 16:58
356799896420257891 | 6125*2^4654257+1
p=356799896583008257, 4.863M p/sec, 0.19 CPU cores, 58.3% done. ETA 27 Jan 16:58
p=356799896876347393, 4.844M p/sec, 0.19 CPU cores, 87.6% done. ETA 27 Jan 16:58
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896000000000 <= p < 356799897000000000
Found 1 factor
count=24741711,sum=0x29445f772863e9fd
Elapsed time: 209.33 sec. (2.08 init + 207.25 sieve) at 4825399 p/sec.
Processor time: 41.78 sec. (2.09 init + 39.69 sieve) at 25199387 p/sec.
Average processor utilization: 1.01 (init), 0.19 (sieve)
|
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
This...
Elapsed time: 101.90 sec. (2.18 init + 99.72 sieve) at 10028972 p/sec.
looks very good!
This...
Average processor utilization: 0.95 (init), 0.78 (sieve)
is somewhat disappointing. Do you happen to have the Alpha 1 version? Could you run it on the same range with -m64 --vecsize=1? Otherwise I'll have to wait for Roger to run a test, I guess.
____________
|
|
|
|
|
This...
Elapsed time: 101.90 sec. (2.18 init + 99.72 sieve) at 10028972 p/sec.
looks very good!
This...
Average processor utilization: 0.95 (init), 0.78 (sieve)
is somewhat disappointing. Do you happen to have the Alpha 1 version? Could you run it on the same range with -m64 --vecsize=1? Otherwise I'll have to wait for Roger to run a test, I guess.
I dont know what version is alpha 1 but you can read the version number from test. Give me one more hour before I can do this test again. |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 921 ID: 3110 Credit: 218,950,902 RAC: 3,798
                          
|
|
Never mind, I just did a separate test of the SSE2 multiply, and it's surprisingly slow! I'm going to revert it out and then produce an alpha3 version. (Or a beta, since all the components are tested.)
____________
|
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,621,444 RAC: 21
                    
|
|
2nd Alpha version, GPU load 82%:
>tpsieve-cl-x86-windows.exe -p356799896 e9 -P356799897e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha2 (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799896000000000 <= p < 356799897000000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
GCN device detected; use -m1 --vecsize=4 to undo effect
nstep changed to 32
CL setup complete.
cthread_count = 131072
356799896420257891 | 6125*2^4654257+1
p=356799896654311425, 10.90M p/sec, 0.91 CPU cores, 65.4% done. ETA 28 Jan 07:07
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799896000000000 <= p < 356799897000000000
Found 1 factor
count=24741711,sum=0x29445f772863e9fd
Elapsed time: 93.83 sec. (2.54 init + 91.29 sieve) at 10954754 p/sec.
Processor time: 85.21 sec. (2.53 init + 82.68 sieve) at 12095706 p/sec.
Average processor utilization: 0.99 (init), 0.91 (sieve)
2nd Alpha version large range, GPU load 82%:
>tpsieve-cl-x86-windows.exe -p356799891e9 -P356799900e9 -k5 -K9999 -n3000000 -N6000000 -T -M2 -c 60
tpsieve version cl-0.2.5-alpha2 (testing)
nstart=3000000, nstep=46
tpsieve initialized: 5 <= k <= 9999, 3000000 <= n < 6000000
Sieve started: 356799891000000000 <= p < 356799900000000000
Thread 0 starting
Detected 512 multiprocessors (2560 SPUs) on device 0.
Device 0 is a Advanced Micro Devices, Inc. Tahiti.
GCN device detected; use -m1 --vecsize=4 to undo effect
nstep changed to 32
CL setup complete.
cthread_count = 131072
p=356799891725614593, 12.09M p/sec, 0.91 CPU cores, 8.1% done. ETA 28 Jan 07:40
p=356799892468006401, 12.17M p/sec, 0.90 CPU cores, 16.3% done. ETA 28 Jan 07:40
p=356799893215378945, 12.32M p/sec, 0.91 CPU cores, 24.6% done. ETA 28 Jan 07:40
p=356799893962489345, 12.29M p/sec, 0.90 CPU cores, 32.9% done. ETA 28 Jan 07:40
p=356799894709599745, 12.25M p/sec, 0.90 CPU cores, 41.2% done. ETA 28 Jan 07:40
p=356799895454875137, 12.22M p/sec, 0.91 CPU cores, 49.5% done. ETA 28 Jan 07:40
p=356799896202509825, 12.33M p/sec, 0.91 CPU cores, 57.8% done. ETA 28 Jan 07:40
356799896420257891 | 6125*2^4654257+1
p=356799896949620225, 12.25M p/sec, 0.90 CPU cores, 66.1% done. ETA 28 Jan 07:40
p=356799897694109185, 12.21M p/sec, 0.90 CPU cores, 74.4% done. ETA 28 Jan 07:40
p=356799898440695297, 12.31M p/sec, 0.90 CPU cores, 82.7% done. ETA 28 Jan 07:40
p=356799899182562817, 12.22M p/sec, 0.90 CPU cores, 90.9% done. ETA 28 Jan 07:40
p=356799899919187457, 12.09M p/sec, 0.91 CPU cores, 99.1% done. ETA 28 Jan 07:40
Thread 0 completed
Waiting for threads to exit
Sieve complete: 356799891000000000 <= p < 356799900000000000
Found 1 factor
count=222687271,sum=0x11a0ce66e3586af7
Elapsed time: 739.65 sec. (2.53 init + 737.12 sieve) at 12209881 p/sec.
Processor time: 669.34 sec. (2.53 init + 666.81 sieve) at 13497369 p/sec.
Average processor utilization: 1.00 (init), 0.90 (sieve)
SSE2 not faster. I wonder if SSE2 would be faster over a different test range like with GFN? |
|
|