Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Sieving :
ppsieve ATI/OpenCL testing
Author |
Message |
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
You read that right! I've managed to port PPSieve-CUDA to OpenCL and compile it with the ATI compiler!
The code base is very similar to PPSieve-CUDA, and will eventually be merged with it. But for now, if you have Linux and an ATI GPU, please download:
PPSieve-OpenCL (source, on the redcl branch)
And give it a shot with the usual test procedure:
32 bit: ./ppsieve-cl-boinc-x86-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
64 bit: ./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
It should output the following factors:
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Please provide the output from stderr.txt, especially in case of error. Please also provide as much detail about your system as possible, including your ATI GPU model number, driver version, and Stream Processing Unit clock speed if possible.
Also note that this first version doesn't use vectorized arithmetic, so it may be possible to make it ~1.5-2x faster. The first goal is to prove that this can run on an ATI GPU. :)
____________
| |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
sorry, my HD3200 isnt able to run an OpenCL-Kernel :(
____________
| |
|
|
I entered this into the command line:
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
And got this:
./ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
No stderr.txt file was created.
I was using my ATI Mobility Radion 4650. I think that the driver version was 10.8
This was running off of my jumpdrive, using Ubuntu 10.04
I must say that being unfamiliar with Linux and running on an 8GB jumpdrive is causing me problems trying to test this.
____________
May the Force be with you always.
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
I'm not sure how much work is required to set up for running an OpenCL app. That's part of what I'm testing here.
I downloaded the ATI Stream SDK from here and went through this installation procedure (PDF). I'm hoping the full installation isn't necessary, and that you only need to grab the lib/x86/libOpenCL.so or lib/x86_64/libOpenCL.so file and put it in the same directory with my app. But I really don't know.
Edit: Also, the Stream SDK version was 2.1 when I downloaded it, and that's what I used to compile this, but they're serving 2.2 now. Hopefully, that won't make a difference.
____________
| |
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
OS: Debian GNU Linux Squeeze
GPU: ATI HD 5750 1024MB
Driver: 10.7
SDK: ati-stream-sdk-v2.2-lnx64
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.0.1-alpha (testing)
Compiled Aug 28 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
# more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Getting Platforms. (clGetPlatformsIDs)
called boinc_finish
EDIT: I don't know if this is important. At the the beginning of the test i use the wrong terminal and test the app on a host with a nvidia-gpu (GTX260). The app runs without problems.
____________
| |
|
|
O.K., I tested it again, with the libOpenCl.so from SDK V2.2. I get the same message:
./ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
Is it expecting the file to be in a location other than where it launches from?
____________
May the Force be with you always.
| |
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
O.K., I tested it again, with the libOpenCl.so from SDK V2.2. I get the same message:
./ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
Is it expecting the file to be in a location other than where it launches from?
Linux search in the libarypath.
Have you set the 2 Variables from the instruction?
If not copy or link the lib to /usr/lib/ or set the 2 variables.
After that you can test with ldd:
ldd ppsieve-cl-boinc-x86_64-linux
linux-vdso.so.1 => (0x00007fff4e9ff000)
libOpenCL.so => /home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libOpenCL.so (0x00007f32533f9000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f32531d1000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f3252ebc000)
libm.so.6 => /lib/libm.so.6 (0x00007f3252c3a000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f3252a24000)
libc.so.6 => /lib/libc.so.6 (0x00007f32526c2000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3253600000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f32524be000)
librt.so.1 => /lib/librt.so.1 (0x00007f32522b6000)
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Looks like you'll (both/all) have to follow this entire installation procedure.
Don't forget to get this ICD registration file and extract it in the root directory, as root.
____________
| |
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
Don't forget to get this ICD registration file and extract it in the root directory, as root.
Thanks for the hint. I extract this but not to /.
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.0.1-alpha (testing)
Compiled Aug 28 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
# more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070003407873 in ppcheck42070e9.txt
Thread 0 starting
Invalid MIT-MAGIC-COOKIE-1 keyDetected 4 multiprocessors (20 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 46.20 sec. (0.01 init + 46.18 sieve) at 147577 p/sec.
Processor time: 158.15 sec. (0.02 init + 158.13 sieve) at 43102 p/sec.
Average processor utilization: 1.10 (init), 3.42 (sieve)
called boinc_finish
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Looks like it was using your CPU, not your GPU. Probably because you're using an AMD processor. Try adding a "--device 1" to the end of the command line.
____________
| |
|
|
So, out of curiosity, why is Linux the first platform to test an application on?
I would like to help test this ATI app, but right now I simply do not know enough Linux to actively do so. I am afraid that I must wait for a windows testing app, or learn a lot about Linux.
By the way, thanks Ken (and anyone else involved) for starting ATI development!
____________
May the Force be with you always.
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
So, out of curiosity, why is Linux the first platform to test an application on?
Because this is my main computer. No other reason.
____________
| |
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cl-0.0.1-alpha (testing)
Compiled Aug 28 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
# more stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Invalid MIT-MAGIC-COOKIE-1 keySIGSEGV: segmentation violation
Stack trace (10 frames):
./ppsieve-cl-boinc-x86_64-linux[0x414d8d]
/lib/libpthread.so.0(+0xef60)[0x7f0558473f60]
/home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libatiocl64.so(+0xca3b0)[0x7f055654f3b0]
/home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libatiocl64.so(+0x10b640)[0x7f0556590640]
/home/tocx/download/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libatiocl64.so(clCreateCommandQueue+0x69)[0x7f0556533879]
./ppsieve-cl-boinc-x86_64-linux[0x410e15]
./ppsieve-cl-boinc-x86_64-linux[0x40e83e]
./ppsieve-cl-boinc-x86_64-linux[0x40ad5e]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f0557976c4d]
./ppsieve-cl-boinc-x86_64-linux[0x409fd9]
Exiting...
suspending the wus on gpu gives the same result
---
trying with other devices
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Invalid MIT-MAGIC-COOKIE-1 keyCreating Command Queue. (clCreateCommandQueue)
called boinc_finish
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Well, I'm stumped. A segfault is hard enough to debug when you do have access to the hardware it happens on. Since it doesn't happen on a CPU, I don't.
I was hoping that someone would come along with an Intel processor and AMD graphics card, run the test, and find that it worked fine there. Since that hasn't happened, I've uploaded more-or-less the same code compiled with debugging on. If that doesn't produce a useful error message, I'm afraid ATI/OpenCL is at a standstill for now.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
Would love to help out, but my ATI cards are all in WIN boxes...
____________
141941*2^4299438-1 is prime!
| |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 2,910,184,413 RAC: 199,509
                              
|
Would love to help out, but my ATI cards are all in WIN boxes...
Same here... i7 + HD5850, but Win7. :)
Is there any live CD that includes the required drivers or an easy guide how to install them in a live environment?
____________
| |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1957 ID: 352 Credit: 6,148,601,957 RAC: 2,303,078
                                      
|
Would love to help out, but my ATI cards are all in WIN boxes...
Same here as I wrote via PM.
____________
My stats | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Yeah, OK, it'll probably be easier to build a Windows version than to teach all of you how to use Linux. I'm working on it.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Alright, folks, I stuck a Windows executable in that zipfile. It works on my CPU, but I kind of expect it will also segfault on a real GPU.
But test away; maybe we'll get lucky somewhere.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14011 ID: 53948 Credit: 435,567,510 RAC: 868,496
                               
|
Alright, folks, I stuck a Windows executable in that zipfile. It works on my CPU, but I kind of expect it will also segfault on a real GPU.
But test away; maybe we'll get lucky somewhere.
Since it's OpenCL, is this supposed to work on NVIDIA also?
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
I get a "Null Platform found Exiting Application" message.
AMD Athlon x2 5600+
Win7 Enterprise 64-bit
HD 4650
Driver 10.7 (also tested with 10.5 with same error).
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
No; nVIDIA already has a version specifically made for it. You can try compiling it for nVIDIA if you want, but it's not likely to be faster than the CUDA version.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Scott, did you install the ATI Stream SDK? I think you have to do that. I haven't yet seen a way to run this app without the full installation.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
Scott, did you install the ATI Stream SDK? I think you have to do that. I haven't yet seen a way to run this app without the full installation.
Figured that out after I posted :)
After the install, this is what I get:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 2
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Then the app crashes with no log file or factors found. (*note: device 0 = AMD cpu, device 1 = onboard AMD GPU - not OPENCL compatible). The application does run fine on the cpu.
____________
141941*2^4299438-1 is prime!
| |
|
|
this is my experience.
download zip file. extract and run ppsieve-cl-x86-windows.exe
problem. 1) get pop-up, "Unable to locate OpenCL.dll"
answer: download and install ATI SDK
problem 2) running gives "unable to determing platforms" or similar message on console
answer: reboot machine (oops)
it then an successfully.. (however i dont have a GPU.. its an intel GMA )
so here are my CPU opencl results.
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Device 0 looks like an 4-core CPU, not a GPU. Adjusting.
Detected 4 multiprocessors (20 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
p=42070001310721, 21.84K p/sec, 3.85 CPU cores, 13.1% done. ETA 08 Sep 10:30
p=42070002621441, 19.69K p/sec, 4.08 CPU cores, 26.2% done. ETA 08 Sep 10:30
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070003932161, 18.62K p/sec, 3.90 CPU cores, 39.3% done. ETA 08 Sep 10:30
p=42070005242881, 18.67K p/sec, 3.95 CPU cores, 52.4% done. ETA 08 Sep 10:31
42070006307657 | 1513*2^1771812+1
p=42070006553601, 18.76K p/sec, 3.97 CPU cores, 65.5% done. ETA 08 Sep 10:31
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
p=42070007864321, 18.82K p/sec, 3.95 CPU cores, 78.6% done. ETA 08 Sep 10:31
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
p=42070009175041, 18.54K p/sec, 3.91 CPU cores, 91.8% done. ETA 08 Sep 10:31
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 532.38 sec. (0.06 init + 532.33 sieve) at 19206 p/sec.
Processor time: 2097.70 sec. (0.06 init + 2097.64 sieve) at 4874 p/sec.
Average processor utilization: 1.07 (init), 3.94 (sieve)
machine specs
windows xp sp3
intel i5 @3.2Ghz
intel GMA HD graphics (integrated)
so not fast.. but it works. | |
|
|
oh.. and if i add --d 2
it crashes with a popup
error code 0xc0000005 | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
HD 4650
FYI, I've been looking at the kernel with the Stream Kernel Analyzer, and it looks like the current configuration (that writes bytes as output) won't compile on 4000-series GPUs. I'll look at changing this soon, but it would be nice if someone with a 5000 series produced valid results with this.
Edit: And I see tocx has a 5000 series, so that's not likely.
____________
| |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 2,910,184,413 RAC: 199,509
                              
|
It's working for me... :)
i7 980X + HD5850 (GPU clock: 725 MHz, Memory clock: 1000 MHz) + GTX460
Win7 Prof. x64
Catalyst 10.8, SDK 2.2
First try:
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Device 0 looks like an 12-core CPU, not a GPU. Adjusting.
Detected 12 multiprocessors (60 SPUs) on device 0.
[...]
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 27 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 535.30 sec. (0.02 init + 535.27 sieve) at 56320 p/sec.
Processor time: 6247.65 sec. (0.03 init + 6247.62 sieve) at 4825 p/sec.
Average processor utilization: 1.30 (init), 11.67 (sieve)
OK, that's the CPU. However, results are correct.
Second try:
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected 18 multiprocessors (90 SPUs) on device 1.
[...]
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 27 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 25.47 sec. (0.03 init + 25.44 sieve) at 1184985 p/sec.
Processor time: 1.62 sec. (0.03 init + 1.59 sieve) at 18945683 p/sec.
Average processor utilization: 1.16 (init), 0.06 (sieve)
About 96% load on the ATI GPU, correct results. :)
Can't get it to run on the GTX460, but we have the CUDA app for that one, so it's no problem.
____________
| |
|
|
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
Thread 0 starting
Device 1 looks like an 10-core CPU, not a GPU. Adjusting.
Detected 10 multiprocessors (50 SPUs) on device 1.
Error: Building Program (clBuildProgram)
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
It's working for me... :)
...
Second try:
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected 18 multiprocessors (90 SPUs) on device 1.
[...]
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 27 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 25.47 sec. (0.03 init + 25.44 sieve) at 1184985 p/sec.
Processor time: 1.62 sec. (0.03 init + 1.59 sieve) at 18945683 p/sec.
Average processor utilization: 1.16 (init), 0.06 (sieve)
About 96% load on the ATI GPU, correct results. :)
YES!!!!!!!!!! :D
I'm guessing tocx just needed me to compile with the sdk 2.2 instead of 2.1. :)
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
Thread 0 starting
Device 1 looks like an 10-core CPU, not a GPU. Adjusting.
Detected 10 multiprocessors (50 SPUs) on device 1.
Error: Building Program (clBuildProgram)
The error building program on a 4000-series processor is expected for now; I'll fix it soon. But this also points out another bug: multiprocessor counts are 16 times as large as they're being detected. I'll have to overhaul the device detection code.
____________
| |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1957 ID: 352 Credit: 6,148,601,957 RAC: 2,303,078
                                      
|
Catalyst 10.8, SDK 2.2, Windows 2008 R2 x64, HD5850.
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 2
The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail.
(getting same error when running ppsieve-cl-x86-windows.exe without any parameters)
using sxstrace I extracted following
=================
Begin Activation Context Generation.
Input Parameter:
Flags = 0
ProcessorArchitecture = Wow32
CultureFallBacks = en-US;en
ManifestPath = E:\ppsieve-cl-x86-windows.exe
AssemblyDirectory = E:\
Application Config File =
-----------------
INFO: Parsing Manifest File E:\ppsieve-cl-x86-windows.exe.
INFO: Manifest Definition Identity is (null).
INFO: Reference: Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8"
INFO: Resolving reference Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8".
INFO: Resolving reference for ProcessorArchitecture WOW64.
INFO: Resolving reference for culture Neutral.
INFO: Applying Binding Policy.
INFO: No publisher policy found.
INFO: No binding policy redirect found.
INFO: Begin assembly probing.
INFO: Did not find the assembly in WinSxS.
INFO: Attempt to probe manifest at C:\Windows\assembly\GAC_32\Microsoft.VC90.CRT\9.0.21022.8__1fc8b3b9a1e18e3b\Microsoft.VC90.CRT.DLL.
INFO: Did not find manifest for culture Neutral.
INFO: End assembly probing.
INFO: Resolving reference for ProcessorArchitecture x86.
INFO: Resolving reference for culture Neutral.
INFO: Applying Binding Policy.
INFO: No publisher policy found.
INFO: No binding policy redirect found.
INFO: Begin assembly probing.
INFO: Did not find the assembly in WinSxS.
INFO: Attempt to probe manifest at C:\Windows\assembly\GAC_32\Microsoft.VC90.CRT\9.0.21022.8__1fc8b3b9a1e18e3b\Microsoft.VC90.CRT.DLL.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT.DLL.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT.MANIFEST.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT\Microsoft.VC90.CRT.DLL.
INFO: Attempt to probe manifest at E:\Microsoft.VC90.CRT\Microsoft.VC90.CRT.MANIFEST.
INFO: Did not find manifest for culture Neutral.
INFO: End assembly probing.
ERROR: Cannot resolve reference Microsoft.VC90.CRT,processorArchitecture="x86",publicKeyToken="1fc8b3b9a1e18e3b",type="win32",version="9.0.21022.8".
ERROR: Activation Context generation failed.
End Activation Context Generation.
____________
My stats | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Try installing the 32-bit version of the Microsoft Visual C++ 2008 redistributable package.
P.S. Please also use code tags judiciously. When text inside them is too long, they make the forum overflow my screen to the right.
____________
| |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1957 ID: 352 Credit: 6,148,601,957 RAC: 2,303,078
                                      
|
Try installing the 32-bit version of the Microsoft Visual C++ 2008 redistributable package.
Thanks it helped.
>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cuda-0.1.5a (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 18 multiprocessors (90 SPUs) on device 1.
...
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 9.27 sec. (0.04 init + 9.23 sieve) at 1108234 p/sec.
Processor time: 1.22 sec. (0.05 init + 1.17 sieve) at 8738081 p/sec.
Average processor utilization: 1.10 (init), 0.13 (sieve)
____________
My stats | |
|
|
Alas my ATI 5770 bought in april is defunct. Put it in yesterday after it having idled in the package for two months. I will coordinate a replacement on monday with the vendor. Then i am free to look into this. | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Alright, I have a new test version ready for everyone, v0.0.2 alpha. This version is vectorized, so it could be up to twice as fast as the last version; though I'm suspecting it might be slower than the last version. Should that be the case, I have a version with no vectorization ready to go as well.
Vectorization is controlled with the -v parameter, so please try -v 2, -v 4, and maybe -v 3 (though that's likely to be really slow!). The default is 4.
Also, this code should now run on 4xxx series cards, though I will need comparisons to previous runs on 5xxx cards as well.
Good luck!
P.S. Gerrit, I hope your card gets well soon.
____________
| |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
I've neither linux or a decent windows version *g* nor an amd grafics card, but i've tried the opencl version
Z:\ppsieve-cl>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999
-N 2000000 -c 60 --device 0
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 192 multiprocessors (960 SPUs) on device 0.
clang: Unknown command line argument '-g'. Try: 'clang --help'
GPU is an nVidia 8800 GTS with G80 chipset (wikipedia says this card is opencl capable), driver 258.96, OS Windows XP
I've searched for this output on google and found this: http://stackoverflow.com/questions/3060201/compile-opencl-kernels-with-debug-information
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Hm, yeah, I probably should take the -g out at this point...OK, done!
Edit: In any case, you'd probably need to compile with nVIDIA's compiler to get this to work on nVIDIA.
____________
| |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
okay, got another error
Z:\ppsieve-cl>ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999
-N 2000000 -c 60
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 192 multiprocessors (960 SPUs) on device 0.
Error: Building Program (clBuildProgram)
Ken please dont tweak this app for my old card. i don't want to steal your time with this one ;)
____________
| |
|
|
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 17.40 sec. (0.03 init + 17.36 sieve) at 588844 p/sec.
Processor time: 1.78 sec. (0.05 init + 1.73 sieve) at 5904107 p/sec.
Average processor utilization: 1.38 (init), 0.10 (sieve)
with -v 2
Elapsed time: 15.69 sec. (0.03 init + 15.65 sieve) at 653138 p/sec.
Processor time: 1.29 sec. (0.05 init + 1.25 sieve) at 8191947 p/sec.
with -v 3
Error: Building Program (clBuildProgram) | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
hd 4850, catalyst 10.8, sdk 2.2
windows 7 x64
Glad it's working on a 4xxx card. :)
with -v 2
Elapsed time: 15.69 sec. (0.03 init + 15.65 sieve) at 653138 p/sec.
Processor time: 1.29 sec. (0.05 init + 1.25 sieve) at 8191947 p/sec.
Huh. OK, I guess I'll go with -v 2 unless someone who ran the earlier app finds this one slower.
with -v 3
Error: Building Program (clBuildProgram)
Heh. I was right, though, -v 3 was much slower. :P
____________
| |
|
|
Hardware configuration :
Q6600 Intel Processor
4 Go RAM
2 Ati Radeon HD5870 Sapphire (crossfire)
Software configuration :
Gentoo Linux 10.0 system (64 bits)
2.6.34-gentoo-r6 kernel
ati-drivers 10.8 (catalyst)
ati-stream-sdk-bin 2.2
ppsieve version cl-0.0.2-alpha (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 64 multiprocessors (320 SPUs) on device 0.
Thread 0 interrupted
Sieve incomplete: 42070000000000 <= p < 42070001310721
count=41690,sum=0x1857193c797374b0
Elapsed time: 16.99 sec. (0.01 init + 16.98 sieve) at 77199 p/sec.
Processor time: 59.81 sec. (0.01 init + 59.80 sieve) at 21919 p/sec.
Average processor utilization: 0.90 (init), 3.52 (sieve)
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070001310721 in ppcheck42070e9.txt
Thread 0 starting
Detected 64 multiprocessors (320 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 97.11 sec. (0.01 init + 97.10 sieve) at 91791 p/sec.
Processor time: 384.34 sec. (0.01 init + 384.33 sieve) at 23191 p/sec.
Average processor utilization: 0.99 (init), 3.96 (sieve)
called boinc_finish | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
HD 4650, Catalyst 10.7, SDK 2.2
Win7 64-bit
with -v 2
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1 -v2
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 1.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 42.49 sec. (0.05 init + 42.44 sieve) at 240871 p/sec.
Processor time: 42.03 sec. (0.05 init + 41.98 sieve) at 243536 p/sec.
Average processor utilization: 1.04 (init), 0.99 (sieve)
with -v 3
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1 -v3
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 1.
Error: Building Program (clBuildProgram)
with -v 4
I get a driver crash, followed by the application running, but it is incredibly slow...while writing this message it has gotten just over 10% done.
Also, the SPUs are incorrect. A 4650 card has 320, not 640 as the app reports.
Edit: Also, I am wondering at the full core of CPU usage...maybe because this is an AMD CPU (Athlon x2 5600+)?
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Average processor utilization: 0.99 (init), 3.96 (sieve)
That's using your CPU. Try --device 1. Sorry I can't edit the first post to suggest that.
Also note that two cards in SLI are still two cards, and will probably be referenced separately, as --device 1 and --device 2.
Scott, I'm wondering about that too, but I doubt the CPU is the problem. Try --device 2 and --device 3 just to make sure, though.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
Scott, I'm wondering about that too, but I doubt the CPU is the problem. Try --device 2 and --device 3 just to make sure, though.
--device 2 is the onboard graphics being used in tandem with the 4650 via the "surround view" feature...allows it to crunch over at Collatz, but is not OpenCL capable (app crashes as expected)
--device 0 runs the CPU...99% on the task manager and runs much slower than the 4650. Also shows up as 160 SPUs.
Edit: CPU results
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 0 -v2
ppsieve version cl-0.0.2-alpha (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 32 multiprocessors (160 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
p=42070001048577, 17.48K p/sec, 1.92 CPU cores, 10.5% done. ETA 09 Sep 14:59
p=42070002097153, 13.01K p/sec, 1.94 CPU cores, 21.0% done. ETA 09 Sep 15:01
p=42070003145729, 13.15K p/sec, 1.94 CPU cores, 31.5% done. ETA 09 Sep 15:01
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070004194305, 13.09K p/sec, 1.94 CPU cores, 41.9% done. ETA 09 Sep 15:02
p=42070005242881, 13.16K p/sec, 1.94 CPU cores, 52.4% done. ETA 09 Sep 15:02
p=42070006291457, 13.15K p/sec, 1.95 CPU cores, 62.9% done. ETA 09 Sep 15:02
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
p=42070007340033, 13.15K p/sec, 1.94 CPU cores, 73.4% done. ETA 09 Sep 15:02
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
p=42070008388609, 13.15K p/sec, 1.94 CPU cores, 83.9% done. ETA 09 Sep 15:02
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
p=42070009437185, 13.15K p/sec, 1.94 CPU cores, 94.4% done. ETA 09 Sep 15:02
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 761.70 sec. (0.05 init + 761.65 sieve) at 13423 p/sec.
Processor time: 1476.74 sec. (0.05 init + 1476.69 sieve) at 6923 p/sec.
Average processor utilization: 0.96 (init), 1.94 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 2,910,184,413 RAC: 199,509
                              
|
HD5850 again:
-v 2:
Elapsed time: 22.72 sec. (0.04 init + 22.69 sieve) at 1328786 p/sec.
Processor time: 2.73 sec. (0.03 init + 2.70 sieve) at 11170287 p/sec.
-v 3:
Elapsed time: 26.23 sec. (0.04 init + 26.19 sieve) at 1151050 p/sec.
Processor time: 3.06 sec. (0.05 init + 3.01 sieve) at 10012744 p/sec.
-v 4:
Elapsed time: 25.09 sec. (0.04 init + 25.05 sieve) at 1203483 p/sec.
Processor time: 2.96 sec. (0.06 init + 2.90 sieve) at 10389565 p/sec.
____________
| |
|
|
Sorry, I make a mistake for my first test.
GPU (first or second, I don't know) calculation :
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --device 1
ppsieve version cl-0.0.2-alpha (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 64 multiprocessors (320 SPUs) on device 0.
Thread 0 interrupted
Sieve incomplete: 42070000000000 <= p < 42070001572865
count=50065,sum=0x1d3ad93327b6df1f
Elapsed time: 17.79 sec. (0.01 init + 17.77 sieve) at 88504 p/sec.
Processor time: 69.20 sec. (0.01 init + 69.19 sieve) at 22734 p/sec.
Average processor utilization: 0.98 (init), 3.89 (sieve)
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070001572865 in ppcheck42070e9.txt
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 1.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.20 sec. (0.01 init + 6.19 sieve) at 1398453 p/sec.
Processor time: 6.20 sec. (0.01 init + 6.19 sieve) at 1398381 p/sec.
Average processor utilization: 0.98 (init), 1.00 (sieve)
called boinc_finish
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
OK, a new version, 0.1.0-beta, is out. Major changes you should notice:
- It only runs on GPUs. So you shouldn't need to pass --device anymore unless you have two or more.
- It defaults to -v 2. So you shouldn't need to pass that anymore either.
So this should make it possible to run the app on BOINC PPSE WUs, with an appropriate app_info.xml file. (No, I don't know how to create one.)
Other things you might find interesting:
- -m works now. Default is 7, but you can fiddle to your heart's content. Not that I expect any significant improvements, but I've been wrong before.
- Better error messages. Not that you should notice.
Based on the number of crashes I saw from the previous version, I'm not sure any OpenCL app can come out of beta yet. But hopefully this is as stable as it can be. If you see any major issues, post them here.
Oh, and Scott, I have no answers for you about the 100% CPU usage on the 4650. I can only guess that either some driver is too old or it's a quirky card. I will say that I'm glad I didn't buy one. (I was thinking about it!)
____________
| |
|
|
Win7 x64
SDK 2.2
Catalyst 10.7
HD4870
-v2
00:53:22 (3460): Can't set up shared mem: -1. Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 14.10 sec. (0.05 init + 14.06 sieve) at 727317 p/sec.
Processor time: 1.64 sec. (0.06 init + 1.58 sieve) at 6488672 p/sec.
Average processor utilization: 1.33 (init), 0.11 (sieve)
00:53:36 (3460): called boinc_finish
-v3
00:53:57 (3572): Can't set up shared mem: -1. Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
Error: Building Program (clBuildProgram)
00:53:58 (3572): called boinc_finish
-v4
00:54:20 (1656): Can't set up shared mem: -1. Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 1.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 15.62 sec. (0.05 init + 15.57 sieve) at 656588 p/sec.
Processor time: 2.14 sec. (0.06 init + 2.07 sieve) at 4927488 p/sec.
Average processor utilization: 1.33 (init), 0.13 (sieve)
00:54:36 (1656): called boinc_finish
| |
|
|
OK, a new version, 0.1.0-beta, is out. Major changes you should notice:
- It only runs on GPUs.
Nice, i couldn't use my gpu in boinc with the previous version.
This app_info works for me, but i'm not sure it will for everyone.
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>pps_sr2sieve_20090322.sieveinput</name>
<nbytes>243615956.000000</nbytes>
<max_nbytes>0.000000</max_nbytes>
<status>1</status>
<sticky/>
</file_info>
<file_info>
<name>ppsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>124</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>ppsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info> | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
Oh, and Scott, I have no answers for you about the 100% CPU usage on the 4650. I can only guess that either some driver is too old or it's a quirky card. I will say that I'm glad I didn't buy one. (I was thinking about it!)
I will try the driver when I am in the office tomorrow...might be the difference between 10.7 vs. 10.8 since I do not have the CPU load with 10.8 on my home machine with a 4670 card (results below). Also, you can see from the results below that the m setting of 7 is not optimal at least for this card.
HD4670, catalyst 10.8, sdk 2.2
Vista 64-bit on i7-920
Default settings:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 38.44 sec. (0.03 init + 38.40 sieve) at 266233 p/sec.
Processor time: 1.31 sec. (0.05 init + 1.26 sieve) at 8090813 p/sec.
Average processor utilization: 1.38 (init), 0.03 (sieve)
with -m 1
Elapsed time: 57.24 sec. (0.03 init + 57.21 sieve) at 178710 p/sec.
Processor time: 1.42 sec. (0.03 init + 1.39 sieve) at 7363548 p/sec.
Average processor utilization: 0.92 (init), 0.02 (sieve)
with -m 2
Elapsed time: 29.49 sec. (0.04 init + 29.45 sieve) at 347152 p/sec.
Processor time: 1.54 sec. (0.05 init + 1.50 sieve) at 6826626 p/sec.
Average processor utilization: 1.26 (init), 0.05 (sieve)
with -m 3
Elapsed time: 41.41 sec. (0.03 init + 41.38 sieve) at 247073 p/sec.
Processor time: 1.56 sec. (0.05 init + 1.51 sieve) at 6756244 p/sec.
Average processor utilization: 1.38 (init), 0.04 (sieve)
with -m 4
Elapsed time: 31.42 sec. (0.03 init + 31.38 sieve) at 325780 p/sec.
Processor time: 1.51 sec. (0.05 init + 1.47 sieve) at 6971872 p/sec.
Average processor utilization: 1.38 (init), 0.05 (sieve)
with -m 5
Elapsed time: 38.29 sec. (0.03 init + 38.25 sieve) at 267270 p/sec.
Processor time: 1.31 sec. (0.05 init + 1.26 sieve) at 8090813 p/sec.
Average processor utilization: 1.38 (init), 0.03 (sieve)
with -m 6
Elapsed time: 31.84 sec. (0.03 init + 31.81 sieve) at 321376 p/sec.
Processor time: 1.33 sec. (0.03 init + 1.29 sieve) at 7895855 p/sec.
Average processor utilization: 0.95 (init), 0.04 (sieve)
with -m 8
Elapsed time: 31.99 sec. (0.03 init + 31.95 sieve) at 319958 p/sec.
Processor time: 1.28 sec. (0.03 init + 1.25 sieve) at 8191947 p/sec.
Average processor utilization: 0.92 (init), 0.04 (sieve)
with -m 9
Elapsed time: 35.72 sec. (0.04 init + 35.69 sieve) at 286464 p/sec.
Processor time: 1.40 sec. (0.03 init + 1.37 sieve) at 7447224 p/sec.
Average processor utilization: 0.89 (init), 0.04 (sieve)
with -m 10
Elapsed time: 32.77 sec. (0.03 init + 32.73 sieve) at 312334 p/sec.
Processor time: 1.40 sec. (0.05 init + 1.36 sieve) at 7532824 p/sec.
Average processor utilization: 1.38 (init), 0.04 (sieve)
So it looks like even m's are better with -m2 being best overall.
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Thanks, Scott! I now see that makes sense. I now have no idea where I got the 7, but searching for it I ran across this PDF. My "BLOCKSIZE" (which doesn't mean much in OpenCL except that it's a constant) is 128. That happens to be the right number of threads to fill a pair of wavefronts. Apparently four wavefronts avoid latency issues, so it makes sense that even m's would be better.
Now, no offense, Scott, but I'm not going to optimize for your low-end GPU specifically. But if I can get some tests of higher-end GPUs verifying this (and I expect I will), I'll release a version where m defaults to 2 tomorrow.
P.S. Anyone want to try -v 4 with -m 2? I doubt it will help, but like I said, I've been wrong before.
____________
| |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 2,910,184,413 RAC: 199,509
                              
|
Some test results for 0.1.0-beta on the HD5850:
default:
Elapsed time: 22.83 sec. (0.04 init + 22.80 sieve) at 1322316 p/sec.
Processor time: 2.90 sec. (0.05 init + 2.85 sieve) at 10559889 p/sec.
-m1:
Elapsed time: 68.73 sec. (0.04 init + 68.69 sieve) at 438898 p/sec.
Processor time: 3.20 sec. (0.06 init + 3.14 sieve) at 9614226 p/sec.
-m2:
Elapsed time: 40.08 sec. (0.03 init + 40.05 sieve) at 752812 p/sec.
Processor time: 2.90 sec. (0.05 init + 2.85 sieve) at 10559889 p/sec.
-m2, -v 4:
Elapsed time: 36.81 sec. (0.04 init + 36.78 sieve) at 819732 p/sec.
Processor time: 2.98 sec. (0.05 init + 2.93 sieve) at 10279039 p/sec.
-m3:
Elapsed time: 27.54 sec. (0.03 init + 27.50 sieve) at 1096096 p/sec.
Processor time: 2.68 sec. (0.05 init + 2.64 sieve) at 11434671 p/sec.
-m4:
Elapsed time: 21.34 sec. (0.04 init + 21.29 sieve) at 1415782 p/sec.
Processor time: 2.78 sec. (0.06 init + 2.71 sieve) at 11106090 p/sec.
-m8:
Elapsed time: 20.24 sec. (0.05 init + 20.19 sieve) at 1493280 p/sec.
Processor time: 2.59 sec. (0.05 init + 2.54 sieve) at 11855581 p/sec.
-m12:
Elapsed time: 20.22 sec. (0.04 init + 20.18 sieve) at 1493724 p/sec.
Processor time: 2.95 sec. (0.06 init + 2.89 sieve) at 10445728 p/sec.
-m16:
Elapsed time: 19.83 sec. (0.04 init + 19.80 sieve) at 1522851 p/sec.
Processor time: 2.64 sec. (0.03 init + 2.61 sieve) at 11571616 p/sec.
-m20:
Elapsed time: 19.94 sec. (0.05 init + 19.89 sieve) at 1515501 p/sec.
Processor time: 2.62 sec. (0.03 init + 2.59 sieve) at 11641324 p/sec.
-m24:
Elapsed time: 20.48 sec. (0.04 init + 20.44 sieve) at 1474724 p/sec.
Processor time: 3.04 sec. (0.05 init + 3.00 sieve) at 10064893 p/sec.
-m32:
Elapsed time: 19.71 sec. (0.04 init + 19.67 sieve) at 1532762 p/sec.
Processor time: 3.26 sec. (0.03 init + 3.23 sieve) at 9335555 p/sec.
Also tried m 5, 6, 9, 10 and some numbers above 32, but they all were slower that the default.
@tepek: Your app_info works, but you don't need to include the file_info for pps_sr2sieve_20090322.sieveinput, as ppsieve doesn't need the sievefile.
____________
| |
|
|
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.03 sec. (0.01 init + 6.01 sieve) at 1700196 p/sec.
Processor time: 5.99 sec. (0.01 init + 5.98 sieve) at 1710700 p/sec.
Average processor utilization: 0.99 (init), 0.99 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=2
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 10.15 sec. (0.01 init + 10.14 sieve) at 1008154 p/sec.
Processor time: 10.14 sec. (0.01 init + 10.13 sieve) at 1008975 p/sec.
Average processor utilization: 0.74 (init), 1.00 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=7
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 6.87 sec. (0.01 init + 6.86 sieve) at 1490629 p/sec.
Processor time: 6.85 sec. (0.01 init + 6.83 sieve) at 1496237 p/sec.
Average processor utilization: 0.99 (init), 1.00 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=10
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 7.95 sec. (0.01 init + 7.93 sieve) at 1288497 p/sec.
Processor time: 7.95 sec. (0.01 init + 7.94 sieve) at 1287153 p/sec.
Average processor utilization: 0.74 (init), 1.00 (sieve)
called boinc_finish
./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 --vecsize=4 --mthreads=20
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
Can't open init data file - running in standalone mode
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 320 multiprocessors (1600 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 7.79 sec. (0.01 init + 7.78 sieve) at 1314750 p/sec.
Processor time: 7.79 sec. (0.01 init + 7.78 sieve) at 1314739 p/sec.
Average processor utilization: 0.96 (init), 1.00 (sieve)
called boinc_finish | |
|
|
hd 4850
default Elapsed time: 45.33 sec. (0.03 init + 45.29 sieve) at 665589 p/sec.
Processor time: 2.70 sec. (0.05 init + 2.65 sieve) at 11367408 p/sec.
-m 4Elapsed time: 42.39 sec. (0.03 init + 42.36 sieve) at 711744 p/sec.
Processor time: 2.48 sec. (0.05 init + 2.43 sieve) at 12387563 p/sec.
-m 4 -v 4Elapsed time: 39.20 sec. (0.03 init + 39.16 sieve) at 769739 p/sec.
Processor time: 3.28 sec. (0.05 init + 3.23 sieve) at 9335552 p/sec.
-m 8Elapsed time: 40.09 sec. (0.03 init + 40.06 sieve) at 752615 p/sec.
Processor time: 2.48 sec. (0.05 init + 2.43 sieve) at 12387563 p/sec.
-m 8 -v 4Elapsed time: 44.21 sec. (0.03 init + 44.18 sieve) at 682404 p/sec.
Processor time: 2.87 sec. (0.05 init + 2.82 sieve) at 10676572 p/sec.
-m 16Elapsed time: 39.07 sec. (0.03 init + 39.04 sieve) at 772144 p/sec.
Processor time: 2.32 sec. (0.03 init + 2.29 sieve) at 13145986 p/sec.
-m 16 -v 4Elapsed time: 46.20 sec. (0.03 init + 46.17 sieve) at 652917 p/sec.
Processor time: 2.84 sec. (0.03 init + 2.81 sieve) at 10735886 p/sec.
-m 32Elapsed time: 39.60 sec. (0.03 init + 39.57 sieve) at 761783 p/sec.
Processor time: 2.78 sec. (0.05 init + 2.73 sieve) at 11042627 p/sec.
-m 64
Elapsed time: 39.12 sec. (0.03 init + 39.09 sieve) at 771216 p/sec.
Processor time: 2.37 sec. (0.03 init + 2.34 sieve) at 12883063 p/sec. | |
|
|
I have a lot of wu errors like this http://www.primegrid.com/result.php?resultid=188107021
Few wus are ok.
what is the problem ? | |
|
|
new app_info, faster than the previous one (1200 to 1030 seconds on a 4850)
<app_info>
<app>
<name>pps_sr2sieve</name>
<user_friendly_name>Proth Prime Search (Sieve)</user_friendly_name>
</app>
<file_info>
<name>ppsieve-cl-boinc-x86-windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>pps_sr2sieve</app_name>
<version_num>124</version_num>
<plan_class>ati13ati</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<cmdline>-m 16</cmdline>
<file_ref>
<file_name>ppsieve-cl-boinc-x86-windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info> | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
Now, no offense, Scott, but I'm not going to optimize for your low-end GPU specifically. But if I can get some tests of higher-end GPUs verifying this (and I expect I will), I'll release a version where m defaults to 2 tomorrow.
None taken...the 4600 series is mid-range usually, but since OpenCL only runs on 4xxx and 5xxx cards, the low-end designation is appropriate.
Also, I figured out the issue on the 4650 card with CPU utilization of one full core. Turns out it wasn't the driver or quirky card, rather a quirky set-up. I had "surround view" turned on with a dummy plug to allow the onboard graphics to crunch along side the 4650 on Collatz (AMD 760 chipset, so not OpenCL capable). When I turn that off, the cpu load during sieve is about the same as the 4670 in my home machine with GPU times just a few seconds slower (as expected given the slower core clocks). I have not been able to find any info about the "Surround View" feature with OpenCL, so maybe this is a new issue that should be reported to AMD/ATI?
P.S. Anyone want to try -v 4 with -m 2? I doubt it will help, but like I said, I've been wrong before.
The -m2 setting remains the fastest for the 4600 series (I also tested m16, m32, and m64, each larger m being faster, but not as fast as the m2 setting). As before, the -v4 setting crashes the driver on the 4600 series.
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
Thanks, Scott! I now see that makes sense. I now have no idea where I got the 7, but searching for it I ran across this PDF. My "BLOCKSIZE" (which doesn't mean much in OpenCL except that it's a constant) is 128. That happens to be the right number of threads to fill a pair of wavefronts. Apparently four wavefronts avoid latency issues, so it makes sense that even m's would be better.
I wonder if the different -m performance across the cards so far is related to ATI employing different wavefront sizes in different cards? The 58xx/57xx and 48xx use 64, but a 5450 card has wavefront size of 32 (I haven't found the size for the 4650 yet...some cards are also have wavefronts of size 16, but I don't think any of these are OpenCL capable???).
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
@Elgrande71: A computation error means just that: your GPU found a factor that your CPU couldn't verify. Usually, this means you've overclocked too much!
By the way, for PPSieve, there's not much reason to overclock the memory, if such things are separate in ATI like they are in nVIDIA. You can under-clock the memory, overclock the shaders, and get better performance (as long as you don't overclock too much).
@tepek
<cmdline>-m 16</cmdline> Nice! I didn't know about that. Now everyone can adjust their client manually as needed. :)
As for automatically finding a good setting, I'm not sure about that yet. :/ But it does look like -v 4 is best forgotten.
____________
| |
|
|
@Elgrande71: A computation error means just that: your GPU found a factor that your CPU couldn't verify. Usually, this means you've overclocked too much!
By the way, for PPSieve, there's not much reason to overclock the memory, if such things are separate in ATI like they are in nVIDIA. You can under-clock the memory, overclock the shaders, and get better performance (as long as you don't overclock too much).
My GPU cards are a bit overclocked (Sapphire HD5870 Vapor-X) but now, there is no problem at all.
First WUs are gone in errors but now everything is working fine.
Thank you for your advice and your work.
Continue. | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
I wonder if the different -m performance across the cards so far is related to ATI employing different wavefront sizes in different cards? The 58xx/57xx and 48xx use 64, but a 5450 card has wavefront size of 32 (I haven't found the size for the 4650 yet...some cards are also have wavefronts of size 16, but I don't think any of these are OpenCL capable???).
Yes, all three sizes are possible. And it seems to be hard to figure out which is which. That's what's stopping me from changing the default -m size right now.
About computation errors, note that most computation errors go undetected. Especially the important ones, where a factor is missed. So if you're getting detected computation errors, even if they stop, it would be a good idea to change something to prevent future errors.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Alright, I've pushed a new version that just changes the default -m to match what CL_KERNEL_WORK_GROUP_SIZE returns. It's so similar that I didn't even bother changing the version number. (But the build date is today.) It also prevents using a -m higher than the default, as I think it returns a maximum.
Since the default is now the maximum possible size, I'd like to see what various cards set -m to on the test run. (It's printed to stdout.) It might make sense to make the default -m some fraction of what it is now. So let me know what your cards produce.
Thanks!
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
HD 4650 results...Something is not quite right with this one...
ppsieve-cl-x86-windows.exe -p42070e9 -P42
070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Resuming from checkpoint p=42070002359297 in ppcheck42070e9.txt
Thread 0 starting
Detected 8 SIMDs (640 SPUs?) on device 0.
Using 128 threads (about -m 0.25).
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070004718593, 39.32K p/sec, 0.03 CPU cores, 47.2% done. ETA 10 Sep 17:08
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
p=42070007077889, 35.53K p/sec, 0.03 CPU cores, 70.8% done. ETA 10 Sep 17:08
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
p=42070009437185, 38.39K p/sec, 0.03 CPU cores, 94.4% done. ETA 10 Sep 17:08
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 204.62 sec. (0.05 init + 204.57 sieve) at 38443 p/sec.
Processor time: 6.43 sec. (0.06 init + 6.36 sieve) at 1235588 p/sec.
Average processor utilization: 1.25 (init), 0.03 (sieve)
Also, -m2 is not allowed and is reduced to something less than 1 when tried.
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
OK, that didn't work. I've reverted to the previous build. I'll have to figure out something else later.
____________
| |
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 851 ID: 18447 Credit: 720,662,163 RAC: 1,697,414
                           
|
finally got time to try this...
HD4670, Ubuntu 10.04 x86_64 (collatz works fine on this)
i have done the ICD registration and ldd looks good
i use this wrapper
#!/bin/sh
export DISPLAY=:0.0
export ATISTREAMSDKROOT=${HOME}/ATI/ati-stream-sdk-v2.2-lnx64
export ATISTREAMSDKSAMPLESROOT=${ATISTREAMSDKROOT}
export LD_LIBRARY_PATH=${ATISTREAMSDKROOT}/lib/x86_64:${LD_LIBRARY_PATH}
exec ./ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60 $@
but it can't find the device (have tried various --device options)
$ ./ppsieve.sh
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
$ cat stderr.txt
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
called boinc_finish
____________
| |
|
|
try to run on : Ubuntu 10.04 64b with HD4870 (no overclock) driver 10.8 ( + ATI Stream SDK 2.1 and 2.2. same result)
command :
ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
result :
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
stderr.txt :
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Error: Building Program (clBuildProgram): Program build failure
called boinc_finish
same error with a real WU : http://www.primegrid.com/result.php?resultid=188269557
what's wrong ? | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Jip, did you complete all of the installation procedure (PDF), including setting up all necessary environment variables?
Vato: I'm afraid I don't have any personal experience with cards being found. (Since I don't have one. :P)
____________
| |
|
|
with that command :
ldd ppsieve-cl-boinc-x86_64-linux
i have :
linux-vdso.so.1 => (0x00007fff425cd000)
libOpenCL.so => /usr/lib/libOpenCL.so (0x00007f2c50e5c000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f2c50c3f000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f2c5092a000)
libm.so.6 => /lib/libm.so.6 (0x00007f2c506a7000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f2c50490000)
libc.so.6 => /lib/libc.so.6 (0x00007f2c5010c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2c5107b000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f2c4ff08000)
librt.so.1 => /lib/librt.so.1 (0x00007f2c4fd00000)
i have "atiocl32.icd" and "atiocl64.icd" in "/etc/OpenCL/vendors"
and "libatiocl64.so" and "libOpenCL.so" in /usr/lib64
and has you could see, it work to detect my GPU :
stderr.txt :
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Error: Building Program (clBuildProgram): Program build failure
called boinc_finish
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
check echo $ATISTREAMSDKROOT
____________
| |
|
|
i have no "$ATISTREAMSDKROOT" because "libatiocl64.so" and "libOpenCL.so" are in /usr/lib64 and then are find. as show by the ldd command. another lib is used ?
i try the complete sdk install. perhaps better but i dont think juste for run.
for libOpenCL with ldd i have :
ldd /usr/lib64/libOpenCL.so
linux-vdso.so.1 => (0x00007fff53dff000)
libdl.so.2 => /lib/libdl.so.2 (0x00007feb09147000)
librt.so.1 => /lib/librt.so.1 (0x00007feb08f3f000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007feb08c2a000)
libm.so.6 => /lib/libm.so.6 (0x00007feb089a7000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007feb08790000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007feb08572000)
libc.so.6 => /lib/libc.so.6 (0x00007feb081ef000)
/lib64/ld-linux-x86-64.so.2 (0x00007feb0956a000)
ldd on libatiocl64.so :
ldd /usr/lib64/libatiocl64.so
linux-vdso.so.1 => (0x00007fff039ff000)
libdl.so.2 => /lib/libdl.so.2 (0x00007fd4c995d000)
libX11.so.6 => /usr/lib/libX11.so.6 (0x00007fd4c9627000)
libGL.so.1 => /usr/lib/libGL.so.1 (0x00007fd4c944e000)
libGLU.so.1 => /usr/lib/libGLU.so.1 (0x00007fd4c91dd000)
librt.so.1 => /lib/librt.so.1 (0x00007fd4c8fd5000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fd4c8cc0000)
libm.so.6 => /lib/libm.so.6 (0x00007fd4c8a3d000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fd4c8826000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007fd4c8608000)
libc.so.6 => /lib/libc.so.6 (0x00007fd4c8285000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd4ca96a000)
libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007fd4c8069000)
libXext.so.6 => /usr/lib/libXext.so.6 (0x00007fd4c7e56000)
libatiuki.so.1 => /usr/lib/libatiuki.so.1 (0x00007fd4c7d4d000)
libXau.so.6 => /usr/lib/libXau.so.6 (0x00007fd4c7b48000)
libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x00007fd4c7942000)
all dependency seem correct. no ? | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
I think you need the complete SDK install because the OpenCL code is distributed as source code. The OpenCL SDK has to compile it, every time, before it can run!
I am aware that the current SDK allows compiling in binary OpenCL code; however, the app is currently set up to "bake in" certain constants.
____________
| |
|
|
same error with the complete sdk install .....
echo $ATISTREAMSDKROOT
/home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64
echo $ATISTREAMSDKSAMPLESROOT
/home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64
echo $LD_LIBRARY_PATH
/home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64/lib/x86_64:
ldd ppsieve-cl-boinc-x86_64-linux
linux-vdso.so.1 => (0x00007fff4531c000)
libOpenCL.so => /home/jip/ATI Stream/ati-stream-sdk-v2.2-lnx64/lib/x86_64/libOpenCL.so (0x00007f5d73951000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00007f5d7371c000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f5d73407000)
libm.so.6 => /lib/libm.so.6 (0x00007f5d73184000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f5d72f6d000)
libc.so.6 => /lib/libc.so.6 (0x00007f5d72be9000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5d73b58000)
libdl.so.2 => /lib/libdl.so.2 (0x00007f5d729e5000)
librt.so.1 => /lib/librt.so.1 (0x00007f5d727dd000)
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
OK, I've pushed a new version that will report the compile error. Let's see what's wrong.
____________
| |
|
|
after a "sudo ldconfig"
when i run in my boinc/projetcs/www.primegrid.com directory with this command :
ppsieve-cl-boinc-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0-beta (testing)
Compiled Sep 9 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Found 10 factors
stderr.txt :
shmget in attach_shmem: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Thread 0 completed
Sieve complete: 42070000000000 <= p < 42070010000000
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 21.26 sec. (0.02 init + 21.24 sieve) at 481269 p/sec.
Processor time: 20.93 sec. (0.02 init + 20.91 sieve) at 488934 p/sec.
Average processor utilization: 1.16 (init), 0.98 (sieve)
called boinc_finish
but with a real WU : http://www.primegrid.com/result.php?resultid=188323747
Stderr output
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/www.primegrid.com/ppsieve-cl-boinc-x86_64-linux: error while loading shared libraries: libOpenCL.so: cannot open shared object file: No such file or directory
</stderr_txt>
]]>
| |
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 851 ID: 18447 Credit: 720,662,163 RAC: 1,697,414
                           
|
Have now put 32bit and 64bit SDK libs in /lib32 and /lib64 respectively, re-run ldconfig, downloaded latest ppsieve-cl.zip, tried boinc and non-boinc and 32bit and 64bit executables - and still haven't got very far. Running under strace just shows lots of mmap() calls succeeding prior to calling clone() and futex(). Still HD4670 Ubuntu 10.04 x86_64 - any further hints gratefully received.
$ ./doit.sh
ppsieve version cl-0.1.0a-beta (testing)
Compiled Sep 12 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Error: Creating Context. (clCreateContextFromType): Device not found.
____________
| |
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes. | |
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes.
Don't hesitate to put another version of your app for testing. | |
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes.
you have to disable crossfire to make the second gpu working in opencl. It's a long standing bug, fortunately now solved for "old" Brook/CAL apps but still not for OpenCL/CAL apps.
It's a big problem for people with a 5970, because you cannot disable crossfire on that graphic card and so you're forced to use only one gpu.
It's not a problem of this app, it's a problem of both driver (10.7b, 10.8 & 10.9) and AMD SDK 2.2 | |
|
|
After several tests with my system ( Q6600 4Go Ram, 2x ATI HD5870 Crossfire), it seems that the m parameter doesn't have to exceed 24 otherwise one of my GPU send out compute errors.
With m equal to 32, the only GPU which doesn't provide with compute errors calculates workunits in 6min30s instead of 7min30s. It's a shame that the second GPU doesn't calculate workunits without errors.
I tried to find a fix for this but no luck (driver, etc....) .
Futhermore, I upgraded Boinc to the 6.10.58 version but nothing changes.
you have to disable crossfire to make the second gpu working in opencl. It's a long standing bug, fortunately now solved for "old" Brook/CAL apps but still not for OpenCL/CAL apps.
It's a big problem for people with a 5970, because you cannot disable crossfire on that graphic card and so you're forced to use only one gpu.
It's not a problem of this app, it's a problem of both driver (10.7b, 10.8 & 10.9) and AMD SDK 2.2
I have to summarize my situation :
without crossfire only one GPU is detected by Boinc
with crossfire two GPU detected by Boinc
All other configurations have been tested without success.
GPU workunits computation errors only occured if m parameter is equal or higher than 32 and only one GPU is concerned (not the other).
Is it clear ?
Ati (Amd) developpers have to fix their drivers and SDK. | |
|
|
I told you that the "second" gpu often makes computational errors with crossfire enabled. This is a problem with SDK 2.2 and AMD drivers. They'll fix this, eventually, with the following 2.3 release.
You can do two things:
- disable crossfire (you can, you do not have a 5970 in which you can't) and make boinc able to find the second gpu using a dummy plug and extending the desktop (search for this procedure on google) to the secondary fake monitor
- leave crossfire enabled and crunch only on one gpu, the first one.
If not, you'll have to live with the 2nd gpu making a lot of computational errors until the next sdk | |
|
|
OS: Windows Vista SP-2 - ATI CCC 10.9 - ATI Stream SDK 2.2 - OpenCL 1.1
CPU: Core 2 Quad 9550 @ 3.4 GHz
GPU: ATI HD 4770 @ stock clock (750/800 MHz)
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.1.0a-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 15.55 sec. (0.03 init + 15.52 sieve) at 658781 p/sec.
Processor time: 1.34 sec. (0.05 init + 1.29 sieve) at 7895855 p/sec.
Average processor utilization: 1.51 (init), 0.08 (sieve)
A complete work unit on the HD 4770 @ stock clock needs less time (45 seconds faster) than on my overclocked GTX 260-192 @ 667 MHz.
If I overclock the HD 4770 too it is up to 160 seconds faster than the already overclocked GTX.
An additional note:
The Windows GUI is much more responsive if I crunch PPS sieve WU's on the ATI instead of using the CUDA enabled GTX.
____________
| |
|
|
Hi all ,, would also like to contribute
OS: Win 7 - ATI CC 10.7 - ATI Stream SDK 2.2
CPU: i7
GPU: ATI HD5700 Series
ppsieve-cl-x86-windows -p42070e9 -P42070010e6 -k 1201 -K 9999 -
N 2000000 -c 60
ppsieve version cl-0.1.0a-beta (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 10.19 sec. (0.03 init + 10.15 sieve) at 1006898 p/sec.
Processor time: 1.00 sec. (0.03 init + 0.97 sieve) at 10570257 p/sec.
Average processor utilization: 0.92 (init), 0.10 (sieve)
Does that help ??
What to do more to help ?? | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
GPU: ATI HD5700 Series
That's not much detail. I infer from the 800 SPUs that this is a 5770. Is it overclocked at all? (Edit: since it might be overclocked at the factory, can you determine its clock speed?)
Does that help ??
What to do more to help ??
That's a little helpful, but not my main focus right now. Right now I'd like to get the new algorithms tested in the CUDA thread. Then I can port them here and you can help me test that.
____________
| |
|
|
That's not much detail. I infer from the 800 SPUs that this is a 5770. Is it overclocked at all? (Edit: since it might be overclocked at the factory, can you determine its clock speed?)
Yes it is indeed a 5770 ! I think was running @ 800Mhz
That's a little helpful, but not my main focus right now. Right now I'd like to get the new algorithms tested in the CUDA thread. Then I can port them here and you can help me test that.
OK will be glad any time to help .. just let me know what to do | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Alright, everybody, I finally have a new version of PPSieve-CL! V0.2.0-beta incorporates the algorithms of the CUDA version 0.2.1a. Get it at the usual place. Windows builds included too!
Now, to testing! Please test:
- The usual range
- -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
That should produce:20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Found 13 factors
- -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
The reasons for this should quickly become obvious if they aren't already. That should produce:20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
20070002680493 | 6455*2^1778260-1
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Found 14 factors
And let me know how fast it goes too! (Speed is printed in stderr.txt when using the BOINC apps.)
Thanks!
____________
| |
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 851 ID: 18447 Credit: 720,662,163 RAC: 1,697,414
                           
|
Did anyone have any advice for me on getting this to work at all?
I still always get "Device not found".
It's just infuriating running collatz instead of sieving.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Vato, I have a vague memory of seeing something about "Download the latest Catalyst drivers" somewhere. Other than that, I give up. Sorry.
____________
| |
|
|
Did anyone have any advice for me on getting this to work at all?
I still always get "Device not found".
It's just infuriating running collatz instead of sieving.
Did you install the Stream SDK 2.2?
____________
| |
|
|
hd4850, win7 64bit
-p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Elapsed time: 9.04 sec. (0.03 init + 9.01 sieve) at 1134494 p/sec.
Processor time: 1.14 sec. (0.05 init + 1.09 sieve) at 9362226 p/sec.
Average processor utilization: 1.42 (init), 0.12 (sieve)
-p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Found 13 factors
Elapsed time: 12.70 sec. (0.02 init + 12.67 sieve) at 806670 p/sec.
Processor time: 1.19 sec. (0.03 init + 1.15 sieve) at 8856163 p/sec.
Average processor utilization: 1.42 (init), 0.09 (sieve)
-p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
20070002680493 | 6455*2^1778260-1
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Elapsed time: 12.61 sec. (0.02 init + 12.59 sieve) at 812052 p/sec.
Processor time: 1.64 sec. (0.05 init + 1.59 sieve) at 6425058 p/sec.
Average processor utilization: 1.95 (init), 0.13 (sieve)
| |
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 851 ID: 18447 Credit: 720,662,163 RAC: 1,697,414
                           
|
Did you install the Stream SDK 2.2?
Yes - followed the install to the letter - see posts 26284 & 26357 earlier in this thread. I guess I have to chase my tail with many attempts to get the right combo or sequence, but that's scary with the recent fglrx issues. Why can't they just ship the SDK runtime with the cat drivers like normal folks do?
____________
| |
|
|
Did you install the Stream SDK 2.2?
Yes - followed the install to the letter - see posts 26284 & 26357 earlier in this thread. I guess I have to chase my tail with many attempts to get the right combo or sequence, but that's scary with the recent fglrx issues. Why can't they just ship the SDK runtime with the cat drivers like normal folks do?
Which Catalyst driver version do you use? 10.8 or newer?
____________
| |
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 851 ID: 18447 Credit: 720,662,163 RAC: 1,697,414
                           
|
Whatever ubuntu 10.04 + patches is.
I always found that chasing the ATI drivers eventually got me an unbootable system.
And when CVE-2010-3081 got fixed, I got one of those anyway.
This is the downside of ATI...
I can't find a minimum cat driver version in this thread, and the SDK didn't specify either that I can remember - so, what is the actual minimum?
____________
| |
|
|
Whatever ubuntu 10.04 + patches is.
I always found that chasing the ATI drivers eventually got me an unbootable system.
And when CVE-2010-3081 got fixed, I got one of those anyway.
This is the downside of ATI...
I can't find a minimum cat driver version in this thread, and the SDK didn't specify either that I can remember - so, what is the actual minimum?
For the Stream SDK 2.2 / Open CL 1.1 it's the latest 10.9 driver according to the SDK download page:
ATI Radeon™ HD - ATI Catalyst™ 10.9 Driver Suite
You can get them from the x-updates ppa:
https://launchpad.net/~ubuntu-x-swat/+archive/x-updates
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
I can't find a minimum cat driver version in this thread, and the SDK didn't specify either that I can remember - so, what is the actual minimum?
10.7 is the absolute minimum, though it is only partially supported (and did not work perfectly for me on the tests earlier). 10.8 is probably the true minimum for the SDK that was previously used in testing earlier in the thread, but ATI changes these relatively rapidly so you should update to the latest for best results.
____________
141941*2^4299438-1 is prime!
| |
|
blahVolunteer tester Send message
Joined: 27 Sep 08 Posts: 19 ID: 29724 Credit: 3,462,933 RAC: 0
         
|
OS: Windows 7 64 bit - ATI CCC 10.9 - ATI Stream SDK 2.2 - OpenCL 1.1
CPU: Core 2 Quad 6600 @ 2.4 GHz
GPU: ATI HD 4770 @ 820/840 MHz
ppsieve version cl-0.2.0-beta (testing)
1) -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 8.72 sec. (0.05 init + 8.67 sieve) at 1178703 p/sec.
Processor time: 1.50 sec. (0.06 init + 1.44 sieve) at 7123434 p/sec.
Average processor utilization: 1.33 (init), 0.17 (sieve)
2) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 12.76 sec. (0.03 init + 12.73 sieve) at 803136 p/sec.
Processor time: 2.04 sec. (0.05 init + 2.00 sieve) at 5119967 p/sec.
Average processor utilization: 1.50 (init), 0.16 (sieve)
3) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
Elapsed time: 12.79 sec. (0.06 init + 12.73 sieve) at 803136 p/sec.
Processor time: 2.31 sec. (0.06 init + 2.25 sieve) at 4551083 p/sec.
Average processor utilization: 1.00 (init), 0.18 (sieve)
Factors match for all 3 ranges.
| |
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 851 ID: 18447 Credit: 720,662,163 RAC: 1,697,414
                           
|
Thanks chaps!
I'll go with your advice and see how it works.
(Though I'll probably wait to see if I can get it "for free" with ubuntu 10.10 which is only a few days away)
____________
| |
|
|
OS: Windows 7 64 bit - ATI CCC 10.7 - ATI Stream SDK 2.2 - OpenCL 1.1
CPU: i7 860 @ 2.8 GHz
GPU: ATI HD 5700 @ 770/1000 MHz
ppsieve version cl-0.2.0-beta (testing)
1) -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 7.32 sec. (0.05 init + 7.27 sieve) at 1406581 p/sec.
Processor time: 1.42 sec. (0.08 init + 1.34 sieve) at 7620414 p/sec.
Average processor utilization: 1.47 (init), 0.18 (sieve)
2) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 10.44 sec. (0.04 init + 10.40 sieve) at 982984 p/sec.
Processor time: 1.84 sec. (0.08 init + 1.76 sieve) at 5799610 p/sec.
Average processor utilization: 1.95 (init), 0.17 (sieve)
3) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
Elapsed time: 10.47 sec. (0.04 init + 10.44 sieve) at 979687 p/sec.
Processor time: 2.04 sec. (0.05 init + 2.00 sieve) at 5119967 p/sec.
Average processor utilization: 1.20 (init), 0.19 (sieve)
Factors match for all 3 ranges. | |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 2,910,184,413 RAC: 199,509
                              
|
i7 980X @ 4 GHz + HD5850 @ 725/1000
Win7 Prof. x64, Catalyst 10.9, SDK 2.2
ppsieve version cl-0.2.0-beta (testing)
1) -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 4.20 sec. (0.02 init + 4.18 sieve) at 2444532 p/sec.
Processor time: 0.59 sec. (0.03 init + 0.56 sieve) at 18204347 p/sec.
Average processor utilization: 1.42 (init), 0.13 (sieve)
2) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
Elapsed time: 6.11 sec. (0.02 init + 6.09 sieve) at 1678383 p/sec.
Processor time: 0.62 sec. (0.05 init + 0.58 sieve) at 17712310 p/sec.
Average processor utilization: 2.13 (init), 0.09 (sieve)
3) -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
Elapsed time: 6.06 sec. (0.02 init + 6.04 sieve) at 1691995 p/sec.
Processor time: 0.69 sec. (0.03 init + 0.66 sieve) at 15603714 p/sec.
Average processor utilization: 1.49 (init), 0.11 (sieve)
All expected factors found.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
I'm getting reports from a PSA tester of errors with the following range on Windows:
-p6900015e6 -P6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R It should report the following factors, and does on my Linux64 CPU emulation:
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
But I get errors on Win32 CPU emulation, which are different from the errors reported by the other user.
Can someone with Windows who's run the standard tests run this one to compare? How about someone with Linux 32-bit? That would clear up a few things.
Thanks!
____________
| |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 2,910,184,413 RAC: 199,509
                              
|
Works for me.
ppsieve-cl-x86-windows.exe -p6900015e6 -P6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 288 multiprocessors (1440 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 0.87 sec. (0.01 init + 0.86 sieve) at 1222117 p/sec.
Processor time: 0.25 sec. (0.02 init + 0.23 sieve) at 4481075 p/sec.
Average processor utilization: 1.42 (init), 0.27 (sieve)
____________
| |
|
|
Works for me too.
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 1.60 sec. (0.01 init + 1.59 sieve) at 660683 p/sec.
Processor time: 0.53 sec. (0.03 init + 0.50 sieve) at 2100500 p/sec.
Average processor utilization: 2.23 (init), 0.31 (sieve) | |
|
blahVolunteer tester Send message
Joined: 27 Sep 08 Posts: 19 ID: 29724 Credit: 3,462,933 RAC: 0
         
|
Worked for me also.
ppsieve-cl-x86-windows.exe -p6900015e6 -P6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 1.67 sec. (0.03 init + 1.64 sieve) at 640155 p/sec.
Processor time: 0.62 sec. (0.02 init + 0.61 sieve) at 1723486 p/sec.
Average processor utilization: 0.50 (init), 0.37 (sieve)
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
It worked for the guy with the original error as well! I'm now checking to see if his original command line produces the error.
But that was a good range to test anyway.
____________
| |
|
|
For me testing as well OK......
ppsieve version cl-0.2.0-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 2.11 sec. (0.03 init + 2.08 sieve) at 504579 p/sec.
Processor time: 0.58 sec. (0.02 init + 0.56 sieve) at 1867113 p/sec.
Average processor utilization: 0.56 (init), 0.27 (sieve) | |
|
|
is it safe to run it on boinc? Because i'm running it on boinc, tell me if I have to stop...
thanks a lot! | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
It certainly appears safe to run on BOINC. The Computation Errors only seem to appear when running with a sieve file. It's very possible that my Git merge created a bug in the sieve file reading code.
If you got a Computation Error in BOINC, your WU would error out. If you're not seeing that, there shouldn't be any problem.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Alright, PPSieve-CL v0.2.1-beta is ready for testing. It should be faster than the last version; but I'm really unsure by how much. OpenCL seems to be a relatively slow language.
Also, the sieve file computation errors bug is still a mystery. It's not something I changed in reading the sieve file. So long as it finds the right factors without a sieve file, it should be OK to use this in BOINC.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
i7-920
Windows Vista 64-bit
HD4670
ppsieve-cl-x86-windows.exe -p6900015e6 -P 6900016e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.1-beta (testing)
nstart=70, nstep=30
ppsieve initialized: 1201 <= k <= 9999, 70 <= n < 2000000
Sieve started: 6900015000000 <= p < 6900016000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
Didn't change nstep from 30
6900015037673 | 7137*2^1445798-1
6900015118459 | 1577*2^1108206-1
6900015199963 | 8839*2^81555-1
6900015510929 | 6177*2^1704558-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 6900015000000 <= p < 6900016000000
Found 4 factors
count=33918,sum=0x033f7557d250845a
Elapsed time: 3.48 sec. (0.02 init + 3.46 sieve) at 303320 p/sec.
Processor time: 0.67 sec. (0.03 init + 0.64 sieve) at 1639414 p/sec.
Average processor utilization: 1.64 (init), 0.19 (sieve)
Still list the 4670 incorrectly with 640 SPUs (should be 320)?
Do you need it tested on the other ranges in the thread?
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
Here are the other ranges:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.1-beta (testing)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
p=42070006815745, 113.6K p/sec, 0.02 CPU cores, 68.2% done. ETA 17 Oct 20:52
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070010000000
Found 10 factors
count=318533,sum=0xb9f8cbeb13d00db3
Elapsed time: 95.62 sec. (0.04 init + 95.58 sieve) at 106966 p/sec.
Processor time: 1.37 sec. (0.06 init + 1.31 sieve) at 7801857 p/sec.
Average processor utilization: 1.42 (init), 0.01 (sieve)
Screen was horribly sluggish with this one.
ppsieve-cl-x86-windows.exe -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.1-beta (testing)
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
nstep changed to 22
Computation Error: no candidates found for p=20070000113153 between 1004042 and 1254026.
20070000475957 | 4995*2^1822738+1
Computation Error: no candidates found for p=20070000860671 between 1504010 and 1753994.
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
Computation Error: no candidates found for p=20070002489651 between 754058 and 1004042.
20070002606341 | 4809*2^497683+1
Computation Error: no candidates found for p=20070004648247 between 504074 and 754058.
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Computation Error: no candidates found for p=20070009297743 between 1504010 and 1753994.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070010000000
Found 13 factors
count=326136,sum=0x5ad678173464405c
Elapsed time: 19.85 sec. (0.03 init + 19.82 sieve) at 515875 p/sec.
Processor time: 1.48 sec. (0.05 init + 1.44 sieve) at 7123434 p/sec.
Average processor utilization: 1.51 (init), 0.07 (sieve)
no screen problems here...
ppsieve-cl-x86-windows.exe -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
ppsieve version cl-0.2.1-beta (testing)
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 128 multiprocessors (640 SPUs) on device 0.
nstep changed to 22
Computation Error: no candidates found for p=20070000292703 between 1753994 and 2000000.
Computation Error: no candidates found for p=20070000462433 between 1504010 and 1753994.
20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
Computation Error: no candidates found for p=20070001722619 between 1004042 and 1254026.
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
Computation Error: no candidates found for p=20070002155237 between 1004042 and 1254026.
20070002680493 | 6455*2^1778260-1
Computation Error: no candidates found for p=20070002886811 between 754058 and 1004042.
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070010000000
Found 14 factors
count=326136,sum=0x5ad678173464405c
Elapsed time: 19.89 sec. (0.03 init + 19.85 sieve) at 514992 p/sec.
Processor time: 1.61 sec. (0.05 init + 1.56 sieve) at 6553558 p/sec.
Average processor utilization: 1.42 (init), 0.08 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Here are the other ranges:
ppsieve-cl-x86-windows.exe -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
...
Elapsed time: 95.62 sec. (0.04 init + 95.58 sieve) at [b]106966 p/sec[/b].
Screen was horribly sluggish with this one.
That ain't right.
nstep changed to 22
Computation Error: no candidates found for p=20070000113153 between 1004042 and 1254026.
20070000475957 | 4995*2^1822738+1
Computation Error: no candidates found for p=20070000860671 between 1504010 and 1753994.
That ain't right either.
Looks like this one's going to need some work. :(
If someone's looking for a good version to use with BOINC, use this instead.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
If someone's looking for a good version to use with BOINC, use this instead.
I get the same times and errors with the apps at this link.
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
That link might not have been accessible for anyone but me. I've now reverted to v0.2.0-beta, and I'm sticking with that.
Upon further reflection, it may not be possible to implement the latest algorithm on OpenCL. It appears that converting from vector long to vector int actually takes some operations in OpenCL; the algorithm depends on casting long to int with no cost.
So I'm sticking with v0.2.0 beta and never buying an AMD GPU. Hey, at least I finally made a decision! :P
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2394 ID: 1178 Credit: 18,680,830,069 RAC: 6,903,927
                                                
|
That link might not have been accessible for anyone but me. I've now reverted to v0.2.0-beta, and I'm sticking with that.
Upon further reflection, it may not be possible to implement the latest algorithm on OpenCL. It appears that converting from vector long to vector int actually takes some operations in OpenCL; the algorithm depends on casting long to int with no cost.
So I'm sticking with v0.2.0 beta and never buying an AMD GPU. Hey, at least I finally made a decision! :P
OpenCL certainly seems limited compared to native algorithms as with CUDA on the NVidia cards. Have you thought much about trying to do the app in ATI's native Brook/CAL? The Collatz project was able to do their app that way, but I have no idea how difficult it is to work with the ATI cards this way...
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
OpenCL certainly seems limited compared to native algorithms as with CUDA on the NVidia cards. Have you thought much about trying to do the app in ATI's native Brook/CAL? Yes.The Collatz project was able to do their app that way, but I have no idea how difficult it is to work with the ATI cards this way...
Well, I couldn't do it without buying an ATI card, and I don't want to buy an ATI card if it's going to be slower than an nVIDIA card. Catch-22!
Also, the current fastest algorithm on nVIDIA is very linear. ATI needs instruction-level parallelism, and evidently that's not easy to come by. So I'm not sure CAL could do much either. Certainly not sure enough to buy an ATI card.
On the other hand, the vectorizing I did on ATI only gave about a 33% speedup. It might be worth un-vectorizing it and applying the newest algorithm. But ATI/OpenCL is so unpredictable that I'm not inclined to try this soon.
____________
| |
|
|
Rumor has it i do get back my vendor-"repaired" ATI 5770 in the course of this week.
I would take a look into it, but CAL seemed to me as a big pain in the back and there is (or was?) not that good documentation available as for CUDA. | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
I found this PDF about CAL - about four days ago from the time stamp. Knock yourself out! :)
____________
| |
|
|
Very interesting work Ken_g6!
I'll have to see if I can get the code to compile and run on my Macintosh. It seems to be going well so far, with only minor modifications, the compilation part at least.
Some progress (the executable name lies):
$ ./ppsieve-cl-x86_64-linux -p42070e9 -P42070010e6 -k 1201 -K 9999 -N 2000000 -c 60
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 18 2010 with GCC 4.2.1 (Apple Inc. build 5659)
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n < 2000000
Sieve started: 42070000000000 <= p < 42070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Error: Building Program (clBuildProgram): Program build failure
I still need to compile the boinc libraries to get the -boinc version... but even this failure is promising at this point. | |
|
|
GAAA i hate ATI!!!!
# yum localinstall --nogpgcheck fglrx64_6_9_0-8.712-1.x86_64.rpm
Loaded plugins: kernel-module, security
Setting up Local Package Process
Examining fglrx64_6_9_0-8.712-1.x86_64.rpm: fglrx64_6_9_0-8.712-1.x86_64
Marking fglrx64_6_9_0-8.712-1.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package fglrx64_6_9_0.x86_64 0:8.712-1 set to be updated
--> Finished Dependency Resolution
Beginning Kernel Module Plugin
Finished Kernel Module Plugin
Dependencies Resolved
========================================================================================================================
Package Arch Version Repository Size
========================================================================================================================
Installing:
fglrx64_6_9_0 x86_64 8.712-1 /fglrx64_6_9_0-8.712-1.x86_64 121 M
Transaction Summary
========================================================================================================================
Install 1 Package(s)
Upgrade 0 Package(s)
Total size: 121 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : fglrx64_6_9_0 1/1
Error! Bad return status for module build on kernel: 2.6.18-194.17.1.el5 (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/fglrx/8.712/build/ for more information.
Error! Invalid number of parameters passed.
Usage: remove -m <module> -v <module-version> --all
or: remove -m <module> -v <module-version> -k <kernel-version>
DKMS part of installation failed. Please refer to /usr/share/ati/fglrx-install.log for details
Installed:
fglrx64_6_9_0.x86_64 0:8.712-1
Complete! | |
|
|
ATI is soooo coooool
# ./make.sh
AMD kernel module generator version 2.1
doing Makefile based build for kernel 2.6.x and higher
rm -rf *.c *.h *.o *.ko *.GCC* .??* *.symvers
make -C /lib/modules/2.6.18-194.17.1.el5/build SUBDIRS=/var/lib/dkms/fglrx/8.712/build/2.6.x modules
make[1]: Entering directory `/usr/src/kernels/2.6.18-194.17.1.el5-x86_64'
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/firegl_public.o
/var/lib/dkms/fglrx/8.712/build/2.6.x/firegl_public.c:2415: Warnung: »kcl_flush_tlb_one« definiert, aber nicht verwendet
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_acpi.o
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_agp.o
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_debug.o
CC [M] /var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.o
/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.c: In function »KCL_IOCTL_AllocUserSpace32«:
/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.c:196: Fehler: Implizite Deklaration der Funktion »compat_alloc_user_space«
/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.c:196: Warnung: return erzeugt Zeiger von Ganzzahl ohne Typkonvertierung
make[2]: *** [/var/lib/dkms/fglrx/8.712/build/2.6.x/kcl_ioctl.o] Fehler 1
make[1]: *** [_module_/var/lib/dkms/fglrx/8.712/build/2.6.x] Fehler 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-194.17.1.el5-x86_64'
make: *** [kmod_build] Fehler 2
build failed with return value 2
https://access.redhat.com/kb/docs/DOC-40265
RHEL errata
solution
solution
# diff -u /tmp/kcl_ioctl.c 2.6.x/kcl_ioctl.c
--- /tmp/kcl_ioctl.c 2010-10-19 21:30:27.000000000 +0200
+++ 2.6.x/kcl_ioctl.c 2010-10-19 21:23:52.000000000 +0200
@@ -193,7 +193,7 @@
*/
void* ATI_API_CALL KCL_IOCTL_AllocUserSpace32(long size)
{
- return compat_alloc_user_space(size);
+ return arch_compat_alloc_user_space(size);
}
#endif // __x86_64__ | |
|
|
no luck, i have streak-sdk 2.1 an the related ICD.
$ ./ppsieve-cl-x86_64-linux -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
./ppsieve-cl-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-x86_64-linux)
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
nstep changed to 22
Error: Building Program (clBuildProgram): Program build failure
/tmp/OCLL5MUom.cl(76): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(21)
^
/tmp/OCLL5MUom.cl(76): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(21)
^
/tmp/OCLL5MUom.cl(77): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(42)
^
/tmp/OCLL5MUom.cl(77): error: mixed vector-scalar operation not allowed unless
up-convertable(scalar-type=>vector-element-type)
SHIFTMOD_REDCX(42)
^
/tmp/OCLL5MUom.cl(99): warning: variable "n" was declared but never referenced
uint n = D_NMIN;
^
/tmp/OCLL5MUom.cl(174): error: expression must have integral type
my_factor_found <<= shift;
^
5 errors detected in the compilation of "/tmp/OCLL5MUom.cl".
stream-sdk-samples do work
$ /opt/ati-stream-sdk/samples/opencl/bin/x86_64/CLInfo
/opt/ati-stream-sdk/samples/opencl/bin/x86_64/CLInfo: /usr/lib64/libOpenCL.so: no version information available (required by /opt/ati-stream-sdk/samples/opencl/bin/x86_64/CLInfo)
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd
Platform Name: ATI Stream
Number of devices: 2
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 2001Mhz
Address bits: 64
Max memory allocation: 1073741824
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 0
Cache size: 0
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x2b8217c24228
Name: Genuine Intel(R) CPU @ 0000 @ 2.00GHz
Vendor: GenuineIntel
Driver version: 1.1
Profile: FULL_PROFILE
Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Extensions: cl_khr_icd cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 10
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 960Mhz
Address bits: 32
Max memory allocation: 268435456
Image support: No
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 268435456
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x2b8217c24228
Name: Juniper
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.556
Profile: FULL_PROFILE
Version: OpenCL 1.0 ATI-Stream-v2.1 (145)
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_amd_device_attribute_query
Passed!
$ ls -l /usr/lib64/libOpenCL*
lrwxrwxrwx 1 root root 14 12. Okt 19:15 /usr/lib64/libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx 1 root root 16 12. Okt 19:15 /usr/lib64/libOpenCL.so.1 -> libOpenCL.so.1.0
lrwxrwxrwx 1 root root 18 12. Okt 19:15 /usr/lib64/libOpenCL.so.1.0 -> libOpenCL.so.1.0.0
-rwxr-xr-x 1 root root 20968 12. Okt 19:15 /usr/lib64/libOpenCL.so.1.0.0
$ ls -l /etc/OpenCL/vendors/
insgesamt 12
-r--r--r-- 1 root root 15 5. Mär 2010 atiocl32.icd
-r--r--r-- 1 root root 15 5. Mär 2010 atiocl64.icd
lrwxrwxrwx 1 root root 23 15. Jun 23:20 libatiocl32.so -> /usr/lib/libatiocl32.so
lrwxrwxrwx 1 root root 25 15. Jun 23:20 libatiocl64.so -> /usr/lib64/libatiocl64.so
-r--r--r-- 1 root root 11 12. Okt 19:15 nvidia.icd
$ ls -l /usr/lib64/libatiocl64.so
lrwxrwxrwx 1 root root 45 9. Apr 2010 /usr/lib64/libatiocl64.so -> /opt/ati-stream-sdk/lib/x86_64/libatiocl64.so
$ ls -l /opt/ati-stream-sdk/lib/x86_64/libatiocl64.so
-rwxr-xr-x 1 a0062995 a0062995 12477000 15. Apr 2010 /opt/ati-stream-sdk/lib/x86_64/libatiocl64.so | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Nothing worked for me until I upgraded to the 2.2 SDK.
____________
| |
|
|
now i need a new driver, i love it so, the 10.9 can't be installed, will try 10.7... | |
|
|
10.8 works, but only if you do trick the driver and build the module while you are building the distro-specific rpm and copy it over to your kernel-module-extra-dir, what a mess...
$ ./ppsieve-cl-x86_64-linux -p20070e9 -P20070010e6 -k 1201 -K 9999 -N 2000000 -c 60 -R
./ppsieve-cl-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-x86_64-linux)
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=74, nstep=31
ppsieve initialized: 1201 <= k <= 9999, 74 <= n < 2000000
Sieve started: 20070000000000 <= p < 20070010000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
nstep changed to 22
20070000541441 | 3243*2^1584966-1
20070000674041 | 8143*2^1397047-1
20070001823101 | 7647*2^1022532-1
20070001843627 | 4955*2^248864-1
20070002680493 | 6455*2^1778260-1
20070003067151 | 9259*2^1869285-1
20070004606567 | 5269*2^257879-1
20070005404357 | 7893*2^796719-1
20070006186677 | 2413*2^1924519-1
20070007049707 | 2149*2^153375-1
20070007529777 | 1367*2^1532230-1
20070008213771 | 4041*2^1187467-1
20070008917537 | 6671*2^1260830-1
20070008991223 | 5523*2^101606-1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 20070000000000 <= p < 20070010000000
Found 14 factors
count=326136,sum=0x5ad678173464405c
Elapsed time: 10.56 sec. (0.02 init + 10.54 sieve) at 970182 p/sec.
Processor time: 9.20 sec. (0.02 init + 9.18 sieve) at 1113732 p/sec.
Average processor utilization: 1.15 (init), 0.87 (sieve)
Scientific Linux SL release 5.5 (Boron)
Genuine Intel(R) CPU @ 0000 @ 2.00GHz (Xeon E5504 ES with HyperThreading)
6 GB RAM
MSI R5770 PMDIG (800 SPU @ 850 MHz; 1024 MB DDR5)
ati-stream-sdk-2.2
10.8-driver
gpuload as reported by aticonfig --odgc is 95% | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 261,913,874 RAC: 8,863
                            
|
Congrats! :)
____________
| |
|
|
# time ./ppsieve-cl-boinc-x86_64-linux -p1186491e9 -P1186492e9 -k 1201 -K 9999 -N 2000000 -c 60
./ppsieve-cl-boinc-x86_64-linux: /usr/lib64/libOpenCL.so: no version information available (required by ./ppsieve-cl-boinc-x86_64-linux)
ppsieve version cl-0.2.0-beta (testing)
Compiled Oct 6 2010 with GCC 4.3.3
nstart=86, nstep=37
ppsieve initialized: 1201 <= k <= 9999, 86 <= n < 2000000
nstep changed to 32
(...)
Found 27 factors
real 8m36.841s
user 1m21.796s
sys 7m7.255
Can't open init data file - running in standalone mode
Sieve started: 1186491000000000 <= p < 1186492000000000
Thread 0 starting
Detected 160 multiprocessors (800 SPUs) on device 0.
Thread 0 completed
Sieve complete: 1186491000000000 <= p < 1186492000000000
count=28805195,sum=0xbece431c19d67201
Elapsed time: 514.84 sec. (0.13 init + 514.71 sieve) at 1943001 p/sec.
Processor time: 509.01 sec. (0.14 init + 508.88 sieve) at 1965273 p/sec.
Average processor utilization: 1.02 (init), 0.99 (sieve)
called boinc_finish
Aroud 50% faster than a GT 240
found factors do match with the ones from my 9400 GT that ran the same range. | |
|
|
I couldn't do it without buying an ATI card, and I don't want to buy an ATI card if it's going to be slower than an nVIDIA card. Catch-22!
I don't want to be unrespectful, but I think that with a decent algorithm, in these calculations ati cards should be really faster than a similar nvidia.
Anyway, thanks for your work. It's also open source, so when somebody will be able to go onward (time is not free, unfortunately), it will start from there. | |
|
|
I don't want to be unrespectful, but I think that with a decent algorithm, in these calculations ati cards should be really faster than a similar nvidia.
FWIW... the real 'problem' here is openCL and it's JIT... It's crap.
Just a tip... http://sourceforge.net/projects/calpp/ that should make it easier... | |
|
|
Maybe i ask that to early, because you still try to test the application and BOINC will come later. In these case, i will ask later again. ;)
After pschoefer told me that there is an way to crunch for PG on an ATI i tested it under BOINC. And cool - it works fine (as long as i deactivate manually my onboard ATI HD 2400 on Device0).
Prior that, i tested the sieve on cmd.exe, it find the correct ATI-card and ignored the HD 2400.
Under Boinc instead it starts one wu with device 0 and compute correctly (remember, according to BOINC, the device 0 should be the HD2400 ?!?), then it starts a second wu with device 1 (at normal the ATI 5750 ?!?) and immediately produce an error.
As result (ignoring the wrong device-output):
The ATI HD 5750 (725/1150MHz; Device 1) crunch and crunch and crunch.... ;)
The ATI HD 2400 (when active) produce error on error on error... :/
My question: Is there a way to exclude the HD2400 for PG? The card could crunch for Collatz (RV610-Chip... not fast, but the card works). Is there a way to write it in the app_info.xml or is that a problem, i can't solve?
Thank you for your regard. | |
|
|
I doubt that you crunch with the CUDA version on an ATI card :)
I think you tried to get the OpenCL version to work on your 5750 and since
OpenCL (to my knowledge) doesn't support your HD 2400 it selects the right
card running the application through the command line by simply ignoring the
not OpenCL capable HD 2400.*
If you use app_info.xml files for Collatz and PG you could try to add a --device X
switch to the command lines with X denoting the device you want to use for the
respective BOINC project (HD 2400 for Collatz and the 5750 for PG) but I don't
know what BOINC will do if it has only work for the PG project and tries to start two
work units which will then end up on the same card (via the added --device X) switch.
*Even if BOINC sees your HD 2400 as device 0, for the ATI OpenCL app
device 0 is most probably the first OpenCL capable device and that is
your 5750.
____________
| |
|
|
|