Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Genefer OCL 2
Author |
Message |
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
A new Genefer application "geneferocl2" is available from Genefer repository (http://www.primegrid.com/forum_thread.php?id=6359).
A 32-bit binary for windows is available (compiled with gcc 5.2.1).
geneferocl2 uses fixed-point arithmetic. Q63.64 fixed-point numbers for data and Q0.63 for the sin/cos table.
Then it can run on any GPU: the double type is not used.
Some first tests on my computer (i7-3820 @ 3.6 GHz and GTX 680).
Generalized Fermat Number Bench OCL2
1203210^65536+1 Time: 831 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.6 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.73 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 5.52 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 11.7 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 24.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 59.2 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 120 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 8.
Generalized Fermat Number Bench OCL
1203210^65536+1 Time: 103 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 197 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 374 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 726 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.46 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.98 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 6.27 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 13.6 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 66.
Generalized Fermat Number CPU
1203210^65536+1 Time: 278 us/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 619 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 1.3 ms/mul. Err: 0.1484 1548156 digits
658332^524288+1 Time: 2.88 ms/mul. Err: 0.1406 3050541 digits
538452^1048576+1 Time: 6.14 ms/mul. Err: 0.1367 6009544 digits
440400^2097152+1 Time: 13.9 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 28.8 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 65.8 ms/mul. Err: 0.1328 45879398 digits
Genefer Mark = 15.
Generalized Fermat Number Bench x87
1203210^65536+1 Time: 2.95 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.91 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 13.2 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 26.5 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 58.5 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 117 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 256 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 513 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.
It is the "OpenCL version of genefer80": slower than geneferocl but the error is smaller then a larger range can be tested with it:
geneferocl_windows.exe -q "3149688^32768+1"
maxErr exceeded for 3149688^32768+1, 0.5000 > 0.4500
genefer_windows64.exe -q "3149688^32768+1"
maxErr exceeded for 3149688^32768+1, 0.5000 > 0.4500
geneferocl2_windows.exe -q "3149688^32768+1"
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:04:37)
genefer_windows64.exe -q "3149688^32768+1" -x x87
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0006) (time = 0:15:20)
This is the first version then it should be tested on different hardware / OS. And future improvements are possible.
Have fun! | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
Runs fine on a GTX 660 (Win7 Enterprise 64-bit, Haswell-based Xeon). However, a GTX 645 (same OS, i7-860) crashes giving the error:
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
An error (2964) occured.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Does NOT work on my GTX 580:
C:\Temp\GFN\OCL2>geneferocl2_windows.exe -q "3149688^32768+1
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: geneferocl2_windows.exe -q 3149688^32768+1
Priority change succeeded.
Testing 3149688^32768+1...
Using OCL2 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1
CUDA' and driver '344.75'.
Starting initialization...
Initialization complete (0.070 seconds).
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
An error (2964) occured.
I get the same error when running -b. Is it perhaps compiled only for GTX6xx and later GPUs?
Minor: Note the "occured" in the error message is mispelled.
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Runs fine on a GTX 660 (Win7 Enterprise 64-bit, Haswell-based Xeon). However, a GTX 645 (same OS, i7-860) crashes giving the error:
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
An error (2964) occured.
Did you check NVidia driver version?
May be a problem with "old" drivers? New ones are OpenCL 1.2, I didn't check the app on OpenCL 1.1. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
Runs fine on a GTX 660 (Win7 Enterprise 64-bit, Haswell-based Xeon). However, a GTX 645 (same OS, i7-860) crashes giving the error:
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
An error (2964) occured.
Did you check NVidia driver version?
May be a problem with "old" drivers? New ones are OpenCL 1.2, I didn't check the app on OpenCL 1.1.
Updated driver seems to be working on the GTX 645 now, but it is NOT the OpenCL 1.1 issue as my GTX 660's slightly newer driver was also OpenCL 1.1
| |
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1078 ID: 183129 Credit: 1,376,122,338 RAC: 4,719
                         
|
Works fine on Nvidia GTX 760 on Win 7 pro x64... Driver is 347.52, saying OCL 1.1.
Command line: geneferocl2_windows.exe -q 3149688^32768+1
Priority change succeeded.
Testing 3149688^32768+1...
Using OCL2 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'.
Starting initialization...
Initialization complete (0.037 seconds).
Estimated time remaining for 3149688^32768+1 is 0:06:50
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:06:30) 11:50:53
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Updated driver seems to be working on the GTX 645 now, but it is NOT the OpenCL 1.1 issue as my GTX 660's slightly newer driver was also OpenCL 1.1
I'm running Windows 10 :o) and cannot install driver prior to 352 :o(
Intel's OpenCL is running on my CPU but on on my HD Graphics 4000: "maxErr exceeded for 100234^64+1, 0.5000 > 0.4500 during final check". I will check this.
It seems to run on 347.52 but not on 344.75... | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Yves, the limit check ("-l") seems to be reporting much lower limits than OCL2 can actually process.
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Minor: Note the "occured" in the error message is mispelled.
Thanks, committed! | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Yves, the limit check ("-l") seems to be reporting much lower limits than OCL2 can actually process.
Yes, the test doesn't work if we start with a large b because the error is not continuous with fixed-point arithmetic. Round-off error is continuous but integer overflow may occurred too. I'm working on finding true limits.
Does it run on a GTX 580 with a new driver?
| |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
Does it run on a GTX 580 with a new driver?
Yes Yves, it does.
Running benchmarks for transform implementation "OCL2"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '353.62'.
2199064^8192+1 Time: 176 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 203 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 251 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 512 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.07 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.96 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.97 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.24 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 16 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 37.3 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 79.1 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
Priority change succeeded.
____________
My stats | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 918 ID: 107307 Credit: 977,945,376 RAC: 0
                     
|
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 570', version 'OpenCL 1.1 CUDA' and driver '347.25'.
Max b seems to be:
32768 between 16.9M-16.95M
65536 between 11.6M-11.7M
131072 between 7.7M-8M
262144 between 5.5M-6M
524288 between 3M-4M
(I'll keep updating this post as I keep testing) | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Does it run on a GTX 580 with a new driver?
Yes Yves, it does.
Thanks Mike, I will check why it doesn't run with driver '344.75' on a PC on Windows 7.
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '353.62'.
On my laptop, it is device 'GeForce GT 740M', version 'OpenCL 1.2 CUDA' and driver '353.62'.
I don't understand. Why OpenCL 1.2 on my computer and 1.1 on yours?
GTX 580 => Genefer Mark = 12 and GTX 680 Genefer Mark = 8.
GTX 580: 512 cores @ 772 (or 1544?) MHz.
GTX 680: 1536 cores @ 1006 MHz.
Why the 580 is 50% faster? | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
Max b seems to be:
32768 between 16.9M-16.95M
65536 between 11.5M-11.7M
Those are both within PRPNet current range.
32768 is reaching 10M and 65536 passed 4M couple of days ago.
3966304^65536+1 is ~12 minutes on GTX580 vs ~66 mins on i5-4670.
Guest what - I'm in process of selling my GTX580 box this week...and new Genefer is on the scene.
____________
My stats | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
GTX 580, latest non-beta drivers (353.62), Core i5-4670K. All stock clocks.
The two extended precision transforms (OCL2 and x87) are in red.
GPU OCL:
2199064^8192+1 Time: 25.4 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 31.9 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 45 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 82.6 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 176 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 356 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 664 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.32 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.69 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.71 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 12.5 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 73.
GPU CUDA:
2199064^8192+1 Time: 160 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 177 us/mul. Err: 0.2344 102481 digits
1471094^32768+1 Time: 192 us/mul. Err: 0.2305 202102 digits
1203210^65536+1 Time: 265 us/mul. Err: 0.2188 398482 digits
984108^131072+1 Time: 425 us/mul. Err: 0.2344 785521 digits
804904^262144+1 Time: 694 us/mul. Err: 0.2188 1548156 digits
658332^524288+1 Time: 1.14 ms/mul. Err: 0.2031 3050541 digits
538452^1048576+1 Time: 1.97 ms/mul. Err: 0.1875 6009544 digits
440400^2097152+1 Time: 3.6 ms/mul. Err: 0.1953 11836006 digits
360204^4194304+1 Time: 7.09 ms/mul. Err: 0.1875 23305854 digits
294612^8388608+1 Time: 15.8 ms/mul. Err: 0.2031 45879398 digits
Genefer Mark = 55.
CPU FMA3:6008024^256+1 Time: 0.822 us/mul. Err: 0.1484 1736 digits
4913974^512+1 Time: 1.36 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 2.75 us/mul. Err: 0.1602 6763 digits
3287270^2048+1 Time: 5.72 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 11.6 us/mul. Err: 0.1406 26336 digits
2199064^8192+1 Time: 25.6 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 51.8 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 118 us/mul. Err: 0.1641 202102 digits
1203210^65536+1 Time: 240 us/mul. Err: 0.1523 398482 digits
984108^131072+1 Time: 549 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 1.11 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 2.63 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 5.49 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 12.7 ms/mul. Err: 0.1328 11836006 digits
360204^4194304+1 Time: 25.8 ms/mul. Err: 0.1375 23305854 digits
294612^8388608+1 Time: 63.2 ms/mul. Err: 0.1289 45879398 digits
Genefer Mark = 16.
CPU AVX (Intel):6008024^256+1 Time: 0.963 us/mul. Err: 0.1406 1736 digits
4913974^512+1 Time: 1.59 us/mul. Err: 0.1250 3427 digits
4019150^1024+1 Time: 3.34 us/mul. Err: 0.1250 6763 digits
3287270^2048+1 Time: 6.74 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 14.1 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 30.2 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 62.5 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 137 us/mul. Err: 0.1641 202102 digits
1203210^65536+1 Time: 284 us/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 623 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 1.29 ms/mul. Err: 0.1484 1548156 digits
658332^524288+1 Time: 2.91 ms/mul. Err: 0.1406 3050541 digits
538452^1048576+1 Time: 6.2 ms/mul. Err: 0.1367 6009544 digits
440400^2097152+1 Time: 13.9 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 28.6 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 67.7 ms/mul. Err: 0.1328 45879398 digits
Genefer Mark = 14.
GPU OCL2:
]2199064^8192+1 Time: 176 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 206 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 253 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 512 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.08 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.98 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.96 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.31 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 16.1 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 37.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 79.9 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
CPU SSE4:6008024^256+1 Time: 1.26 us/mul. Err: 0.1250 1736 digits
4913974^512+1 Time: 2.7 us/mul. Err: 0.1406 3427 digits
4019150^1024+1 Time: 5.51 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 12.2 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 24.4 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 51.8 us/mul. Err: 0.1719 51956 digits
1798620^16384+1 Time: 111 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 232 us/mul. Err: 0.1719 202102 digits
1203210^65536+1 Time: 496 us/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 1.04 ms/mul. Err: 0.1641 785521 digits
804904^262144+1 Time: 2.21 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 4.59 ms/mul. Err: 0.1484 3050541 digits
538452^1048576+1 Time: 10.2 ms/mul. Err: 0.1562 6009544 digits
440400^2097152+1 Time: 20.9 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 45.8 ms/mul. Err: 0.1250 23305854 digits
294612^8388608+1 Time: 93.2 ms/mul. Err: 0.1328 45879398 digits
Genefer Mark = 9.
CPU SSE2:6008024^256+1 Time: 1.62 us/mul. Err: 0.1250 1736 digits
4913974^512+1 Time: 3.39 us/mul. Err: 0.1406 3427 digits
4019150^1024+1 Time: 6.96 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 14.2 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 29.6 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 62 us/mul. Err: 0.1719 51956 digits
1798620^16384+1 Time: 133 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 274 us/mul. Err: 0.1719 202102 digits
1203210^65536+1 Time: 579 us/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 1.2 ms/mul. Err: 0.1641 785521 digits
804904^262144+1 Time: 2.54 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 5.31 ms/mul. Err: 0.1484 3050541 digits
538452^1048576+1 Time: 11.4 ms/mul. Err: 0.1562 6009544 digits
440400^2097152+1 Time: 23.4 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 51 ms/mul. Err: 0.1250 23305854 digits
294612^8388608+1 Time: 104 ms/mul. Err: 0.1328 45879398 digits
Genefer Mark = 8.
CPU x87:6008024^256+1 Time: 6.07 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 14.4 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 30.2 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 61.6 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 134 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 286 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 620 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.31 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.82 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.86 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 12.6 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 26.4 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 56.2 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 117 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 248 ms/mul. Err: 0.0001 23305854 digits
Bottom line: on this hardware, OCL2 is about 6 times faster than x87, but I'd be running 4 x87 tests simultaneously.
____________
My lucky number is 75898524288+1 | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
I don't understand. Why OpenCL 1.2 on my computer and 1.1 on yours?
GTX 580: 512 cores @ 772 (or 1544?) MHz.
GTX 680: 1536 cores @ 1006 MHz.
Why the 580 is 50% faster?
Well, GPU-Z says 512 shaders clocked 1544Mhz, core clocked at half.
And OpenCL 1.1 for whole GTX 5xx series.
GTX 580 has 384-bit bus width, GTX 680 is only 256.
On the other hand, my box has slower i5-661 (comparing to i7-3820) and Genefer is taking whole core.
____________
My stats | |
|
|
352.78 OpenCL 1.2 driver (branch r352) Transform implementation "OCL2" on Nvidia Maxwell C.C5.0/5.2 and i-5 4440S CPU @ 2.9GHz:
GTX750 @ 1.412GHz:
Generalized Fermat Number Bench
2199064^8192+1 Time: 163 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 281 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 511 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 856 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 2.3 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 3.75 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 8.65 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 15.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 37.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 65.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 260 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 5.
GTX970 @ 1540MHz:
Generalized Fermat Number Bench
2199064^8192+1 Time: 174 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 181 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 288 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 385 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 943 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.62 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.52 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.58 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 14.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 28.4 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 64.1 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 14.
10 runs produced the same result: CPU OCL2 benchmark never completed full suite - results stop at 398482 digits.
(Microcode revision 1C) 2.9GHz (i-5 4440) OpenCL (Build148) Driver 4.2.0.148:
Generalized Fermat Number Bench
2199064^8192+1 Time: 1.45 ms/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 3 ms/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 6.5 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 13.8 ms/mul. Err: 0.0001 398482 digits
The current microcode for my Ivy i-5 is 1B. I've seen four different MC on my Haswell in last couple months: 19/1A/1B/1C.
The 970 peaked at 125W during OCL2 benchmark. The 970's BUS usage ~60% on PCIe3.0 x8 links. The 750 (PCI3.0 x8 link) BUS usage was ~50%. MCU load peaked at 30% for both GPU's.
Genefer x64 OCL benchmark (140W peak) 970 at 1540MHz:
Generalized Fermat Number Bench
2199064^8192+1 Time: 53.4 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 58.7 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 73.6 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 101 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 175 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 340 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 664 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.31 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.55 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.31 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 10.6 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 78.
GTX750 at 1412MHz:
Generalized Fermat Number Bench
2199064^8192+1 Time: 61.1 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 78.1 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 131 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 242 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 451 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 937 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 1.86 ms/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 3.98 ms/mul. Err: 0.1758 6009544 digits
440400^2097152+1 Time: 7.98 ms/mul. Err: 0.1680 11836006 digits
360204^4194304+1 Time: 17.1 ms/mul. Err: 0.1563 23305854 digits
294612^8388608+1 Time: 34.5 ms/mul. Err: 0.1563 45879398 digits
Genefer Mark = 25.
FMA3 Haswell 2.9GHz (C72133RAM) Genefer x64:
Generalized Fermat Number Bench
6008024^256+1 Time: 1.16 us/mul. Err: 0.1484 1736 digits
4913974^512+1 Time: 1.44 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 2.93 us/mul. Err: 0.1602 6763 digits
3287270^2048+1 Time: 6.12 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 12.2 us/mul. Err: 0.1406 26336 digits
2199064^8192+1 Time: 27.5 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 53.9 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 123 us/mul. Err: 0.1641 202102 digits
1203210^65536+1 Time: 250 us/mul. Err: 0.1523 398482 digits
984108^131072+1 Time: 569 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 1.14 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 2.69 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 5.51 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 12.6 ms/mul. Err: 0.1328 11836006 digits
360204^4194304+1 Time: 25.6 ms/mul. Err: 0.1375 23305854 digits
294612^8388608+1 Time: 60.9 ms/mul. Err: 0.1289 45879398 digits
Genefer Mark = 16.
Are the OCL2 and OCL Genefer Marks comparable? Does each program have a separate rating design?
OCL2 (GTX750) 'b' limits:
Generalized Fermat Number b Limits
The upper bound m = 8192, b = 6685000, Err = 0.0009
Starting b = 6690000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 16384, b = 5435000, Err = 0.0011
Starting b = 5440000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 32768, b = 4405000, Err = 0.0008
Starting b = 4410000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 65536, b = 3575000, Err = 0.0009
Starting b = 3580000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 131072, b = 2905000, Err = 0.0013
Starting b = 2910000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 262144, b = 2355000, Err = 0.0008
Starting b = 2360000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 524288, b = 1915000, Err = 0.0008
Starting b = 1920000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 1048576, b = 1555000, Err = 0.0012
Starting b = 1560000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 2097152, b = 1255000, Err = 0.0008
Starting b = 1260000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 4194304, b = 1025000, Err = 0.0006
Starting b = 1030000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 8388608, b = 825000, Err = 0.0008
Starting b = 830000, Err b = 0, Err = 0.0000, 5 Err b = 0
OCL2 (GTX970) 'b' limits:
Generalized Fermat Number b Limits
The upper bound m = 8192, b = 6685000, Err = 0.0009
Starting b = 6690000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 16384, b = 5435000, Err = 0.0010
Starting b = 5440000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 32768, b = 4405000, Err = 0.0008
Starting b = 4410000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 65536, b = 3575000, Err = 0.0009
Starting b = 3580000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 131072, b = 2905000, Err = 0.0012
Starting b = 2910000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 262144, b = 2355000, Err = 0.0008
Starting b = 2360000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 524288, b = 1915000, Err = 0.0008
Starting b = 1920000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 1048576, b = 1555000, Err = 0.0008
Starting b = 1560000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 2097152, b = 1255000, Err = 0.0007
Starting b = 1260000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 4194304, b = 1025000, Err = 0.0006
Starting b = 1030000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 8388608, b = 825000, Err = 0.0007
Starting b = 830000, Err b = 0, Err = 0.0000, 5 Err b = 0
OCL2 (i-5 4440S) 'b' limits:
Generalized Fermat Number b Limits
The upper bound m = 8192, b = 6685000, Err = 0.0009
Starting b = 6690000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 16384, b = 5435000, Err = 0.0011
Starting b = 5440000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 32768, b = 4405000, Err = 0.0009
Starting b = 4410000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 65536, b = 3575000, Err = 0.0010
Starting b = 3580000, Err b = 0, Err = 0.0000, 5 Err b = 0
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Are the OCL2 and OCL Genefer Marks comparable? Does each program have a separate rating design?
If I remember correctly, they're all the same metric except that GPU tasks measure elapsed time and CPU tasks measure CPU time. The Genefer Mark scores should therefore by comparable between any of the programs. That's the reason for it.
____________
My lucky number is 75898524288+1 | |
|
|
Thanks Yves, looks like it will be useful when we reach beyond the limits of the current application. I built the new code on my Mac with FirePro D700, but it fails for all tests:
Running tests for transform implementation "OCL2"
Testing 10234^64+1...
Using OCL2 transform
Running on platform 'Apple', device 'ATI Radeon HD - FirePro D700 Compute Engine', version 'OpenCL 1.2 ' and driver '1.2 (Jun 10 2015 16:27:05)'.
Starting initialization...
Initialization complete (0.000 seconds).
Testing 10234^64+1... 851 steps to go
maxErr exceeded for 10234^64+1, 0.5000 > 0.4500 during final check
However, it is working fine on my Linux machine with a Tesla K20m (Kepler-series) GPU:
Running benchmarks for transform implementation "OCL2"
Running on platform 'NVIDIA CUDA', device 'Tesla K20m', version 'OpenCL 1.1 CUDA' and driver '346.46'.
2199064^8192+1 Time: 198 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 220 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 344 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 636 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.17 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.3 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 4.56 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 9.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 20.4 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 42.9 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 87.9 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 10.
Running benchmarks for transform implementation "OCL"
Running on platform 'NVIDIA CUDA', device 'Tesla K20m', version 'OpenCL 1.1 CUDA' and driver '346.46'.
2199064^8192+1 Time: 42 us/mul. Err: 0.2031 51956 digits
1798620^16384+1 Time: 53.6 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 51.1 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 67.1 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 115 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 276 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 525 us/mul. Err: 0.1875 3050541 digits
538452^1048576+1 Time: 1.02 ms/mul. Err: 0.1680 6009544 digits
440400^2097152+1 Time: 1.99 ms/mul. Err: 0.1855 11836006 digits
360204^4194304+1 Time: 4.12 ms/mul. Err: 0.1641 23305854 digits
294612^8388608+1 Time: 8.65 ms/mul. Err: 0.1641 45879398 digits
Genefer Mark = 100.
For this setup, OCL2 is about 10 times slower than OCL, but with much improved accuracy...
I will see what I can do to debug the mac application.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
Well, I guess what I'm gonna say was to be expected, but regardless, it seems to not work on a Radeon 4350 with a pentium E2180.
C:\Users\Paulo\Desktop\Arquivos\Setups>geneferocl2_windows -b
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: geneferocl2_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', version 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.
Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE.
Note: Using -l seems to hand out Error 2989
I guess I'll just stick to PPS sieve then. I'll run the tests on my Gtx 970, once I get to my other house tomorrow. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Well, I guess what I'm gonna say was to be expected, but regardless, it seems to not work on a Radeon 4350 with a pentium E2180.
Not expected, actually.
This is probably fixable; the requirements for OCL2 are less than for OCL. Double precision hardware isn't required for OCL2, so OCL2 can run on a wider variety of GPUs than can OCL or CUDA.
____________
My lucky number is 75898524288+1 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
Well, I guess what I'm gonna say was to be expected, but regardless, it seems to not work on a Radeon 4350 with a pentium E2180.
Not expected, actually.
This is probably fixable; the requirements for OCL2 are less than for OCL. Double precision hardware isn't required for OCL2, so OCL2 can run on a wider variety of GPUs than can OCL or CUDA.
The dream lives on, I suppose...
I'll be happy to give another go if the app changes, or tinker with any settings that might fix it... if anyone knows of it. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Resuming from a checkpoint is bugged... | |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 685 ID: 845 Credit: 2,886,414,412 RAC: 77,022
                              
|
AMD R9-280X, Windows 7 x64, Catalyst 15.7.1
Running benchmarks for transform implementation "OCL"
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'.
2199064^8192+1 Time: 44.3 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 47.7 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 53.5 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 77.4 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 122 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 282 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 465 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 897 us/mul. Err: 0.1758 6009544 digits
440400^2097152+1 Time: 1.73 ms/mul. Err: 0.1680 11836006 digits
360204^4194304+1 Time: 3.42 ms/mul. Err: 0.1563 23305854 digits
294612^8388608+1 Time: 7 ms/mul. Err: 0.1719 45879398 digits
Genefer Mark = 118.
Running benchmarks for transform implementation "OCL2"
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'.
2199064^8192+1 Time: 441 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 507 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 862 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 1.49 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 2.57 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 4.46 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 9.38 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 19.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 27 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 54.9 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 128 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 7.
geneferocl2_windows.exe -q "3149688^32768+1"
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:10:27) | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
A6 3500, 2x8GB DDR3, Gtx 970, Driver 355.60.
Just cuz "why not?", I included the benchmarks with the GPU at stock 1329/3005mhz core/mem. and OC to 1501/3505, if these are of any interest.
I couldn't test genefercuda, as I'm getting an error message when running genefercuda. "The program can't start because cudar32_55dll is missing from your computer". Any solution for that?
Also, my x87 failed when testing 3149688^32768+1 (it did fine on benchmarking, though).
EDIT: By coincidence, I found out that I have a 2nd computer, which also has a Radeon HD 4350. Just as with my previous message, it showed the same error and was unable to run the test.
OCL2 (Stock)
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 194 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 207 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 237 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 439 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.08 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.83 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.74 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 7.46 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 15.6 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 33.2 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 74 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 13.
OCL2 (OCed)
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 181 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 182 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 213 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 390 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 962 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.63 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.29 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.62 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 14.1 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 28.6 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 65.5 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 14.
OCL (Stock)
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 54.4 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 62.9 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 81.2 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 116 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 201 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 387 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 757 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.51 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.94 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 6.14 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 12.3 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 68.
OCL (OCed)
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 53.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 58.8 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 73.9 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 103 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 181 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 343 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 676 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.33 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.63 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.45 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 11 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 76.
CPU
Running benchmarks for transform implementation "Default"
6008024^256+1 Time: 4.61 us/mul. Err: 0.1562 1736 digits
4913974^512+1 Time: 9.65 us/mul. Err: 0.1406 3427 digits
4019150^1024+1 Time: 20.3 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 42.4 us/mul. Err: 0.1719 13347 digits
2688666^4096+1 Time: 91.6 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 194 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 464 us/mul. Err: 0.2188 102481 digits
1471094^32768+1 Time: 987 us/mul. Err: 0.2031 202102 digits
1203210^65536+1 Time: 2.28 ms/mul. Err: 0.2188 398482 digits
984108^131072+1 Time: 4.87 ms/mul. Err: 0.2031 785521 digits
804904^262144+1 Time: 11.2 ms/mul. Err: 0.2031 1548156 digits
658332^524288+1 Time: 24.4 ms/mul. Err: 0.2031 3050541 digits
538452^1048576+1 Time: 54 ms/mul. Err: 0.1914 6009544 digits
440400^2097152+1 Time: 114 ms/mul. Err: 0.1738 11836006 digits
360204^4194304+1 Time: 247 ms/mul. Err: 0.1875 23305854 digits
SSE2
Running benchmarks for transform implementation "SSE2"
6008024^256+1 Time: 2.67 us/mul. Err: 0.1250 1736 digits
4913974^512+1 Time: 5.66 us/mul. Err: 0.1406 3427 digits
4019150^1024+1 Time: 11.4 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 23.4 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 48.8 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 101 us/mul. Err: 0.1719 51956 digits
1798620^16384+1 Time: 216 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 447 us/mul. Err: 0.1719 202102 digits
1203210^65536+1 Time: 1.33 ms/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 2.79 ms/mul. Err: 0.1641 785521 digits
804904^262144+1 Time: 7.68 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 15.4 ms/mul. Err: 0.1484 3050541 digits
538452^1048576+1 Time: 40.5 ms/mul. Err: 0.1562 6009544 digits
440400^2097152+1 Time: 81.1 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 203 ms/mul. Err: 0.1328 23305854 digits
294612^8388608+1 Time: 403 ms/mul. Err: 0.1406 45879398 digits
Genefer Mark = 2.
x87
Running benchmarks for transform implementation "x87 (80-bit)"
6008024^256+1 Time: 12.8 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 28.4 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 61.7 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 132 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 288 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 737 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 1.57 ms/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 3.86 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 8.19 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 19.5 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 41.2 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 93.2 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 194 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 432 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 894 ms/mul. Err: 0.0001 23305854 digits
Testing 3149688^32768+1...
OCL / CPU / SSE2 / x87
maxErr exceeded for 3149688^32768+1, 0.5000 > 0.4500
OCL2 (Stock)
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:02:45) 01:52:36
OCL2 (OCed)
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:02:49) 01:40:51 | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 918 ID: 107307 Credit: 977,945,376 RAC: 0
                     
|
I couldn't test genefercuda, as I'm getting an error message when running genefercuda. "The program can't start because cudar32_55dll is missing from your computer". Any solution for that?
cudart32_55.dll
cufft32_55.dll
In answer to the question below, put them in the same directory as the genefercuda program. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
I couldn't test genefercuda, as I'm getting an error message when running genefercuda. "The program can't start because cudar32_55dll is missing from your computer". Any solution for that?
cudart32_55.dll
Forgot to mention, the cufft32_55.dll is missing as well.
Also, where am I supposed to put such files? | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
A new version of "geneferocl2" is available from Genefer repository.
It performs about 50% faster than the previous version on a GTX 680.
It was tested on a GTX 275 with NVIDIA driver 320.49.
Resuming from a checkpoint bug is fixed.
The b limits were extended: they are similar to genefer80 limits:
Generalized Fermat Number b limits for transform implementation "OCL2"
m = 8192, maxErr(b = 125.40M) = 0.2497, maxErr(b = 125.41M) = 0.2752, maxErr(b = 157.76M) = 0.3984, maxErr(b = 157.77M) = 0.4232
m = 16384, maxErr(b = 102.36M) = 0.2638, maxErr(b = 102.37M) = 0.2789, maxErr(b = 124.64M) = 0.3669, maxErr(b = 124.65M) = 0.4137
m = 32768, maxErr(b = 81.66M) = 0.2567, maxErr(b = 81.67M) = 0.2855, maxErr(b = 95.51M) = 0.3840, maxErr(b = 95.52M) = 0.4312
m = 65536, maxErr(b = 64.00M) = 0.2287, maxErr(b = 64.01M) = 0.2885, maxErr(b = 81.66M) = 0.3758, maxErr(b = 81.67M) = 0.4217
m = 131072, maxErr(b = 51.03M) = 0.2489, maxErr(b = 51.04M) = 0.2810, maxErr(b = 60.42M) = 0.3172, maxErr(b = 60.43M) = 0.4365
m = 262144, maxErr(b = 45.07M) = 0.2607, maxErr(b = 45.08M) = 0.2757, maxErr(b = 50.78M) = 0.3181, maxErr(b = 50.79M) = 0.4282
m = 524288, maxErr(b = 34.86M) = 0.2320, maxErr(b = 34.87M) = 0.2717, maxErr(b = 41.34M) = 0.3326, maxErr(b = 41.35M) = 0.4123
m = 1048576, maxErr(b = 28.61M) = 0.2427, maxErr(b = 28.62M) = 0.3263, maxErr(b = 35.01M) = 0.3796, maxErr(b = 35.02M) = 0.4345
m = 2097152, maxErr(b = 23.82M) = 0.2611, maxErr(b = 23.83M) = 0.2742, maxErr(b = 28.30M) = 0.3888, maxErr(b = 28.31M) = 0.4547
m = 4194304, maxErr(b = 19.38M) = 0.2482, maxErr(b = 19.39M) = 0.2926, maxErr(b = 22.46M) = 0.3668, maxErr(b = 22.47M) = 0.4274
m = 8388608, maxErr(b = 16.03M) = 0.2524, maxErr(b = 16.04M) = 0.3122, maxErr(b = 19.38M) = 0.3738, maxErr(b = 19.39M) = 0.4348
The limit check was rewritten: a pessimistic and an optimistic limits are printed. A quick test cannot be more accurate (I think).
Note that inline assembler is necessary to get a performance boost because OpenCL has no "add with carry" instruction. It was done on NVIDIA but not on ATI (I don't know if it is possible and have no ATI card).
Yves | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
A new version of "geneferocl2" is available from Genefer repository.
(...)
Yves
Results for my Gtx 970 will have to wait, as I'll only get home by Friday.
In the meantime, my Radeon HD 4350 still doesn't work. Though this time around, it gave off A BUNCH of info. Maybe this be of use to you.
All command lines give out the following results, except for -l, which also adds the line "An error (2989) occurred."
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', version 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.
Error: build program failed.
"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 687: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(8 / 4 * BLK8, 1, 1)))
^
"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 712: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(16 / 4 * BLK16, 1, 1)))
^
"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 737: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(32 / 4 * BLK32, 1, 1)))
^
"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 766: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1)))
^
"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 795: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(128 / 4 * BLK128, 1, 1)))
^
"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 828: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(256 / 4 * BLK256, 1, 1)))
^
Error: Requested compile size is bigger than the required workgroup size of 32 elements
Error: Creating kernel Forward64 failed!
Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Just so everyone knows, behind the scenes there's been a lot of discussion and work on what happens next with GFN. OCL2 will probably play a part in that. We're not ready to discuss anything yet, but I want you to know we're not ignoring the OCL2 discussion.
____________
My lucky number is 75898524288+1 | |
|
|
A quick benchmark comparison -- driver 352.78
GTX970 @ 1540MHz failed err rate (verison2 OCL2):
2199064^8192+1 Time: 157 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 163 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 176 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 331 us/mul. Err: 0.5000 398482 digits
984108^131072+1 Time: 721 us/mul. Err: 0.5000 785521 digits
804904^262144+1 Time: 1.48 ms/mul. Err: 0.5000 1548156 digits
658332^524288+1 Time: 3.04 ms/mul. Err: 0.5000 3050541 digits
538452^1048576+1 Time: 6.14 ms/mul. Err: 0.5000 6009544 digits
440400^2097152+1 Time: 13.1 ms/mul. Err: 0.5000 11836006 digits
360204^4194304+1 Time: 26.2 ms/mul. Err: 0.5000 23305854 digits
294612^8388608+1 Time: 55.3 ms/mul. Err: 0.5000 45879398 digits
Genefer Mark = 16.
(ver.2 OCL2) GTX970 @ 1501MHz
2199064^8192+1 Time: 161 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 164 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 179 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 339 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 735 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.5 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.09 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.23 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 13.2 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 26.4 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 55.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 16.
(ver.2 OCL2) GTX750 @ 1412MHz:
2199064^8192+1 Time: 139 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 240 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 372 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 696 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.54 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.99 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 5.96 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 12.2 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 25.4 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 53.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 111 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 8.
(ver.1 OCL2) GTX750 @ 1412MHz:
2199064^8192+1 Time: 163 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 281 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 511 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 856 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 2.3 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 3.75 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 8.65 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 15.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 37.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 65.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 260 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 5.
(ver.1 OCL2) GTX970 @ 1540MHz:
2199064^8192+1 Time: 174 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 181 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 288 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 385 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 943 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.62 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.52 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.58 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 14.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 28.4 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 64.1 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 14.
A new version of "geneferocl2" is available from Genefer repository.
It performs about 50% faster than the previous version on a GTX 680.
Nvidia's Maxwell C.C 5.0 and 5.2 also improved it's Genefer mark with the new OCL2 version.
I'm curious to know if AMD's 3 GCN revisions reveal any architectural differences for fixed-point arithmetic: Tahiti GCN1.0 vs. Hawaii GCN1.1 vs. Tonga (Fiji) GCN1.2. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
New version runs about 35% faster on GTX 660-OEM (347.52 driver) and GTX 645 (353.62 driver) cards.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
GTX 580:
Old version of OCL2:
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '355.60'.
2199064^8192+1 Time: 175 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 204 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 252 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 517 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.06 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.98 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 4.01 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.31 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 16.1 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 37.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 79.9 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
New OCL2:
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1
CUDA' and driver '355.60'.
2199064^8192+1 Time: 111 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 128 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 161 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 366 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 726 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.36 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.79 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 5.61 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 11.5 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 24.2 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 52.2 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 17.
Regular OCL:
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1
CUDA' and driver '355.60'.
2199064^8192+1 Time: 25.6 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 31.8 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 44.9 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 82.8 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 176 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 361 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 662 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.34 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.69 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.7 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 12.5 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 73.
It's now roughly one quarter of the speed of OCL, with a greatly expanded b range. This is very exciting!
One of the things that we've been talking about is restarting and extending the n=17 search with OCL2, or using a combined OCL/OCL2 that can switch when needed, the way the CPU app can switch transforms.
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
One of the things that we've been talking about is restarting and extending the n=17 search with OCL2,
... starting at 42,598,524.
geneferocl2_windows.exe -q "42598524^131072+1"
Using OCL2 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 680', version 'OpenCL 1.2 CUDA' and driver '353.30'.
42598524^131072+1 is a probable composite. (RES=3661fa7c70613e8c) (1000001 digits) (err = 0.2120) (time = 0:53:15) | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
One of the things that we've been talking about is restarting and extending the n=17 search with OCL2,
... starting at 42,598,524.
To be precise, 42,597,774^131072+1 is the smallest mega-number for n=17:
42597774^131072+1 is composite. (RES=3304d8cf4acbc370) (1000000 digits) (err = 0.2190) (time = 0:53:30)
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
I just checked the largest 'b' found with genefer80. geneferocl2 successfully passed the test.
Using OCL2 transform
140000374^2048+1 is a probable prime. (16684 digits) (err = 0.1968) (time = 0:00:05)
103109922^4096+1 is a probable prime. (32823 digits) (err = 0.1438) (time = 0:00:12)
100219912^8192+1 is a probable prime. (65544 digits) (err = 0.2099) (time = 0:00:25)
99941872^16384+1 is a probable prime. (131068 digits) (err = 0.3173) (time = 0:01:03)
15547296^32768+1 is a probable prime. (235657 digits) (err = 0.0121) (time = 0:03:31)
19502212^65536+1 is a probable prime. (477763 digits) (err = 0.0290) (time = 0:14:01)
Using x87 (80-bit) transform
140000374^2048+1 is a probable prime. (16684 digits) (err = 0.2188) (time = 0:00:05)
103109922^4096+1 is a probable prime. (32823 digits) (err = 0.1875) (time = 0:00:23)
100219912^8192+1 is a probable prime. (65544 digits) (err = 0.2500) (time = 0:01:32)
99941872^16384+1 is a probable prime. (131068 digits) (err = 0.3750) (time = 0:06:59)
15547296^32768+1 is a probable prime. (235657 digits) (err = 0.0137) (time = 0:25:01)
19502212^65536+1 is a probable prime. (477763 digits) (err = 0.0312) (time = 1:51:12)
| |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
From pschoefer's test on AMD GPU, the OCL2 runtime > 15x OCL !! (n > 14).
Needs optimisation work. Intriguing though. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
From pschoefer's test on AMD GPU, the OCL2 runtime > 15x OCL !! (n > 14).
Needs optimisation work. Intriguing though.
On GeForce GTX 680, the OCL2 runtime ~ 4.5 OCL.
But on GTX 680, FP64 = 1/24 FP32 and on Radeon HD 79x0, FP64 = 1/4 FP32.
And inline assembler is used on NVIDIA in the latest version.
Radeon HD 79x0 are as fast as Titan for GFN WR (FP64) but are as fast as GTX 960 for PPS Sieve (FP32).
Note that pschoefer's test was done with the previous version... OCL2 may be a bit faster on ATI now ?? | |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 685 ID: 845 Credit: 2,886,414,412 RAC: 77,022
                              
|
Note that pschoefer's test was done with the previous version... OCL2 may be a bit faster on ATI now ??
It is:
Running benchmarks for transform implementation "OCL2"
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'.
2199064^8192+1 Time: 388 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 442 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 749 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 1.09 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 2.14 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 4.12 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 8.93 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 18.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 24.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 37.8 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 111 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 8.
geneferocl2_windows.exe -q "3149688^32768+1"
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:08:58)
____________
| |
|
|
Update on the Mac front - it appears that the app is not completely broken, as it works just fine on my integrated Intel Iris Pro GPU:
Running benchmarks for transform implementation "OCL2"
Running on platform 'Apple', device 'Iris Pro', version 'OpenCL 1.2 ' and driver '1.2(Jul 29 2015 02:40:39)'.
2199064^8192+1 Time: 418 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 681 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.36 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.08 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.49 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 10.5 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 20.9 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 49.2 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 106 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 157 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 331 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.
However, on my AMD FirePro 700, the OpenCL compiler fails with an internal error, and I haven't pinpointed yet exactly what causes it. I also have a couple of older Nvidia GPUs that I will try later.
I added a binary to the SVN if anyone is interested in testing:https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin/mac/geneferocl2_macintel
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
I also have a couple of older Nvidia GPUs that I will try later.
Well the OpenCL code compiles and runs successfully but it's very slow (7ms/mul for 2199064^8192+1), and locked up my machine…
Does anyone have a more modern card that they want to try it on?
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
I also have a couple of older Nvidia GPUs that I will try later.
Well the OpenCL code compiles and runs successfully but it's very slow (7ms/mul for 2199064^8192+1), and locked up my machine…
Does anyone have a more modern card that they want to try it on?
- Iain
GT650m @ 835MHz (driver 353.51/branch r352/OpenCL 1.2)
2199064^8192+1 Time: 311 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 567 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.17 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.29 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.62 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 9.92 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 20.1 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 42.6 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 90.1 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 184 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 562 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.
C.C 3.0 (384 Kelper core) 4x slower than C.C 5.0 (512c GTX750) and 8x less the OCL2 performance of a C.C 5.2 (1664c GTX970). See posts below. | |
|
|
Does anyone have a more modern card that they want to try it on?
- Iain
GT650m @ 835MHz (driver 353.51/branch r352/OpenCL 1.2)
C.C 3.0 (384 Kelper core) 4x slower than C.C 5.0 (512c GTX750) and 8x less the OCL2 performance of a C.C 5.2 (1664c GTX970). See posts below.
I meant Nvidia on a Mac ;)
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
A new version of "geneferocl2" is available from Genefer repository.
It performs faster than the previous version on NVIDIA GPU.
Some benches:
GTX 680
2199064^8192+1 Time: 86.7 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 108 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 209 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 416 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 755 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.44 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.02 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.34 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 13.8 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 27.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 56 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 15.
GTX 780Ti
2199064^8192+1 Time: 120 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 117 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 163 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 247 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 462 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 805 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 1.54 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 3.08 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 6.81 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 13.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 27.4 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 31.
Note that for N=524288, GTX 780Ti is twice as fast as GTX 680.
But for N=65536 and 131072, the ratio is 1.6/1.7 and it's even worse for N < 65536.
2880 cores are "too much" if N < 524288.
Yves | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
New relase bench:
GTX 660 OEM
2199064^8192+1 Time: 110 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 148 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 318 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 551 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.03 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.04 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 4.22 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.87 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 19.3 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 38.6 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 77 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 11.
This is about 1/3 the speed of the regular OCL app on this card...Nice improvement!
EDIT:
Genefer Mark Comparison
OCL2 ver.1 = 6
OCL2 ver.2 = 9
OCL2 ver.3 = 11
OCL = 34
| |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
On a side note - sold GTX 580 couple days ago, now in process of selling HD 7950.
(there is still R280X in my home computer).
Thinking about Fury Nano that should be introduced later this week.
4096 shaders? Guess it would need Skylake to feed this GPU...and working on that as well :-)
Do we have any tests on AMD Fury series with HBM?
____________
My stats | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Maxwell has 2/3 of the performance of Kepler.
GTX 980
2199064^8192+1 Time: 150 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 158 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 168 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 211 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 621 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.23 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.53 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 5.3 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 10.5 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 22.9 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 47 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 19.
Genefer Mark was expected to be 4612/5046 * 31 = 28 (compared to GTX 780Ti).
Why? | |
|
|
GTX750 @ 1412MHz:
2199064^8192+1 Time: 114 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 177 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 263 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 451 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.99 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.99 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.06 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 16.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 34.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 72.2 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
GTX750 (ver.2) OCL2 mark = 8.
GTX970 @ 1501MHz:
2199064^8192+1 Time: 127 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 129 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 147 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 228 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 516 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 981 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.14 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 4.27 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 8.89 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 17.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 38.5 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 23.
(OCL2 ver.3) improvement for a GTX970: 23 vs. 16 mark. Excellent work Yves!
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
Radeon HD 4350 still fails, same message as my last post.
Once agian, I'll do Gtx 970 tests at Friday... if I remember to do it, as I forgot about it last week. If there's any particular test you guys want, just hit me up and I'll try doing it. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Latest OCL2 on GTX 580:
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1
CUDA' and driver '355.60'.
2199064^8192+1 Time: 86.4 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 103 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 134 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 313 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 622 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.18 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.36 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 4.82 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 9.99 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 21.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 46.6 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 20.
Previous OCL2 had Genefer Mark of 17. OCL has Genefer Mark of 73.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Yves, would a fixed-point CPU program (a CPU equivalent of OCL2) be able to use the integer versions of AVX instructions? If so, would such a program be likely to run a lot faster than the x87 transform on CPUs?
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
FYI...the ver. 3 of OCL2 seems to have reintroduced the failing on older drivers problem. Just tried it on GT 540M with 344.75 drivers and failed with CL_INVALID_COMMAND_QUEUE error. Updated to latest drivers and running bench test with no problems.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Yves, would a fixed-point CPU program (a CPU equivalent of OCL2) be able to use the integer versions of AVX instructions?
Yes with AVX2 (AVX has no integer instructions, just FP32 and FP64).
If so, would such a program be likely to run a lot faster than the x87 transform on CPUs?
A lot faster no, I think that it would be slower.
AVX2 is height 32-bit integer units: int32 add or mul takes 1 cycle and clock ~ 4 GHz.
NVIDIA GPU is n cores: int32 add takes 1 cycle, mul takes 4. Clock is about 1 GHz.
Then one Intel CPU core is about as fast as 50 NVIDIA GPU cores (counting some mul 4/1).
GeForce 740M (384 cores @ 800 MHz): OCL2 Genefer Mark = 3.
i7-3820 @ 3.6 GHz: genefer80 Genefer Mark = 2 (one core).
On one core of this processor, we can expect that a fixed-point CPU genefer would have Genefer Mark = 0.5. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
Do we have any tests on AMD Fury series with HBM?
I have a Fury X due to arrive tomorrow so if there are no surprises I'll try to test it then. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
Ok, joining in with whatever I have that can run it...
GTX 960
2199064^8192+1 Time: 131 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 138 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 222 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 455 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 897 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.8 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.9 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.13 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 17.2 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 33.1 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 72.3 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
GTX 560 Ti
2199064^8192+1 Time: 97.6 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 121 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 243 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 545 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.03 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.08 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 4.27 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.92 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 18 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 39.3 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 85.3 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 11.
GeForce 9500 GT
Wont run - card too old? e.g.
Error: build program failed.
ptxas application ptx input, line 115; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 116; error : Instruction 'madc' requires .target sm_20 or higher
R9 280X
2199064^8192+1 Time: 432 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 401 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 820 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 1.1 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.78 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 3.61 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 7.53 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 15.5 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 21 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 33.2 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 95.3 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 9.
HD6850
2199064^8192+1 Time: 310 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 775 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.29 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 726 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.56 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 4.71 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 13 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 36 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 58.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 56.6 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 114 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 5.
Just to check, I downloaded tonight. It reports itself as geneferocl2 3.2.9-dev. Is this the v.3 talked about above? | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
GeForce 9500 GT
Wont run - card too old? e.g.
Error: build program failed.
ptxas application ptx input, line 115; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 116; error : Instruction 'madc' requires .target sm_20 or higher
Yes GeForce 400 series (Fermi) or its successors is required.
I don't know how to detect the microarchitecture with OpenGL.
It would be possible to replace madc with the pair of instructions mul + adc (slower on sm_20 or higher). | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
Yes GeForce 400 series (Fermi) or its successors is required.
Hmmm...I was able to run OCL2 ver.2 on an 8400M GS with driver 285.xx (which is OpenCL 1.0). Is there a specific change in ver.3 that now prevents the older cards from running?
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Yes GeForce 400 series (Fermi) or its successors is required.
Hmmm...I was able to run OCL2 ver.2 on an 8400M GS with driver 285.xx (which is OpenCL 1.0). Is there a specific change in ver.3 that now prevents the older cards from running?
Yes, "multiply and add with carry-in" is a ver.3 optimisation (was not used before).
I don't know how to detect NVIDIA microarchitecture: I would be able to use madc on GPU >= Fermi and mul + adc on GPU < Fermi.
The code is faster with madc then this is the default.
If someone know how to detect NVIDIA microarchitecture...
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
FYI...the ver. 3 of OCL2 seems to have reintroduced the failing on older drivers problem. Just tried it on GT 540M with 344.75 drivers and failed with CL_INVALID_COMMAND_QUEUE error. Updated to latest drivers and running bench test with no problems.
I'm trying to understand why... | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
Fury X
2199064^8192+1 Time: 441 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 466 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 512 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 629 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.24 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.25 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 4.54 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.98 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 13 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 16.1 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 55 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 17. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
FYI...the ver. 3 of OCL2 seems to have reintroduced the failing on older drivers problem. Just tried it on GT 540M with 344.75 drivers and failed with CL_INVALID_COMMAND_QUEUE error. Updated to latest drivers and running bench test with no problems.
I found it, it's a little bit tricky to fixe it.
It is a bug in NVIDIA ptxas (ptx optimizing assembler).
It depends on driver version because it was fixed in CUDA 7.0.
Driver 344.75 (or 345.20) = CUDA 6.5 => bug and driver 347.09 = CUDA 7.0 => OK.
The problem occurs when a large number of ptx registers is used... I don't know how to reduce it without slowing the program.
It can be fixed by setting optimizer level to 2 (default is 3).
-cl-nv-opt-level=2 => OK, -cl-nv-opt-level=3 => bug.
To resolve the issue, I have to:
- check that 'platform' is "NVIDIA"
- check that 'driver' < 347
then set optimizer level to 2.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
A "release candidate" of "geneferocl2" is available from Genefer repository.
On NVIDIA GPU, the bug "failed on driver < 347" is fixed.
On ATI, it may perform a bit faster (the 'generic' code is faster on NVIDIA, it was not tested on ATI).
geneferocl2 is achieved: no new feature/improvement is scheduled (bug fixes only if necessary).
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Yves,
Iain is on the road this week, so if there doesn't seem to be much immediate action on OCL2, that's why.
Having a release candidate is very exciting news, and it comes at a good time. Thanks!
____________
My lucky number is 75898524288+1 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
A "release candidate" of "geneferocl2" is available from Genefer repository.
On NVIDIA GPU, the bug "failed on driver < 347" is fixed.
On ATI, it may perform a bit faster (the 'generic' code is faster on NVIDIA, it was not tested on ATI).
geneferocl2 is achieved: no new feature/improvement is scheduled (bug fixes only if necessary).
Still failing on the 4350 (dat OpenCL beta support.....). I'll have to set up a reminder to do the tests on my gtx 970, for I just forgot to do it again. You guys could post updates on friday's, would be very helpful for me :) | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
A "release candidate" of "geneferocl2" is available from Genefer repository.
On ATI, it may perform a bit faster (the 'generic' code is faster on NVIDIA, it was not tested on ATI).
280X, used to get mark=8 using earlier version.
2199064^8192+1 Time: 370 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 405 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 708 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 851 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.56 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 3.05 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 6.46 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 13.3 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 17.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 20.3 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 73.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 13.
____________
My stats | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Latest OCL2 on GTX 580:
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1
CUDA' and driver '355.60'.
2199064^8192+1 Time: 86.7 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 103 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 134 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 313 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 621 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.18 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.36 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 4.81 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 9.94 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 21.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 46.5 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 20.
Previous version of OCL2:
2199064^8192+1 Time: 86.4 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 103 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 134 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 313 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 622 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.18 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.36 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 4.82 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 9.99 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 21.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 46.6 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 20.
Essentially identical speed.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
R9 280X
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', vers
ion 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'.
2199064^8192+1 Time: 366 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 399 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 684 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 834 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.51 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.97 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 6.31 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 13 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 17.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 19.9 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 75 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 13.
Up from 9 previously.
GTX960
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 960', version 'OpenCL 1.2 CUDA' and driver '355.60'.
2199064^8192+1 Time: 125 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 134 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 215 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 447 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 880 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.76 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.82 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 7.94 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 17.2 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 33.1 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 72.6 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
Unchanged score from previous. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
GTX980Ti
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980 Ti', version 'OpenCL 1.2 CUDA' and driver '355.82'.
2199064^8192+1 Time: 151 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 158 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 172 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 208 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 429 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 786 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 1.56 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 3.11 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 6.25 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 12.8 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 28.3 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 32.
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
Gtx 970 (1499mhz):
Command line: geneferocl2_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2
CUDA' and driver '355.60'.
2199064^8192+1 Time: 121 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 128 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 146 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 243 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 514 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 969 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.12 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 4.24 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 8.44 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 17.6 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 38.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 23.
Priority change succeeded. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Another new Genefer application "geneferocl3" is available from Genefer repository (http://www.primegrid.com/forum_thread.php?id=6359).
A 32-bit binary for windows is available (compiled with Visual Studio 2013).
geneferocl3 uses a Number Theoretic Transform. The finite a field is Z/p with p = 2^64 - 2^32 + 1.
Then maxErr = 0 and the limits can be computed using the relation p / 2 > n . b^2:
m = 1024, bMax = 94906266.
m = 2048, bMax = 67108864.
m = 4096, bMax = 47453133.
m = 8192, bMax = 33554432.
m = 16384, bMax = 23726566.
m = 32768, bMax = 16777216.
m = 65536, bMax = 11863283.
m = 131072, bMax = 8388608.
m = 262144, bMax = 5931642.
m = 524288, bMax = 4194304.
m = 1048576, bMax = 2965821.
m = 2097152, bMax = 2097152.
m = 4194304, bMax = 1482910.
m = 8388608, bMax = 1048576.
They are larger than geneferocl bounds but smaller than geneferocl2 limits.
geneferocl2 can run on any GPU supporting 'OpenCL 1.0'.
On a GTX 680, we have:
OCL: Genefer Mark = 66.
OCL2: Genefer Mark = 15.
OCL3: Genefer Mark = 45.
This is a beta version: just being removed from the oven.
| |
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1078 ID: 183129 Credit: 1,376,122,338 RAC: 4,719
                         
|
Stock GTX 760:
OCL:
Command line: geneferocl_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'.
2199064^8192+1 Time: 41 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 47.2 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 71.4 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 125 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 243 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 445 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 887 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.79 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 3.67 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 7.71 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 16.7 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 54.
Priority change succeeded.
OCL2:
Command line: geneferocl2_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'.
2199064^8192+1 Time: 94.5 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 135 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 279 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 500 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 925 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.83 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.79 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 7.97 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 17.3 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 34.9 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 70.6 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
Priority change succeeded.
OCL3:
Command line: geneferocl3_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'.
2199064^8192+1 Time: 55.4 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 61.7 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 97 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 181 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 354 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 685 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.38 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 2.78 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 5.77 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 11.7 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 24.7 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 35.
Priority change succeeded.
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
So, tryed it on my gtx 970 (Stock 1329 / OC 1499 mhz). The application seems to be COMPLETELY CPU bound. Why do I say that? Because the GPU seemed to do little work during the entire test. One of the CPU cores was always at 100% - but, looking into the GPU, usage was about 1~3% for the first few tests, and then it would just stay at 0% for long periods.
In fact, it would spend so much time without working that the power saving measures (downclock and undervolt) would kick in, putting the card at idle speeds. Then, it would spike to, say, 85%, post the result, then begin the next test, which repeated the pattern.
I don't know if that's my CPU that's too weak (A6-3500, Oced to 3.2ghz), but that's what happened. Considering that the score went up from 77 to 85, I suppose OCL3 is already a very good improvement, but I guess there's room for a lot more...
EDIT: Tryed it on my trusty Radeon HD 4350. And.... nope, ain't working. See last quote for the error codes.
OCL (1499 mhz)
Command line: geneferocl_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2
CUDA' and driver '355.82'.
2199064^8192+1 Time: 52.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 55.4 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 73.3 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 99.8 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 178 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 343 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 673 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.33 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.59 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.41 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 10.9 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 77.
OCL3 (1499mhz)
Command line: geneferocl3_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2
CUDA' and driver '355.82'.
2199064^8192+1 Time: 63.6 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 65.3 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 75.1 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 94 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 181 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 322 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 603 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.21 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.39 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.77 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.89 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 85.
Priority change succeeded.
Radeon HD 4350 (failed to run).
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', v
ersion 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Error: build program failed.
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 216: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 240: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 264: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 290: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 318: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(32 / 4 * BLK32, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 318: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(32 / 4 * BLK32, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 370: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(128 / 4 * BLK128, 1, 1))) __attri
bute__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 370: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(128 / 4 * BLK128, 1, 1))) __attri
bute__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 399: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(256 / 4 * BLK256, 1, 1))) __attri
bute__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 399: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(256 / 4 * BLK256, 1, 1))) __attri
bute__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 428: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(512 / 4, 1, 1))) __attribute__((v
ec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 458: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(1024 / 4, 1, 1))) __attribute__((
vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 490: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((vec_type_hint(Zp)))
^
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 518: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((vec_type_hint(Zp)))
^
Error: Requested compile size is bigger than the required workgroup size of 32☺
elements
Error: Creating kernel Forward64 failed!
Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
A 32-bit binary for windows is available (compiled with Visual Studio 2013).
c:\temp>geneferocl3_windows.exe
This version of c:\temp\geneferocl3_windows.exe is not compatible with the version of Windows you're running. Check you computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher.
What's the trick to make it running on Win7 x64?
I've got all MS Visual C++ Redistributable installed from version 2005 up to 2013, both x86 and x64
____________
My stats | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
GTX 580 Genefer Mark scores:
OCL: 73
OCL2: 20
OCL3: 39
For kicks, I'm running 42598524^131072+1 through OCL 3. The B limit for n=17 is about 8 million and this is 42 million, so it shouldn't work. It's been running several minutes and there's no error message. Should it be detecting a problem with the limit, or does it need to simply reject running any tests above the limit? I'll edit this when it completes.
EDIT: It finished without any error message, but produced the incorrect residue.
EDIT2: Corrected typo -- I meant I was going to test OCL3, not OCL2.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
What's the trick to make it running on Win7 x64?
I've got all MS Visual C++ Redistributable installed from version 2005 up to 2013, both x86 and x64
Mine just ran without doing anything special on Win7 x64.
I do have MS VS installed, but normally that shouldn't make a difference.
____________
My lucky number is 75898524288+1 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
Okay, so looking at the scores, it seems that OCL 3 has lower performance than OCL with the Gtx 580 / 680 / 760, BUT, has increased performance on a Gtx 970. Did I just win the lottery there?
Waiting for eXaPower's results on his 970, for the sake of comparison... | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
For kicks, I'm running 42598524^131072+1 through OCL 2. The B limit for n=17 is about 8 million and this is 42 million, so it shouldn't work.
The bound is > 42 million:
m = 131072, maxErr(b = 51.03M) = 0.2489, maxErr(b = 51.04M) = 0.2810, maxErr(b = 60.42M) = 0.3172, maxErr(b = 60.43M) = 0.4365
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
For kicks, I'm running 42598524^131072+1 through OCL 2. The B limit for n=17 is about 8 million and this is 42 million, so it shouldn't work.
The bound is > 42 million:
m = 131072, maxErr(b = 51.03M) = 0.2489, maxErr(b = 51.04M) = 0.2810, maxErr(b = 60.42M) = 0.3172, maxErr(b = 60.43M) = 0.4365
That was a typo on my part... I meant OCL 3. Sorry!
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Okay, so looking at the scores, it seems that OCL 3 has lower performance than OCL with the Gtx 580 / 680 / 760, BUT, has increased performance on a Gtx 970. Did I just win the lottery there?
580: Fermi, FP64 = 1/8 FP32.
680, 760: Kepler, FP64 = 1/24 FP32.
970: Maxwell, FP64 = 1/32 FP32.
OCL 3 runs on FP32 units and OCL on FP64.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
That was a typo on my part... I meant OCL 3. Sorry!
There is no error, then maxError=0 and it works. But the result is not correct.
In the final version, I'm going to check if p / 2 > n . b^2 during initialisation and then set maxError=1.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
c:\temp>geneferocl3_windows.exe
This version of c:\temp\geneferocl3_windows.exe is not compatible with the version of Windows you're running. Check you computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher.
What's the trick to make it running on Win7 x64?
This message indicates that the binary has been corrupted when downloaded.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
FYI, our current plans are to produce a combined OCL/OCL2/OCL3 genefer app that will automatically switch to the appropriate transform. This will open up all the n ranges to continued GPU crunching.
____________
My lucky number is 75898524288+1 | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
280X
OCL: 115
OCL2: 13
OCL3: 49
2199064^8192+1 Time: 144 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 153 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 200 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 301 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 407 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 709 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.23 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 2.23 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 4.12 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 8.12 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 16.3 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 49.
____________
My stats | |
|
|
Running on platform 'Apple', device 'ATI Radeon HD - FirePro D700 Compute Engine', version 'OpenCL 1.2 ' and driver '1.2 (Jun 10 2015 16:27:05)'
OCL: Genefer Mark = 36
OCL2: Still broken on Apple/AMD (Apple/Intel works though)
OCL3: Genefer Mark = 26
2199064^8192+1 Time: 269 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 356 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 372 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 403 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 473 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 866 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.91 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 4.43 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 7.92 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 16.2 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 30.1 ms/mul. Err: 0.0000 45879398 digits
Thanks Yves, this looks very interesting indeed!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
OCL3 benchmark:
GTX 970 @ 1.6GHz:
2199064^8192+1 Time: 71.8 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 66.3 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 70.4 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 91.8 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 179 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 316 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 666 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.19 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.28 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.57 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.58 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 88.
1540MHz:
2199064^8192+1 Time: 73.4 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 69.4 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 71.5 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 95.3 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 184 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 325 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 669 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.21 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.33 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.68 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.73 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 86.
GTX 750 @ 1451MHz:
2199064^8192+1 Time: 61.8 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 73.8 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 122 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 206 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 420 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 863 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.83 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 3.58 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 7.28 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 14.8 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 31.1 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 28.
OCL3 benchmark peaks:
970: (231W/85%MCU/60%BUS)
750: (75W/65%MCU/54%BUS)
OCL benchmark wattage peak: 145W (970) and 45W (750).
Ten OCed benchmark passes on each GPU didn't trigger a max error.
My 970 with Ver.1 of OCL2 would max error >1540MHz while ver.2/3 tripped max error at >1501MHz.
Regular OCL version benchmark max error >1565MHz.
*Edit*: I just read Yves posts stating there no error check on OCL3.
There is no error, then maxError=0 and it works. But the result is not correct.
In the final version, I'm going to check if p / 2 > n . b^2 during initialisation and then set maxError=1.
970's marks:
88 = OCL3
78 = OCL
23 = OCL2 ver.3
750 marks:
28 = OCL3
25 = OCL
12 = OCL2 ver.3
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
So, I just remembered that I had a notebook with a G105m on it, so I decided to give it a go:
OCL 1: No Opencl 1.2 support, so the application just says that there's no capable device.
Command line: geneferocl_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
No compatible OpenCL device found.
Device List:
0: GPU device 'GeForce G 105M' on 'NVIDIA CUDA'.
OCL 2: Another fail. Really, wasn't Ver. 2 supposed to work with older hardware, due to lower requirements? At any rate, here are the codes
Command line: geneferocl2_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.
Error: build program failed.
ptxas application ptx input, line 115; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 116; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 117; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 118; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 119; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 120; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 122; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 123; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 124; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 125; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 160; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 161; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 162; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 163; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 164; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 165; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 167; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 168; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 169; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 170; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 219; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 220; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 221; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 222; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 223; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 224; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 226; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 227; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 228; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 229; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 261; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 262; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 263; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 264; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 265; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 266; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 268; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 269; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 270; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 271; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, li
Error: OpenCL error detected: CL_INVALID_BINARY.
OCL 3: Kinda fail...? The progam does start, and it actually completes a few number. BUT, whenever I get to the 8th test, it ALWAYS fails, the driver just crashes, period. I tried with 341.74 and 341.81 (latest, August 24th release), but nope. Stock speeds, insanely low underclocks, the thing just fails everytime at this particular test, whatever the reason may be!
This was done in stock 640/500/1600mhz on core/mem/shadders
Command line: geneferocl3_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.
2199064^8192+1 Time: 1.54 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 3.46 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 7.64 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 16 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 29.5 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 60.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 177 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
And this was done on 500/400/1000:
Command line: geneferocl3_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.
2199064^8192+1 Time: 2.35 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 5.18 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 11.4 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 23.8 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 44.7 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 91.3 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 250 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
I get the same fail pattern with OCL3 on an 8400M GS with 285.xx driver. The tests compute fine up to n=19, but fail at n=20.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
So, I just remembered that I had a notebook with a G105m on it, so I decided to give it a go:
Running benchmarks for transform implementation "OCL3"
Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.
2199064^8192+1 Time: 1.54 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 3.46 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 7.64 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 16 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 29.5 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 60.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 177 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
A video Timeout Detection and Recovery occurred.
It's really slow, my integrated HD 4000 (i3-3217U CPU @ 1.80GHz) is 8 times faster!
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.2 ' and driver '10.18.10.4252'.
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 451 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 622 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 1.08 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 2.32 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 4.47 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 9.24 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 20 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 41.7 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 85.5 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 183 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 373 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 2.
GeForce G 105M: Genefer Mark = 0.3 ...
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
So, I just remembered that I had a notebook with a G105m on it, so I decided to give it a go:
Running benchmarks for transform implementation "OCL3"
Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.
2199064^8192+1 Time: 1.54 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 3.46 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 7.64 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 16 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 29.5 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 60.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 177 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
A video Timeout Detection and Recovery occurred.
It's really slow, my integrated HD 4000 (i3-3217U CPU @ 1.80GHz) is 8 times faster!
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.2 ' and driver '10.18.10.4252'.
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 451 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 622 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 1.08 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 2.32 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 4.47 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 9.24 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 20 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 41.7 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 85.5 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 183 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 373 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 2.
GeForce G 105M: Genefer Mark = 0.3 ...
Hey, at least the GPU was stuck at 99% usage all times, and the CPU core wasn't fully sucked, so I guess the hardware was being better utilized. Not that I'd crunch with it anyway, PPS Sieve just causes imense lag AND overheat.
But since this is supposed to be Beta testing, I figured I might as well try it out......
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Tryed it on my trusty Radeon HD 4350. And.... nope, ain't working. See last quote for the error codes.
Radeon HD 4350 (failed to run).
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', v
ersion 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Error: build program failed.
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 216: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
That is why I'm writing "geneferocl3 can run on any GPU supporting 'OpenCL 1.0'."
vec_type_hint and work_group_size_hint are some OpenCL 1.0 function qualifiers.
Then this is not an OpenCL-capable driver/GPU.
| |
|
|
geneferocl_windows benchmark runs to successful completion on GTX TITAN Black from command line.
OCL2 and OCL3 benchmarks default to CPU graphics - will not run on GTX TITAN Black from command line. Am I using the correct binaries?
EDIT: I am running Windows 10 x64 GeForce driver 355.82.
Command line: geneferocl2_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600',
version 'OpenCL 1.2 ' and driver '10.18.15.4256'.
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 336 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 649 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.21 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.18 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.7 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 9.68 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 20.4 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 43 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 93 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 177 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 380 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.
Priority change succeeded.
-------------------------------------------
Command line: geneferocl3_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600',
version 'OpenCL 1.2 ' and driver '10.18.15.4256'.
geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 176 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 337 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 685 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 1.47 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 2.91 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 6.05 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 11.9 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 24.9 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 53.5 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 116 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 225 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 4.
Priority change succeeded. | |
|
|
OCL3:
GT 650m @ 790MHz (355.58):
2199064^8192+1 Time: 107 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 185 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 340 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 635 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 1.33 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 2.72 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 5.74 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 11.6 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 23.4 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 49.1 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 104 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 8.
I-5 4440S @ 2.9GHz OpenCL 1.2 (Build148) Driver 4.2.0.148:
2199064^8192+1 Time: 438 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 876 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 1.78 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 3.8 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 7.85 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 16.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 34.5 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 73.1 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 153 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 316 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 664 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 1.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
970's marks:88 = OCL3
78 = OCL
23 = OCL2 ver.3 750 marks: 28 = OCL3
25 = OCL
12 = OCL2 ver.3
The first version of OCL3 was not expected to run faster than OCL, but it does on Maxwell.
This is great news! | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
geneferocl_windows benchmark runs to successful completion on GTX TITAN Black from command line.
OCL2 and OCL3 benchmarks default to CPU graphics - will not run on GTX TITAN Black from command line. Am I using the correct binaries?
Yes, Intel's GPU device is the default one.
You can try the command line:
geneferocl3_windows.exe -nvidia -b
or
geneferocl3_windows.exe -d 1 -b
| |
|
|
Yes, Intel's GPU device is the default one.
You can try the command line:
geneferocl3_windows.exe -nvidia -b
Thank you, Yves. I tried the first command line and it works:
Command line: geneferocl_windows -nvidia -b
Running on platform 'NVIDIA CUDA', device GeForce GTX TITAN Black', version
'OpenCl 1.2 CUDA' and driver '355.82'.
geneferocl 3.2.8 (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 50.8 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 53.7 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 60.1 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 78.5 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 102 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 209 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 359 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 673 us/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 1.26 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 2.56 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 5.23 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 159.
Priority change succeeded.
----------------------------------
geneferocl2_windows.exe -nvidia -b
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 116 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 122 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 164 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 253 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 477 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 842 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 1.58 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 3.15 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 6.92 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 13.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 27.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 30.
Priority change succeeded.
----------------------------------
geneferocl3_windows.exe -nvidia -b
geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 81.5 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 86.4 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 99.7 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 135 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 239 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 395 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 719 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.39 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.82 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 5.57 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 12.3 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 72.
Priority change succeeded.
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
GTX 980 Ti
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980 Ti', version 'OpenCL 1.2 CUDA' and driver '355.82'.
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 57.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 53.3 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 65.4 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 86.1 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 136 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 277 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 504 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.02 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 1.97 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 4.09 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 8.18 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 101.
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 150 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 158 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 171 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 207 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 432 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 776 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 1.55 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 3.12 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 6.11 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 12.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 28.3 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 32.
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 75.6 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 75.3 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 78.1 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 88.1 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 143 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 257 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 527 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 924 us/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 1.75 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 3.62 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 7.48 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 113.
GTS 450
Running on platform 'NVIDIA CUDA', device 'GeForce GTS 450', version 'OpenCL 1.1 CUDA' and driver '355.82'.
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 59.9 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 64.9 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 122 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 256 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 530 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 1.1 ms/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 2.16 ms/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 4.63 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 9.46 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 20.3 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 45 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 21.
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 114 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 231 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 493 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 1.01 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 2 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 4.15 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 8.71 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 18.2 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 36.8 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 80.3 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 236 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 5.
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 65.3 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 116 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 199 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 419 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 844 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 1.74 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 3.58 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 7.58 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 14.3 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 32 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 67.6 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 13.
GTX 560 Ti
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 560 Ti', version 'OpenCL 1.1 CUDA' and driver '355.82'.
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 63.2 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 62.6 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 67 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 133 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 272 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 561 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 1.1 ms/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 2.33 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 4.73 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 10.1 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 22.2 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 41.
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 98.2 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 121 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 244 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 549 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.03 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.08 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 4.28 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.94 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 18.1 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 39.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 85.6 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 11.
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 64.8 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 65.5 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 113 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 221 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 441 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 889 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.8 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 3.79 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 7.11 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 15.8 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 33.5 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 27.
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
Tryed it on my trusty Radeon HD 4350. And.... nope, ain't working. See last quote for the error codes.
Radeon HD 4350 (failed to run).
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', v
ersion 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Error: build program failed.
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 216: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
That is why I'm writing "geneferocl3 can run on any GPU supporting 'OpenCL 1.0'."
vec_type_hint and work_group_size_hint are some OpenCL 1.0 function qualifiers.
Then this is not an OpenCL-capable driver/GPU.
That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work... | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...
You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Yves,
I was experimenting with the behavior of OCL3 near the b limit.
In particular, I was testing at the n=2048 limit of 67108864. I wanted to see what OCL3 does when the limit is exceeded. More specifically, I wanted to see if 67108864 was the last good number, or the first bad number.
OCL3 produces a residue of FFFFFFFFFFFFFFFF for that number. Suspiciously looks like an bad result!. However, the control test, running OCL2, also produces FFFFFFFFFFFFFFFF. So does the x87 genefer64 test. So that's the "correct" result. Is that one of the numbers genefer can't test?
What surprised me, however, is that OCL3 produced the same residue as OCL2 and x87 on 67108866, which is 2 beyond the limit.
Any thoughts on why it appears to work correctly beyond the limit?
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
In particular, I was testing at the n=2048 limit of 67108864.
Is that one of the numbers genefer can't test?
Yes, 67108864 = 2^26. It's a Fermat number.
The residue is a root of 1: it can be 1, -1, "7f7f7f7f7f7f7f7f", etc. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...
You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.
That isn't the issue with the 8400M GS error. The work group size is reported by GPU Caps Viewer as 512.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
That isn't the issue with the 8400M GS error. The work group size is reported by GPU Caps Viewer as 512.
I think that 8400M GS error is a video Timeout Detection and Recovery.
Time > 500 ms/mul.
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...
You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.
Oh... the Radeon 4350 is 128. I guess that explains it. Does that mean A: OpenCL IS supported, but the card lacks a few features; or B: OpenCL is not even supported on the card! Sorry, I don't understand much about specs.
If it's the 1st, there should be indication on the application page, readme or somewhere, saying that this card (and maybe it's family?) aren't supported, even though it would seem so. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...
You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.
Oh... the Radeon 4350 is 128. I guess that explains it. Does that mean A: OpenCL IS supported, but the card lacks a few features; or B: OpenCL is not even supported on the card! Sorry, I don't understand much about specs.
If it's the 1st, there should be indication on the application page, readme or somewhere, saying that this card (and maybe it's family?) aren't supported, even though it would seem so.
All ATI HD 4xxx series cards supported OCL (at least in Beta). However, OCL support for these cards may or may not be included in the latest drivers from AMD. AMD's drivers are very quirky at times, so you would need to play around some to make sure it is working.
That said, your 4350 appears to be below the minimum specs for the genefer app being tested.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Oh... the Radeon 4350 is 128. I guess that explains it. Does that mean A: OpenCL IS supported, but the card lacks a few features; or B: OpenCL is not even supported on the card! Sorry, I don't understand much about specs.
Both.
A: WORK_GROUP_SIZE: the minimum value is 1 then the driver is compliant on that point (but that is not sufficient for genefer app).
B:
Error: Requested compile size is bigger than the required workgroup size of 32☺
elements Error: Creating kernel Forward64 failed!
The required workgroup size for Forward64 is 128. Then the driver/compiler seems to not be compliant with OpenCL 1.0 specification.
Note that the Radeon 4350 is not in the list https://www.khronos.org/conformance/adopters/conformant-products#opencl. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
User Device OCL OCL2 OCL3
eXaPower GT 650m 8
mackerel GTS 450 21 5 13
mackerel GTX 560 Ti 41 11 27
Michael Goetz GTX 580 73 20 39
Yves Gallot GTX 680 66 15 45
eXaPower GTX 750 25 12 28
1998golfer GTX 760 54 12 35
mackerel GTX 960 44 12 49
eXaPower GTX 970 78 23 88
Rafael GTX 970 77 85
mackerel GTX 980 Ti 101 32 113
Wingless Wonder GTX Titan Black 159 30 72
Yves Gallot HD Graphics 4000 2
Wingless Wonder HD Graphics 4600 2 4
Iain Bethune FirePro D700 36 26
Honza R9 280X 115 13 49
mackerel R9 280X 113 13 52
Since there is a lot of data, I thought I'd put the scores into a little table. In short, it looks like Maxwell does best with OCL3, and other nvidia and AMD do better on OCL.
Wingless Wonder - I saw the following comment while looking up the Titan Black:
Double precision performance of the GTX Titan & GTX Titan Black is either 1/3 or 1/24 of single-precision performance depending on a user-selected configuration option in the driver that boosts single-precision performance if double-precision is set to 1/24 of single-precision performance
Assuming it still exists, it would be interesting to see what impact this option has on performance. And which setting was used for the earlier results? | |
|
|
Wingless Wonder - I saw the following comment while looking up the Titan Black:
Double precision performance of the GTX Titan & GTX Titan Black is either 1/3 or 1/24 of single-precision performance depending on a user-selected configuration option in the driver that boosts single-precision performance if double-precision is set to 1/24 of single-precision performance
Assuming it still exists, it would be interesting to see what impact this option has on performance. And which setting was used for the earlier results?
Default is double-precision mode is off. Double-precision mode is switched on by user through NVIDIA Control Panel and was set on for these OCL benchmarks. I leave the card in double-precision mode in normal use since it doesn't impact everyday tasks like web browsing and only switch it off when gaming or benchmarking because it greatly impacts frame rate in those applications. For PrimeGrid PPS (Sieve), it is faster when double-precision mode is switched off.
When double-precision mode is selected through NVIDIA Control Panel, it automatically reduces core and memory clock speeds for stability. For OCL, OCL2, and OCL3 benchmarks the card was run at default double-precision core and memory clock speeds, although the benchmark test is so short that the card never came up to normal operating temperature, unlike when crunching regular Genefer short or World Record tasks. Clock speeds are reduced somewhat when the card temperature warms up to 72 C degrees, which is the limit I set through software. | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
7970Ghz
OCL: 109 (3.2.5-dev, rev 747)
OCL2: 3
OCL3: 38
2199064^8192+1 Time: 121 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 130 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 165 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 240 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 434 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 720 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.32 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 2.56 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 5.36 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 10.4 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 24.3 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 38. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
Has anyone tried running OCL2 or OCL3 on an Intel HD integrated GPU?
____________
My lucky number is 75898524288+1 | |
|
|
Has anyone tried running OCL2 or OCL3 on an Intel HD integrated GPU?
Yes, both of them work (on Mac). I'll post some timings tomorrow.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
Intel i5 4670 / HD4600
OCL: N/A
OCL2: 2
OCL3: 3
OCL2
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600', version 'OpenCL 1.2 ' and driver '10.18.14.4264'.
2199064^8192+1 Time: 397 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 758 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.46 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.53 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.28 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 11 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 23.7 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 49.8 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 108 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 207 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 436 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.
OCL3
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600', version 'OpenCL 1.2 ' and driver '10.18.14.4264'.
2199064^8192+1 Time: 213 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 383 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 796 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 1.65 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 3.18 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 6.59 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 13.6 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 27.7 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 59.6 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 127 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 250 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 3.
Priority change succeeded.
____________
My stats | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1022 ID: 301928 Credit: 543,195,386 RAC: 1
                        
|
Intel i5 4670 / HD4600
Did passed test of known primes? Intel/Linux is surely out of question for now, current Beignet failed at least two tests (reported as composite that what should be primes).
Btw Yves, could you make a big fat "!!! ERROR - ERROR - ERROR !!!" message if built-in tests produces wrong residue? Something that will catch an eye in this boring long scrolling output.
| |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
Intel i5 4670 / HD4600
Did passed test of known primes?
Yes, it did.
Or is there any particular prime you want me to test?
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0000) (time = 0:10:42)
3966304^65536+1 is a probable prime. (432432 digits) (err = 0.0000) (time = 0:42:56)
____________
My stats | |
|
|
Has anyone tried running OCL2 or OCL3 on an Intel HD integrated GPU?
Yes, both of them work (on Mac). I'll post some timings tomorrow.
- Iain
Intel i7-4750HQ @ 2.00GHz / Iris Pro
Running benchmarks for transform implementation "OCL2"
Running on platform 'Apple', device 'Iris Pro', version 'OpenCL 1.2 ' and driver '1.2(Jul 29 2015 02:40:39)'.
2199064^8192+1 Time: 441 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 573 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.34 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 1.78 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.57 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 8.24 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 21.1 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 37.7 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 101 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 132 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 328 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 3.
Running benchmarks for transform implementation "OCL3"
Running on platform 'Apple', device 'Iris Pro', version 'OpenCL 1.2 ' and driver '1.2(Jul 29 2015 02:40:39)'.
2199064^8192+1 Time: 163 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 229 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 443 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 838 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 1.74 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 3.47 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 7.18 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 15.1 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 31.7 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 67.1 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 139 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 6.
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1022 ID: 301928 Credit: 543,195,386 RAC: 1
                        
|
Intel i5 4670 / HD4600
Did passed test of known primes?
Yes, it did.
Nice. But considering the speed I see nothing useful except "science interest" in making it run.
Or is there any particular prime you want me to test?
I think it's not necessary. It's clear that Beignet codegen still have lot of bugs. In particular, these two tests failed on my Haswell:
2485064^4096+1 is composite. (RES=15aa723e56f21e42) (26196 digits) (err = 0.0000) (time = 0:01:01) 00:38:08
2030234^8192+1 is a probable prime. (51672 digits) (err = 0.0000) (time = 0:02:42) 00:40:50
1651902^16384+1 is composite. (RES=7bb203c965b8e34d) (101876 digits) (err = 0.0000) (time = 0:11:14) 00:52:04
And I've aborted following tests because they were planned to take too much time to complete.
Generalized Fermat Number Bench
2199064^8192+1<>Time: 952 us/mul.<----->Err: 0.0000<--->51956 digits
1798620^16384+1>Time: 2.02 ms/mul.<---->Err: 0.0000<--->102481 digits
1471094^32768+1>Time: 4.01 ms/mul.<---->Err: 0.0000<--->202102 digits
1203210^65536+1>Time: 8.59 ms/mul.<---->Err: 0.0000<--->398482 digits
984108^131072+1>Time: 10.1 ms/mul.<---->Err: 0.0000<--->785521 digits
804904^262144+1>Time: 21.4 ms/mul.<---->Err: 0.0000<--->1548156 digits
658332^524288+1>Time: 43.8 ms/mul.<---->Err: 0.0000<--->3050541 digits
538452^1048576+1<------>Time: 97 ms/mul.<------>Err: 0.0000<--->6009544 digits
440400^2097152+1<------>Time: 191 ms/mul.<----->Err: 0.0000<--->11836006 digits
360204^4194304+1<------>Time: 415 ms/mul.<----->Err: 0.0000<--->23305854 digits
294612^8388608+1<------>Time: 740 ms/mul.<----->Err: 0.0000<--->45879398 digits
Genefer Mark = 1.
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
But considering the speed I see nothing useful except "science interest" in making it run.
I disagree. MANY pcs have the iGPU idle; while individually they are slow, with many of them working, it's going to be better than a bunch of high end GPU (distributed computing inception here). Any speedup is well appreciated. Especially if we consider Michael's proposal for the Prime Field of Dreams (GFN 32768), or really any other search/double check with super short tasks. I imagine the iGPU could handle those without much problem
Not to mention, now that we have variable deadlines, we can configure GFN to extend deadlines for GPUs that are slowly, but surely crunching. I don't think it's implemented yet, but I'm sure it could be done. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13958 ID: 53948 Credit: 393,418,261 RAC: 198,590
                               
|
But considering the speed I see nothing useful except "science interest" in making it run.
I disagree. MANY pcs have the iGPU idle; while individually they are slow, with many of them working, it's going to be better than a bunch of high end GPU (distributed computing inception here). Any speedup is well appreciated. Especially if we consider Michael's proposal for the Prime Field of Dreams (GFN 32768), or really any other search/double check with super short tasks. I imagine the iGPU could handle those without much problem
Not to mention, now that we have variable deadlines, we can configure GFN to extend deadlines for GPUs that are slowly, but surely crunching. I don't think it's implemented yet, but I'm sure it could be done.
Being as I'm the one who installed the ill-fated Android Sieve app, I'm not going to come out and say it's not worthwhile to use the large number of integrated GPUs. But I will say, that over all, despite their greater numbers, they're pretty slow and overall don't make a big difference.
It comes down to how much effort it takes to set them up. The good news is that OpenCL is OpenCL is OpenCL, and the same binary executable will run on an Nvidia GPU, and AMD GPU, an integrated GPU, or even on the CPU itself. I don't even have to build different executables to run on an integrated GPU.
That's why you see people saying "I tested it and it seems to work". You can run the exact same binary on any type of GPU. That makes our job a lot easier.
The harder part is getting the BOINC server to play nice with the integrated GPUs. I'm not sure how much work is involved. This is a matter of my development time, figuring out what's required and then making it happen. If a BOINC server code upgrade is required, then we're also introducing the risk of breaking something when we upgrade.
So it comes down to how much effort are we willing to put into making it work. Probably not a lot in the beginning, but maybe that will change. First, however, we've got to get everything else up and running.
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
A "release candidate" of "geneferocl3" is available from Genefer repository.
It performs a bit faster.
Error check was added to OCL3 (for each coefficient of the transform we should have |c_i| <= n.(b-1)^2).
Testing 67108864^2048+1...
Using OCL3 transform
67108864^2048+1 is composite. (RES=ffffffffffffffff) (16030 digits) (err = 0.0000)
Testing 67108866^2048+1...
Using OCL3 transform
maxErr exceeded for 67108866^2048+1, 1.0000 > 0.4500 during final check
"maxErr" is a boolean. The test may detect hardware errors. I think that eXaPower can check if it works...
Yves | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2382 ID: 1178 Credit: 17,991,226,846 RAC: 12,693,967
                                                
|
Looks good on a GTX 660 OEM on Win7. Great work, Yves!
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 660', version 'OpenCL 1.1
CUDA' and driver '347.52'.
2199064^8192+1 Time: 63.2 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 66.8 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 98 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 193 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 374 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 738 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.47 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 3.01 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 6.05 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 12.6 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 27.4 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 33.
Priority change succeeded.
Edit: and that is just a bit faster than the previous OCL3 version that had a Genefer Mark = 31 on this card. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
280X
OCL: 115
OCL2: 13
OCL3: 49
OCL3RC: 52, 5-10% comparing to previous OCL3
2199064^8192+1 Time: 112 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 121 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 183 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 262 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 374 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 643 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.1 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 2.1 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 3.95 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 7.77 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 15.7 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 52.
____________
My stats | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
On my Gtx 970 (1499mhz), Driver 355.82, OCL 3 went from 85 to 88. Nice boost! Still nowhere close to the 280x 115 on OCL, but hey, at least it's already something...
OCL: 77
OCL2: 23
OCL3 RC: 88
Command line: geneferocl3_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2
CUDA' and driver '355.82'.
2199064^8192+1 Time: 58.6 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 64.2 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 68 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 90.5 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 170 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 313 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 597 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.18 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.32 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.64 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.33 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 88.
Priority change succeeded.
Also, some more overclock results. From testing with Folding Home, GPU@Grid and Poem@Home, the max rocksolid OC I can get out of my 970 is around 1505mhz. Interestingly enough, it seems that PrimeGrid is a lot more stable than those, as I seem to be able to boost the card further without crashes (Edit: Just had a crash at 1534mhz while testing). I usually just run the card at the "be-all, end-all" 1506mhz OC, but for the sake of testing, I gave it a boost further:
OCL 2 (v3) gets a 0.500 error in
1203210^65536+1 Time: 211 us/mul. Err: 0.5000 398482 digits
At speeds 1535mhz and above. 1534mhz did pass without the error, but the system hard crashed a little bit later, so take it with a grain of salt. All other numbers seem fine, though.
OCL 3 (RC) gets a 1.000 error in
1471094^32768+1 Time: 65.8 us/mul. Err: 1.0000 202102 digits
and
804904^262144+1 Time: 300 us/mul. Err: 1.0000 1548156 digits
At 1570mhz and above. 1569mhz and below showed no problem. At 1569, GFN score went to 91, vs 88 at 1.5ghz, so that's another very nice boost, if the card is stable for this particular app. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                                      
|
On my Gtx 970 (1499mhz), Driver 355.82, OCL 3 went from 85 to 88. Nice boost! Still nowhere close to the 280x 115 on OCL, but hey, at least it's already something...
OCL3 does great on Maxwell cards comparing to OCL - performance and b limit wise.
And taking into account power usage of GTX970, it's a nice match for OCL3.
Looking forward to see how Fury (Nano) will do...
____________
My stats | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 909 ID: 370496 Credit: 531,793,905 RAC: 407,603
                        
|
On my Gtx 970 (1499mhz), Driver 355.82, OCL 3 went from 85 to 88. Nice boost! Still nowhere close to the 280x 115 on OCL, but hey, at least it's already something...
OCL3 does great on Maxwell cards comparing to OCL - performance and b limit wise.
And taking into account power usage of GTX970, it's a nice match for OCL3.
Looking forward to see how Fury (Nano) will do...
Coming to think about it.... I could get my wattimeter and compare power consumption between OCL tests with different OCs for a Points per Watt comparison, if anyone is interested in it.
| |
|
|
OCL3 RC:
GTX 970 @ 1555MHz:
2199064^8192+1 Time: 69.1 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 65.9 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 70.4 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 91.8 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 175 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 310 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 599 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.16 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.26 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.49 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.39 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 90.
*Edit*: During the benchmark: only n=17 and n=18 hits max error from 1620 to 1560MHz. <1559MHz no errors.
I turned the dial up to 1637MHz and hit max error for all tests. Scored a 93 mark though. This is the highest Genefer overclock on my 970 I've had without a driver crash. The well designed OCL3 program should handle reasonable overclocks on Maxwell. OCL3 OC performance scales decently - evident by the OCed 970's marks.
GTX 750 @ 1412MHz:
2199064^8192+1 Time: 62.3 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 72.1 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 117 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 200 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 412 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 836 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.67 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 3.39 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 6.91 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 14.1 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 30.1 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 29.
Driver crash at 1451MHz. N=19 and n=20 max errors from 1438 until 1420MHz. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2639 ID: 29980 Credit: 568,393,769 RAC: 1,834
                              
|
GTX 980 Ti
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980 Ti', version 'OpenCL 1.2 CUDA' and driver '355.82'.
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 80.2 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 77.3 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 79.5 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 86.7 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 137 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 244 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 456 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 893 us/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 1.72 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 3.46 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 7.16 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 117.
Was 113 on previous version. | |
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1078 ID: 183129 Credit: 1,376,122,338 RAC: 4,719
                         
|
Has someone, or can someone, build the OCL3 executable for linux?
Thanks
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
| |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Laptop test:
>geneferocl2_windows_879.exe -b
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: geneferocl2_windows_879.exe -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', versi
on 'OpenCL 1.1 ' and driver '9.17.10.3040'.
Error: build program failed.
fcl build 1 succeeded.
fcl build 2 succeeded.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE.
>geneferocl3_windows_887.exe -b
geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: geneferocl3_windows_887.exe -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', versi
on 'OpenCL 1.1 ' and driver '9.17.10.3040'.
2199064^8192+1 Time: 460 us/mul. Err: 1.0000 51956 digits
1798620^16384+1 Time: 850 us/mul. Err: 1.0000 102481 digits
1471094^32768+1 Time: 1.87 ms/mul. Err: 1.0000 202102 digits
1203210^65536+1 Time: 3.99 ms/mul. Err: 1.0000 398482 digits
984108^131072+1 Time: 5.97 ms/mul. Err: 1.0000 785521 digits
804904^262144+1 Time: 12.9 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 28.6 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 59.7 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 139 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 326 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 524 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 2.
Priority change succeeded.
Note the Err: 1.0000 on the small b benchmarks.
>geneferocl3_windows_887.exe -l
geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: geneferocl3_windows_887.exe -l
Priority change succeeded.
Generalized Fermat Number b limits for transform implementation "OCL3"
m = 32, bMax = 536870912.
m = 64, bMax = 379625062.
m = 128, bMax = 268435456.
m = 256, bMax = 189812531.
m = 512, bMax = 134217728.
m = 1024, bMax = 94906266.
m = 2048, bMax = 67108864.
m = 4096, bMax = 47453133.
m = 8192, bMax = 33554432.
m = 16384, bMax = 23726566.
m = 32768, bMax = 16777216.
m = 65536, bMax = 11863283.
m = 131072, bMax = 8388608.
m = 262144, bMax = 5931642.
m = 524288, bMax = 4194304.
m = 1048576, bMax = 2965821.
m = 2097152, bMax = 2097152.
m = 4194304, bMax = 1482910.
m = 8388608, bMax = 1048576. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 803 ID: 164101 Credit: 305,704,038 RAC: 5,420

|
Laptop test:
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.1 ' and driver '9.17.10.3040'.
Both runs on my HD 4000 (i3-3217U CPU): Intel's driver is more recent:
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.2 ' and driver '10.18.10.4252'.
But they are some random errors during tests. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,017,981,497 RAC: 1,586,802
                         |
|