## Other

drummers-lowrise
 Advanced search

Message boards : Generalized Fermat Prime Search : Genefer OCL 2

 Subscribe SortOldest firstNewest firstHighest rated posts first
Author Message
Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87087 - Posted: 12 Aug 2015 | 14:59:23 UTC

A new Genefer application "geneferocl2" is available from Genefer repository (http://www.primegrid.com/forum_thread.php?id=6359).

A 32-bit binary for windows is available (compiled with gcc 5.2.1).

geneferocl2 uses fixed-point arithmetic. Q63.64 fixed-point numbers for data and Q0.63 for the sin/cos table.
Then it can run on any GPU: the double type is not used.

Some first tests on my computer (i7-3820 @ 3.6 GHz and GTX 680).

Generalized Fermat Number Bench OCL2 1203210^65536+1 Time: 831 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.6 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 2.73 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 5.52 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 11.7 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 24.7 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 59.2 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 120 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 8. Generalized Fermat Number Bench OCL 1203210^65536+1 Time: 103 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 197 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 374 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 726 us/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 1.46 ms/mul. Err: 0.1738 6009544 digits 440400^2097152+1 Time: 2.98 ms/mul. Err: 0.1777 11836006 digits 360204^4194304+1 Time: 6.27 ms/mul. Err: 0.1592 23305854 digits 294612^8388608+1 Time: 13.6 ms/mul. Err: 0.1797 45879398 digits Genefer Mark = 66. Generalized Fermat Number CPU 1203210^65536+1 Time: 278 us/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 619 us/mul. Err: 0.1484 785521 digits 804904^262144+1 Time: 1.3 ms/mul. Err: 0.1484 1548156 digits 658332^524288+1 Time: 2.88 ms/mul. Err: 0.1406 3050541 digits 538452^1048576+1 Time: 6.14 ms/mul. Err: 0.1367 6009544 digits 440400^2097152+1 Time: 13.9 ms/mul. Err: 0.1406 11836006 digits 360204^4194304+1 Time: 28.8 ms/mul. Err: 0.1406 23305854 digits 294612^8388608+1 Time: 65.8 ms/mul. Err: 0.1328 45879398 digits Genefer Mark = 15. Generalized Fermat Number Bench x87 1203210^65536+1 Time: 2.95 ms/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 5.91 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 13.2 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 26.5 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 58.5 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 117 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 256 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 513 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 2.

It is the "OpenCL version of genefer80": slower than geneferocl but the error is smaller then a larger range can be tested with it:

geneferocl_windows.exe -q "3149688^32768+1" maxErr exceeded for 3149688^32768+1, 0.5000 > 0.4500 genefer_windows64.exe -q "3149688^32768+1" maxErr exceeded for 3149688^32768+1, 0.5000 > 0.4500 geneferocl2_windows.exe -q "3149688^32768+1" 3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:04:37) genefer_windows64.exe -q "3149688^32768+1" -x x87 3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0006) (time = 0:15:20)

This is the first version then it should be tested on different hardware / OS. And future improvements are possible.

Have fun!

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87090 - Posted: 12 Aug 2015 | 16:45:32 UTC
Last modified: 12 Aug 2015 | 16:45:42 UTC

Runs fine on a GTX 660 (Win7 Enterprise 64-bit, Haswell-based Xeon). However, a GTX 645 (same OS, i7-860) crashes giving the error:

Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE. An error (2964) occured.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87094 - Posted: 12 Aug 2015 | 17:23:03 UTC

Does NOT work on my GTX 580:

C:\Temp\GFN\OCL2>geneferocl2_windows.exe -q "3149688^32768+1
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.

Command line: geneferocl2_windows.exe -q 3149688^32768+1

Priority change succeeded.

Testing 3149688^32768+1...
Using OCL2 transform

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1
CUDA' and driver '344.75'.

Starting initialization...
Initialization complete (0.070 seconds).
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.
An error (2964) occured.

I get the same error when running -b. Is it perhaps compiled only for GTX6xx and later GPUs?

Minor: Note the "occured" in the error message is mispelled.

____________
My lucky number is 75898524288+1

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87097 - Posted: 12 Aug 2015 | 17:30:18 UTC - in response to Message 87090.

Runs fine on a GTX 660 (Win7 Enterprise 64-bit, Haswell-based Xeon). However, a GTX 645 (same OS, i7-860) crashes giving the error:
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE. An error (2964) occured.

Did you check NVidia driver version?
May be a problem with "old" drivers? New ones are OpenCL 1.2, I didn't check the app on OpenCL 1.1.

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87099 - Posted: 12 Aug 2015 | 17:47:54 UTC - in response to Message 87097.

Runs fine on a GTX 660 (Win7 Enterprise 64-bit, Haswell-based Xeon). However, a GTX 645 (same OS, i7-860) crashes giving the error:
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE. An error (2964) occured.

Did you check NVidia driver version?
May be a problem with "old" drivers? New ones are OpenCL 1.2, I didn't check the app on OpenCL 1.1.

Updated driver seems to be working on the GTX 645 now, but it is NOT the OpenCL 1.1 issue as my GTX 660's slightly newer driver was also OpenCL 1.1

Tyler
Project administrator
Volunteer tester

Joined: 4 Dec 12
Posts: 1078
ID: 183129
Credit: 1,376,122,338
RAC: 4,719

Message 87100 - Posted: 12 Aug 2015 | 17:48:34 UTC
Last modified: 12 Aug 2015 | 17:54:19 UTC

Works fine on Nvidia GTX 760 on Win 7 pro x64... Driver is 347.52, saying OCL 1.1.

Command line: geneferocl2_windows.exe -q 3149688^32768+1 Priority change succeeded. Testing 3149688^32768+1... Using OCL2 transform Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'. Starting initialization... Initialization complete (0.037 seconds). Estimated time remaining for 3149688^32768+1 is 0:06:50 3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:06:30) 11:50:53

____________

275*2^3585539+1 is prime!!! (1079358 digits)

Proud member of Aggie the Pew

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87105 - Posted: 12 Aug 2015 | 18:07:59 UTC - in response to Message 87099.

Updated driver seems to be working on the GTX 645 now, but it is NOT the OpenCL 1.1 issue as my GTX 660's slightly newer driver was also OpenCL 1.1

I'm running Windows 10 :o) and cannot install driver prior to 352 :o(

Intel's OpenCL is running on my CPU but on on my HD Graphics 4000: "maxErr exceeded for 100234^64+1, 0.5000 > 0.4500 during final check". I will check this.

It seems to run on 347.52 but not on 344.75...

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87108 - Posted: 12 Aug 2015 | 18:22:20 UTC

Yves, the limit check ("-l") seems to be reporting much lower limits than OCL2 can actually process.
____________
My lucky number is 75898524288+1

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87109 - Posted: 12 Aug 2015 | 18:22:51 UTC - in response to Message 87094.

Minor: Note the "occured" in the error message is mispelled.

Thanks, committed!

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87111 - Posted: 12 Aug 2015 | 18:29:52 UTC - in response to Message 87108.

Yves, the limit check ("-l") seems to be reporting much lower limits than OCL2 can actually process.

Yes, the test doesn't work if we start with a large b because the error is not continuous with fixed-point arithmetic. Round-off error is continuous but integer overflow may occurred too. I'm working on finding true limits.

Does it run on a GTX 580 with a new driver?

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87112 - Posted: 12 Aug 2015 | 19:02:22 UTC - in response to Message 87111.

Does it run on a GTX 580 with a new driver?

Yes Yves, it does.

Running benchmarks for transform implementation "OCL2"

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '353.62'.

2199064^8192+1 Time: 176 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 203 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 251 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 512 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.07 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.96 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.97 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.24 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 16 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 37.3 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 79.1 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.
Priority change succeeded.

____________
My stats

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 918
ID: 107307
Credit: 977,945,376
RAC: 0

Message 87122 - Posted: 12 Aug 2015 | 20:10:12 UTC
Last modified: 13 Aug 2015 | 22:38:12 UTC

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 570', version 'OpenCL 1.1 CUDA' and driver '347.25'.

Max b seems to be:

32768 between 16.9M-16.95M
65536 between 11.6M-11.7M
131072 between 7.7M-8M
262144 between 5.5M-6M
524288 between 3M-4M

(I'll keep updating this post as I keep testing)

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87124 - Posted: 12 Aug 2015 | 20:21:28 UTC - in response to Message 87112.
Last modified: 12 Aug 2015 | 20:22:48 UTC

Does it run on a GTX 580 with a new driver?

Yes Yves, it does.

Thanks Mike, I will check why it doesn't run with driver '344.75' on a PC on Windows 7.

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '353.62'.

On my laptop, it is device 'GeForce GT 740M', version 'OpenCL 1.2 CUDA' and driver '353.62'.

I don't understand. Why OpenCL 1.2 on my computer and 1.1 on yours?

GTX 580 => Genefer Mark = 12 and GTX 680 Genefer Mark = 8.

GTX 580: 512 cores @ 772 (or 1544?) MHz.
GTX 680: 1536 cores @ 1006 MHz.

Why the 580 is 50% faster?

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87125 - Posted: 12 Aug 2015 | 20:25:18 UTC - in response to Message 87122.
Last modified: 12 Aug 2015 | 20:38:22 UTC

Max b seems to be:

32768 between 16.9M-16.95M
65536 between 11.5M-11.7M

Those are both within PRPNet current range.
32768 is reaching 10M and 65536 passed 4M couple of days ago.
3966304^65536+1 is ~12 minutes on GTX580 vs ~66 mins on i5-4670.

Guest what - I'm in process of selling my GTX580 box this week...and new Genefer is on the scene.
____________
My stats

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87126 - Posted: 12 Aug 2015 | 20:33:33 UTC

GTX 580, latest non-beta drivers (353.62), Core i5-4670K. All stock clocks.

The two extended precision transforms (OCL2 and x87) are in red.

GPU OCL:

2199064^8192+1 Time: 25.4 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 31.9 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 45 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 82.6 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 176 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 356 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 664 us/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 1.32 ms/mul. Err: 0.1738 6009544 digits 440400^2097152+1 Time: 2.69 ms/mul. Err: 0.1777 11836006 digits 360204^4194304+1 Time: 5.71 ms/mul. Err: 0.1592 23305854 digits 294612^8388608+1 Time: 12.5 ms/mul. Err: 0.1797 45879398 digits Genefer Mark = 73.

GPU CUDA:
2199064^8192+1 Time: 160 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 177 us/mul. Err: 0.2344 102481 digits 1471094^32768+1 Time: 192 us/mul. Err: 0.2305 202102 digits 1203210^65536+1 Time: 265 us/mul. Err: 0.2188 398482 digits 984108^131072+1 Time: 425 us/mul. Err: 0.2344 785521 digits 804904^262144+1 Time: 694 us/mul. Err: 0.2188 1548156 digits 658332^524288+1 Time: 1.14 ms/mul. Err: 0.2031 3050541 digits 538452^1048576+1 Time: 1.97 ms/mul. Err: 0.1875 6009544 digits 440400^2097152+1 Time: 3.6 ms/mul. Err: 0.1953 11836006 digits 360204^4194304+1 Time: 7.09 ms/mul. Err: 0.1875 23305854 digits 294612^8388608+1 Time: 15.8 ms/mul. Err: 0.2031 45879398 digits Genefer Mark = 55.

CPU FMA3:
6008024^256+1 Time: 0.822 us/mul. Err: 0.1484 1736 digits 4913974^512+1 Time: 1.36 us/mul. Err: 0.1562 3427 digits 4019150^1024+1 Time: 2.75 us/mul. Err: 0.1602 6763 digits 3287270^2048+1 Time: 5.72 us/mul. Err: 0.1406 13347 digits 2688666^4096+1 Time: 11.6 us/mul. Err: 0.1406 26336 digits 2199064^8192+1 Time: 25.6 us/mul. Err: 0.1562 51956 digits 1798620^16384+1 Time: 51.8 us/mul. Err: 0.1562 102481 digits 1471094^32768+1 Time: 118 us/mul. Err: 0.1641 202102 digits 1203210^65536+1 Time: 240 us/mul. Err: 0.1523 398482 digits 984108^131072+1 Time: 549 us/mul. Err: 0.1484 785521 digits 804904^262144+1 Time: 1.11 ms/mul. Err: 0.1562 1548156 digits 658332^524288+1 Time: 2.63 ms/mul. Err: 0.1445 3050541 digits 538452^1048576+1 Time: 5.49 ms/mul. Err: 0.1406 6009544 digits 440400^2097152+1 Time: 12.7 ms/mul. Err: 0.1328 11836006 digits 360204^4194304+1 Time: 25.8 ms/mul. Err: 0.1375 23305854 digits 294612^8388608+1 Time: 63.2 ms/mul. Err: 0.1289 45879398 digits Genefer Mark = 16.

CPU AVX (Intel):
6008024^256+1 Time: 0.963 us/mul. Err: 0.1406 1736 digits 4913974^512+1 Time: 1.59 us/mul. Err: 0.1250 3427 digits 4019150^1024+1 Time: 3.34 us/mul. Err: 0.1250 6763 digits 3287270^2048+1 Time: 6.74 us/mul. Err: 0.1406 13347 digits 2688666^4096+1 Time: 14.1 us/mul. Err: 0.1562 26336 digits 2199064^8192+1 Time: 30.2 us/mul. Err: 0.1562 51956 digits 1798620^16384+1 Time: 62.5 us/mul. Err: 0.1562 102481 digits 1471094^32768+1 Time: 137 us/mul. Err: 0.1641 202102 digits 1203210^65536+1 Time: 284 us/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 623 us/mul. Err: 0.1484 785521 digits 804904^262144+1 Time: 1.29 ms/mul. Err: 0.1484 1548156 digits 658332^524288+1 Time: 2.91 ms/mul. Err: 0.1406 3050541 digits 538452^1048576+1 Time: 6.2 ms/mul. Err: 0.1367 6009544 digits 440400^2097152+1 Time: 13.9 ms/mul. Err: 0.1406 11836006 digits 360204^4194304+1 Time: 28.6 ms/mul. Err: 0.1406 23305854 digits 294612^8388608+1 Time: 67.7 ms/mul. Err: 0.1328 45879398 digits Genefer Mark = 14.

GPU OCL2:
]2199064^8192+1 Time: 176 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 206 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 253 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 512 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.08 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.98 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 3.96 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 8.31 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 16.1 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 37.5 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 79.9 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 12.

CPU SSE4:
6008024^256+1 Time: 1.26 us/mul. Err: 0.1250 1736 digits 4913974^512+1 Time: 2.7 us/mul. Err: 0.1406 3427 digits 4019150^1024+1 Time: 5.51 us/mul. Err: 0.1562 6763 digits 3287270^2048+1 Time: 12.2 us/mul. Err: 0.1562 13347 digits 2688666^4096+1 Time: 24.4 us/mul. Err: 0.1562 26336 digits 2199064^8192+1 Time: 51.8 us/mul. Err: 0.1719 51956 digits 1798620^16384+1 Time: 111 us/mul. Err: 0.1719 102481 digits 1471094^32768+1 Time: 232 us/mul. Err: 0.1719 202102 digits 1203210^65536+1 Time: 496 us/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 1.04 ms/mul. Err: 0.1641 785521 digits 804904^262144+1 Time: 2.21 ms/mul. Err: 0.1562 1548156 digits 658332^524288+1 Time: 4.59 ms/mul. Err: 0.1484 3050541 digits 538452^1048576+1 Time: 10.2 ms/mul. Err: 0.1562 6009544 digits 440400^2097152+1 Time: 20.9 ms/mul. Err: 0.1406 11836006 digits 360204^4194304+1 Time: 45.8 ms/mul. Err: 0.1250 23305854 digits 294612^8388608+1 Time: 93.2 ms/mul. Err: 0.1328 45879398 digits Genefer Mark = 9.

CPU SSE2:
6008024^256+1 Time: 1.62 us/mul. Err: 0.1250 1736 digits 4913974^512+1 Time: 3.39 us/mul. Err: 0.1406 3427 digits 4019150^1024+1 Time: 6.96 us/mul. Err: 0.1562 6763 digits 3287270^2048+1 Time: 14.2 us/mul. Err: 0.1562 13347 digits 2688666^4096+1 Time: 29.6 us/mul. Err: 0.1562 26336 digits 2199064^8192+1 Time: 62 us/mul. Err: 0.1719 51956 digits 1798620^16384+1 Time: 133 us/mul. Err: 0.1719 102481 digits 1471094^32768+1 Time: 274 us/mul. Err: 0.1719 202102 digits 1203210^65536+1 Time: 579 us/mul. Err: 0.1562 398482 digits 984108^131072+1 Time: 1.2 ms/mul. Err: 0.1641 785521 digits 804904^262144+1 Time: 2.54 ms/mul. Err: 0.1562 1548156 digits 658332^524288+1 Time: 5.31 ms/mul. Err: 0.1484 3050541 digits 538452^1048576+1 Time: 11.4 ms/mul. Err: 0.1562 6009544 digits 440400^2097152+1 Time: 23.4 ms/mul. Err: 0.1406 11836006 digits 360204^4194304+1 Time: 51 ms/mul. Err: 0.1250 23305854 digits 294612^8388608+1 Time: 104 ms/mul. Err: 0.1328 45879398 digits Genefer Mark = 8.

CPU x87:
6008024^256+1 Time: 6.07 us/mul. Err: 0.0001 1736 digits 4913974^512+1 Time: 14.4 us/mul. Err: 0.0001 3427 digits 4019150^1024+1 Time: 30.2 us/mul. Err: 0.0001 6763 digits 3287270^2048+1 Time: 61.6 us/mul. Err: 0.0001 13347 digits 2688666^4096+1 Time: 134 us/mul. Err: 0.0001 26336 digits 2199064^8192+1 Time: 286 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 620 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 1.31 ms/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 2.82 ms/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 5.86 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 12.6 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 26.4 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 56.2 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 117 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 248 ms/mul. Err: 0.0001 23305854 digits

Bottom line: on this hardware, OCL2 is about 6 times faster than x87, but I'd be running 4 x87 tests simultaneously.
____________
My lucky number is 75898524288+1

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87128 - Posted: 12 Aug 2015 | 20:34:47 UTC - in response to Message 87124.

I don't understand. Why OpenCL 1.2 on my computer and 1.1 on yours?

GTX 580: 512 cores @ 772 (or 1544?) MHz.
GTX 680: 1536 cores @ 1006 MHz.

Why the 580 is 50% faster?

Well, GPU-Z says 512 shaders clocked 1544Mhz, core clocked at half.
And OpenCL 1.1 for whole GTX 5xx series.

GTX 580 has 384-bit bus width, GTX 680 is only 256.
On the other hand, my box has slower i5-661 (comparing to i7-3820) and Genefer is taking whole core.
____________
My stats

eXaPower

Joined: 30 Sep 13
Posts: 122
ID: 259902
Credit: 1,636,946,772
RAC: 0

Message 87130 - Posted: 12 Aug 2015 | 20:45:15 UTC
Last modified: 12 Aug 2015 | 21:12:53 UTC

352.78 OpenCL 1.2 driver (branch r352) Transform implementation "OCL2" on Nvidia Maxwell C.C5.0/5.2 and i-5 4440S CPU @ 2.9GHz:

GTX750 @ 1.412GHz:

Generalized Fermat Number Bench
2199064^8192+1 Time: 163 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 281 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 511 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 856 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 2.3 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 3.75 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 8.65 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 15.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 37.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 65.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 260 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 5.

GTX970 @ 1540MHz:

Generalized Fermat Number Bench
2199064^8192+1 Time: 174 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 181 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 288 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 385 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 943 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.62 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.52 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.58 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 14.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 28.4 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 64.1 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 14.

10 runs produced the same result: CPU OCL2 benchmark never completed full suite - results stop at 398482 digits.

(Microcode revision 1C) 2.9GHz (i-5 4440) OpenCL (Build148) Driver 4.2.0.148:

Generalized Fermat Number Bench
2199064^8192+1 Time: 1.45 ms/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 3 ms/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 6.5 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 13.8 ms/mul. Err: 0.0001 398482 digits

The current microcode for my Ivy i-5 is 1B. I've seen four different MC on my Haswell in last couple months: 19/1A/1B/1C.

The 970 peaked at 125W during OCL2 benchmark. The 970's BUS usage ~60% on PCIe3.0 x8 links. The 750 (PCI3.0 x8 link) BUS usage was ~50%. MCU load peaked at 30% for both GPU's.

Genefer x64 OCL benchmark (140W peak) 970 at 1540MHz:

Generalized Fermat Number Bench
2199064^8192+1 Time: 53.4 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 58.7 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 73.6 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 101 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 175 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 340 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 664 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.31 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.55 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.31 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 10.6 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 78.

GTX750 at 1412MHz:

Generalized Fermat Number Bench
2199064^8192+1 Time: 61.1 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 78.1 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 131 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 242 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 451 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 937 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 1.86 ms/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 3.98 ms/mul. Err: 0.1758 6009544 digits
440400^2097152+1 Time: 7.98 ms/mul. Err: 0.1680 11836006 digits
360204^4194304+1 Time: 17.1 ms/mul. Err: 0.1563 23305854 digits
294612^8388608+1 Time: 34.5 ms/mul. Err: 0.1563 45879398 digits
Genefer Mark = 25.

FMA3 Haswell 2.9GHz (C72133RAM) Genefer x64:

Generalized Fermat Number Bench
6008024^256+1 Time: 1.16 us/mul. Err: 0.1484 1736 digits
4913974^512+1 Time: 1.44 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 2.93 us/mul. Err: 0.1602 6763 digits
3287270^2048+1 Time: 6.12 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 12.2 us/mul. Err: 0.1406 26336 digits
2199064^8192+1 Time: 27.5 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 53.9 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 123 us/mul. Err: 0.1641 202102 digits
1203210^65536+1 Time: 250 us/mul. Err: 0.1523 398482 digits
984108^131072+1 Time: 569 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 1.14 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 2.69 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 5.51 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 12.6 ms/mul. Err: 0.1328 11836006 digits
360204^4194304+1 Time: 25.6 ms/mul. Err: 0.1375 23305854 digits
294612^8388608+1 Time: 60.9 ms/mul. Err: 0.1289 45879398 digits
Genefer Mark = 16.

Are the OCL2 and OCL Genefer Marks comparable? Does each program have a separate rating design?

OCL2 (GTX750) 'b' limits:

Generalized Fermat Number b Limits
The upper bound m = 8192, b = 6685000, Err = 0.0009
Starting b = 6690000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 16384, b = 5435000, Err = 0.0011
Starting b = 5440000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 32768, b = 4405000, Err = 0.0008
Starting b = 4410000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 65536, b = 3575000, Err = 0.0009
Starting b = 3580000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 131072, b = 2905000, Err = 0.0013
Starting b = 2910000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 262144, b = 2355000, Err = 0.0008
Starting b = 2360000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 524288, b = 1915000, Err = 0.0008
Starting b = 1920000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 1048576, b = 1555000, Err = 0.0012
Starting b = 1560000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 2097152, b = 1255000, Err = 0.0008
Starting b = 1260000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 4194304, b = 1025000, Err = 0.0006
Starting b = 1030000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 8388608, b = 825000, Err = 0.0008
Starting b = 830000, Err b = 0, Err = 0.0000, 5 Err b = 0

OCL2 (GTX970) 'b' limits:

Generalized Fermat Number b Limits
The upper bound m = 8192, b = 6685000, Err = 0.0009
Starting b = 6690000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 16384, b = 5435000, Err = 0.0010
Starting b = 5440000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 32768, b = 4405000, Err = 0.0008
Starting b = 4410000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 65536, b = 3575000, Err = 0.0009
Starting b = 3580000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 131072, b = 2905000, Err = 0.0012
Starting b = 2910000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 262144, b = 2355000, Err = 0.0008
Starting b = 2360000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 524288, b = 1915000, Err = 0.0008
Starting b = 1920000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 1048576, b = 1555000, Err = 0.0008
Starting b = 1560000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 2097152, b = 1255000, Err = 0.0007
Starting b = 1260000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 4194304, b = 1025000, Err = 0.0006
Starting b = 1030000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 8388608, b = 825000, Err = 0.0007
Starting b = 830000, Err b = 0, Err = 0.0000, 5 Err b = 0

OCL2 (i-5 4440S) 'b' limits:

Generalized Fermat Number b Limits
The upper bound m = 8192, b = 6685000, Err = 0.0009
Starting b = 6690000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 16384, b = 5435000, Err = 0.0011
Starting b = 5440000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 32768, b = 4405000, Err = 0.0009
Starting b = 4410000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 65536, b = 3575000, Err = 0.0010
Starting b = 3580000, Err b = 0, Err = 0.0000, 5 Err b = 0

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87132 - Posted: 12 Aug 2015 | 20:55:38 UTC - in response to Message 87130.

Are the OCL2 and OCL Genefer Marks comparable? Does each program have a separate rating design?

If I remember correctly, they're all the same metric except that GPU tasks measure elapsed time and CPU tasks measure CPU time. The Genefer Mark scores should therefore by comparable between any of the programs. That's the reason for it.
____________
My lucky number is 75898524288+1

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 87134 - Posted: 12 Aug 2015 | 21:25:48 UTC

Thanks Yves, looks like it will be useful when we reach beyond the limits of the current application. I built the new code on my Mac with FirePro D700, but it fails for all tests:

Running tests for transform implementation "OCL2" Testing 10234^64+1... Using OCL2 transform Running on platform 'Apple', device 'ATI Radeon HD - FirePro D700 Compute Engine', version 'OpenCL 1.2 ' and driver '1.2 (Jun 10 2015 16:27:05)'. Starting initialization... Initialization complete (0.000 seconds). Testing 10234^64+1... 851 steps to go maxErr exceeded for 10234^64+1, 0.5000 > 0.4500 during final check

However, it is working fine on my Linux machine with a Tesla K20m (Kepler-series) GPU:

Running benchmarks for transform implementation "OCL2" Running on platform 'NVIDIA CUDA', device 'Tesla K20m', version 'OpenCL 1.1 CUDA' and driver '346.46'. 2199064^8192+1 Time: 198 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 220 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 344 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 636 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.17 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 2.3 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 4.56 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 9.4 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 20.4 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 42.9 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 87.9 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 10. Running benchmarks for transform implementation "OCL" Running on platform 'NVIDIA CUDA', device 'Tesla K20m', version 'OpenCL 1.1 CUDA' and driver '346.46'. 2199064^8192+1 Time: 42 us/mul. Err: 0.2031 51956 digits 1798620^16384+1 Time: 53.6 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 51.1 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 67.1 us/mul. Err: 0.1719 398482 digits 984108^131072+1 Time: 115 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 276 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 525 us/mul. Err: 0.1875 3050541 digits 538452^1048576+1 Time: 1.02 ms/mul. Err: 0.1680 6009544 digits 440400^2097152+1 Time: 1.99 ms/mul. Err: 0.1855 11836006 digits 360204^4194304+1 Time: 4.12 ms/mul. Err: 0.1641 23305854 digits 294612^8388608+1 Time: 8.65 ms/mul. Err: 0.1641 45879398 digits Genefer Mark = 100.

For this setup, OCL2 is about 10 times slower than OCL, but with much improved accuracy...

I will see what I can do to debug the mac application.

- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87136 - Posted: 12 Aug 2015 | 21:37:52 UTC
Last modified: 12 Aug 2015 | 21:38:42 UTC

Well, I guess what I'm gonna say was to be expected, but regardless, it seems to not work on a Radeon 4350 with a pentium E2180.

C:\Users\Paulo\Desktop\Arquivos\Setups>geneferocl2_windows -b
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.

Command line: geneferocl2_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"

Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', version 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.

Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE.

Note: Using -l seems to hand out Error 2989

I guess I'll just stick to PPS sieve then. I'll run the tests on my Gtx 970, once I get to my other house tomorrow.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87143 - Posted: 13 Aug 2015 | 1:06:40 UTC - in response to Message 87136.

Well, I guess what I'm gonna say was to be expected, but regardless, it seems to not work on a Radeon 4350 with a pentium E2180.

Not expected, actually.

This is probably fixable; the requirements for OCL2 are less than for OCL. Double precision hardware isn't required for OCL2, so OCL2 can run on a wider variety of GPUs than can OCL or CUDA.

____________
My lucky number is 75898524288+1

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87144 - Posted: 13 Aug 2015 | 1:17:23 UTC - in response to Message 87143.

Well, I guess what I'm gonna say was to be expected, but regardless, it seems to not work on a Radeon 4350 with a pentium E2180.

Not expected, actually.

This is probably fixable; the requirements for OCL2 are less than for OCL. Double precision hardware isn't required for OCL2, so OCL2 can run on a wider variety of GPUs than can OCL or CUDA.

The dream lives on, I suppose...

I'll be happy to give another go if the app changes, or tinker with any settings that might fix it... if anyone knows of it.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87150 - Posted: 13 Aug 2015 | 10:05:58 UTC

Resuming from a checkpoint is bugged...

pschoefer
Volunteer developer
Volunteer tester

Joined: 20 Sep 05
Posts: 685
ID: 845
Credit: 2,886,414,412
RAC: 77,022

Message 87153 - Posted: 13 Aug 2015 | 13:28:04 UTC

AMD R9-280X, Windows 7 x64, Catalyst 15.7.1

Running benchmarks for transform implementation "OCL" Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'. 2199064^8192+1 Time: 44.3 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 47.7 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 53.5 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 77.4 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 122 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 282 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 465 us/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 897 us/mul. Err: 0.1758 6009544 digits 440400^2097152+1 Time: 1.73 ms/mul. Err: 0.1680 11836006 digits 360204^4194304+1 Time: 3.42 ms/mul. Err: 0.1563 23305854 digits 294612^8388608+1 Time: 7 ms/mul. Err: 0.1719 45879398 digits Genefer Mark = 118.

Running benchmarks for transform implementation "OCL2" Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'. 2199064^8192+1 Time: 441 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 507 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 862 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 1.49 ms/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 2.57 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 4.46 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 9.38 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 19.4 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 27 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 54.9 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 128 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 7.

geneferocl2_windows.exe -q "3149688^32768+1" 3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:10:27)

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87167 - Posted: 14 Aug 2015 | 4:50:13 UTC
Last modified: 14 Aug 2015 | 5:41:58 UTC

A6 3500, 2x8GB DDR3, Gtx 970, Driver 355.60.

Just cuz "why not?", I included the benchmarks with the GPU at stock 1329/3005mhz core/mem. and OC to 1501/3505, if these are of any interest.

I couldn't test genefercuda, as I'm getting an error message when running genefercuda. "The program can't start because cudar32_55dll is missing from your computer". Any solution for that?

Also, my x87 failed when testing 3149688^32768+1 (it did fine on benchmarking, though).

EDIT: By coincidence, I found out that I have a 2nd computer, which also has a Radeon HD 4350. Just as with my previous message, it showed the same error and was unable to run the test.

OCL2 (Stock)

Running benchmarks for transform implementation "OCL2"

2199064^8192+1 Time: 194 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 207 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 237 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 439 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.08 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.83 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.74 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 7.46 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 15.6 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 33.2 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 74 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 13.

OCL2 (OCed)
Running benchmarks for transform implementation "OCL2"

2199064^8192+1 Time: 181 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 182 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 213 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 390 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 962 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.63 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.29 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.62 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 14.1 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 28.6 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 65.5 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 14.

OCL (Stock)
Running benchmarks for transform implementation "OCL"

2199064^8192+1 Time: 54.4 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 62.9 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 81.2 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 116 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 201 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 387 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 757 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.51 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.94 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 6.14 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 12.3 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 68.

OCL (OCed)
Running benchmarks for transform implementation "OCL"

2199064^8192+1 Time: 53.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 58.8 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 73.9 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 103 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 181 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 343 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 676 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.33 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.63 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.45 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 11 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 76.

CPU
Running benchmarks for transform implementation "Default"
6008024^256+1 Time: 4.61 us/mul. Err: 0.1562 1736 digits
4913974^512+1 Time: 9.65 us/mul. Err: 0.1406 3427 digits
4019150^1024+1 Time: 20.3 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 42.4 us/mul. Err: 0.1719 13347 digits
2688666^4096+1 Time: 91.6 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 194 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 464 us/mul. Err: 0.2188 102481 digits
1471094^32768+1 Time: 987 us/mul. Err: 0.2031 202102 digits
1203210^65536+1 Time: 2.28 ms/mul. Err: 0.2188 398482 digits
984108^131072+1 Time: 4.87 ms/mul. Err: 0.2031 785521 digits
804904^262144+1 Time: 11.2 ms/mul. Err: 0.2031 1548156 digits
658332^524288+1 Time: 24.4 ms/mul. Err: 0.2031 3050541 digits
538452^1048576+1 Time: 54 ms/mul. Err: 0.1914 6009544 digits
440400^2097152+1 Time: 114 ms/mul. Err: 0.1738 11836006 digits
360204^4194304+1 Time: 247 ms/mul. Err: 0.1875 23305854 digits

SSE2
Running benchmarks for transform implementation "SSE2"
6008024^256+1 Time: 2.67 us/mul. Err: 0.1250 1736 digits
4913974^512+1 Time: 5.66 us/mul. Err: 0.1406 3427 digits
4019150^1024+1 Time: 11.4 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 23.4 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 48.8 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 101 us/mul. Err: 0.1719 51956 digits
1798620^16384+1 Time: 216 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 447 us/mul. Err: 0.1719 202102 digits
1203210^65536+1 Time: 1.33 ms/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 2.79 ms/mul. Err: 0.1641 785521 digits
804904^262144+1 Time: 7.68 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 15.4 ms/mul. Err: 0.1484 3050541 digits
538452^1048576+1 Time: 40.5 ms/mul. Err: 0.1562 6009544 digits
440400^2097152+1 Time: 81.1 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 203 ms/mul. Err: 0.1328 23305854 digits
294612^8388608+1 Time: 403 ms/mul. Err: 0.1406 45879398 digits
Genefer Mark = 2.

x87
Running benchmarks for transform implementation "x87 (80-bit)"
6008024^256+1 Time: 12.8 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 28.4 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 61.7 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 132 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 288 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 737 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 1.57 ms/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 3.86 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 8.19 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 19.5 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 41.2 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 93.2 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 194 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 432 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 894 ms/mul. Err: 0.0001 23305854 digits

Testing 3149688^32768+1...
OCL / CPU / SSE2 / x87
maxErr exceeded for 3149688^32768+1, 0.5000 > 0.4500

OCL2 (Stock)
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:02:45) 01:52:36

OCL2 (OCed)
3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:02:49) 01:40:51

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 918
ID: 107307
Credit: 977,945,376
RAC: 0

Message 87170 - Posted: 14 Aug 2015 | 8:25:01 UTC - in response to Message 87167.
Last modified: 14 Aug 2015 | 14:00:42 UTC

I couldn't test genefercuda, as I'm getting an error message when running genefercuda. "The program can't start because cudar32_55dll is missing from your computer". Any solution for that?

cudart32_55.dll
cufft32_55.dll

In answer to the question below, put them in the same directory as the genefercuda program.

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87180 - Posted: 14 Aug 2015 | 13:46:10 UTC - in response to Message 87170.

I couldn't test genefercuda, as I'm getting an error message when running genefercuda. "The program can't start because cudar32_55dll is missing from your computer". Any solution for that?

cudart32_55.dll

Forgot to mention, the cufft32_55.dll is missing as well.

Also, where am I supposed to put such files?

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87247 - Posted: 17 Aug 2015 | 13:11:34 UTC

A new version of "geneferocl2" is available from Genefer repository.

It performs about 50% faster than the previous version on a GTX 680.
It was tested on a GTX 275 with NVIDIA driver 320.49.
Resuming from a checkpoint bug is fixed.
The b limits were extended: they are similar to genefer80 limits:

Generalized Fermat Number b limits for transform implementation "OCL2" m = 8192, maxErr(b = 125.40M) = 0.2497, maxErr(b = 125.41M) = 0.2752, maxErr(b = 157.76M) = 0.3984, maxErr(b = 157.77M) = 0.4232 m = 16384, maxErr(b = 102.36M) = 0.2638, maxErr(b = 102.37M) = 0.2789, maxErr(b = 124.64M) = 0.3669, maxErr(b = 124.65M) = 0.4137 m = 32768, maxErr(b = 81.66M) = 0.2567, maxErr(b = 81.67M) = 0.2855, maxErr(b = 95.51M) = 0.3840, maxErr(b = 95.52M) = 0.4312 m = 65536, maxErr(b = 64.00M) = 0.2287, maxErr(b = 64.01M) = 0.2885, maxErr(b = 81.66M) = 0.3758, maxErr(b = 81.67M) = 0.4217 m = 131072, maxErr(b = 51.03M) = 0.2489, maxErr(b = 51.04M) = 0.2810, maxErr(b = 60.42M) = 0.3172, maxErr(b = 60.43M) = 0.4365 m = 262144, maxErr(b = 45.07M) = 0.2607, maxErr(b = 45.08M) = 0.2757, maxErr(b = 50.78M) = 0.3181, maxErr(b = 50.79M) = 0.4282 m = 524288, maxErr(b = 34.86M) = 0.2320, maxErr(b = 34.87M) = 0.2717, maxErr(b = 41.34M) = 0.3326, maxErr(b = 41.35M) = 0.4123 m = 1048576, maxErr(b = 28.61M) = 0.2427, maxErr(b = 28.62M) = 0.3263, maxErr(b = 35.01M) = 0.3796, maxErr(b = 35.02M) = 0.4345 m = 2097152, maxErr(b = 23.82M) = 0.2611, maxErr(b = 23.83M) = 0.2742, maxErr(b = 28.30M) = 0.3888, maxErr(b = 28.31M) = 0.4547 m = 4194304, maxErr(b = 19.38M) = 0.2482, maxErr(b = 19.39M) = 0.2926, maxErr(b = 22.46M) = 0.3668, maxErr(b = 22.47M) = 0.4274 m = 8388608, maxErr(b = 16.03M) = 0.2524, maxErr(b = 16.04M) = 0.3122, maxErr(b = 19.38M) = 0.3738, maxErr(b = 19.39M) = 0.4348

The limit check was rewritten: a pessimistic and an optimistic limits are printed. A quick test cannot be more accurate (I think).

Note that inline assembler is necessary to get a performance boost because OpenCL has no "add with carry" instruction. It was done on NVIDIA but not on ATI (I don't know if it is possible and have no ATI card).

Yves

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87248 - Posted: 17 Aug 2015 | 13:25:47 UTC - in response to Message 87247.
Last modified: 17 Aug 2015 | 13:27:10 UTC

A new version of "geneferocl2" is available from Genefer repository.
(...)
Yves

Results for my Gtx 970 will have to wait, as I'll only get home by Friday.

In the meantime, my Radeon HD 4350 still doesn't work. Though this time around, it gave off A BUNCH of info. Maybe this be of use to you.

All command lines give out the following results, except for -l, which also adds the line "An error (2989) occurred."

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"

Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', version 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.

Error: build program failed.
"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 687: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(8 / 4 * BLK8, 1, 1)))
^

"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 712: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(16 / 4 * BLK16, 1, 1)))
^

"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 737: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(32 / 4 * BLK32, 1, 1)))
^

"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 766: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1)))
^

"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 795: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(128 / 4 * BLK128, 1, 1)))
^

"C:\Users\Paulo\AppData\Local\Temp\OCL8383.tmp.cl", line 828: warning: unknown
attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(256 / 4 * BLK256, 1, 1)))
^

Error: Requested compile size is bigger than the required workgroup size of 32 elements
Error: Creating kernel Forward64 failed!

Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87249 - Posted: 17 Aug 2015 | 14:23:35 UTC

Just so everyone knows, behind the scenes there's been a lot of discussion and work on what happens next with GFN. OCL2 will probably play a part in that. We're not ready to discuss anything yet, but I want you to know we're not ignoring the OCL2 discussion.
____________
My lucky number is 75898524288+1

eXaPower

Joined: 30 Sep 13
Posts: 122
ID: 259902
Credit: 1,636,946,772
RAC: 0

Message 87250 - Posted: 17 Aug 2015 | 14:33:43 UTC
Last modified: 17 Aug 2015 | 14:34:28 UTC

A quick benchmark comparison -- driver 352.78
GTX970 @ 1540MHz failed err rate (verison2 OCL2):

2199064^8192+1 Time: 157 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 163 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 176 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 331 us/mul. Err: 0.5000 398482 digits
984108^131072+1 Time: 721 us/mul. Err: 0.5000 785521 digits
804904^262144+1 Time: 1.48 ms/mul. Err: 0.5000 1548156 digits
658332^524288+1 Time: 3.04 ms/mul. Err: 0.5000 3050541 digits
538452^1048576+1 Time: 6.14 ms/mul. Err: 0.5000 6009544 digits
440400^2097152+1 Time: 13.1 ms/mul. Err: 0.5000 11836006 digits
360204^4194304+1 Time: 26.2 ms/mul. Err: 0.5000 23305854 digits
294612^8388608+1 Time: 55.3 ms/mul. Err: 0.5000 45879398 digits
Genefer Mark = 16.

(ver.2 OCL2) GTX970 @ 1501MHz

2199064^8192+1 Time: 161 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 164 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 179 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 339 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 735 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.5 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.09 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.23 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 13.2 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 26.4 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 55.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 16.

(ver.2 OCL2) GTX750 @ 1412MHz:

2199064^8192+1 Time: 139 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 240 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 372 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 696 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.54 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 2.99 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 5.96 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 12.2 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 25.4 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 53.5 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 111 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 8.

(ver.1 OCL2) GTX750 @ 1412MHz:

2199064^8192+1 Time: 163 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 281 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 511 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 856 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 2.3 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 3.75 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 8.65 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 15.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 37.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 65.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 260 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 5.

(ver.1 OCL2) GTX970 @ 1540MHz:

2199064^8192+1 Time: 174 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 181 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 288 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 385 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 943 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.62 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.52 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 6.58 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 14.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 28.4 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 64.1 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 14.

A new version of "geneferocl2" is available from Genefer repository.
It performs about 50% faster than the previous version on a GTX 680.

Nvidia's Maxwell C.C 5.0 and 5.2 also improved it's Genefer mark with the new OCL2 version.
I'm curious to know if AMD's 3 GCN revisions reveal any architectural differences for fixed-point arithmetic: Tahiti GCN1.0 vs. Hawaii GCN1.1 vs. Tonga (Fiji) GCN1.2.

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87251 - Posted: 17 Aug 2015 | 14:49:22 UTC

New version runs about 35% faster on GTX 660-OEM (347.52 driver) and GTX 645 (353.62 driver) cards.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87252 - Posted: 17 Aug 2015 | 15:06:09 UTC

GTX 580:

Old version of OCL2:

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '355.60'. 2199064^8192+1 Time: 175 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 204 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 252 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 517 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.06 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.98 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 4.01 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 8.31 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 16.1 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 37.5 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 79.9 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 12.

New OCL2:
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '355.60'. 2199064^8192+1 Time: 111 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 128 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 161 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 366 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 726 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.36 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 2.79 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 5.61 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 11.5 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 24.2 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 52.2 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 17.

Regular OCL:
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '355.60'. 2199064^8192+1 Time: 25.6 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 31.8 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 44.9 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 82.8 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 176 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 361 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 662 us/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 1.34 ms/mul. Err: 0.1738 6009544 digits 440400^2097152+1 Time: 2.69 ms/mul. Err: 0.1777 11836006 digits 360204^4194304+1 Time: 5.7 ms/mul. Err: 0.1592 23305854 digits 294612^8388608+1 Time: 12.5 ms/mul. Err: 0.1797 45879398 digits Genefer Mark = 73.

It's now roughly one quarter of the speed of OCL, with a greatly expanded b range. This is very exciting!

One of the things that we've been talking about is restarting and extending the n=17 search with OCL2, or using a combined OCL/OCL2 that can switch when needed, the way the CPU app can switch transforms.
____________
My lucky number is 75898524288+1

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87254 - Posted: 17 Aug 2015 | 16:11:26 UTC - in response to Message 87252.
Last modified: 17 Aug 2015 | 17:03:37 UTC

One of the things that we've been talking about is restarting and extending the n=17 search with OCL2,

... starting at 42,598,524.

geneferocl2_windows.exe -q "42598524^131072+1" Using OCL2 transform Running on platform 'NVIDIA CUDA', device 'GeForce GTX 680', version 'OpenCL 1.2 CUDA' and driver '353.30'. 42598524^131072+1 is a probable composite. (RES=3661fa7c70613e8c) (1000001 digits) (err = 0.2120) (time = 0:53:15)

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87271 - Posted: 18 Aug 2015 | 10:26:24 UTC - in response to Message 87254.

One of the things that we've been talking about is restarting and extending the n=17 search with OCL2,

... starting at 42,598,524.

To be precise, 42,597,774^131072+1 is the smallest mega-number for n=17:

42597774^131072+1 is composite. (RES=3304d8cf4acbc370) (1000000 digits) (err = 0.2190) (time = 0:53:30)

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87273 - Posted: 18 Aug 2015 | 12:43:53 UTC

I just checked the largest 'b' found with genefer80. geneferocl2 successfully passed the test.

Using OCL2 transform 140000374^2048+1 is a probable prime. (16684 digits) (err = 0.1968) (time = 0:00:05) 103109922^4096+1 is a probable prime. (32823 digits) (err = 0.1438) (time = 0:00:12) 100219912^8192+1 is a probable prime. (65544 digits) (err = 0.2099) (time = 0:00:25) 99941872^16384+1 is a probable prime. (131068 digits) (err = 0.3173) (time = 0:01:03) 15547296^32768+1 is a probable prime. (235657 digits) (err = 0.0121) (time = 0:03:31) 19502212^65536+1 is a probable prime. (477763 digits) (err = 0.0290) (time = 0:14:01) Using x87 (80-bit) transform 140000374^2048+1 is a probable prime. (16684 digits) (err = 0.2188) (time = 0:00:05) 103109922^4096+1 is a probable prime. (32823 digits) (err = 0.1875) (time = 0:00:23) 100219912^8192+1 is a probable prime. (65544 digits) (err = 0.2500) (time = 0:01:32) 99941872^16384+1 is a probable prime. (131068 digits) (err = 0.3750) (time = 0:06:59) 15547296^32768+1 is a probable prime. (235657 digits) (err = 0.0137) (time = 0:25:01) 19502212^65536+1 is a probable prime. (477763 digits) (err = 0.0312) (time = 1:51:12)

Roger
Volunteer developer
Volunteer tester

Joined: 27 Nov 11
Posts: 1138
ID: 120786
Credit: 268,668,824
RAC: 0

Message 87274 - Posted: 18 Aug 2015 | 14:44:01 UTC - in response to Message 87153.

From pschoefer's test on AMD GPU, the OCL2 runtime > 15x OCL !! (n > 14).
Needs optimisation work. Intriguing though.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87275 - Posted: 18 Aug 2015 | 15:35:52 UTC - in response to Message 87274.
Last modified: 18 Aug 2015 | 15:38:04 UTC

From pschoefer's test on AMD GPU, the OCL2 runtime > 15x OCL !! (n > 14).
Needs optimisation work. Intriguing though.

On GeForce GTX 680, the OCL2 runtime ~ 4.5 OCL.

But on GTX 680, FP64 = 1/24 FP32 and on Radeon HD 79x0, FP64 = 1/4 FP32.
And inline assembler is used on NVIDIA in the latest version.

Radeon HD 79x0 are as fast as Titan for GFN WR (FP64) but are as fast as GTX 960 for PPS Sieve (FP32).

Note that pschoefer's test was done with the previous version... OCL2 may be a bit faster on ATI now ??

pschoefer
Volunteer developer
Volunteer tester

Joined: 20 Sep 05
Posts: 685
ID: 845
Credit: 2,886,414,412
RAC: 77,022

Message 87276 - Posted: 18 Aug 2015 | 15:57:31 UTC - in response to Message 87275.

Note that pschoefer's test was done with the previous version... OCL2 may be a bit faster on ATI now ??

It is:

Running benchmarks for transform implementation "OCL2" Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'. 2199064^8192+1 Time: 388 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 442 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 749 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 1.09 ms/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 2.14 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 4.12 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 8.93 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 18.4 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 24.7 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 37.8 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 111 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 8.

geneferocl2_windows.exe -q "3149688^32768+1" 3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0005) (time = 0:08:58)

____________

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 87286 - Posted: 19 Aug 2015 | 12:49:01 UTC

Update on the Mac front - it appears that the app is not completely broken, as it works just fine on my integrated Intel Iris Pro GPU:

Running benchmarks for transform implementation "OCL2" Running on platform 'Apple', device 'Iris Pro', version 'OpenCL 1.2 ' and driver '1.2(Jul 29 2015 02:40:39)'. 2199064^8192+1 Time: 418 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 681 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 1.36 ms/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 2.08 ms/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 4.49 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 10.5 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 20.9 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 49.2 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 106 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 157 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 331 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 2.

However, on my AMD FirePro 700, the OpenCL compiler fails with an internal error, and I haven't pinpointed yet exactly what causes it. I also have a couple of older Nvidia GPUs that I will try later.

I added a binary to the SVN if anyone is interested in testing:https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin/mac/geneferocl2_macintel

Cheers

- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 87291 - Posted: 19 Aug 2015 | 20:34:38 UTC - in response to Message 87286.
Last modified: 19 Aug 2015 | 20:35:19 UTC

I also have a couple of older Nvidia GPUs that I will try later.

Well the OpenCL code compiles and runs successfully but it's very slow (7ms/mul for 2199064^8192+1), and locked up my machineâ€¦

Does anyone have a more modern card that they want to try it on?

- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!

eXaPower

Joined: 30 Sep 13
Posts: 122
ID: 259902
Credit: 1,636,946,772
RAC: 0

Message 87297 - Posted: 20 Aug 2015 | 9:27:46 UTC - in response to Message 87291.
Last modified: 20 Aug 2015 | 9:50:17 UTC

I also have a couple of older Nvidia GPUs that I will try later.

Well the OpenCL code compiles and runs successfully but it's very slow (7ms/mul for 2199064^8192+1), and locked up my machineâ€¦

Does anyone have a more modern card that they want to try it on?

- Iain

GT650m @ 835MHz (driver 353.51/branch r352/OpenCL 1.2)

2199064^8192+1 Time: 311 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 567 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.17 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.29 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.62 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 9.92 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 20.1 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 42.6 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 90.1 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 184 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 562 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.

C.C 3.0 (384 Kelper core) 4x slower than C.C 5.0 (512c GTX750) and 8x less the OCL2 performance of a C.C 5.2 (1664c GTX970). See posts below.

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 87298 - Posted: 20 Aug 2015 | 10:11:28 UTC - in response to Message 87297.

Does anyone have a more modern card that they want to try it on?

- Iain

GT650m @ 835MHz (driver 353.51/branch r352/OpenCL 1.2)

C.C 3.0 (384 Kelper core) 4x slower than C.C 5.0 (512c GTX750) and 8x less the OCL2 performance of a C.C 5.2 (1664c GTX970). See posts below.

I meant Nvidia on a Mac ;)
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87426 - Posted: 24 Aug 2015 | 12:07:29 UTC

A new version of "geneferocl2" is available from Genefer repository.

It performs faster than the previous version on NVIDIA GPU.

Some benches:
GTX 680

2199064^8192+1 Time: 86.7 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 108 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 209 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 416 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 755 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.44 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 3.02 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 6.34 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 13.8 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 27.7 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 56 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 15.

GTX 780Ti
2199064^8192+1 Time: 120 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 117 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 163 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 247 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 462 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 805 us/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 1.54 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 3.08 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 6.81 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 13.5 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 27.4 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 31.

Note that for N=524288, GTX 780Ti is twice as fast as GTX 680.
But for N=65536 and 131072, the ratio is 1.6/1.7 and it's even worse for N < 65536.
2880 cores are "too much" if N < 524288.

Yves

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87430 - Posted: 24 Aug 2015 | 12:25:26 UTC - in response to Message 87426.
Last modified: 24 Aug 2015 | 15:51:46 UTC

New relase bench:

GTX 660 OEM

2199064^8192+1 Time: 110 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 148 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 318 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 551 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.03 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 2.04 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 4.22 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 8.87 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 19.3 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 38.6 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 77 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 11.

This is about 1/3 the speed of the regular OCL app on this card...Nice improvement!

EDIT:
Genefer Mark Comparison

OCL2 ver.1 = 6 OCL2 ver.2 = 9 OCL2 ver.3 = 11 OCL = 34

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87431 - Posted: 24 Aug 2015 | 12:46:46 UTC

On a side note - sold GTX 580 couple days ago, now in process of selling HD 7950.
(there is still R280X in my home computer).

Thinking about Fury Nano that should be introduced later this week.
4096 shaders? Guess it would need Skylake to feed this GPU...and working on that as well :-)

Do we have any tests on AMD Fury series with HBM?
____________
My stats

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87432 - Posted: 24 Aug 2015 | 12:54:03 UTC - in response to Message 87426.

Maxwell has 2/3 of the performance of Kepler.

GTX 980

2199064^8192+1 Time: 150 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 158 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 168 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 211 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 621 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.23 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 2.53 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 5.3 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 10.5 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 22.9 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 47 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 19.

Genefer Mark was expected to be 4612/5046 * 31 = 28 (compared to GTX 780Ti).

Why?

eXaPower

Joined: 30 Sep 13
Posts: 122
ID: 259902
Credit: 1,636,946,772
RAC: 0

Message 87433 - Posted: 24 Aug 2015 | 12:56:02 UTC

GTX750 @ 1412MHz:

2199064^8192+1 Time: 114 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 177 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 263 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 451 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 1.99 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 3.99 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 8.06 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 16.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 34.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 72.2 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 12.

GTX750 (ver.2) OCL2 mark = 8.

GTX970 @ 1501MHz:
2199064^8192+1 Time: 127 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 129 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 147 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 228 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 516 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 981 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.14 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 4.27 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 8.89 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 17.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 38.5 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 23.

(OCL2 ver.3) improvement for a GTX970: 23 vs. 16 mark. Excellent work Yves!

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87434 - Posted: 24 Aug 2015 | 14:57:44 UTC

Radeon HD 4350 still fails, same message as my last post.

Once agian, I'll do Gtx 970 tests at Friday... if I remember to do it, as I forgot about it last week. If there's any particular test you guys want, just hit me up and I'll try doing it.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87435 - Posted: 24 Aug 2015 | 15:18:05 UTC

Latest OCL2 on GTX 580:

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '355.60'. 2199064^8192+1 Time: 86.4 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 103 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 134 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 313 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 622 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.18 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 2.36 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 4.82 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 9.99 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 21.5 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 46.6 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 20.

Previous OCL2 had Genefer Mark of 17. OCL has Genefer Mark of 73.
____________
My lucky number is 75898524288+1

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87437 - Posted: 24 Aug 2015 | 15:54:20 UTC

Yves, would a fixed-point CPU program (a CPU equivalent of OCL2) be able to use the integer versions of AVX instructions? If so, would such a program be likely to run a lot faster than the x87 transform on CPUs?
____________
My lucky number is 75898524288+1

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87439 - Posted: 24 Aug 2015 | 16:29:03 UTC

FYI...the ver. 3 of OCL2 seems to have reintroduced the failing on older drivers problem. Just tried it on GT 540M with 344.75 drivers and failed with CL_INVALID_COMMAND_QUEUE error. Updated to latest drivers and running bench test with no problems.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87440 - Posted: 24 Aug 2015 | 17:00:44 UTC - in response to Message 87437.

Yves, would a fixed-point CPU program (a CPU equivalent of OCL2) be able to use the integer versions of AVX instructions?

Yes with AVX2 (AVX has no integer instructions, just FP32 and FP64).

If so, would such a program be likely to run a lot faster than the x87 transform on CPUs?

A lot faster no, I think that it would be slower.

AVX2 is height 32-bit integer units: int32 add or mul takes 1 cycle and clock ~ 4 GHz.

NVIDIA GPU is n cores: int32 add takes 1 cycle, mul takes 4. Clock is about 1 GHz.

Then one Intel CPU core is about as fast as 50 NVIDIA GPU cores (counting some mul 4/1).

GeForce 740M (384 cores @ 800 MHz): OCL2 Genefer Mark = 3.
i7-3820 @ 3.6 GHz: genefer80 Genefer Mark = 2 (one core).
On one core of this processor, we can expect that a fixed-point CPU genefer would have Genefer Mark = 0.5.

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87443 - Posted: 24 Aug 2015 | 17:20:46 UTC - in response to Message 87431.

Do we have any tests on AMD Fury series with HBM?

I have a Fury X due to arrive tomorrow so if there are no surprises I'll try to test it then.

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87462 - Posted: 24 Aug 2015 | 22:10:46 UTC - in response to Message 87443.
Last modified: 24 Aug 2015 | 22:21:27 UTC

Ok, joining in with whatever I have that can run it...

GTX 960

2199064^8192+1 Time: 131 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 138 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 222 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 455 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 897 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.8 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 3.9 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 8.13 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 17.2 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 33.1 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 72.3 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 12.

GTX 560 Ti
2199064^8192+1 Time: 97.6 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 121 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 243 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 545 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.03 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 2.08 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 4.27 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 8.92 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 18 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 39.3 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 85.3 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 11.

GeForce 9500 GT
Wont run - card too old? e.g.
Error: build program failed. ptxas application ptx input, line 115; error : Instruction 'mad.cc' requires .target sm_20 or higher ptxas application ptx input, line 116; error : Instruction 'madc' requires .target sm_20 or higher

R9 280X
2199064^8192+1 Time: 432 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 401 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 820 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 1.1 ms/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.78 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 3.61 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 7.53 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 15.5 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 21 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 33.2 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 95.3 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 9.

HD6850
2199064^8192+1 Time: 310 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 775 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 1.29 ms/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 726 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.56 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 4.71 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 13 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 36 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 58.9 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 56.6 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 114 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 5.

Just to check, I downloaded tonight. It reports itself as geneferocl2 3.2.9-dev. Is this the v.3 talked about above?

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87464 - Posted: 24 Aug 2015 | 22:35:50 UTC - in response to Message 87462.

GeForce 9500 GT
Wont run - card too old? e.g.
Error: build program failed. ptxas application ptx input, line 115; error : Instruction 'mad.cc' requires .target sm_20 or higher ptxas application ptx input, line 116; error : Instruction 'madc' requires .target sm_20 or higher

Yes GeForce 400 series (Fermi) or its successors is required.
I don't know how to detect the microarchitecture with OpenGL.
It would be possible to replace madc with the pair of instructions mul + adc (slower on sm_20 or higher).

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87465 - Posted: 24 Aug 2015 | 23:05:49 UTC - in response to Message 87464.

Yes GeForce 400 series (Fermi) or its successors is required.

Hmmm...I was able to run OCL2 ver.2 on an 8400M GS with driver 285.xx (which is OpenCL 1.0). Is there a specific change in ver.3 that now prevents the older cards from running?

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87471 - Posted: 25 Aug 2015 | 15:02:49 UTC - in response to Message 87465.

Yes GeForce 400 series (Fermi) or its successors is required.

Hmmm...I was able to run OCL2 ver.2 on an 8400M GS with driver 285.xx (which is OpenCL 1.0). Is there a specific change in ver.3 that now prevents the older cards from running?

Yes, "multiply and add with carry-in" is a ver.3 optimisation (was not used before).
I don't know how to detect NVIDIA microarchitecture: I would be able to use madc on GPU >= Fermi and mul + adc on GPU < Fermi.
The code is faster with madc then this is the default.
If someone know how to detect NVIDIA microarchitecture...

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87472 - Posted: 25 Aug 2015 | 15:09:38 UTC - in response to Message 87439.

FYI...the ver. 3 of OCL2 seems to have reintroduced the failing on older drivers problem. Just tried it on GT 540M with 344.75 drivers and failed with CL_INVALID_COMMAND_QUEUE error. Updated to latest drivers and running bench test with no problems.

I'm trying to understand why...

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87484 - Posted: 25 Aug 2015 | 19:06:55 UTC

Fury X

2199064^8192+1 Time: 441 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 466 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 512 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 629 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.24 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 2.25 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 4.54 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 8.98 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 13 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 16.1 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 55 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 17.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87489 - Posted: 26 Aug 2015 | 9:38:05 UTC - in response to Message 87472.

FYI...the ver. 3 of OCL2 seems to have reintroduced the failing on older drivers problem. Just tried it on GT 540M with 344.75 drivers and failed with CL_INVALID_COMMAND_QUEUE error. Updated to latest drivers and running bench test with no problems.

I found it, it's a little bit tricky to fixe it.

It is a bug in NVIDIA ptxas (ptx optimizing assembler).
It depends on driver version because it was fixed in CUDA 7.0.
Driver 344.75 (or 345.20) = CUDA 6.5 => bug and driver 347.09 = CUDA 7.0 => OK.

The problem occurs when a large number of ptx registers is used... I don't know how to reduce it without slowing the program.
It can be fixed by setting optimizer level to 2 (default is 3).
-cl-nv-opt-level=2 => OK, -cl-nv-opt-level=3 => bug.

To resolve the issue, I have to:
- check that 'platform' is "NVIDIA"
- check that 'driver' < 347
then set optimizer level to 2.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87553 - Posted: 31 Aug 2015 | 12:12:10 UTC

A "release candidate" of "geneferocl2" is available from Genefer repository.

On NVIDIA GPU, the bug "failed on driver < 347" is fixed.
On ATI, it may perform a bit faster (the 'generic' code is faster on NVIDIA, it was not tested on ATI).

geneferocl2 is achieved: no new feature/improvement is scheduled (bug fixes only if necessary).

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87559 - Posted: 31 Aug 2015 | 16:38:53 UTC

Yves,

Iain is on the road this week, so if there doesn't seem to be much immediate action on OCL2, that's why.

Having a release candidate is very exciting news, and it comes at a good time. Thanks!
____________
My lucky number is 75898524288+1

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87560 - Posted: 31 Aug 2015 | 16:53:42 UTC - in response to Message 87553.

A "release candidate" of "geneferocl2" is available from Genefer repository.

On NVIDIA GPU, the bug "failed on driver < 347" is fixed.
On ATI, it may perform a bit faster (the 'generic' code is faster on NVIDIA, it was not tested on ATI).

geneferocl2 is achieved: no new feature/improvement is scheduled (bug fixes only if necessary).

Still failing on the 4350 (dat OpenCL beta support.....). I'll have to set up a reminder to do the tests on my gtx 970, for I just forgot to do it again. You guys could post updates on friday's, would be very helpful for me :)

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87562 - Posted: 31 Aug 2015 | 17:02:47 UTC - in response to Message 87553.

A "release candidate" of "geneferocl2" is available from Genefer repository.

On ATI, it may perform a bit faster (the 'generic' code is faster on NVIDIA, it was not tested on ATI).

280X, used to get mark=8 using earlier version.
2199064^8192+1 Time: 370 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 405 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 708 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 851 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 1.56 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 3.05 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 6.46 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 13.3 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 17.9 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 20.3 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 73.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 13.

____________
My stats

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87564 - Posted: 31 Aug 2015 | 17:23:11 UTC

Latest OCL2 on GTX 580:

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '355.60'. 2199064^8192+1 Time: 86.7 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 103 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 134 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 313 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 621 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.18 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 2.36 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 4.81 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 9.94 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 21.5 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 46.5 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 20.

Previous version of OCL2:

2199064^8192+1 Time: 86.4 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 103 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 134 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 313 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 622 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.18 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 2.36 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 4.82 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 9.99 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 21.5 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 46.6 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 20.

Essentially identical speed.
____________
My lucky number is 75898524288+1

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87575 - Posted: 31 Aug 2015 | 19:32:25 UTC

R9 280X

Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', vers ion 'OpenCL 1.2 AMD-APP (1800.8)' and driver '1800.8 (VM)'. 2199064^8192+1 Time: 366 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 399 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 684 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 834 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.51 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 2.97 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 6.31 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 13 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 17.7 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 19.9 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 75 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 13.

Up from 9 previously.

GTX960
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 960', version 'OpenCL 1.2 CUDA' and driver '355.60'. 2199064^8192+1 Time: 125 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 134 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 215 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 447 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 880 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.76 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 3.82 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 7.94 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 17.2 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 33.1 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 72.6 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 12.

Unchanged score from previous.

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87609 - Posted: 2 Sep 2015 | 18:47:35 UTC - in response to Message 87575.

GTX980Ti

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980 Ti', version 'OpenCL 1.2 CUDA' and driver '355.82'. 2199064^8192+1 Time: 151 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 158 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 172 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 208 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 429 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 786 us/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 1.56 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 3.11 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 6.25 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 12.8 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 28.3 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 32.

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87611 - Posted: 2 Sep 2015 | 23:26:35 UTC

Gtx 970 (1499mhz):

Command line: geneferocl2_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2
CUDA' and driver '355.60'.

2199064^8192+1 Time: 121 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 128 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 146 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 243 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 514 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 969 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 2.12 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 4.24 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 8.44 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 17.6 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 38.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 23.
Priority change succeeded.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87618 - Posted: 3 Sep 2015 | 16:48:42 UTC

Another new Genefer application "geneferocl3" is available from Genefer repository (http://www.primegrid.com/forum_thread.php?id=6359).

A 32-bit binary for windows is available (compiled with Visual Studio 2013).

geneferocl3 uses a Number Theoretic Transform. The finite a field is Z/p with p = 2^64 - 2^32 + 1.
Then maxErr = 0 and the limits can be computed using the relation p / 2 > n . b^2:

m = 1024, bMax = 94906266. m = 2048, bMax = 67108864. m = 4096, bMax = 47453133. m = 8192, bMax = 33554432. m = 16384, bMax = 23726566. m = 32768, bMax = 16777216. m = 65536, bMax = 11863283. m = 131072, bMax = 8388608. m = 262144, bMax = 5931642. m = 524288, bMax = 4194304. m = 1048576, bMax = 2965821. m = 2097152, bMax = 2097152. m = 4194304, bMax = 1482910. m = 8388608, bMax = 1048576.

They are larger than geneferocl bounds but smaller than geneferocl2 limits.

geneferocl2 can run on any GPU supporting 'OpenCL 1.0'.

On a GTX 680, we have:
OCL: Genefer Mark = 66. OCL2: Genefer Mark = 15. OCL3: Genefer Mark = 45.

This is a beta version: just being removed from the oven.

Tyler
Project administrator
Volunteer tester

Joined: 4 Dec 12
Posts: 1078
ID: 183129
Credit: 1,376,122,338
RAC: 4,719

Message 87619 - Posted: 3 Sep 2015 | 16:59:13 UTC

Stock GTX 760:

OCL:

Command line: geneferocl_windows -b Priority change succeeded. Priority change succeeded. Generalized Fermat Number Bench Running benchmarks for transform implementation "OCL" Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'. 2199064^8192+1 Time: 41 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 47.2 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 71.4 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 125 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 243 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 445 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 887 us/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 1.79 ms/mul. Err: 0.1738 6009544 digits 440400^2097152+1 Time: 3.67 ms/mul. Err: 0.1777 11836006 digits 360204^4194304+1 Time: 7.71 ms/mul. Err: 0.1592 23305854 digits 294612^8388608+1 Time: 16.7 ms/mul. Err: 0.1797 45879398 digits Genefer Mark = 54. Priority change succeeded.

OCL2:
Command line: geneferocl2_windows -b Priority change succeeded. Priority change succeeded. Generalized Fermat Number Bench Running benchmarks for transform implementation "OCL2" Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'. 2199064^8192+1 Time: 94.5 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 135 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 279 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 500 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 925 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 1.83 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 3.79 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 7.97 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 17.3 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 34.9 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 70.6 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 12. Priority change succeeded.

OCL3:
Command line: geneferocl3_windows -b Priority change succeeded. Priority change succeeded. Generalized Fermat Number Bench Running benchmarks for transform implementation "OCL3" Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', version 'OpenCL 1.1 CUDA' and driver '347.52'. 2199064^8192+1 Time: 55.4 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 61.7 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 97 us/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 181 us/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 354 us/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 685 us/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 1.38 ms/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 2.78 ms/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 5.77 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 11.7 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 24.7 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 35. Priority change succeeded.

____________

275*2^3585539+1 is prime!!! (1079358 digits)

Proud member of Aggie the Pew

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87620 - Posted: 3 Sep 2015 | 17:13:12 UTC
Last modified: 3 Sep 2015 | 17:42:14 UTC

So, tryed it on my gtx 970 (Stock 1329 / OC 1499 mhz). The application seems to be COMPLETELY CPU bound. Why do I say that? Because the GPU seemed to do little work during the entire test. One of the CPU cores was always at 100% - but, looking into the GPU, usage was about 1~3% for the first few tests, and then it would just stay at 0% for long periods.

In fact, it would spend so much time without working that the power saving measures (downclock and undervolt) would kick in, putting the card at idle speeds. Then, it would spike to, say, 85%, post the result, then begin the next test, which repeated the pattern.

I don't know if that's my CPU that's too weak (A6-3500, Oced to 3.2ghz), but that's what happened. Considering that the score went up from 77 to 85, I suppose OCL3 is already a very good improvement, but I guess there's room for a lot more...

EDIT: Tryed it on my trusty Radeon HD 4350. And.... nope, ain't working. See last quote for the error codes.

OCL (1499 mhz)

Command line: geneferocl_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2
CUDA' and driver '355.82'.

2199064^8192+1 Time: 52.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 55.4 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 73.3 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 99.8 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 178 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 343 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 673 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.33 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 2.59 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 5.41 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 10.9 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 77.

OCL3 (1499mhz)
Command line: geneferocl3_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2
CUDA' and driver '355.82'.

2199064^8192+1 Time: 63.6 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 65.3 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 75.1 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 94 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 181 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 322 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 603 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.21 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.39 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.77 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.89 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 85.
Priority change succeeded.

Radeon HD 4350 (failed to run).
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', v
ersion 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.

Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Error: build program failed.
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 216: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 240: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 264: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 290: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 318: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(32 / 4 * BLK32, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 318: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(32 / 4 * BLK32, 1, 1))) __attribu
te__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))
^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 370: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(128 / 4 * BLK128, 1, 1))) __attri
bute__((vec_type_hint(Zp)))
^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 370: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(128 / 4 * BLK128, 1, 1))) __attri
bute__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 399: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(256 / 4 * BLK256, 1, 1))) __attri
bute__((vec_type_hint(Zp)))
^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 399: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((work_group_size_hint(256 / 4 * BLK256, 1, 1))) __attri
bute__((vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 428: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(512 / 4, 1, 1))) __attribute__((v
ec_type_hint(Zp)))
^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 458: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(1024 / 4, 1, 1))) __attribute__((
vec_type_hint(Zp)))

^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 490: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((vec_type_hint(Zp)))
^

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 518: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((vec_type_hint(Zp)))
^

Error: Requested compile size is bigger than the required workgroup size of 32â˜º
elements
Error: Creating kernel Forward64 failed!

Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE.

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87621 - Posted: 3 Sep 2015 | 17:36:49 UTC - in response to Message 87618.

A 32-bit binary for windows is available (compiled with Visual Studio 2013).

c:\temp>geneferocl3_windows.exe
This version of c:\temp\geneferocl3_windows.exe is not compatible with the version of Windows you're running. Check you computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher.

What's the trick to make it running on Win7 x64?
I've got all MS Visual C++ Redistributable installed from version 2005 up to 2013, both x86 and x64
____________
My stats

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87622 - Posted: 3 Sep 2015 | 17:37:34 UTC
Last modified: 3 Sep 2015 | 17:59:04 UTC

GTX 580 Genefer Mark scores:

OCL: 73
OCL2: 20
OCL3: 39

For kicks, I'm running 42598524^131072+1 through OCL 3. The B limit for n=17 is about 8 million and this is 42 million, so it shouldn't work. It's been running several minutes and there's no error message. Should it be detecting a problem with the limit, or does it need to simply reject running any tests above the limit? I'll edit this when it completes.

EDIT: It finished without any error message, but produced the incorrect residue.
EDIT2: Corrected typo -- I meant I was going to test OCL3, not OCL2.
____________
My lucky number is 75898524288+1

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87623 - Posted: 3 Sep 2015 | 17:39:32 UTC - in response to Message 87621.

What's the trick to make it running on Win7 x64?
I've got all MS Visual C++ Redistributable installed from version 2005 up to 2013, both x86 and x64

Mine just ran without doing anything special on Win7 x64.

I do have MS VS installed, but normally that shouldn't make a difference.

____________
My lucky number is 75898524288+1

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87624 - Posted: 3 Sep 2015 | 17:45:08 UTC

Okay, so looking at the scores, it seems that OCL 3 has lower performance than OCL with the Gtx 580 / 680 / 760, BUT, has increased performance on a Gtx 970. Did I just win the lottery there?

Waiting for eXaPower's results on his 970, for the sake of comparison...

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87626 - Posted: 3 Sep 2015 | 17:54:16 UTC - in response to Message 87622.

For kicks, I'm running 42598524^131072+1 through OCL 2. The B limit for n=17 is about 8 million and this is 42 million, so it shouldn't work.

The bound is > 42 million:
m = 131072, maxErr(b = 51.03M) = 0.2489, maxErr(b = 51.04M) = 0.2810, maxErr(b = 60.42M) = 0.3172, maxErr(b = 60.43M) = 0.4365

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87627 - Posted: 3 Sep 2015 | 17:58:06 UTC - in response to Message 87626.

For kicks, I'm running 42598524^131072+1 through OCL 2. The B limit for n=17 is about 8 million and this is 42 million, so it shouldn't work.

The bound is > 42 million:
m = 131072, maxErr(b = 51.03M) = 0.2489, maxErr(b = 51.04M) = 0.2810, maxErr(b = 60.42M) = 0.3172, maxErr(b = 60.43M) = 0.4365

That was a typo on my part... I meant OCL 3. Sorry!
____________
My lucky number is 75898524288+1

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87628 - Posted: 3 Sep 2015 | 18:01:15 UTC - in response to Message 87624.

Okay, so looking at the scores, it seems that OCL 3 has lower performance than OCL with the Gtx 580 / 680 / 760, BUT, has increased performance on a Gtx 970. Did I just win the lottery there?

580: Fermi, FP64 = 1/8 FP32.
680, 760: Kepler, FP64 = 1/24 FP32.
970: Maxwell, FP64 = 1/32 FP32.

OCL 3 runs on FP32 units and OCL on FP64.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87629 - Posted: 3 Sep 2015 | 18:06:02 UTC - in response to Message 87627.

That was a typo on my part... I meant OCL 3. Sorry!

There is no error, then maxError=0 and it works. But the result is not correct.
In the final version, I'm going to check if p / 2 > n . b^2 during initialisation and then set maxError=1.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87630 - Posted: 3 Sep 2015 | 18:19:27 UTC - in response to Message 87621.

c:\temp>geneferocl3_windows.exe
This version of c:\temp\geneferocl3_windows.exe is not compatible with the version of Windows you're running. Check you computer's system information to see whether you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact the software publisher.

What's the trick to make it running on Win7 x64?

This message indicates that the binary has been corrupted when downloaded.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87631 - Posted: 3 Sep 2015 | 18:33:51 UTC

FYI, our current plans are to produce a combined OCL/OCL2/OCL3 genefer app that will automatically switch to the appropriate transform. This will open up all the n ranges to continued GPU crunching.
____________
My lucky number is 75898524288+1

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87633 - Posted: 3 Sep 2015 | 19:06:43 UTC
Last modified: 3 Sep 2015 | 19:11:07 UTC

280X
OCL: 115
OCL2: 13
OCL3: 49

2199064^8192+1 Time: 144 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 153 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 200 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 301 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 407 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 709 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.23 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 2.23 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 4.12 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 8.12 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 16.3 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 49.
____________
My stats

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 87636 - Posted: 3 Sep 2015 | 19:19:45 UTC

Running on platform 'Apple', device 'ATI Radeon HD - FirePro D700 Compute Engine', version 'OpenCL 1.2 ' and driver '1.2 (Jun 10 2015 16:27:05)'

OCL: Genefer Mark = 36
OCL2: Still broken on Apple/AMD (Apple/Intel works though)
OCL3: Genefer Mark = 26

2199064^8192+1 Time: 269 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 356 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 372 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 403 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 473 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 866 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.91 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 4.43 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 7.92 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 16.2 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 30.1 ms/mul. Err: 0.0000 45879398 digits

Thanks Yves, this looks very interesting indeed!

- Iain

____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!

eXaPower

Joined: 30 Sep 13
Posts: 122
ID: 259902
Credit: 1,636,946,772
RAC: 0

Message 87637 - Posted: 3 Sep 2015 | 19:38:58 UTC
Last modified: 3 Sep 2015 | 19:53:55 UTC

OCL3 benchmark:

GTX 970 @ 1.6GHz:

2199064^8192+1 Time: 71.8 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 66.3 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 70.4 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 91.8 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 179 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 316 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 666 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.19 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.28 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.57 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.58 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 88.

1540MHz:

2199064^8192+1 Time: 73.4 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 69.4 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 71.5 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 95.3 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 184 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 325 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 669 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.21 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.33 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.68 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.73 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 86.

GTX 750 @ 1451MHz:

2199064^8192+1 Time: 61.8 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 73.8 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 122 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 206 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 420 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 863 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.83 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 3.58 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 7.28 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 14.8 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 31.1 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 28.

OCL3 benchmark peaks:
970: (231W/85%MCU/60%BUS)
750: (75W/65%MCU/54%BUS)
OCL benchmark wattage peak: 145W (970) and 45W (750).

Ten OCed benchmark passes on each GPU didn't trigger a max error.
My 970 with Ver.1 of OCL2 would max error >1540MHz while ver.2/3 tripped max error at >1501MHz.
Regular OCL version benchmark max error >1565MHz.

*Edit*: I just read Yves posts stating there no error check on OCL3.
There is no error, then maxError=0 and it works. But the result is not correct.
In the final version, I'm going to check if p / 2 > n . b^2 during initialisation and then set maxError=1.

970's marks:
88 = OCL3
78 = OCL
23 = OCL2 ver.3

750 marks:

28 = OCL3
25 = OCL
12 = OCL2 ver.3

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87638 - Posted: 3 Sep 2015 | 20:14:35 UTC

So, I just remembered that I had a notebook with a G105m on it, so I decided to give it a go:

OCL 1: No Opencl 1.2 support, so the application just says that there's no capable device.

Command line: geneferocl_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
No compatible OpenCL device found.
Device List:
0: GPU device 'GeForce G 105M' on 'NVIDIA CUDA'.

OCL 2: Another fail. Really, wasn't Ver. 2 supposed to work with older hardware, due to lower requirements? At any rate, here are the codes
Command line: geneferocl2_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"

Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.

Error: build program failed.
ptxas application ptx input, line 115; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 116; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 117; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 118; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 119; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 120; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 122; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 123; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 124; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 125; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 160; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 161; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 162; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 163; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 164; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 165; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 167; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 168; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 169; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 170; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 219; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 220; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 221; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 222; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 223; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 224; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 226; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 227; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 228; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 229; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 261; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 262; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 263; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 264; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 265; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 266; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 268; error : Instruction 'mad.cc' requires .target sm_20 or higher
ptxas application ptx input, line 269; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 270; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, line 271; error : Instruction 'madc' requires .target sm_20 or higher
ptxas application ptx input, li
Error: OpenCL error detected: CL_INVALID_BINARY.

OCL 3: Kinda fail...? The progam does start, and it actually completes a few number. BUT, whenever I get to the 8th test, it ALWAYS fails, the driver just crashes, period. I tried with 341.74 and 341.81 (latest, August 24th release), but nope. Stock speeds, insanely low underclocks, the thing just fails everytime at this particular test, whatever the reason may be!

This was done in stock 640/500/1600mhz on core/mem/shadders
Command line: geneferocl3_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"

Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.

2199064^8192+1 Time: 1.54 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 3.46 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 7.64 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 16 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 29.5 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 60.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 177 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.

And this was done on 500/400/1000:
Command line: geneferocl3_windows -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"

Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.

2199064^8192+1 Time: 2.35 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 5.18 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 11.4 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 23.8 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 44.7 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 91.3 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 250 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87640 - Posted: 3 Sep 2015 | 20:26:09 UTC

I get the same fail pattern with OCL3 on an 8400M GS with 285.xx driver. The tests compute fine up to n=19, but fail at n=20.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87641 - Posted: 3 Sep 2015 | 21:06:04 UTC - in response to Message 87638.

So, I just remembered that I had a notebook with a G105m on it, so I decided to give it a go:

Running benchmarks for transform implementation "OCL3"

Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.

2199064^8192+1 Time: 1.54 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 3.46 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 7.64 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 16 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 29.5 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 60.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 177 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.

A video Timeout Detection and Recovery occurred.
It's really slow, my integrated HD 4000 (i3-3217U CPU @ 1.80GHz) is 8 times faster!

Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.2 ' and driver '10.18.10.4252'. Running benchmarks for transform implementation "OCL3" 2199064^8192+1 Time: 451 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 622 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 1.08 ms/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 2.32 ms/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 4.47 ms/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 9.24 ms/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 20 ms/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 41.7 ms/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 85.5 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 183 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 373 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 2.

GeForce G 105M: Genefer Mark = 0.3 ...

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87642 - Posted: 3 Sep 2015 | 21:13:18 UTC - in response to Message 87641.

So, I just remembered that I had a notebook with a G105m on it, so I decided to give it a go:

Running benchmarks for transform implementation "OCL3"

Running on platform 'NVIDIA CUDA', device 'GeForce G 105M', version 'OpenCL 1.0 CUDA' and driver '341.81'.

2199064^8192+1 Time: 1.54 ms/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 3.46 ms/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 7.64 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 16 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 29.5 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 60.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 177 ms/mul. Err: 0.0000 3050541 digits
Error: OpenCL error detected: CL_INVALID_COMMAND_QUEUE.

A video Timeout Detection and Recovery occurred.
It's really slow, my integrated HD 4000 (i3-3217U CPU @ 1.80GHz) is 8 times faster!

Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.2 ' and driver '10.18.10.4252'. Running benchmarks for transform implementation "OCL3" 2199064^8192+1 Time: 451 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 622 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 1.08 ms/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 2.32 ms/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 4.47 ms/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 9.24 ms/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 20 ms/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 41.7 ms/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 85.5 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 183 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 373 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 2.

GeForce G 105M: Genefer Mark = 0.3 ...

Hey, at least the GPU was stuck at 99% usage all times, and the CPU core wasn't fully sucked, so I guess the hardware was being better utilized. Not that I'd crunch with it anyway, PPS Sieve just causes imense lag AND overheat.

But since this is supposed to be Beta testing, I figured I might as well try it out......

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87643 - Posted: 3 Sep 2015 | 21:25:21 UTC - in response to Message 87620.

Tryed it on my trusty Radeon HD 4350. And.... nope, ain't working. See last quote for the error codes.

Radeon HD 4350 (failed to run).
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', v
ersion 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.

Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Error: build program failed.
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 216: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))

That is why I'm writing "geneferocl3 can run on any GPU supporting 'OpenCL 1.0'."
vec_type_hint and work_group_size_hint are some OpenCL 1.0 function qualifiers.
Then this is not an OpenCL-capable driver/GPU.

Wingless Wonder

Joined: 25 Dec 12
Posts: 175
ID: 186381
Credit: 413,600,996
RAC: 128

Message 87644 - Posted: 3 Sep 2015 | 21:50:41 UTC
Last modified: 3 Sep 2015 | 21:51:53 UTC

geneferocl_windows benchmark runs to successful completion on GTX TITAN Black from command line.

OCL2 and OCL3 benchmarks default to CPU graphics - will not run on GTX TITAN Black from command line. Am I using the correct binaries?

EDIT: I am running Windows 10 x64 GeForce driver 355.82.

Command line: geneferocl2_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"

Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600',
version 'OpenCL 1.2 ' and driver '10.18.15.4256'.

geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 336 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 649 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.21 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.18 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.7 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 9.68 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 20.4 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 43 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 93 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 177 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 380 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.
Priority change succeeded.
-------------------------------------------
Command line: geneferocl3_windows -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"

Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600',
version 'OpenCL 1.2 ' and driver '10.18.15.4256'.

geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 176 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 337 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 685 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 1.47 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 2.91 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 6.05 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 11.9 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 24.9 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 53.5 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 116 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 225 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 4.
Priority change succeeded.

eXaPower

Joined: 30 Sep 13
Posts: 122
ID: 259902
Credit: 1,636,946,772
RAC: 0

Message 87645 - Posted: 3 Sep 2015 | 21:54:21 UTC

OCL3:
GT 650m @ 790MHz (355.58):

2199064^8192+1 Time: 107 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 185 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 340 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 635 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 1.33 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 2.72 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 5.74 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 11.6 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 23.4 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 49.1 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 104 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 8.

I-5 4440S @ 2.9GHz OpenCL 1.2 (Build148) Driver 4.2.0.148:

2199064^8192+1 Time: 438 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 876 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 1.78 ms/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 3.8 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 7.85 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 16.4 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 34.5 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 73.1 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 153 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 316 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 664 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 1.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87646 - Posted: 3 Sep 2015 | 21:59:28 UTC - in response to Message 87637.

970's marks:
88 = OCL3
78 = OCL
23 = OCL2 ver.3

750 marks:

28 = OCL3
25 = OCL
12 = OCL2 ver.3

The first version of OCL3 was not expected to run faster than OCL, but it does on Maxwell.
This is great news!

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87647 - Posted: 3 Sep 2015 | 22:08:05 UTC - in response to Message 87644.

geneferocl_windows benchmark runs to successful completion on GTX TITAN Black from command line.

OCL2 and OCL3 benchmarks default to CPU graphics - will not run on GTX TITAN Black from command line. Am I using the correct binaries?

Yes, Intel's GPU device is the default one.

You can try the command line:

geneferocl3_windows.exe -nvidia -b

or

geneferocl3_windows.exe -d 1 -b

Wingless Wonder

Joined: 25 Dec 12
Posts: 175
ID: 186381
Credit: 413,600,996
RAC: 128

Message 87648 - Posted: 3 Sep 2015 | 22:28:29 UTC - in response to Message 87647.

Yes, Intel's GPU device is the default one.

You can try the command line:

geneferocl3_windows.exe -nvidia -b

Thank you, Yves. I tried the first command line and it works:

Command line: geneferocl_windows -nvidia -b
Running on platform 'NVIDIA CUDA', device GeForce GTX TITAN Black', version
'OpenCl 1.2 CUDA' and driver '355.82'.

geneferocl 3.2.8 (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 50.8 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 53.7 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 60.1 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 78.5 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 102 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 209 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 359 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 673 us/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 1.26 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 2.56 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 5.23 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 159.
Priority change succeeded.
----------------------------------
geneferocl2_windows.exe -nvidia -b
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"
2199064^8192+1 Time: 116 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 122 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 164 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 253 us/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 477 us/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 842 us/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 1.58 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 3.15 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 6.92 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 13.7 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 27.7 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 30.
Priority change succeeded.
----------------------------------
geneferocl3_windows.exe -nvidia -b
geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
2199064^8192+1 Time: 81.5 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 86.4 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 99.7 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 135 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 239 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 395 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 719 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.39 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.82 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 5.57 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 12.3 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 72.
Priority change succeeded.

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87649 - Posted: 3 Sep 2015 | 22:44:57 UTC

GTX 980 Ti

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980 Ti', version 'OpenCL 1.2 CUDA' and driver '355.82'. Running benchmarks for transform implementation "OCL" 2199064^8192+1 Time: 57.5 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 53.3 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 65.4 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 86.1 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 136 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 277 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 504 us/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 1.02 ms/mul. Err: 0.1738 6009544 digits 440400^2097152+1 Time: 1.97 ms/mul. Err: 0.1777 11836006 digits 360204^4194304+1 Time: 4.09 ms/mul. Err: 0.1592 23305854 digits 294612^8388608+1 Time: 8.18 ms/mul. Err: 0.1797 45879398 digits Genefer Mark = 101. Running benchmarks for transform implementation "OCL2" 2199064^8192+1 Time: 150 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 158 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 171 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 207 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 432 us/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 776 us/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 1.55 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 3.12 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 6.11 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 12.7 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 28.3 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 32. Running benchmarks for transform implementation "OCL3" 2199064^8192+1 Time: 75.6 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 75.3 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 78.1 us/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 88.1 us/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 143 us/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 257 us/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 527 us/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 924 us/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 1.75 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 3.62 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 7.48 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 113.

GTS 450
Running on platform 'NVIDIA CUDA', device 'GeForce GTS 450', version 'OpenCL 1.1 CUDA' and driver '355.82'. Running benchmarks for transform implementation "OCL" 2199064^8192+1 Time: 59.9 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 64.9 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 122 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 256 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 530 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 1.1 ms/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 2.16 ms/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 4.63 ms/mul. Err: 0.1738 6009544 digits 440400^2097152+1 Time: 9.46 ms/mul. Err: 0.1777 11836006 digits 360204^4194304+1 Time: 20.3 ms/mul. Err: 0.1592 23305854 digits 294612^8388608+1 Time: 45 ms/mul. Err: 0.1797 45879398 digits Genefer Mark = 21. Running benchmarks for transform implementation "OCL2" 2199064^8192+1 Time: 114 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 231 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 493 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 1.01 ms/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 2 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 4.15 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 8.71 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 18.2 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 36.8 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 80.3 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 236 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 5. Running benchmarks for transform implementation "OCL3" 2199064^8192+1 Time: 65.3 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 116 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 199 us/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 419 us/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 844 us/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 1.74 ms/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 3.58 ms/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 7.58 ms/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 14.3 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 32 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 67.6 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 13.

GTX 560 Ti
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 560 Ti', version 'OpenCL 1.1 CUDA' and driver '355.82'. Running benchmarks for transform implementation "OCL" 2199064^8192+1 Time: 63.2 us/mul. Err: 0.1875 51956 digits 1798620^16384+1 Time: 62.6 us/mul. Err: 0.1875 102481 digits 1471094^32768+1 Time: 67 us/mul. Err: 0.1875 202102 digits 1203210^65536+1 Time: 133 us/mul. Err: 0.1875 398482 digits 984108^131072+1 Time: 272 us/mul. Err: 0.1797 785521 digits 804904^262144+1 Time: 561 us/mul. Err: 0.1719 1548156 digits 658332^524288+1 Time: 1.1 ms/mul. Err: 0.1797 3050541 digits 538452^1048576+1 Time: 2.33 ms/mul. Err: 0.1738 6009544 digits 440400^2097152+1 Time: 4.73 ms/mul. Err: 0.1777 11836006 digits 360204^4194304+1 Time: 10.1 ms/mul. Err: 0.1592 23305854 digits 294612^8388608+1 Time: 22.2 ms/mul. Err: 0.1797 45879398 digits Genefer Mark = 41. Running benchmarks for transform implementation "OCL2" 2199064^8192+1 Time: 98.2 us/mul. Err: 0.0001 51956 digits 1798620^16384+1 Time: 121 us/mul. Err: 0.0001 102481 digits 1471094^32768+1 Time: 244 us/mul. Err: 0.0001 202102 digits 1203210^65536+1 Time: 549 us/mul. Err: 0.0001 398482 digits 984108^131072+1 Time: 1.03 ms/mul. Err: 0.0001 785521 digits 804904^262144+1 Time: 2.08 ms/mul. Err: 0.0001 1548156 digits 658332^524288+1 Time: 4.28 ms/mul. Err: 0.0001 3050541 digits 538452^1048576+1 Time: 8.94 ms/mul. Err: 0.0001 6009544 digits 440400^2097152+1 Time: 18.1 ms/mul. Err: 0.0001 11836006 digits 360204^4194304+1 Time: 39.5 ms/mul. Err: 0.0001 23305854 digits 294612^8388608+1 Time: 85.6 ms/mul. Err: 0.0001 45879398 digits Genefer Mark = 11. Running benchmarks for transform implementation "OCL3" 2199064^8192+1 Time: 64.8 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 65.5 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 113 us/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 221 us/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 441 us/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 889 us/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 1.8 ms/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 3.79 ms/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 7.11 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 15.8 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 33.5 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 27.

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87650 - Posted: 4 Sep 2015 | 2:33:48 UTC - in response to Message 87643.

Tryed it on my trusty Radeon HD 4350. And.... nope, ain't working. See last quote for the error codes.

Radeon HD 4350 (failed to run).
Running on platform 'AMD Accelerated Parallel Processing', device 'ATI RV710', v
ersion 'OpenCL 1.0 AMD-APP (937.2)' and driver 'CAL 1.4.1734'.

Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"
Error: build program failed.
"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 216: warning:
unknown attribute "vec_type_hint"
__kernel __attribute__((reqd_work_group_size(64 / 4 * BLK, 1, 1))) __attribute
__((vec_type_hint(Zp)))

"C:\Users\ADMINI~1\AppData\Local\Temp\OCL78D8.tmp.cl", line 344: warning:
unknown attribute "work_group_size_hint"
__kernel __attribute__((work_group_size_hint(64 / 4 * BLK64, 1, 1))) __attribu
te__((vec_type_hint(Zp)))

That is why I'm writing "geneferocl3 can run on any GPU supporting 'OpenCL 1.0'."
vec_type_hint and work_group_size_hint are some OpenCL 1.0 function qualifiers.
Then this is not an OpenCL-capable driver/GPU.

That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87654 - Posted: 4 Sep 2015 | 8:30:03 UTC - in response to Message 87650.

That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...

You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87655 - Posted: 4 Sep 2015 | 11:35:00 UTC

Yves,

I was experimenting with the behavior of OCL3 near the b limit.

In particular, I was testing at the n=2048 limit of 67108864. I wanted to see what OCL3 does when the limit is exceeded. More specifically, I wanted to see if 67108864 was the last good number, or the first bad number.

OCL3 produces a residue of FFFFFFFFFFFFFFFF for that number. Suspiciously looks like an bad result!. However, the control test, running OCL2, also produces FFFFFFFFFFFFFFFF. So does the x87 genefer64 test. So that's the "correct" result. Is that one of the numbers genefer can't test?

What surprised me, however, is that OCL3 produced the same residue as OCL2 and x87 on 67108866, which is 2 beyond the limit.

Any thoughts on why it appears to work correctly beyond the limit?

____________
My lucky number is 75898524288+1

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87658 - Posted: 4 Sep 2015 | 14:12:09 UTC - in response to Message 87655.
Last modified: 4 Sep 2015 | 14:28:59 UTC

In particular, I was testing at the n=2048 limit of 67108864.

Is that one of the numbers genefer can't test?

Yes, 67108864 = 2^26. It's a Fermat number.
The residue is a root of 1: it can be 1, -1, "7f7f7f7f7f7f7f7f", etc.

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87659 - Posted: 4 Sep 2015 | 14:20:58 UTC - in response to Message 87654.

That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...

You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.

That isn't the issue with the 8400M GS error. The work group size is reported by GPU Caps Viewer as 512.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87661 - Posted: 4 Sep 2015 | 14:28:20 UTC - in response to Message 87659.

That isn't the issue with the 8400M GS error. The work group size is reported by GPU Caps Viewer as 512.

I think that 8400M GS error is a video Timeout Detection and Recovery.
Time > 500 ms/mul.

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87664 - Posted: 4 Sep 2015 | 18:27:33 UTC - in response to Message 87654.

That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...

You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.

Oh... the Radeon 4350 is 128. I guess that explains it. Does that mean A: OpenCL IS supported, but the card lacks a few features; or B: OpenCL is not even supported on the card! Sorry, I don't understand much about specs.

If it's the 1st, there should be indication on the application page, readme or somewhere, saying that this card (and maybe it's family?) aren't supported, even though it would seem so.

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87665 - Posted: 4 Sep 2015 | 19:50:15 UTC - in response to Message 87664.

That's odd. Both Boinc and GPU-Z say the card is Open CL capable; web searching also told me that the 4350 had Open CL 1.0 support. Yet it doesn't? I tried downloading some SDK packages, but they didn't seem to make the app work...

You can try "GPU Caps Viewer". Check Work Group size. The minimum value is 256 for genefer.

Oh... the Radeon 4350 is 128. I guess that explains it. Does that mean A: OpenCL IS supported, but the card lacks a few features; or B: OpenCL is not even supported on the card! Sorry, I don't understand much about specs.

If it's the 1st, there should be indication on the application page, readme or somewhere, saying that this card (and maybe it's family?) aren't supported, even though it would seem so.

All ATI HD 4xxx series cards supported OCL (at least in Beta). However, OCL support for these cards may or may not be included in the latest drivers from AMD. AMD's drivers are very quirky at times, so you would need to play around some to make sure it is working.

That said, your 4350 appears to be below the minimum specs for the genefer app being tested.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87667 - Posted: 4 Sep 2015 | 21:27:09 UTC - in response to Message 87664.

Oh... the Radeon 4350 is 128. I guess that explains it. Does that mean A: OpenCL IS supported, but the card lacks a few features; or B: OpenCL is not even supported on the card! Sorry, I don't understand much about specs.

Both.

A: WORK_GROUP_SIZE: the minimum value is 1 then the driver is compliant on that point (but that is not sufficient for genefer app).
B:
Error: Requested compile size is bigger than the required workgroup size of 32â˜º elements Error: Creating kernel Forward64 failed!

The required workgroup size for Forward64 is 128. Then the driver/compiler seems to not be compliant with OpenCL 1.0 specification.

Note that the Radeon 4350 is not in the list https://www.khronos.org/conformance/adopters/conformant-products#opencl.

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87673 - Posted: 5 Sep 2015 | 11:19:37 UTC

User Device OCL OCL2 OCL3 eXaPower GT 650m 8 mackerel GTS 450 21 5 13 mackerel GTX 560 Ti 41 11 27 Michael Goetz GTX 580 73 20 39 Yves Gallot GTX 680 66 15 45 eXaPower GTX 750 25 12 28 1998golfer GTX 760 54 12 35 mackerel GTX 960 44 12 49 eXaPower GTX 970 78 23 88 Rafael GTX 970 77 85 mackerel GTX 980 Ti 101 32 113 Wingless Wonder GTX Titan Black 159 30 72 Yves Gallot HD Graphics 4000 2 Wingless Wonder HD Graphics 4600 2 4 Iain Bethune FirePro D700 36 26 Honza R9 280X 115 13 49 mackerel R9 280X 113 13 52

Since there is a lot of data, I thought I'd put the scores into a little table. In short, it looks like Maxwell does best with OCL3, and other nvidia and AMD do better on OCL.

Wingless Wonder - I saw the following comment while looking up the Titan Black:
Double precision performance of the GTX Titan & GTX Titan Black is either 1/3 or 1/24 of single-precision performance depending on a user-selected configuration option in the driver that boosts single-precision performance if double-precision is set to 1/24 of single-precision performance

Assuming it still exists, it would be interesting to see what impact this option has on performance. And which setting was used for the earlier results?

Wingless Wonder

Joined: 25 Dec 12
Posts: 175
ID: 186381
Credit: 413,600,996
RAC: 128

Message 87678 - Posted: 5 Sep 2015 | 12:24:03 UTC - in response to Message 87673.

Wingless Wonder - I saw the following comment while looking up the Titan Black:
Double precision performance of the GTX Titan & GTX Titan Black is either 1/3 or 1/24 of single-precision performance depending on a user-selected configuration option in the driver that boosts single-precision performance if double-precision is set to 1/24 of single-precision performance

Assuming it still exists, it would be interesting to see what impact this option has on performance. And which setting was used for the earlier results?

Default is double-precision mode is off. Double-precision mode is switched on by user through NVIDIA Control Panel and was set on for these OCL benchmarks. I leave the card in double-precision mode in normal use since it doesn't impact everyday tasks like web browsing and only switch it off when gaming or benchmarking because it greatly impacts frame rate in those applications. For PrimeGrid PPS (Sieve), it is faster when double-precision mode is switched off.

When double-precision mode is selected through NVIDIA Control Panel, it automatically reduces core and memory clock speeds for stability. For OCL, OCL2, and OCL3 benchmarks the card was run at default double-precision core and memory clock speeds, although the benchmark test is so short that the card never came up to normal operating temperature, unlike when crunching regular Genefer short or World Record tasks. Clock speeds are reduced somewhat when the card temperature warms up to 72 C degrees, which is the limit I set through software.

Roger
Volunteer developer
Volunteer tester

Joined: 27 Nov 11
Posts: 1138
ID: 120786
Credit: 268,668,824
RAC: 0

Message 87697 - Posted: 6 Sep 2015 | 5:00:37 UTC

7970Ghz
OCL: 109 (3.2.5-dev, rev 747)
OCL2: 3
OCL3: 38

2199064^8192+1 Time: 121 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 130 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 165 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 240 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 434 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 720 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.32 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 2.56 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 5.36 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 10.4 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 24.3 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 38.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87711 - Posted: 6 Sep 2015 | 19:02:43 UTC

Has anyone tried running OCL2 or OCL3 on an Intel HD integrated GPU?
____________
My lucky number is 75898524288+1

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 87712 - Posted: 6 Sep 2015 | 19:21:41 UTC - in response to Message 87711.

Has anyone tried running OCL2 or OCL3 on an Intel HD integrated GPU?

Yes, both of them work (on Mac). I'll post some timings tomorrow.

- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87742 - Posted: 7 Sep 2015 | 8:29:21 UTC
Last modified: 7 Sep 2015 | 8:31:42 UTC

Intel i5 4670 / HD4600

OCL: N/A
OCL2: 2
OCL3: 3

OCL2
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600', version 'OpenCL 1.2 ' and driver '10.18.14.4264'.

2199064^8192+1 Time: 397 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 758 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.46 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.53 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.28 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 11 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 23.7 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 49.8 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 108 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 207 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 436 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 2.

OCL3
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4600', version 'OpenCL 1.2 ' and driver '10.18.14.4264'.

2199064^8192+1 Time: 213 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 383 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 796 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 1.65 ms/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 3.18 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 6.59 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 13.6 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 27.7 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 59.6 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 127 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 250 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 3.
Priority change succeeded.
____________
My stats

stream
Volunteer moderator
Project administrator
Volunteer developer
Volunteer tester

Joined: 1 Mar 14
Posts: 1022
ID: 301928
Credit: 543,195,386
RAC: 1

Message 87743 - Posted: 7 Sep 2015 | 10:55:46 UTC - in response to Message 87742.

Intel i5 4670 / HD4600

Did passed test of known primes? Intel/Linux is surely out of question for now, current Beignet failed at least two tests (reported as composite that what should be primes).

Btw Yves, could you make a big fat "!!! ERROR - ERROR - ERROR !!!" message if built-in tests produces wrong residue? Something that will catch an eye in this boring long scrolling output.

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87744 - Posted: 7 Sep 2015 | 11:11:34 UTC - in response to Message 87743.
Last modified: 7 Sep 2015 | 11:55:14 UTC

Intel i5 4670 / HD4600

Did passed test of known primes?

Yes, it did.
Or is there any particular prime you want me to test?

3149688^32768+1 is a probable prime. (212936 digits) (err = 0.0000) (time = 0:10:42) 3966304^65536+1 is a probable prime. (432432 digits) (err = 0.0000) (time = 0:42:56)

____________
My stats

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 87750 - Posted: 7 Sep 2015 | 13:19:38 UTC - in response to Message 87712.

Has anyone tried running OCL2 or OCL3 on an Intel HD integrated GPU?

Yes, both of them work (on Mac). I'll post some timings tomorrow.

- Iain

Intel i7-4750HQ @ 2.00GHz / Iris Pro

Running benchmarks for transform implementation "OCL2"
Running on platform 'Apple', device 'Iris Pro', version 'OpenCL 1.2 ' and driver '1.2(Jul 29 2015 02:40:39)'.
2199064^8192+1 Time: 441 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 573 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.34 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 1.78 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.57 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 8.24 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 21.1 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 37.7 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 101 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 132 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 328 ms/mul. Err: 0.0001 45879398 digits
Genefer Mark = 3.

Running benchmarks for transform implementation "OCL3"
Running on platform 'Apple', device 'Iris Pro', version 'OpenCL 1.2 ' and driver '1.2(Jul 29 2015 02:40:39)'.
2199064^8192+1 Time: 163 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 229 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 443 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 838 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 1.74 ms/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 3.47 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 7.18 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 15.1 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 31.7 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 67.1 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 139 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 6.

____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!

stream
Volunteer moderator
Project administrator
Volunteer developer
Volunteer tester

Joined: 1 Mar 14
Posts: 1022
ID: 301928
Credit: 543,195,386
RAC: 1

Message 87751 - Posted: 7 Sep 2015 | 13:22:10 UTC - in response to Message 87744.

Intel i5 4670 / HD4600

Did passed test of known primes?

Yes, it did.

Nice. But considering the speed I see nothing useful except "science interest" in making it run.

Or is there any particular prime you want me to test?

I think it's not necessary. It's clear that Beignet codegen still have lot of bugs. In particular, these two tests failed on my Haswell:

2485064^4096+1 is composite. (RES=15aa723e56f21e42) (26196 digits) (err = 0.0000) (time = 0:01:01) 00:38:08 2030234^8192+1 is a probable prime. (51672 digits) (err = 0.0000) (time = 0:02:42) 00:40:50 1651902^16384+1 is composite. (RES=7bb203c965b8e34d) (101876 digits) (err = 0.0000) (time = 0:11:14) 00:52:04

And I've aborted following tests because they were planned to take too much time to complete.

Generalized Fermat Number Bench 2199064^8192+1<>Time: 952 us/mul.<----->Err: 0.0000<--->51956 digits 1798620^16384+1>Time: 2.02 ms/mul.<---->Err: 0.0000<--->102481 digits 1471094^32768+1>Time: 4.01 ms/mul.<---->Err: 0.0000<--->202102 digits 1203210^65536+1>Time: 8.59 ms/mul.<---->Err: 0.0000<--->398482 digits 984108^131072+1>Time: 10.1 ms/mul.<---->Err: 0.0000<--->785521 digits 804904^262144+1>Time: 21.4 ms/mul.<---->Err: 0.0000<--->1548156 digits 658332^524288+1>Time: 43.8 ms/mul.<---->Err: 0.0000<--->3050541 digits 538452^1048576+1<------>Time: 97 ms/mul.<------>Err: 0.0000<--->6009544 digits 440400^2097152+1<------>Time: 191 ms/mul.<----->Err: 0.0000<--->11836006 digits 360204^4194304+1<------>Time: 415 ms/mul.<----->Err: 0.0000<--->23305854 digits 294612^8388608+1<------>Time: 740 ms/mul.<----->Err: 0.0000<--->45879398 digits Genefer Mark = 1.

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87760 - Posted: 7 Sep 2015 | 17:53:16 UTC - in response to Message 87751.

But considering the speed I see nothing useful except "science interest" in making it run.

I disagree. MANY pcs have the iGPU idle; while individually they are slow, with many of them working, it's going to be better than a bunch of high end GPU (distributed computing inception here). Any speedup is well appreciated. Especially if we consider Michael's proposal for the Prime Field of Dreams (GFN 32768), or really any other search/double check with super short tasks. I imagine the iGPU could handle those without much problem

Not to mention, now that we have variable deadlines, we can configure GFN to extend deadlines for GPUs that are slowly, but surely crunching. I don't think it's implemented yet, but I'm sure it could be done.

Michael Goetz
Volunteer moderator
Project administrator

Joined: 21 Jan 10
Posts: 13958
ID: 53948
Credit: 393,418,261
RAC: 198,590

Message 87768 - Posted: 7 Sep 2015 | 21:54:15 UTC - in response to Message 87760.

But considering the speed I see nothing useful except "science interest" in making it run.

I disagree. MANY pcs have the iGPU idle; while individually they are slow, with many of them working, it's going to be better than a bunch of high end GPU (distributed computing inception here). Any speedup is well appreciated. Especially if we consider Michael's proposal for the Prime Field of Dreams (GFN 32768), or really any other search/double check with super short tasks. I imagine the iGPU could handle those without much problem

Not to mention, now that we have variable deadlines, we can configure GFN to extend deadlines for GPUs that are slowly, but surely crunching. I don't think it's implemented yet, but I'm sure it could be done.

Being as I'm the one who installed the ill-fated Android Sieve app, I'm not going to come out and say it's not worthwhile to use the large number of integrated GPUs. But I will say, that over all, despite their greater numbers, they're pretty slow and overall don't make a big difference.

It comes down to how much effort it takes to set them up. The good news is that OpenCL is OpenCL is OpenCL, and the same binary executable will run on an Nvidia GPU, and AMD GPU, an integrated GPU, or even on the CPU itself. I don't even have to build different executables to run on an integrated GPU.

That's why you see people saying "I tested it and it seems to work". You can run the exact same binary on any type of GPU. That makes our job a lot easier.

The harder part is getting the BOINC server to play nice with the integrated GPUs. I'm not sure how much work is involved. This is a matter of my development time, figuring out what's required and then making it happen. If a BOINC server code upgrade is required, then we're also introducing the risk of breaking something when we upgrade.

So it comes down to how much effort are we willing to put into making it work. Probably not a lot in the beginning, but maybe that will change. First, however, we've got to get everything else up and running.

____________
My lucky number is 75898524288+1

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87804 - Posted: 8 Sep 2015 | 16:25:26 UTC

A "release candidate" of "geneferocl3" is available from Genefer repository.

It performs a bit faster.

Error check was added to OCL3 (for each coefficient of the transform we should have |c_i| <= n.(b-1)^2).

Testing 67108864^2048+1... Using OCL3 transform 67108864^2048+1 is composite. (RES=ffffffffffffffff) (16030 digits) (err = 0.0000) Testing 67108866^2048+1... Using OCL3 transform maxErr exceeded for 67108866^2048+1, 1.0000 > 0.4500 during final check

"maxErr" is a boolean. The test may detect hardware errors. I think that eXaPower can check if it works...

Yves

Scott Brown
Volunteer moderator
Project administrator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2382
ID: 1178
Credit: 17,991,226,846
RAC: 12,693,967

Message 87806 - Posted: 8 Sep 2015 | 16:30:45 UTC - in response to Message 87804.
Last modified: 8 Sep 2015 | 16:35:13 UTC

Looks good on a GTX 660 OEM on Win7. Great work, Yves!

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 660', version 'OpenCL 1.1 CUDA' and driver '347.52'. 2199064^8192+1 Time: 63.2 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 66.8 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 98 us/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 193 us/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 374 us/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 738 us/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 1.47 ms/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 3.01 ms/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 6.05 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 12.6 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 27.4 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 33. Priority change succeeded.

Edit: and that is just a bit faster than the previous OCL3 version that had a Genefer Mark = 31 on this card.

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87807 - Posted: 8 Sep 2015 | 16:35:05 UTC - in response to Message 87633.
Last modified: 8 Sep 2015 | 16:35:23 UTC

280X
OCL: 115
OCL2: 13
OCL3: 49
OCL3RC: 52, 5-10% comparing to previous OCL3

2199064^8192+1 Time: 112 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 121 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 183 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 262 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 374 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 643 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.1 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 2.1 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 3.95 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 7.77 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 15.7 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 52.
____________
My stats

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87809 - Posted: 8 Sep 2015 | 16:54:54 UTC
Last modified: 8 Sep 2015 | 17:20:06 UTC

On my Gtx 970 (1499mhz), Driver 355.82, OCL 3 went from 85 to 88. Nice boost! Still nowhere close to the 280x 115 on OCL, but hey, at least it's already something...

OCL: 77
OCL2: 23
OCL3 RC: 88

Command line: geneferocl3_windows -b Priority change succeeded. Priority change succeeded. Generalized Fermat Number Bench Running benchmarks for transform implementation "OCL3" Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '355.82'. 2199064^8192+1 Time: 58.6 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 64.2 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 68 us/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 90.5 us/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 170 us/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 313 us/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 597 us/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 1.18 ms/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 2.32 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 4.64 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 9.33 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 88. Priority change succeeded.

Also, some more overclock results. From testing with Folding Home, GPU@Grid and Poem@Home, the max rocksolid OC I can get out of my 970 is around 1505mhz. Interestingly enough, it seems that PrimeGrid is a lot more stable than those, as I seem to be able to boost the card further without crashes (Edit: Just had a crash at 1534mhz while testing). I usually just run the card at the "be-all, end-all" 1506mhz OC, but for the sake of testing, I gave it a boost further:

OCL 2 (v3) gets a 0.500 error in
1203210^65536+1 Time: 211 us/mul. Err: 0.5000 398482 digits

At speeds 1535mhz and above. 1534mhz did pass without the error, but the system hard crashed a little bit later, so take it with a grain of salt. All other numbers seem fine, though.

OCL 3 (RC) gets a 1.000 error in
1471094^32768+1 Time: 65.8 us/mul. Err: 1.0000 202102 digits and 804904^262144+1 Time: 300 us/mul. Err: 1.0000 1548156 digits

At 1570mhz and above. 1569mhz and below showed no problem. At 1569, GFN score went to 91, vs 88 at 1.5ghz, so that's another very nice boost, if the card is stable for this particular app.

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802

Message 87810 - Posted: 8 Sep 2015 | 17:10:15 UTC - in response to Message 87809.
Last modified: 8 Sep 2015 | 17:11:02 UTC

On my Gtx 970 (1499mhz), Driver 355.82, OCL 3 went from 85 to 88. Nice boost! Still nowhere close to the 280x 115 on OCL, but hey, at least it's already something...

OCL3 does great on Maxwell cards comparing to OCL - performance and b limit wise.
And taking into account power usage of GTX970, it's a nice match for OCL3.

Looking forward to see how Fury (Nano) will do...
____________
My stats

Rafael
Volunteer tester

Joined: 22 Oct 14
Posts: 909
ID: 370496
Credit: 531,793,905
RAC: 407,603

Message 87811 - Posted: 8 Sep 2015 | 17:17:16 UTC - in response to Message 87810.

On my Gtx 970 (1499mhz), Driver 355.82, OCL 3 went from 85 to 88. Nice boost! Still nowhere close to the 280x 115 on OCL, but hey, at least it's already something...

OCL3 does great on Maxwell cards comparing to OCL - performance and b limit wise.
And taking into account power usage of GTX970, it's a nice match for OCL3.

Looking forward to see how Fury (Nano) will do...

Coming to think about it.... I could get my wattimeter and compare power consumption between OCL tests with different OCs for a Points per Watt comparison, if anyone is interested in it.

eXaPower

Joined: 30 Sep 13
Posts: 122
ID: 259902
Credit: 1,636,946,772
RAC: 0

Message 87813 - Posted: 8 Sep 2015 | 18:50:37 UTC
Last modified: 8 Sep 2015 | 19:50:23 UTC

OCL3 RC:

GTX 970 @ 1555MHz:

2199064^8192+1 Time: 69.1 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 65.9 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 70.4 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 91.8 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 175 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 310 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 599 us/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 1.16 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 2.26 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 4.49 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 9.39 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 90.

*Edit*: During the benchmark: only n=17 and n=18 hits max error from 1620 to 1560MHz. <1559MHz no errors.
I turned the dial up to 1637MHz and hit max error for all tests. Scored a 93 mark though. This is the highest Genefer overclock on my 970 I've had without a driver crash. The well designed OCL3 program should handle reasonable overclocks on Maxwell. OCL3 OC performance scales decently - evident by the OCed 970's marks.

GTX 750 @ 1412MHz:
2199064^8192+1 Time: 62.3 us/mul. Err: 0.0000 51956 digits
1798620^16384+1 Time: 72.1 us/mul. Err: 0.0000 102481 digits
1471094^32768+1 Time: 117 us/mul. Err: 0.0000 202102 digits
1203210^65536+1 Time: 200 us/mul. Err: 0.0000 398482 digits
984108^131072+1 Time: 412 us/mul. Err: 0.0000 785521 digits
804904^262144+1 Time: 836 us/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 1.67 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 3.39 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 6.91 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 14.1 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 30.1 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 29.

Driver crash at 1451MHz. N=19 and n=20 max errors from 1438 until 1420MHz.

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2639
ID: 29980
Credit: 568,393,769
RAC: 1,834

Message 87817 - Posted: 8 Sep 2015 | 20:47:14 UTC

GTX 980 Ti

Running on platform 'NVIDIA CUDA', device 'GeForce GTX 980 Ti', version 'OpenCL 1.2 CUDA' and driver '355.82'. Running benchmarks for transform implementation "OCL3" 2199064^8192+1 Time: 80.2 us/mul. Err: 0.0000 51956 digits 1798620^16384+1 Time: 77.3 us/mul. Err: 0.0000 102481 digits 1471094^32768+1 Time: 79.5 us/mul. Err: 0.0000 202102 digits 1203210^65536+1 Time: 86.7 us/mul. Err: 0.0000 398482 digits 984108^131072+1 Time: 137 us/mul. Err: 0.0000 785521 digits 804904^262144+1 Time: 244 us/mul. Err: 0.0000 1548156 digits 658332^524288+1 Time: 456 us/mul. Err: 0.0000 3050541 digits 538452^1048576+1 Time: 893 us/mul. Err: 0.0000 6009544 digits 440400^2097152+1 Time: 1.72 ms/mul. Err: 0.0000 11836006 digits 360204^4194304+1 Time: 3.46 ms/mul. Err: 0.0000 23305854 digits 294612^8388608+1 Time: 7.16 ms/mul. Err: 0.0000 45879398 digits Genefer Mark = 117.

Was 113 on previous version.

Tyler
Project administrator
Volunteer tester

Joined: 4 Dec 12
Posts: 1078
ID: 183129
Credit: 1,376,122,338
RAC: 4,719

Message 87822 - Posted: 9 Sep 2015 | 2:35:55 UTC

Has someone, or can someone, build the OCL3 executable for linux?

Thanks
____________

275*2^3585539+1 is prime!!! (1079358 digits)

Proud member of Aggie the Pew

Roger
Volunteer developer
Volunteer tester

Joined: 27 Nov 11
Posts: 1138
ID: 120786
Credit: 268,668,824
RAC: 0

Message 87825 - Posted: 9 Sep 2015 | 3:45:21 UTC
Last modified: 9 Sep 2015 | 3:49:53 UTC

Laptop test:
>geneferocl2_windows_879.exe -b
geneferocl2 3.2.9-dev (Windows/OpenCL/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.

Command line: geneferocl2_windows_879.exe -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL2"

Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', versi
on 'OpenCL 1.1 ' and driver '9.17.10.3040'.

Error: build program failed.
fcl build 1 succeeded.
fcl build 2 succeeded.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.
Error: internal error.

Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE.

>geneferocl3_windows_887.exe -b
geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.

Command line: geneferocl3_windows_887.exe -b

Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL3"

Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', versi
on 'OpenCL 1.1 ' and driver '9.17.10.3040'.

2199064^8192+1 Time: 460 us/mul. Err: 1.0000 51956 digits
1798620^16384+1 Time: 850 us/mul. Err: 1.0000 102481 digits
1471094^32768+1 Time: 1.87 ms/mul. Err: 1.0000 202102 digits
1203210^65536+1 Time: 3.99 ms/mul. Err: 1.0000 398482 digits
984108^131072+1 Time: 5.97 ms/mul. Err: 1.0000 785521 digits
804904^262144+1 Time: 12.9 ms/mul. Err: 0.0000 1548156 digits
658332^524288+1 Time: 28.6 ms/mul. Err: 0.0000 3050541 digits
538452^1048576+1 Time: 59.7 ms/mul. Err: 0.0000 6009544 digits
440400^2097152+1 Time: 139 ms/mul. Err: 0.0000 11836006 digits
360204^4194304+1 Time: 326 ms/mul. Err: 0.0000 23305854 digits
294612^8388608+1 Time: 524 ms/mul. Err: 0.0000 45879398 digits
Genefer Mark = 2.
Priority change succeeded.

Note the Err: 1.0000 on the small b benchmarks.

>geneferocl3_windows_887.exe -l
geneferocl3 3.2.9-dev (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.

Command line: geneferocl3_windows_887.exe -l

Priority change succeeded.
Generalized Fermat Number b limits for transform implementation "OCL3"
m = 32, bMax = 536870912.
m = 64, bMax = 379625062.
m = 128, bMax = 268435456.
m = 256, bMax = 189812531.
m = 512, bMax = 134217728.
m = 1024, bMax = 94906266.
m = 2048, bMax = 67108864.
m = 4096, bMax = 47453133.
m = 8192, bMax = 33554432.
m = 16384, bMax = 23726566.
m = 32768, bMax = 16777216.
m = 65536, bMax = 11863283.
m = 131072, bMax = 8388608.
m = 262144, bMax = 5931642.
m = 524288, bMax = 4194304.
m = 1048576, bMax = 2965821.
m = 2097152, bMax = 2097152.
m = 4194304, bMax = 1482910.
m = 8388608, bMax = 1048576.

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 803
ID: 164101
Credit: 305,704,038
RAC: 5,420

Message 87826 - Posted: 9 Sep 2015 | 7:59:45 UTC - in response to Message 87825.

Laptop test:
Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.1 ' and driver '9.17.10.3040'.

Both runs on my HD 4000 (i3-3217U CPU): Intel's driver is more recent:

Running on platform 'Intel(R) OpenCL', device 'Intel(R) HD Graphics 4000', version 'OpenCL 1.2 ' and driver '10.18.10.4252'.

But they are some random errors during tests.

Honza
Volunteer moderator
Volunteer tester
Project scientist

Joined: 15 Aug 05
Posts: 1952
ID: 352
Credit: 6,017,981,497
RAC: 1,586,802