Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Genefer 3.3.3 testing
Author |
Message |
|
Hi folks,
Yves has been busy and we have just pushed out a new release of genefer for testing. What's new? On the CPU, expect ~10% speedups for the x87 transform (current n<=20 leading edge tests), especially on 64-bit machines. The vector transforms (sse2, avx, fma3...) have also been optimised with a focus on the large (n>=21) tests. It's hard to generate code that is faster on *all* CPU types, so let us know how it goes for you. There are no speed improvements for the OCL code.
You can download the binaries from here:
https://app.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin
Ignore the comments that say 3.3.2 (my bad) - all of the binaries are the new version 3.3.3.
We will do a full test campaign to verify everything works on a range of platforms, but for now it would be interesting to see benchmark results from anyone that wants to give it a try.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1930 ID: 352 Credit: 5,463,422,052 RAC: 5,787,600
                                   
|
I took production primegrid_genefer_3_3_2_3.16_windows_x86_64__cpuGFN15.exe and mentioned 3.3.3 version of Genefer CPU.
Run specific x87 benchmark on 3 different CPUs, all Win10x64
CPU i5-3570K, version 3.3.2
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
...
2688666^4096+1 Time: 131 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 286 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 612 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.31 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.79 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.94 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 12.7 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 26.7 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 56.5 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 119 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 250 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 532 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.1 s/mul. Err: 0.0001 90294174 digits
CPU i5-3570K, version 3.3.3
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
...
2688666^4096+1 Time: 120 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 260 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 560 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.2 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.55 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.44 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 11.7 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 24.9 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 53 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 112 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 235 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 495 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.04 s/mul. Err: 0.0001 90294174 digits
CPU i5-6600, version 3.3.2
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
...
2688666^4096+1 Time: 128 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 277 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 600 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.28 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.75 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.85 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 12.5 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 26.1 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 56.9 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 116 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 243 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 512 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.05 s/mul. Err: 0.0001 90294174 digits
CPU i5-6600, version 3.3.3
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
...
2688666^4096+1 Time: 120 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 254 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 547 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.18 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.51 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.37 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 11.4 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 23.8 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 50.5 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 106 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 223 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 469 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 999 ms/mul. Err: 0.0001 90294174 digits
CPU i7-8700K, version 3.3.2
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
...
2688666^4096+1 Time: 99.7 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 214 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 462 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 988 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.11 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.47 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 9.47 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 20 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 42.5 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 88.7 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 187 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 388 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 814 ms/mul. Err: 0.0001 90294174 digits
CPU i7-8700K, version 3.3.3
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
...
2688666^4096+1 Time: 91.5 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 199 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 427 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 914 us/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 1.94 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.13 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 8.77 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 18.6 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 39.6 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 82.8 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 175 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 366 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 767 ms/mul. Err: 0.0001 90294174 digits
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
Does anyone have a link to the previous version? I'd love to test, but I really have no idea on how to get the old release outside of requesting a WU from Boinc and retrieving the program manually.
Speaking which, it would be nice if "new version, test please" posts in general would include the links for both the old and the new versions. It would be pretty handy to have both in a single, convenient place. | |
|
|
Does anyone have a link to the previous version? I'd love to test, but I really have no idea on how to get the old release outside of requesting a WU from Boinc and retrieving the program manually.
Speaking which, it would be nice if "new version, test please" posts in general would include the links for both the old and the new versions. It would be pretty handy to have both in a single, convenient place.
I believe these may be what you are looking for:
http://www.primegrid.com/download/primegrid_genefer_3_3_2_3.16_windows_x86_64__cpuGFN21.exe
http://www.primegrid.com/download/primegrid_genefer_3_3_2_3.16_x86_64-apple-darwin__cpuGFN21
http://www.primegrid.com/download/primegrid_genefer_3_3_2_3.16_x86_64-pc-linux-gnu__cpuGFN21
It is the same executable for all N values.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Speaking which, it would be nice if "new version, test please" posts in general would include the links for both the old and the new versions. It would be pretty handy to have both in a single, convenient place.
That is a good suggestion.
____________
My lucky number is 75898524288+1 | |
|
|
I tested all the transforms for my i5-3317U running Windows 10 Enterprise 2016 LTSB 64 bits. It seems that the F64 transform was removed from 3.3.3 so I will be omitting those results. In summary, AVX saw performance improvements in some areas, and losses in others; SSE2 and SSE4 had losses all around; and x87 had improvements all around.
Version 3.3.2:
Generalized Fermat Number benchmarks for transform implementation "AVX (Intel)"
10980642^32+1 Time: 128 ns/mul. Err: 0.1006 226 digits
8981090^64+1 Time: 242 ns/mul. Err: 0.1250 446 digits
7345652^128+1 Time: 1.46 us/mul. Err: 0.1719 879 digits
6008024^256+1 Time: 1.7 us/mul. Err: 0.1562 1736 digits
4913974^512+1 Time: 4.3 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 5.14 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 15.5 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 21.2 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 65.5 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 94.8 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 288 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 432 us/mul. Err: 0.1406 398482 digits
984108^131072+1 Time: 1.27 ms/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 1.98 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 6.22 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 10.2 ms/mul. Err: 0.1367 6009544 digits
440400^2097152+1 Time: 30.8 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 51.4 ms/mul. Err: 0.1484 23305854 digits
294612^8388608+1 Time: 157 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 271 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 7.
Generalized Fermat Number benchmarks for transform implementation "SSE2"
10980642^32+1 Time: 316 ns/mul. Err: 0.1250 226 digits
8981090^64+1 Time: 805 ns/mul. Err: 0.1875 446 digits
7345652^128+1 Time: 1.49 us/mul. Err: 0.1562 879 digits
6008024^256+1 Time: 2.55 us/mul. Err: 0.1562 1736 digits
4913974^512+1 Time: 5.01 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 10.6 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 20.7 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 45.8 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 89.7 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 201 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 393 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 881 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.71 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 3.86 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 7.95 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 17.9 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 35.1 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 87.5 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 170 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 419 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 5.
Generalized Fermat Number benchmarks for transform implementation "SSE4"
10980642^32+1 Time: 184 ns/mul. Err: 0.1250 226 digits
8981090^64+1 Time: 516 ns/mul. Err: 0.1875 446 digits
7345652^128+1 Time: 1.13 us/mul. Err: 0.1562 879 digits
6008024^256+1 Time: 1.86 us/mul. Err: 0.1562 1736 digits
4913974^512+1 Time: 3.73 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 7.78 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 15.9 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 34.8 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 75.4 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 158 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 321 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 700 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.42 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 3.14 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 6.8 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 15 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 30.5 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 76.7 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 151 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 372 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 6.
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
10980642^32+1 Time: 811 ns/mul. Err: 0.0001 226 digits
8981090^64+1 Time: 1.75 us/mul. Err: 0.0001 446 digits
7345652^128+1 Time: 4 us/mul. Err: 0.0001 879 digits
6008024^256+1 Time: 8.34 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 18.1 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 39.3 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 85.1 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 184 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 400 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 858 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.83 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 3.9 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 8.38 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 17.8 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 37.7 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 80 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 169 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 355 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 748 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.57 s/mul. Err: 0.0001 90294174 digits
Genefer Mark = 1.
Version 3.3.3:
Generalized Fermat Number benchmarks for transform implementation "AVX (Intel)"
4019150^1024+1 Time: 4.15 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 8.9 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 20.1 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 41.2 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 90.2 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 187 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 404 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 847 us/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 1.89 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 4.36 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 9.95 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 23.2 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 53.5 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 128 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 289 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 8.
Generalized Fermat Number benchmarks for transform implementation "SSE2"
4019150^1024+1 Time: 15.5 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 32.4 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 66.8 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 147 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 306 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 677 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 1.4 ms/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 3.03 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 6.28 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 13.7 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 28.2 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 61.2 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 128 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 279 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 579 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 3.
Generalized Fermat Number benchmarks for transform implementation "SSE4"
4019150^1024+1 Time: 10.5 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 22.2 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 46 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 108 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 227 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 519 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 1.08 ms/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 2.61 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 5.02 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 11.2 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 23.3 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 51.5 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 108 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 241 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 501 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 4.
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
10980642^32+1 Time: 681 ns/mul. Err: 0.0001 226 digits
8981090^64+1 Time: 1.54 us/mul. Err: 0.0001 446 digits
7345652^128+1 Time: 3.47 us/mul. Err: 0.0001 879 digits
6008024^256+1 Time: 7.43 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 16.1 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 36.4 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 78.1 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 169 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 366 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 785 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.68 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 3.57 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 7.64 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 16.4 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 35.1 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 74.6 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 158 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 332 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 698 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.48 s/mul. Err: 0.0001 90294174 digits
Genefer Mark = 1.
| |
|
|
Here are results from my X5675:
x87 3.3.2:
Generalized Fermat Number benchmarks for transform implementation "x87 (80-bit)"
1471094^32768+1 Time: 1.47 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 3.15 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 6.63 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 14.1 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 29.7 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 63.1 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 133 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 281 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 584 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.24 s/mul. Err: 0.0001 90294174 digits
Genefer Mark = 2.
3.3.3:
1471094^32768+1 Time: 1.34 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.88 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 6.11 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 13 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 27.5 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 59.2 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 125 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 266 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 556 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.17 s/mul. Err: 0.0001 90294174 digits
Genefer Mark = 2.
A good 10%+ speedup at lower n. Keeps some life in the old platform for non-AVX units.
SSE4 (gfn20+):
3.3.2
538452^1048576+1 Time: 12.1 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 25.1 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 55.9 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 114 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 266 ms/mul. Err: 0.1250 90294174 digits
3.3.3
538452^1048576+1 Time: 19.7 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 42.9 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 88.7 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 193 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 397 ms/mul. Err: 0.1094 90294174 digits
Almost twice the time. Not a good result! (Also, if it matters, 3.3.2 starts SSE4/SSE2 benchmarks at n=5, 3.3.3 at n=10)
SSE2 (gfn20+)
3.3.2
538452^1048576+1 Time: 15.6 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 31.2 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 69.9 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 137 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 329 ms/mul. Err: 0.1250 90294174 digits
3.3.3
538452^1048576+1 Time: 24.8 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 52.6 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 109 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 231 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 478 ms/mul. Err: 0.1094 90294174 digits
Ouch. Something not right?
On my 3930k:
x87
3.3.2:
1471094^32768+1 Time: 1.29 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.75 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.82 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 12.4 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 26.2 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 55.6 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 116 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 245 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 510 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.07 s/mul. Err: 0.0001 90294174 digits
3.3.3:
1471094^32768+1 Time: 1.18 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.49 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.33 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 11.4 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 24.2 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 51.5 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 110 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 228 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 474 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 978 ms/mul. Err: 0.0001 90294174 digits
A good result there.
AVX (20+):
3.3.2
538452^1048576+1 Time: 6.38 ms/mul. Err: 0.1367 6009544 digits
440400^2097152+1 Time: 17.6 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 29.4 ms/mul. Err: 0.1484 23305854 digits
294612^8388608+1 Time: 79.9 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 136 ms/mul. Err: 0.1250 90294174 digits
3.3.3
538452^1048576+1 Time: 5.89 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 12.6 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 28 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 61.1 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 135 ms/mul. Err: 0.1094 90294174 digits
Some speedup, some not.
SSE2:
3.3.2
538452^1048576+1 Time: 13.3 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 25.9 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 57.7 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 113 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 251 ms/mul. Err: 0.1250 90294174 digits
3.3.3
538452^1048576+1 Time: 21.1 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 45.1 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 92.5 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 197 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 438 ms/mul. Err: 0.1094 90294174 digits
Another system with a serious slowdown on SSE2 (and SSE4, not going to post benchies).
____________
Eating more cheese on Thursdays. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
Here are some benchmarks for the i5 6600k, overclocked to 4100mhz and with some high speed RAM. Generally speaking, the new version is good if you're sure to use x87, but bad otherwise.
FMA3: both seem about the same, maybe with some small margin in favor of the old version
Ver. 3.3.2
4019150^1024+1 Time: 2.01 us/mul. Err: 0.1602 6763 digits
3287270^2048+1 Time: 3.97 us/mul. Err: 0.1641 13347 digits
2688666^4096+1 Time: 8.29 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 17.5 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 36.6 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 78.8 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 166 us/mul. Err: 0.1504 398482 digits
984108^131072+1 Time: 363 us/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 753 us/mul. Err: 0.1523 1548156 digits
658332^524288+1 Time: 1.7 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 3.64 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 8.35 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 17.3 ms/mul. Err: 0.1328 23305854 digits
294612^8388608+1 Time: 40.3 ms/mul. Err: 0.1367 45879398 digits
Genefer Mark = 24.
Ver. 3.3.3
4019150^1024+1 Time: 1.97 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 4.12 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 8.64 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 18.5 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 38.5 us/mul. Err: 0.1406 102481 digits
1471094^32768+1 Time: 81.8 us/mul. Err: 0.1406 202102 digits
1203210^65536+1 Time: 172 us/mul. Err: 0.1289 398482 digits
984108^131072+1 Time: 366 us/mul. Err: 0.1406 785521 digits
804904^262144+1 Time: 779 us/mul. Err: 0.1250 1548156 digits
658332^524288+1 Time: 1.68 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 3.76 ms/mul. Err: 0.1172 6009544 digits
440400^2097152+1 Time: 8.27 ms/mul. Err: 0.1172 11836006 digits
360204^4194304+1 Time: 18.1 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 39.9 ms/mul. Err: 0.1104 45879398 digits
240964^16777216+1 Time: 86.9 ms/mul. Err: 0.1016 90294174 digits
Genefer Mark = 24.
AVX: marginal speedups overall (scored a 22 instead of a 21). Notice how the older version is consistently slower when "n" is odd and consistently faster when "n" is even... the new version loses a bit of speed with even n, but gains a lot more when n is odd, coming out on top when you compare the overall performance.
Ver. 3.3.2
4019150^1024+1 Time: 2.34 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 5.32 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 9.82 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 23.2 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 43.3 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 102 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 194 us/mul. Err: 0.1406 398482 digits
984108^131072+1 Time: 462 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 876 us/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 2.1 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 4.13 ms/mul. Err: 0.1367 6009544 digits
440400^2097152+1 Time: 9.97 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 19.3 ms/mul. Err: 0.1484 23305854 digits
294612^8388608+1 Time: 46.8 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 90.7 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 21.
Ver. 3.3.3
4019150^1024+1 Time: 2.2 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 4.65 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 9.86 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 20.9 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 43.9 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 93 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 197 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 414 us/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 878 us/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 1.88 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 4.17 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 9.1 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 19.7 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 43.1 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 93.5 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 22.
SSE4: MAJOR slowdows (7 vs 13).
Ver. 3.3.2
4019150^1024+1 Time: 4.35 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 8.93 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 19.1 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 39.3 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 85.4 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 175 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 378 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 774 us/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 1.69 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 3.47 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 7.61 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 15.6 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 34.3 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 69.6 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 154 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 13.
Ver. 3.3.3
4019150^1024+1 Time: 5.9 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 12.4 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 25.9 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 60.2 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 125 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 287 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 598 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.34 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 2.79 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 6.18 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 12.8 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 28.1 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 57.7 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 125 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 257 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 7.
SSE2: MAJOR slowdowns (6 vs 11)
Ver 3.3.2
4019150^1024+1 Time: 5.26 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 11.4 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 22.3 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 48.3 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 97.3 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 210 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 425 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 912 us/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 1.88 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 4.01 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 8.36 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 17.8 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 37.4 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 78.5 ms/mul. Err: 0.1406 45879398 digits
Genefer Mark = 11.
Ver. 3.3.3
4019150^1024+1 Time: 8.19 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 17.1 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 35.4 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 78.9 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 164 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 363 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 752 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.64 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 3.4 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 7.38 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 15.2 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 32.8 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 67.4 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 144 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 295 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 6.
x87: A small speedup
Ver. 3.3.2
10980642^32+1 Time: 434 ns/mul. Err: 0.0001 226 digits
8981090^64+1 Time: 960 ns/mul. Err: 0.0001 446 digits
7345652^128+1 Time: 2.15 us/mul. Err: 0.0001 879 digits
6008024^256+1 Time: 4.86 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 10.8 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 23.4 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 51.3 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 112 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 241 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 519 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.11 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.37 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 5.03 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 10.7 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 22.5 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 47.6 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 99.5 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 209 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 436 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 912 ms/mul. Err: 0.0001 90294174 digits
Genefer Mark = 2.
Ver. 3.3.3
10980642^32+1 Time: 371 ns/mul. Err: 0.0001 226 digits
8981090^64+1 Time: 835 ns/mul. Err: 0.0001 446 digits
7345652^128+1 Time: 1.92 us/mul. Err: 0.0001 879 digits
6008024^256+1 Time: 4.33 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 9.73 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 21.6 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 47.3 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 103 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 223 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 480 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.03 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 2.19 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 4.67 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 9.94 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 21 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 44.4 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 93.2 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 196 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 409 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 854 ms/mul. Err: 0.0001 90294174 digits
Genefer Mark = 2. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1930 ID: 352 Credit: 5,463,422,052 RAC: 5,787,600
                                   
|
i7-8700K, a bit of slowdown for FMA3 in 3.3.3.
CPU i7-8700K, version 3.3.2 FMA3
1471094^32768+1 Time: 70.5 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 150 us/mul. Err: 0.1504 398482 digits
984108^131072+1 Time: 329 us/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 679 us/mul. Err: 0.1523 1548156 digits
658332^524288+1 Time: 1.51 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 3.17 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 7.81 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 16.7 ms/mul. Err: 0.1328 23305854 digits
294612^8388608+1 Time: 39.2 ms/mul. Err: 0.1367 45879398 digits
240964^16777216+1 Time: 79.9 ms/mul. Err: 0.1289 90294174 digits
Genefer Mark = 26.
CPU i7-8700K, version 3.3.3 FMA3
1471094^32768+1 Time: 73 us/mul. Err: 0.1406 202102 digits
1203210^65536+1 Time: 155 us/mul. Err: 0.1289 398482 digits
984108^131072+1 Time: 329 us/mul. Err: 0.1406 785521 digits
804904^262144+1 Time: 702 us/mul. Err: 0.1250 1548156 digits
658332^524288+1 Time: 1.51 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 3.29 ms/mul. Err: 0.1172 6009544 digits
440400^2097152+1 Time: 7.69 ms/mul. Err: 0.1172 11836006 digits
360204^4194304+1 Time: 17.1 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 38.5 ms/mul. Err: 0.1104 45879398 digits
240964^16777216+1 Time: 87.1 ms/mul. Err: 0.1016 90294174 digits
Genefer Mark = 26.
i7-8700K, similar for AVX
CPU i7-8700K, version 3.3.2 AVX
1471094^32768+1 Time: 83.1 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 176 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 372 us/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 785 us/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 1.67 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 3.57 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 8.3 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 18.6 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 42.1 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 92 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 24.
CPU i7-8700K, version 3.3.3 FMA3
1471094^32768+1 Time: 83.3 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 176 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 373 us/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 793 us/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 1.67 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 3.62 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 8.35 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 18.5 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 41.4 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 93.2 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 24.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 701 ID: 164101 Credit: 305,166,630 RAC: 523

|
"Generalized Fermat Prime Search benchmarks" are the most significant.
GFN15-GFN19 => x87.
x87 was written in c++ in ver 3.3.2 and is optimized in asm in 3.3.3: then it should be faster on every CPU.
GFN20 => x87 with 3.3.2. The error of 3.3.3 (fma, avx, ...) is lower then this version may still to be able to test some n=20 with fast transforms... but we are very close to the limit at b ~ 950000.
GFN21 is then only n that uses fma, avx, sse4, or sse2 transform. On a fixed processor, only the faster transform is relevant. fma (i3/i5/i7 4th-8th gen) is expected to run at about the same speed, avx (2th-3th gen) to be faster. I didn't check 1th gen (sse4), it seems to be a major slowdown.
I purchased a 7th gen i5 with a particular focus on fma transform on x-Lake processors, but it will take time before this version is available.
In production, ver 3.3.3 is an improvement for n <= 20.
One may wonder whether 3.3.2 or 3.3.3 is the most appropriate for n=21.
| |
|
|
"Generalized Fermat Prime Search benchmarks" are the most significant.
i.e. using the "-b2" option (this takes a little longer than the "-b" benchmark, but runs b values that are close to the current leading edge of the search.
For reference, the previous release binaries can be found here: https://app.assembla.com/spaces/genefer/subversion/source/HEAD/tags/3.3.2-7 (or from the download links posted above for the CPU versions).
Thanks for all the results so far!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
"Generalized Fermat Prime Search benchmarks" are the most significant.
i.e. using the "-b2" option (this takes a little longer than the "-b" benchmark, but runs b values that are close to the current leading edge of the search.
For reference, the previous release binaries can be found here: https://app.assembla.com/spaces/genefer/subversion/source/HEAD/tags/3.3.2-7 (or from the download links posted above for the CPU versions).
Thanks for all the results so far!
- Iain
In that case, here are the -b2 results for the i5-3317U. Even with larger candidates, 3.3.3 seems to be faster.
Version 3.3.2:
Generalized Fermat Prime Search benchmarks
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:26:00
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 1:46:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 7:05:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 7:51:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 28:00:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 117:00:00
900000^1048576+1 6243476 digits AVX (Intel) Estimated time: 60:40:00
130000^2097152+1 10724717 digits AVX (Intel) Estimated time: 313:00:00
100000^4194304+1 20971521 digits AVX (Intel) Estimated time: 1016:00:00
Version 3.3.3:
Generalized Fermat Prime Search benchmarks
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:24:20
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 1:37:00
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 6:38:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 7:11:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 26:20:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 107:00:00
950000^1048576+1 6268098 digits AVX (Intel) Estimated time: 58:40:00
180000^2097152+1 11021106 digits AVX (Intel) Estimated time: 243:00:00
110000^4194304+1 21145134 digits AVX (Intel) Estimated time: 1066:00:00
as an aside, we could really use some spoiler tags. | |
|
|
On my Haswell (i7-4750HQ):
genefer 3.3.2-4 (Apple-x86/CPU/64-bit)
...
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:21:30
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 1:27:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 5:46:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 6:24:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 23:20:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 93:10:00
900000^1048576+1 6243476 digits FMA3 Estimated time: 41:30:00
130000^2097152+1 10724717 digits FMA3 Estimated time: 147:00:00
100000^4194304+1 20971521 digits FMA3 Estimated time: 578:00:00
genefer 3.3.3 (Apple-x86/CPU/64-bit)
...
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:19:50
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 1:19:00
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 5:21:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 5:51:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 21:40:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 85:40:00
950000^1048576+1 6268098 digits FMA3 Estimated time: 40:00:00
180000^2097152+1 11021106 digits FMA3 Estimated time: 150:00:00
110000^4194304+1 21145134 digits FMA3 Estimated time: 630:00:00
So clearly faster for n<=20, something of a slowdown for n>20 (although the b value is higher). Comparing the same b values, suggests something of a slowdown:
genefer 3.3.2-4 (Apple-x86/CPU/64-bit)
...
538452^1048576+1 Time: 6.95 ms/mul. Err: 0.1519 6009544 digits
440400^2097152+1 Time: 15.5 ms/mul. Err: 0.1484 11836006 digits
360204^4194304+1 Time: 32.1 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 75.4 ms/mul. Err: 0.1406 45879398 digits
genefer 3.3.3 (Apple-x86/CPU/64-bit)
...
538452^1048576+1 Time: 7.46 ms/mul. Err: 0.1172 6009544 digits
440400^2097152+1 Time: 16.2 ms/mul. Err: 0.1172 11836006 digits
360204^4194304+1 Time: 35.5 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 78.6 ms/mul. Err: 0.1104 45879398 digits
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
New round of benches, using -b2 and with 2 more CPUs. TLDR: FMA3 seems inconsistent (I re-ran the tests manually and sometimes there was improvement, at others slowdown, and even about the same performance), x87 is always better, SSE2 is MUCH slower.
- i5 6600k -
Ver. 3.3.2
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:15:30
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 1:03:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 4:12:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 4:40:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 16:40:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 67:30:00
900000^1048576+1 6243476 digits FMA3 Estimated time: 21:00:00
130000^2097152+1 10724717 digits FMA3 Estimated time: 83:20:00
100000^4194304+1 20971521 digits FMA3 Estimated time: 336:00:00
Ver. 3.3.3
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:14:40
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 0:59:10
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 3:57:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 4:20:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 15:40:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 63:30:00
950000^1048576+1 6268098 digits FMA3 Estimated time: 21:50:00
180000^2097152+1 11021106 digits FMA3 Estimated time: 84:50:00
110000^4194304+1 21145134 digits FMA3 Estimated time: 357:00:00
- i5 4590 -
Ver 3.3.2
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:18:50
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 1:16:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 5:04:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 5:38:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 20:10:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 81:50:00
900000^1048576+1 6243476 digits FMA3 Estimated time: 31:10:00
130000^2097152+1 10724717 digits FMA3 Estimated time: 129:00:00
100000^4194304+1 20971521 digits FMA3 Estimated time: 500:00:00
Ver 3.3.3
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:17:30
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 1:10:00
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 4:41:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 5:09:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 18:40:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 76:50:00
950000^1048576+1 6268098 digits FMA3 Estimated time: 33:00:00
180000^2097152+1 11021106 digits FMA3 Estimated time: 125:00:00
110000^4194304+1 21145134 digits FMA3 Estimated time: 556:00:00
- Pentium Dual Core E2180 -
Ver 3.3.2
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:30:40
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 2:12:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 9:02:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 10:00:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 39:40:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 158:00:00
900000^1048576+1 6243476 digits SSE2 Estimated time: 164:00:00
130000^2097152+1 10724717 digits SSE2 Estimated time: 571:00:00
100000^4194304+1 20971521 digits SSE2 Estimated time: 2897:00:00
Ver 3.3.3
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:29:00
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 2:04:00
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 8:54:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 9:50:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 39:50:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 160:00:00
950000^1048576+1 6268098 digits SSE2 Estimated time: 245:00:00
180000^2097152+1 11021106 digits SSE2 Estimated time: 968:00:00
110000^4194304+1 21145134 digits SSE2 Estimated time: 3940:00:00 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
I'm going to test the new Genefer on 5 different CPUs in a moment, but first this:
I:\GFN>genefer_windows64-3.3.2.exe -v
genefer 3.3.2-4 (Windows/CPU/64-bit)
Copyright 2001-2017, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2017, Iain Bethune
Genefer is free source code, under the MIT license.
Supported transform implementations: fma3 avx sse4 sse2 f64 x87
I:\GFN>genefer_windows64-3.3.3.exe -v
genefer 3.3.3 (Windows/CPU/64-bit)
Copyright 2001-2017, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2017, Iain Bethune
Genefer is free source code, under the MIT license.
Supported transform implementations: fma3 avx sse4 sse2 x87
The new Genefer is missing the old f64 transform. Does this mean the Genefer will no longer run on old CPUs like my recently defunct Sempron? If so, I need to know about this because it affects the server configuration. My strong preference would be for Genefer to continue to support the f64 transform; removal makes things a lot more complex on the server side.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
On second thought, I just noticed that the -b2 benchmarks were modified to test different values in 3.3.3 -- so comparing these benchmarks in the old and new versions isn't a good comparison.
____________
My lucky number is 75898524288+1 | |
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 701 ID: 164101 Credit: 305,166,630 RAC: 523

|
The new Genefer is missing the old f64 transform. Does this mean the Genefer will no longer run on old CPUs like my recently defunct Sempron?
Genefer 3.3.3 64-bit => fma3 avx sse4 sse2 x87
Genefer 3.3.3 32-bit => fma3 avx sse4 sse2 x87 f64
K7 Sempron is a 32-bit processor without SSE2 but K8 Sempron is a 64-bit processor with SSE2.
f64 was removed from the 64-bit version because the SSE2 is included in AMD64 specification.
| |
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 701 ID: 164101 Credit: 305,166,630 RAC: 523

|
SSE2 is MUCH slower.
I'm working on SSE2/SSE4 code.
The theory is that AVX and SSE2/SSE4 code are similar and one 256-bit AVX instruction is replaced by two 128-bit SSE2 instructions; but gcc is foolish and generates a lot of unnecessary code in the second case. An unexpected and unusual occurrence that should be bypassed! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
The new Genefer is missing the old f64 transform. Does this mean the Genefer will no longer run on old CPUs like my recently defunct Sempron?
Genefer 3.3.3 64-bit => fma3 avx sse4 sse2 x87
Genefer 3.3.3 32-bit => fma3 avx sse4 sse2 x87 f64
K7 Sempron is a 32-bit processor without SSE2 but K8 Sempron is a 64-bit processor with SSE2.
f64 was removed from the 64-bit version because the SSE2 is included in AMD64 specification.
Gotcha -- the 32 bit version still supports f64. All should be good then. Thanks for clearing that up for me.
____________
My lucky number is 75898524288+1 | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
i7-5820K (Haswell-E) @ 3.4 GHz (slight o/c, nominal is 3.3 GHz)
genefer 3.3.2-4 (Linux/CPU/64-bit)
...
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:18:30
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 1:15:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 4:59:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 5:32:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 19:30:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 79:00:00
900000^1048576+1 6243476 digits FMA3 Estimated time: 28:30:00
130000^2097152+1 10724717 digits FMA3 Estimated time: 117:00:00
100000^4194304+1 20971521 digits FMA3 Estimated time: 478:00:00
genefer 3.3.3 (Linux/CPU/64-bit)
...
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:17:10
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 1:09:00
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 4:41:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 5:07:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 18:10:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 73:40:00
950000^1048576+1 6268098 digits FMA3 Estimated time: 30:40:00
180000^2097152+1 11021106 digits FMA3 Estimated time: 118:00:00
110000^4194304+1 21145134 digits FMA3 Estimated time: 504:00:00
| |
|
|
Updated binaries (version 3.3.3-1) are now available from https://app.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin, which address the performance issues of the sse2/sse4 transforms.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
Testing the new 3.3.3-1 SSE4 and SSE2 (TL:DR better than previous 3.3.3 release, but still not worth it over 3.3.2).
First, the i5 6600k Skylake. I had to go with the -b option, given that -b2 doesn't use the SSE transforms. Both SSE4 and SSE2 saw major improvements, but are still slightly slower than 3.3.2.
For the i5 4590 Haswell, comparing 3.3.3-1 vs 3.3.2, performance is just about the same for SSE4, with a marginal tip towards the slower side. SSE2 is slower; not by that much, but you can definitely notice it. Regardless, there's no reason to use the new version, other than the fact that lower Err. values could allow a faster transform to be used during transition periods.
I won't be able to test my LGA775 pentium next year, but to make up for it, I'm adding benchmarks for an i3 530, which is SSE4. A full suit of benchmarks is available below, with both -b2 and -b options tested. Across the board, 3.3.3-1 is noticeably slower than 3.3.2
- i5 6600k -
SSE4: slight slowdowns (12 vs 13)
Ver. 3.3.2
4019150^1024+1 Time: 4.35 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 8.93 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 19.1 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 39.3 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 85.4 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 175 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 378 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 774 us/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 1.69 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 3.47 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 7.61 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 15.6 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 34.3 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 69.6 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 154 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 13.
Ver. 3.3.3-1
4019150^1024+1 Time: 4.84 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 9.61 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 20.3 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 42.4 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 89.8 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 189 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 400 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 837 us/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 1.77 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 3.7 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 7.86 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 16.5 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 34.9 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 73.4 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 155 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 12.
SSE2: slight slowdowns (10 vs 11)
Ver 3.3.2
4019150^1024+1 Time: 5.26 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 11.4 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 22.3 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 48.3 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 97.3 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 210 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 425 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 912 us/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 1.88 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 4.01 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 8.36 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 17.8 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 37.4 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 78.5 ms/mul. Err: 0.1406 45879398 digits
Genefer Mark = 11.
Ver. 3.3.3
4019150^1024+1 Time: 6.55 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 12.6 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 26.3 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 54 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 113 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 234 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 493 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.02 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 2.16 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 4.44 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 9.42 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 19.6 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 41.1 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 85.1 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 180 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 10.
- i5 4590 -
SSE4: just about the same thing
Ver 3.3.2
4019150^1024+1 Time: 5.66 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 11.9 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 25.4 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 53.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 114 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 240 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 511 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.06 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 2.27 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 4.86 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 10.4 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 21.4 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 47.7 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 103 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 219 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 9.
Ver. 3.3.3-1
4019150^1024+1 Time: 6.58 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 12.2 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 26.4 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 55.3 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 119 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 250 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 535 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.11 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 2.37 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 4.94 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 10.7 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 22.3 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 48.4 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 102 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 220 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 9.
SSE2: slight slowdowns
Ver 3.3.2
4019150^1024+1 Time: 7.16 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 14.3 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 31.4 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 62.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 138 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 275 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 604 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.21 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 2.64 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 5.35 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 11.9 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 23.8 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 53.7 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 107 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 243 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 8.
Ver 3.3.3-1
4019150^1024+1 Time: 8.18 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 16.5 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 33.9 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 70.3 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 149 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 309 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 652 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.35 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 2.84 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 5.88 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 12.5 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 26.2 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 56 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 118 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 251 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 8.
- i3 530 -
-b2 benchmarks
Ver. 3.3.2
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:25:40
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 1:47:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 6:37:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 7:18:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 26:10:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 109:00:00
900000^1048576+1 6243476 digits SSE4 Estimated time: 85:00:00
130000^2097152+1 10724717 digits SSE4 Estimated time: 298:00:00
100000^4194304+1 20971521 digits SSE4 Estimated time: 1332:00:00
Ver. 3.3.3-1
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:24:10
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 1:38:00
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 6:19:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 6:52:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 25:00:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 100:00:00
950000^1048576+1 6268098 digits SSE4 Estimated time: 94:00:00
180000^2097152+1 11021106 digits SSE4 Estimated time: 352:00:00
110000^4194304+1 21145134 digits SSE4 Estimated time: 1461:00:00
SSE4: noticeable slowdowns
Ver. 3.3.2
4019150^1024+1 Time: 7.52 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 15.7 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 34.9 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 71.1 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 155 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 324 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 700 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.44 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 3.14 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 6.61 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 14.7 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 30.1 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 69 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 140 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 318 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 6.
Ver. 3.3.3-1
4019150^1024+1 Time: 8.14 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 17.3 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 37.6 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 80.9 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 175 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 371 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 796 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.67 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 3.56 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 7.65 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 16.3 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 34.7 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 75 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 160 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 343 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 6.
SSE2: noticeable slowdowns
Ver. 3.3.2
4019150^1024+1 Time: 11.8 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 22.8 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 51.3 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 99.2 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 224 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 435 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 969 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.89 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 4.22 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 8.41 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 19.1 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 37.3 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 86.5 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 168 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 386 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 5.
Ver. 3.3.3-1
4019150^1024+1 Time: 13 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 26.3 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 55.1 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 116 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 243 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 510 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 1.07 ms/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 2.22 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 4.67 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 9.75 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 20.6 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 43.4 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 92.4 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 196 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 412 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 5.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
First, the i5 6600k Skylake. I had to go with the -b option, given that -b2 doesn't use the SSE transforms. Both SSE4 and SSE2 saw major improvements, but are still slightly slower than 3.3.2.
Note: As currently written, it is impossible to use -b2 to compare timings because 3.3.2 and 3.3.3 use different numbers in the -b2 benchmark.
You will always get somewhat longer times in the 3.3.3 -b2 benchmarks because it's testing larger numbers. For x87 tests, the code improvement is substantial enough that it exceeds the extra time needed to run more iterations of the calculation. But with other transforms, you see a slowdown, but it's unclear if that's because the transform is actually slower or if you're running a larger number.
I recommend using -b for comparing 3.3.2 vs 3.3.3.
____________
My lucky number is 75898524288+1 | |
|
|
Ran new 3.3.3-1 benches on my X5675:
3.3.2/SSE2:
538452^1048576+1 Time: 15.6 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 31.2 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 69.9 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 137 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 329 ms/mul. Err: 0.1250 90294174 digits
3.3.3-1/SSE2:
538452^1048576+1 Time: 17.1 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 36 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 76.3 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 163 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 348 ms/mul. Err: 0.1094 90294174 digits
3.3.2/SSE4:
538452^1048576+1 Time: 12.1 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 25.1 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 55.9 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 114 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 266 ms/mul. Err: 0.1250 90294174 digits
3.3.3-1/SSE4:
538452^1048576+1 Time: 13.5 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 28.8 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 62.3 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 135 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 292 ms/mul. Err: 0.1094 90294174 digits
Times are still >10% longer over 3.3.2 in 3.3.3-1, but compared to 3.3.3-0, we're getting closer.
____________
Eating more cheese on Thursdays. | |
|
|
Hi all,
Genefer 3.3.3 is now ready for testing. The test programme is quite comprehensive as we need to check changes (quite substantial) to the CPU transforms, and some minor changes to the OCL transforms.
Binaries are available from SVN:
Windows: https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin/windows
Linux: https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin/linux
Mac: https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin/mac
These binaries are version "3.3.3-2", due to some performance tweaks which had to be made.
There is a full list of tests, with some instructions for how and what to run on a google sheet:
https://docs.google.com/spreadsheets/d/13K5NKoywrlE4UtDZTM0ws8mMoz6y11b7kR4hFbehnBo/edit?usp=sharing
There are a large set of manual tests, for n=16,17 and n=22 (which will take a long time, unfortunately!), and also some BOINC tests to run under app_info.xml. Please check carefully as some tests require specifying the transform to use (via the -x option), and some leave genefer free to select the appropriate transform.
Please post into this thread if you want to reserve tests, and if you have results and/or questions. I will try to keep the thread clean by hiding posts when I update the google sheet. Manual credit will be awarded for the standalone tests.
Thanks in advance for the help - the more people contribute, the sooner we can start using the new apps.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
I want to point out that one significant change to GeneferOCL is that we're reversing a change we made back in 2014.
Originally, GeneferCUDA ran at normal priority. Most BOINC programs run at idle priority so as not to interfere with whatever else the computer is doing. But a GPU program should run at normal priority so that the CPU can keep the GPU running full speed.
When Genefer OCL first came out, because of a driver "feature" the CPU portion of the program could consume a full CPU core, so we had to lower the priority of GeneferOCL.
Eventually that driver 'feature' was fixed, or we found a way around it, and as a result GeneferOCL no longer uses a whole CPU core.
Now, with 3.3.3, we're able to once again raise the priority of GeneferOCL. As a result, the GPU doesn't get starved even if you're running LLR on all cores. You no longer need to keep a core free for the GPU. The GPU program takes precedence over the other BOINC tasks.
Note that the GPU versions of AP27 and PPS-Sieve already work this way, so you don't need to keep a core free for them either.
____________
My lucky number is 75898524288+1 | |
|
|
It's been a long time since the last update to 3.3.3, so I definitely want to ask: has the performance regression of SSE2/4 been reversed?
I would like to reserve/do the Windows non-boinc testing for 64 bit CPU and NVIDIA OCL through the n17 tests, after that I'll see about the n22 tests.
____________
Eating more cheese on Thursdays. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
geneferocl_linux64 starts 8 threads, but only the main one gets essentially all the CPU time (another one amassed a total of 0.10 seconds by the time the main thread had 10.5 minutes, and a third got 0.01 seconds). Is this normal? | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
D48, D50, D52, D54, D56; note the failure and successful completion from checkpoint on D56
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Starting initialization...
Initialization complete (0.113 seconds).
Testing 52186^131072+1... 2054077 steps to go
Estimated time for 52186^131072+1 is 0:09:16
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0004) (time = 0:09:20) 21:01:23
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL2
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL2 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Starting initialization...
Initialization complete (0.113 seconds).
Testing 52186^131072+1... 2054077 steps to go
Estimated time for 52186^131072+1 is 0:13:50
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:13:43) 21:15:06
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL3
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Starting initialization...
Initialization complete (0.113 seconds).
Testing 52186^131072+1... 2054077 steps to go
Estimated time for 52186^131072+1 is 0:11:40
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:11:49) 21:26:55
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL4
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL4 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Starting initialization...
Initialization complete (0.113 seconds).
Testing 52186^131072+1... 2054077 steps to go
Estimated time for 52186^131072+1 is 0:08:03
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:08:06) 21:35:01
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL5
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL5 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Starting initialization...
Initialization complete (0.113 seconds).
Testing 52186^131072+1... 2054077 steps to go
Estimated time for 52186^131072+1 is 0:09:34
maxErr exceeded for 52186^131072+1, 1.0000 > 0.4500
Errors occurred for all available transform implementations
reran OCL5, it picked up from the last checkpoint
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL5
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL5 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Resuming 52186^131072+1 from a checkpoint (2054077 iterations left)
Estimated time for 52186^131072+1 is 0:09:30
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:09:28) 21:47:16 | |
|
|
It's been a long time since the last update to 3.3.3, so I definitely want to ask: has the performance regression of SSE2/4 been reversed?
Yes, SSE2/4 should now give roughly the same performance as previous releases.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
C10; did not switch to X87 transform
That's fine - the b limits of the 'fast' transforms are increased a little.
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
B10 reserved
...
EDIT: This was completed before the spreadsheet changed. I will run with the other transforms now
Sorry - I have updated the spreadsheet now. Please check posts in the thread to see if any tests are outstanding that are not yet in the spreadsheet.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL5
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL5 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Starting initialization...
Initialization complete (0.113 seconds).
Testing 52186^131072+1... 2054077 steps to go
Estimated time for 52186^131072+1 is 0:09:34
maxErr exceeded for 52186^131072+1, 1.0000 > 0.4500
Errors occurred for all available transform implementations
reran OCL5, it picked up from the last checkpoint
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_linux64 -q 52186^131072+1 -x OCL5
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL5 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 760', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '384.111'.
6 computeUnits @ 1137MHz, memSize=1996MB, cacheSize=96kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Resuming 52186^131072+1 from a checkpoint (2054077 iterations left)
Estimated time for 52186^131072+1 is 0:09:30
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:09:28) 21:47:16
This is strange. b=52186 is well below the b limit, and for OCL5 the calculation should be exact (err=0.0000). Does this recreate on your hardware? It seems to run fine for Grebuloner and Van Zimmerman, and also I tested on my own linux system with a Tesla K40m. If not I can only think it is either a hardware glitch, or a (very rare) software bug. In either case the error correction worked and it restarted from the last known good checkpoint...
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
geneferocl_linux64 starts 8 threads, but only the main one gets essentially all the CPU time (another one amassed a total of 0.10 seconds by the time the main thread had 10.5 minutes, and a third got 0.01 seconds). Is this normal?
No idea - but I'm not surprised it spawns some additional threads (presumably in the OCL/driver stack, rather than the user code).
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
This is strange. b=52186 is well below the b limit, and for OCL5 the calculation should be exact (err=0.0000). Does this recreate on your hardware? It seems to run fine for Grebuloner and Van Zimmerman, and also I tested on my own linux system with a Tesla K40m. If not I can only think it is either a hardware glitch, or a (very rare) software bug. In either case the error correction worked and it restarted from the last known good checkpoint...
This error happened the second time I ran those 5 tests OCL ... OCL5 using a bash 'for' loop.
The first set completed normally but I overwrote the output file accidentally so I ran it again.
On the second set, the first 4 tests completed successfully and the glitch happened within the first 10 seconds of starting OCL5, so I started that test manually and it picked up the checkpoint.
The GPU is factory overclocked but this is hardly ever an issue since I set the fan speed at 100%.
I was getting daily system crashes before the New Year Challenge while I was trying XMP memory speed, so I returned to normal memory speed and the system has been stable ever since. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
There's something slightly fishy given that the error value was exacly 1.000.
Is this the high limit of the range of error values?
Might this indicate some very early failure before the calculation has propagated and averaged down? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
There's something slightly fishy given that the error value was exacly 1.000.
Is this the high limit of the range of error values?
Might this indicate some very early failure before the calculation has propagated and averaged down?
You get an error of 1.0 when something goes wrong with the any of the fixed point transforms, i.e., anything that's not the original OCL. Since it's not using floating point, any error it detects will always be an integer.
____________
My lucky number is 75898524288+1 | |
|
AndySend message
Joined: 27 Nov 17 Posts: 23 ID: 953480 Credit: 157,111,398 RAC: 155,601
       
|
Bit of a toaster of a datapoint compared to most things, but on my i7-4710MQ (2.5GHz) laptop (w7x64),
old:
Generalized Fermat Prime Search benchmarks
50000000^32768+1 252280 digits x87 (80-bit) Estimated time: 0:28:30
25000000^65536+1 484832 digits x87 (80-bit) Estimated time: 1:53:00
8000000^131072+1 904802 digits x87 (80-bit) Estimated time: 7:25:00
46000000^131072+1 1004373 digits x87 (80-bit) Estimated time: 8:00:00
2800000^262144+1 1690084 digits x87 (80-bit) Estimated time: 28:40:00
1500000^524288+1 3238051 digits x87 (80-bit) Estimated time: 115:00:00
900000^1048576+1 6243476 digits FMA3 Estimated time: 43:40:00
130000^2097152+1 10724717 digits FMA3 Estimated time: 184:00:00
100000^4194304+1 20971521 digits FMA3 Estimated time: 729:00:00
new:
Generalized Fermat Prime Search benchmarks
75000000^32768+1 258051 digits x87 (80-bit) Estimated time: 0:25:10
27000000^65536+1 487022 digits x87 (80-bit) Estimated time: 1:40:00
10000000^131072+1 917505 digits x87 (80-bit) Estimated time: 6:58:00
48000000^131072+1 1006796 digits x87 (80-bit) Estimated time: 8:05:00
3600000^262144+1 1718696 digits x87 (80-bit) Estimated time: 28:20:00
1700000^524288+1 3266550 digits x87 (80-bit) Estimated time: 115:00:00
950000^1048576+1 6268098 digits FMA3 Estimated time: 47:50:00
180000^2097152+1 11021106 digits FMA3 Estimated time: 182:00:00
110000^4194304+1 21145134 digits FMA3 Estimated time: 833:00:00
x87:
old:
10980642^32+1 Time: 761 ns/mul. Err: 0.0001 226 digits
8981090^64+1 Time: 1.69 us/mul. Err: 0.0001 446 digits
7345652^128+1 Time: 3.8 us/mul. Err: 0.0001 879 digits
6008024^256+1 Time: 8.46 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 18.7 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 40.4 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 89.4 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 194 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 416 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 886 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.9 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 4.06 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 8.67 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 18.3 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 39.2 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 85.7 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 205 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 433 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 892 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.64 s/mul. Err: 0.0001 90294174 digits
Genefer Mark = 1.
new:
10980642^32+1 Time: 643 ns/mul. Err: 0.0001 226 digits
8981090^64+1 Time: 1.43 us/mul. Err: 0.0001 446 digits
7345652^128+1 Time: 3.3 us/mul. Err: 0.0001 879 digits
6008024^256+1 Time: 7.42 us/mul. Err: 0.0001 1736 digits
4913974^512+1 Time: 16.8 us/mul. Err: 0.0001 3427 digits
4019150^1024+1 Time: 38.5 us/mul. Err: 0.0001 6763 digits
3287270^2048+1 Time: 80.7 us/mul. Err: 0.0001 13347 digits
2688666^4096+1 Time: 175 us/mul. Err: 0.0001 26336 digits
2199064^8192+1 Time: 386 us/mul. Err: 0.0001 51956 digits
1798620^16384+1 Time: 823 us/mul. Err: 0.0001 102481 digits
1471094^32768+1 Time: 1.99 ms/mul. Err: 0.0001 202102 digits
1203210^65536+1 Time: 4.53 ms/mul. Err: 0.0001 398482 digits
984108^131072+1 Time: 9.57 ms/mul. Err: 0.0001 785521 digits
804904^262144+1 Time: 20.9 ms/mul. Err: 0.0001 1548156 digits
658332^524288+1 Time: 43.6 ms/mul. Err: 0.0001 3050541 digits
538452^1048576+1 Time: 93.2 ms/mul. Err: 0.0001 6009544 digits
440400^2097152+1 Time: 175 ms/mul. Err: 0.0001 11836006 digits
360204^4194304+1 Time: 347 ms/mul. Err: 0.0001 23305854 digits
294612^8388608+1 Time: 702 ms/mul. Err: 0.0001 45879398 digits
240964^16777216+1 Time: 1.49 s/mul. Err: 0.0001 90294174 digits
Genefer Mark = 1.
sse4:
old:
10980642^32+1 Time: 184 ns/mul. Err: 0.1250 226 digits
8981090^64+1 Time: 485 ns/mul. Err: 0.1875 446 digits
7345652^128+1 Time: 1.1 us/mul. Err: 0.1562 879 digits
6008024^256+1 Time: 1.83 us/mul. Err: 0.1562 1736 digits
4913974^512+1 Time: 3.94 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 7.97 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 17 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 36.2 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 77.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 165 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 346 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 724 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.58 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 3.39 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 7.15 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 14.9 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 30.8 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 66.6 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 132 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 296 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 6.
new:
4019150^1024+1 Time: 8.62 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 17.8 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 38.6 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 80 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 172 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 354 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 754 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.59 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 3.44 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 7.07 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 14.9 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 30.9 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 69.4 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 140 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 293 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 6.
sse2:
old:
10980642^32+1 Time: 255 ns/mul. Err: 0.1250 226 digits
8981090^64+1 Time: 672 ns/mul. Err: 0.1875 446 digits
7345652^128+1 Time: 1.36 us/mul. Err: 0.1562 879 digits
6008024^256+1 Time: 2.34 us/mul. Err: 0.1562 1736 digits
4913974^512+1 Time: 4.77 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 10 us/mul. Err: 0.1562 6763 digits
3287270^2048+1 Time: 20 us/mul. Err: 0.1875 13347 digits
2688666^4096+1 Time: 44.1 us/mul. Err: 0.1875 26336 digits
2199064^8192+1 Time: 88.2 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 197 us/mul. Err: 0.1719 102481 digits
1471094^32768+1 Time: 394 us/mul. Err: 0.1797 202102 digits
1203210^65536+1 Time: 910 us/mul. Err: 0.1719 398482 digits
984108^131072+1 Time: 1.76 ms/mul. Err: 0.1562 785521 digits
804904^262144+1 Time: 3.86 ms/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 7.93 ms/mul. Err: 0.1562 3050541 digits
538452^1048576+1 Time: 17 ms/mul. Err: 0.1484 6009544 digits
440400^2097152+1 Time: 34.3 ms/mul. Err: 0.1562 11836006 digits
360204^4194304+1 Time: 75.7 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 155 ms/mul. Err: 0.1406 45879398 digits
240964^16777216+1 Time: 346 ms/mul. Err: 0.1250 90294174 digits
Genefer Mark = 6.
new:
4019150^1024+1 Time: 11.1 us/mul. Err: 0.1406 6763 digits
3287270^2048+1 Time: 22.5 us/mul. Err: 0.1562 13347 digits
2688666^4096+1 Time: 47.7 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 98.9 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 212 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 436 us/mul. Err: 0.1562 202102 digits
1203210^65536+1 Time: 952 us/mul. Err: 0.1328 398482 digits
984108^131072+1 Time: 1.98 ms/mul. Err: 0.1289 785521 digits
804904^262144+1 Time: 4.14 ms/mul. Err: 0.1406 1548156 digits
658332^524288+1 Time: 8.66 ms/mul. Err: 0.1250 3050541 digits
538452^1048576+1 Time: 18.1 ms/mul. Err: 0.1250 6009544 digits
440400^2097152+1 Time: 38.5 ms/mul. Err: 0.1250 11836006 digits
360204^4194304+1 Time: 79.5 ms/mul. Err: 0.1094 23305854 digits
294612^8388608+1 Time: 169 ms/mul. Err: 0.1250 45879398 digits
240964^16777216+1 Time: 354 ms/mul. Err: 0.1094 90294174 digits
Genefer Mark = 5. | |
|
|
F74 & F76 are in progress, with G74 and G76 starting shortly.
| |
|
|
Please note there are new Windows CPU builds (3.3.3-3), which fix a minor issue with priority when running standalone (i.e. BOINC is not affected). Anyone who is running the CPU code on Windows - please go back to https://app.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/bin/windows and grab the updated binary!
Thanks for all the testing so far!
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Bit of a toaster of a datapoint compared to most things, but on my i7-4710MQ (2.5GHz) laptop (w7x64),
These results confirm pretty much what everyone else has seen - a decent speedup for the x87 transform, and roughly the same (or slightly slower, depending on hardware and n) performance for the vector transforms. Note that the b values used for the "Generalized Fermat Prime Search benchmarks" are increased in 3.3.3 so you can't directly compare. The b limit for the vector transforms are slightly increased - which might bring some tests back into range.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
G48-G56:
OCL5 Returned an error, and it looks like it isn't the first time it has occurred. I attempted to re-run it, and got the same result.
dora:test vzimmerman$ ./test.sh
geneferocl 3.3.3-2 (Apple-x86/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_macintel -q 52186^131072+1 -x OCL
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL transform
Running on platform 'Apple', device 'AMD Radeon RX 480 Compute Engine', vendor 'AMD', version 'OpenCL 1.2 ' and driver '1.2 (Nov 9 2017 18:48:40)'.
36 computeUnits @ 1266MHz, memSize=8192MB, cacheSize=0kB, cacheLineSize=0B, localMemSize=32kB, maxWorkGroupSize=256.
Starting initialization...
Initialization complete (0.117 seconds).
Estimated time for 52186^131072+1 is 0:07:18
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0004) (time = 0:07:20) 16:54:56
geneferocl 3.3.3-2 (Apple-x86/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_macintel -q 52186^131072+1 -x OCL2
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL2 transform
Running on platform 'Apple', device 'AMD Radeon RX 480 Compute Engine', vendor 'AMD', version 'OpenCL 1.2 ' and driver '1.2 (Nov 9 2017 18:48:40)'.
36 computeUnits @ 1266MHz, memSize=8192MB, cacheSize=0kB, cacheLineSize=0B, localMemSize=32kB, maxWorkGroupSize=256.
Starting initialization...
Initialization complete (0.117 seconds).
Estimated time for 52186^131072+1 is 0:13:00
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:13:01) 17:07:57
geneferocl 3.3.3-2 (Apple-x86/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_macintel -q 52186^131072+1 -x OCL3
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL3 transform
Running on platform 'Apple', device 'AMD Radeon RX 480 Compute Engine', vendor 'AMD', version 'OpenCL 1.2 ' and driver '1.2 (Nov 9 2017 18:48:40)'.
36 computeUnits @ 1266MHz, memSize=8192MB, cacheSize=0kB, cacheLineSize=0B, localMemSize=32kB, maxWorkGroupSize=256.
Starting initialization...
Initialization complete (0.117 seconds).
Estimated time for 52186^131072+1 is 0:10:50
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:11:00) 17:18:57
geneferocl 3.3.3-2 (Apple-x86/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_macintel -q 52186^131072+1 -x OCL4
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL4 transform
Running on platform 'Apple', device 'AMD Radeon RX 480 Compute Engine', vendor 'AMD', version 'OpenCL 1.2 ' and driver '1.2 (Nov 9 2017 18:48:40)'.
36 computeUnits @ 1266MHz, memSize=8192MB, cacheSize=0kB, cacheLineSize=0B, localMemSize=32kB, maxWorkGroupSize=256.
Starting initialization...
Initialization complete (0.117 seconds).
Estimated time for 52186^131072+1 is 0:09:19
52186^131072+1 is composite. (RES=1b196d6c0e4d778f) (618340 digits) (err = 0.0000) (time = 0:09:21) 17:28:18
geneferocl 3.3.3-2 (Apple-x86/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./geneferocl_macintel -q 52186^131072+1 -x OCL5
Normal priority change succeeded.
Testing 52186^131072+1...
Using OCL5 transform
Running on platform 'Apple', device 'AMD Radeon RX 480 Compute Engine', vendor 'AMD', version 'OpenCL 1.2 ' and driver '1.2 (Nov 9 2017 18:48:40)'.
36 computeUnits @ 1266MHz, memSize=8192MB, cacheSize=0kB, cacheLineSize=0B, localMemSize=32kB, maxWorkGroupSize=256.
Starting initialization...
Initialization complete (0.117 seconds).
Estimated time for 52186^131072+1 is 0:10:10
Testing 52186^131072+1... 1855860 steps to go (0:09:18 remaining)
maxErr exceeded for 52186^131072+1, 1.0000 > 0.4500
Errors occurred for all available transform implementations | |
|
|
G48-G56:
OCL5 Returned an error, and it looks like it isn't the first time it has occurred. I attempted to re-run it, and got the same result.
Would be good if you could try running this test several times - and also with the previous genefer release (current stock PrimeGrid app will do). These does seem to be an issue here that we need to understand!
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
G48-G56:
OCL5 Returned an error, and it looks like it isn't the first time it has occurred. I attempted to re-run it, and got the same result.
Would be good if you could try running this test several times - and also with the previous genefer release (current stock PrimeGrid app will do). These does seem to be an issue here that we need to understand!
Don't forget to remove the checkpoint file between tests, otherwise they are not independent. | |
|
|
Retesting the Win64 CPU tests on 3.3.3-3.
I am attempting to run the 32 bit Win CPU tests. It's not a pure environment (x64 Core 2 Duo on x86 Win 7), and I'm turning this laptop on for the first time in about 3 years. If it fails, would you accept a virtualized 32 bit Windows test?
EDIT: It's dead, Jim. We'll see if I can reinstall...
____________
Eating more cheese on Thursdays. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
Not enough memory in an IBM T40 Thinkpad to emulate a disk big enough to run GFN when running Debian Live i386 on it. I guess it would be easiest to plug in a USB hard disk and tell it to swap there. | |
|
|
Repeat Mac/AMD OCL5 tests resulted in errors. Same error occurred on production version.
dora:test1 vzimmerman$ ./primegrid_genefer_3_3_2_3.18_x86_64-apple-darwin__openclatiGFN18 -q 52186^131072+1 -x OCL5
geneferocl 3.3.2-7 (Apple-x86/OpenCL/64-bit)
Copyright 2001-2017, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2017, Iain Bethune
Genefer is free source code, under the MIT license.
Command line: ./primegrid_genefer_3_3_2_3.18_x86_64-apple-darwin__openclatiGFN18 -q 52186^131072+1 -x OCL5
Low priority change succeeded.
Testing 52186^131072+1...
Using OCL5 transform
Running on platform 'Apple', device 'ATI Radeon RX 480 Compute Engine', vendor 'AMD', version 'OpenCL 1.2 ' and driver '1.2 (Nov 9 2017 18:48:40)'.
36 computeUnits @ 1266MHz, memSize=8192MB, cacheSize=0kB, cacheLineSize=0B, localMemSize=32kB, maxWorkGroupSize=256.
Starting initialization...
Initialization complete (0.120 seconds).
Estimated time for 52186^131072+1 is 0:11:10
Testing 52186^131072+1... 381732 steps to go (0:02:03 remaining)
maxErr exceeded for 52186^131072+1, 1.0000 > 0.4500
Errors occurred for all available transform implementations
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Repeat Mac/AMD OCL5 tests resulted in errors. Same error occurred on production version.
I just want to confirm Grebuloner's result... I ran this (Windows 7, NVidia GTX 1060) using OCL5 and it worked. I also ran it without specifying the transform and it ran OCL4, which also worked. (I.e., duplicating B53 and B55.)
____________
My lucky number is 75898524288+1 | |
|
|
I got my little C2D back up again, reserving/doing 32 bit Win CPU tests.
EDIT: Nevermind. narrowly beaten to the punch.
____________
Eating more cheese on Thursdays. | |
|
dukebgVolunteer tester
 Send message
Joined: 21 Nov 17 Posts: 242 ID: 950482 Credit: 23,670,125 RAC: 0
                  
|
I might even be able to run tests on Pentium 4. Which would be an even "purer" in that it doesn't have 64-bit instructions at all. Sorry, Grebuloner :[ | |
|
dukebgVolunteer tester
 Send message
Joined: 21 Nov 17 Posts: 242 ID: 950482 Credit: 23,670,125 RAC: 0
                  
|
Ok, that second computer turned out to be Pentium 4 511 (prescott, 90nm), one of the first chips to get Intel's x86-64-bit instruction set. So there's no real "pureness" difference testing on it or on that pentium dual-core. Meh.
So I ran a bunch of tests to see the application works there, but I'm only going to commit to fully running the listed tests on the first computer above which I posted first.
Grebuloner, by the way, I'm not going to run the BOINC win32 test, you can still grab that one, if you want! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
I have an old pc running 32-bit windows xp. It has an Intel Pentium E2180 (65nm technology, no SSE4 yet, but dual core). Should be fine for a "pure" windows32 test?
It's... better than testing under 64 bit Windows, but that's still a 64 bit CPU. It will execute 64 bit instructions. You *could* run a 64 bit operating system on it if you wanted to. However, testing under XP, in and of itself, is useful as it will show if the build has any incompatibilities with XP. Compiling code with modern compilers that will run on Win 10 as well as XP is getting harder and harder every year.
Until last year I had a true 32 bit computer running XP, but it died in 2017. It was very useful for testing that app builds truly had no instructions that wouldn't run on plain vanilla CPUs. No SSE2, no x64, nothing.
____________
My lucky number is 75898524288+1 | |
|
|
I have an old pc running 32-bit windows xp. It has an Intel Pentium E2180 (65nm technology, no SSE4 yet, but dual core). Should be fine for a "pure" windows32 test?
It's... better than testing under 64 bit Windows, but that's still a 64 bit CPU. It will execute 64 bit instructions. You *could* run a 64 bit operating system on it if you wanted to. However, testing under XP, in and of itself, is useful as it will show if the build has any incompatibilities with XP. Compiling code with modern compilers that will run on Win 10 as well as XP is getting harder and harder every year.
Until last year I had a true 32 bit computer running XP, but it died in 2017. It was very useful for testing that app builds truly had no instructions that wouldn't run on plain vanilla CPUs. No SSE2, no x64, nothing.
Have you tried IntelĀ® Software Development Emulator? It can emulate Intel Quark CPU, which has IA-32 architecture. It will complain if program will try to execute instruction not supported by specified CPU.
____________
| |
|
|
Update on OCL5 "bug" seen on 52186^131072+1:
As far as we know at present this is likely to be a hardware-related issue rather than a code bug. Round-off errors have been seen on Van's Mac with Radeon RX 480 and composite's Linux box with GeForce GTX 760. On Van's machine the error is reproducible on most runs, but not always at the same point in the calculation. On composite's it has only been seen once. Van also sees the same behaviour with the current stock app (3.3.2).
We have over 40 successful tests in both Linux/Nvidia and Mac/AMD setups, and have additional successful tests on Windows/Nvidia and Mac/Nvidia.
composite - would you be able to re-run the test several times on your host where you first saw the bug and see if appears to be recreatable?
In any case, I think we can continue with the test programme - if anyone has Linux and/or Windows systems with AMD GPUs, that could run some tests that would be very helpful - PM me if you're unsure what to do!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 995 ID: 55391 Credit: 865,414,824 RAC: 253,024
                       
|
I still get the same error with OCL5 on Linux testing 52186^131072+1. The first time I tried it today, I got the error twice during the same test (with resumption). Sometimes I don't get the error. It's random.
This GPU works reliably with OCL2. No invalid tasks returned in the last 1256 GFN16 tasks completed during the challenge.
EDIT: When I remove the checkpoint file after failure, the error occurs at different iterations. | |
|
|
G76 errored out, but I suspect that is my now-untrustworthe RX480. | |
|
|
Genefer 3.3.3 has now been released and I've applied manual credit for all of the tests that were run. Thanks to everyone who contributed to the testing - and congrats to dukebg, composite and Kiska on your new "Volunteer tester" status!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Message boards :
Generalized Fermat Prime Search :
Genefer 3.3.3 testing |