Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Genefer 3.2.5 and 3.2.6 testing
Author |
Message |
|
Hi all,
There are some binaries of the new 3.2.5-dev version available on assembla:
Mac: https://www.assembla.com/code/genefer/subversion/nodes/746/trunk/bin/mac
Windows: https://www.assembla.com/code/genefer/subversion/nodes/746/trunk/bin/windows
Linux: coming soon
This version includes an improved OpenCL transform - should up to 5% faster on some cards, and also has fixed the bug which caused very high MaxErr on Mac. Optimised transforms for AMD CPUs (FMA4 and AVX) are also included. For Intel and pre-Bulldozer AMD CPUs the code will automatically select the fastest transform without having to run benchmarks.
Please go ahead and test the new versions (standalone, on PRPnet or BOINC with app_info.xml) and let us know how it goes.
Thanks
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 3,014,129,146 RAC: 885,033
                              
|
Does not work on my R9-280X (Catalyst 14.4, Win7 x64):
>geneferocl-windows.exe -q "360204^4194304+1"
geneferocl 3.2.5-dev (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: geneferocl-windows.exe -q 360204^4194304+1
Priority change succeeded.
Testing 360204^4194304+1...
Using OCL transform
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', version 'OpenCL 1.2 AMD-APP (1445.5)' and driver '1445.5 (VM)'.
Starting initialization...
Initialization complete (91.940 seconds).
Error: OpenCL error detected: CL_INVALID_KERNEL_ARGS.
An error (2948) occured.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,073,832,871 RAC: 21,734,063
                                                
|
Also crashes immediately on my GTX 645 (Win Vista x64, driver 331.82)
| |
|
|
Crash with geneferocl-windows.exe Win7 x64
Checkpoint file gets written to my disk before it crashes
E:\Programme>geneferocl-windows.exe -q "360204^4194304+1"
geneferocl 3.2.5-dev (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: geneferocl-windows.exe -q 360204^4194304+1
Priority change succeeded.
Testing 360204^4194304+1...
Using OCL transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 770', version 'OpenCL 1.1
CUDA' and driver '337.88'.
Resuming 360204^4194304+1 from a checkpoint (77420369 iterations left)
E:\Programme>geneferocl-windows.exe -q "360204^4194304+1"
geneferocl 3.2.5-dev (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: geneferocl-windows.exe -q 360204^4194304+1
Priority change succeeded.
Testing 360204^4194304+1...
Using OCL transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 770', version 'OpenCL 1.1
CUDA' and driver '337.88'.
Starting initialization...
Initialization complete (149.919 seconds).
Testing 360204^4194304+1... 77420369 steps to go | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
The good news is geneferCUDA seems to work. That's pretty much the only good news, unfortunately.
The Genefer64 (the CPU version) successfully runs the -V command. Anything beyond printing the display banner and it just starts using CPU time but nothing seems to happen. Using "-t -x fma3" (or any other transform) just sits there and doesn't output anything.
GeneferOCL (on a GTX580) does this when running -t:
C:\Temp\GFN\Win\3.2.5>geneferocl-windows_3.2.5.exe -t
geneferocl 3.2.5-dev (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: geneferocl-windows_3.2.5.exe -t
Priority change succeeded.
Running tests for transform implementation "OCL"
Testing 100234^64+1...
Using OCL transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 580', version 'OpenCL 1.1 CUDA' and driver '334.89'.
The checkpoint doesn't match current test: 100234^64+1 != 1277444^32768+1. Current test will be restarted
Starting initialization...
Initialization complete (0.001 seconds).
Estimated time remaining for 100234^64+1 is 0:00:00
maxErr exceeded for 100234^64+1, 0.5000 > 0.4500 during final check
Trying to run -l results in the program immediately crashing before any output and getting the Windows "The program stopped working" dialog.
____________
My lucky number is 75898524288+1 | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
AMD HD7970 GPU, Windows 7 64 bit
>geneferocl-windows.exe -q "75898^524288+1"
geneferocl 3.2.5-dev (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: geneferocl-windows.exe -q 75898^524288+1
Priority change succeeded.
Testing 75898^524288+1...
Using OCL transform
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', vers
ion 'OpenCL 1.2 AMD-APP (1445.5)' and driver '1445.5 (VM)'.
Starting initialization...
Initialization complete (3.984 seconds).
Error: OpenCL error detected: CL_INVALID_KERNEL_ARGS.
An error (2948) occured.
AMD X6 1100T CPU, Windows 7 64 bit
>genefer_windows64.exe -q "6960350^32768+1"
genefer 3.2.5-dev (Windows/CPU/64-bit)
Supported transform implementations: sse2 default x87
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefer_windows64.exe -q 6960350^32768+1
Priority change succeeded.
Testing 6960350^32768+1...
Using SSE2 transform
Starting initialization...
Initialization complete (0.039 seconds).
Testing 6960350^32768+1... 744839 steps to go
maxErr exceeded for 6960350^32768+1, 0.5000 > 0.4500
maxErr exceeded while using SSE2; switching to Default.
Testing 6960350^32768+1...
Using Default transform
Resuming 6960350^32768+1 from a checkpoint (744839 iterations left)
maxErr exceeded for 6960350^32768+1, 0.5000 > 0.4500
maxErr exceeded while using Default; switching to x87 (80-bit).
Testing 6960350^32768+1...
Using x87 (80-bit) transform
Resuming 6960350^32768+1 from a checkpoint (744839 iterations left)
Estimated time remaining for 6960350^32768+1 is 0:32:38
Testing 6960350^32768+1... 741376 steps to go (0:32:06 remaining)
Successful computation progress with x87 (80-bit); switching back to SSE2.
Testing 6960350^32768+1...
Using SSE2 transform
Resuming 6960350^32768+1 from a checkpoint (741375 iterations left)
maxErr exceeded for 6960350^32768+1, 0.5000 > 0.4500
maxErr exceeded while using SSE2; switching to Default.
Too many errors with SSE2; Calculation will proceed using only more accurate tra
nsforms.
Testing 6960350^32768+1...
Using Default transform
Resuming 6960350^32768+1 from a checkpoint (741375 iterations left)
maxErr exceeded for 6960350^32768+1, 1.0000 > 0.4500
maxErr exceeded while using Default; switching to x87 (80-bit).
Too many errors with Default; Calculation will proceed using only more accurate
transforms.
Testing 6960350^32768+1...
Using x87 (80-bit) transform
Resuming 6960350^32768+1 from a checkpoint (741375 iterations left)
Estimated time remaining for 6960350^32768+1 is 0:32:10
6960350^32768+1 is a probable prime. (224220 digits) (err = 0.0067) (time = 0:36:05) 09:05:30
After running this test I noticed a "verif.txt". Is that a new thing or is always created when prime found? | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
I just committed a new geneferocl-windows.exe binary.
The previous build was not correct: the OpenCL src was an old version, not compatible with CPU API. | |
|
|
I just committed a new geneferocl-windows.exe binary.
The previous build was not correct: the OpenCL src was an old version, not compatible with CPU API.
The updated Windows OpenCL binary is here: https://www.assembla.com/code/genefer/subversion/nodes/747/trunk/bin/windows/geneferocl-windows.exe
I'm looking into the issue with the CPU app.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,073,832,871 RAC: 21,734,063
                                                
|
OpenCL runs now on the GTX 645. However, it is slower than the current app version on PRPnet in many cases:
Current PRPnet version
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 142 us/mul. Err: 0.1934 51956 digits
1798620^16384+1 Time: 148 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 168 us/mul. Err: 0.2031 202102 digits
1203210^65536+1 Time: 285 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 538 us/mul. Err: 0.1953 785521 digits
804904^262144+1 Time: 1.05 ms/mul. Err: 0.1953 1548156 digits
658332^524288+1 Time: 2.13 ms/mul. Err: 0.1953 3050541 digits
538452^1048576+1 Time: 4.35 ms/mul. Err: 0.1953 6009544 digits
440400^2097152+1 Time: 8.78 ms/mul. Err: 0.1641 11836006 digits
360204^4194304+1 Time: 18.5 ms/mul. Err: 0.1719 23305854 digits
294612^8388608+1 Time: 38.5 ms/mul. Err: 0.1680 45879398 digits
Genefer Mark = 23.
New App
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 645', version 'OpenCL 1.1
CUDA' and driver '331.82'.
2199064^8192+1 Time: 160 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 170 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 190 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 296 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 538 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 1.01 ms/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 2 ms/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 4.09 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 8.43 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 17.5 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 36.9 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 24.
EDIT: Of course, the new app is faster in the ranges that we can still do GPU work on. :) | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
EDIT: Of course, the new app is faster in the ranges that we can still do GPU work on. :)
I just installed the latest version of NVidia driver (the 340.62 which is included with Cuda 6.5) and genefer is slower in large ranges! :o(
On a GeForce GT 740M:
Generalized Fermat Number Bench OCL 3.2.5-dev (driver 335.23)
2199064^8192+1 Time: 243 us/mul. Err: 0.1836 51956 digits
1798620^16384+1 Time: 251 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 260 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 448 us/mul. Err: 0.1797 398482 digits
984108^131072+1 Time: 892 us/mul. Err: 0.1738 785521 digits
804904^262144+1 Time: 1.76 ms/mul. Err: 0.1821 1548156 digits
658332^524288+1 Time: 3.55 ms/mul. Err: 0.1719 3050541 digits
538452^1048576+1 Time: 7.3 ms/mul. Err: 0.1719 6009544 digits
440400^2097152+1 Time: 15.2 ms/mul. Err: 0.1719 11836006 digits
360204^4194304+1 Time: 31.6 ms/mul. Err: 0.1680 23305854 digits
294612^8388608+1 Time: 66.5 ms/mul. Err: 0.1563 45879398 digits
Genefer Mark = 13.
Generalized Fermat Number Bench OCL 3.2.5-dev (driver 340.62)
2199064^8192+1 Time: 70.6 us/mul. Err: 0.1836 51956 digits
1798620^16384+1 Time: 122 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 225 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 446 us/mul. Err: 0.1797 398482 digits
984108^131072+1 Time: 946 us/mul. Err: 0.1738 785521 digits
804904^262144+1 Time: 1.77 ms/mul. Err: 0.1821 1548156 digits
658332^524288+1 Time: 3.6 ms/mul. Err: 0.1719 3050541 digits
538452^1048576+1 Time: 7.51 ms/mul. Err: 0.1719 6009544 digits
440400^2097152+1 Time: 15.6 ms/mul. Err: 0.1719 11836006 digits
360204^4194304+1 Time: 33 ms/mul. Err: 0.1680 23305854 digits
294612^8388608+1 Time: 72.8 ms/mul. Err: 0.1563 45879398 digits
Genefer Mark = 13.
I'm trying to understand why... | |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 3,014,129,146 RAC: 885,033
                              
|
Benchmark results for R9-280X (Catalyst 14.4, Win7 x64):
geneferocl 3.2.3
Generalized Fermat Number Bench
2199064^8192+1 Time: 48.2 us/mul. Err: 0.1934 51956 digits
1798620^16384+1 Time: 51.6 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 57.4 us/mul. Err: 0.2031 202102 digits
1203210^65536+1 Time: 98.9 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 139 us/mul. Err: 0.1953 785521 digits
804904^262144+1 Time: 288 us/mul. Err: 0.1953 1548156 digits
658332^524288+1 Time: 488 us/mul. Err: 0.1875 3050541 digits
538452^1048576+1 Time: 996 us/mul. Err: 0.1953 6009544 digits
440400^2097152+1 Time: 1.91 ms/mul. Err: 0.1641 11836006 digits
360204^4194304+1 Time: 3.83 ms/mul. Err: 0.1719 23305854 digits
294612^8388608+1 Time: 8.13 ms/mul. Err: 0.1641 45879398 digits
Genefer Mark = 105.
geneferocl 3.2.5-dev
Generalized Fermat Number Bench
2199064^8192+1 Time: 48.2 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 52.5 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 58 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 98.9 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 142 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 283 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 479 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 918 us/mul. Err: 0.1758 6009544 digits
440400^2097152+1 Time: 1.88 ms/mul. Err: 0.1680 11836006 digits
360204^4194304+1 Time: 3.83 ms/mul. Err: 0.1563 23305854 digits
294612^8388608+1 Time: 7.81 ms/mul. Err: 0.1719 45879398 digits
Genefer Mark = 108.
____________
| |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
AMD HD7970GHz GPU, AMD X6 1100T CPU, Windows 7 64 bit.
Benchies show an average 7% speedup for the high N's we care about.
N=1048576 even shows a 10.5% speedup:
>primegrid_genefer_3_2_2_0_3.01_windows_intelx86__atiGFN.exe -b
geneferocl 3.2.2 (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: primegrid_genefer_3_2_2_0_3.01_windows_intelx86__atiGFN.exe -b
Priority change succeeded.
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', vers
ion 'OpenCL 1.2 AMD-APP (1445.5)' and driver '1445.5 (VM)'.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
2199064^8192+1 Time: 55.3 us/mul. Err: 0.1934 51956 digits
1798620^16384+1 Time: 55.5 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 62 us/mul. Err: 0.2031 202102 digits
1203210^65536+1 Time: 90.8 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 136 us/mul. Err: 0.1953 785521 digits
804904^262144+1 Time: 282 us/mul. Err: 0.1953 1548156 digits
658332^524288+1 Time: 518 us/mul. Err: 0.1875 3050541 digits
538452^1048576+1 Time: 1.04 ms/mul. Err: 0.1953 6009544 digits
440400^2097152+1 Time: 2.01 ms/mul. Err: 0.1719 11836006 digits
360204^4194304+1 Time: 4 ms/mul. Err: 0.1719 23305854 digits
294612^8388608+1 Time: 8.28 ms/mul. Err: 0.1641 45879398 digits
Genefer Mark = 101.
Priority change succeeded.
>geneferocl-windows_rev747.exe -b
geneferocl 3.2.5-dev (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: geneferocl-windows_rev747.exe -b
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "OCL"
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', vers
ion 'OpenCL 1.2 AMD-APP (1445.5)' and driver '1445.5 (VM)'.
2199064^8192+1 Time: 55.1 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 54.5 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 59.6 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 88.9 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 132 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 272 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 494 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 930 us/mul. Err: 0.1758 6009544 digits
440400^2097152+1 Time: 1.88 ms/mul. Err: 0.1719 11836006 digits
360204^4194304+1 Time: 3.78 ms/mul. Err: 0.1621 23305854 digits
294612^8388608+1 Time: 7.69 ms/mul. Err: 0.1621 45879398 digits
Genefer Mark = 109.
Priority change succeeded.
Limits tests:
geneferocl 3.2.2 (Windows/OpenCL/32-bit)
...
The upper bound m = 524288, b = 895000, Err = 0.2813
The upper bound m = 1048576, b = 750000, Err = 0.3086
The upper bound m = 2097152, b = 600000, Err = 0.2813
The upper bound m = 4194304, b = 505000, Err = 0.2896
The upper bound m = 8388608, b = 405000, Err = 0.3125
geneferocl 3.2.5-dev (Windows/OpenCL/32-bit)
...
The upper bound m = 524288, b = 905000, Err = 0.2969
The upper bound m = 1048576, b = 750000, Err = 0.2969
The upper bound m = 2097152, b = 625000, Err = 0.3086
The upper bound m = 4194304, b = 505000, Err = 0.3047
The upper bound m = 8388608, b = 405000, Err = 0.2813
Some improvement in the limits for N=524288 and N=2097152.
All in all 3.2.5 is an improvement over 3.2.2 for my PC | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
Could you try AMD_OCL_BUILD_OPTIONS=-cl-opt-disable
http://www.primegrid.com/forum_thread.php?id=5255&nowrap=true#78922
on ATI cards?
If it is faster on all ATI GPU, it will be the default value of the program.
Thanks. | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Could you try AMD_OCL_BUILD_OPTIONS=-cl-opt-disable
http://www.primegrid.com/forum_thread.php?id=5255&nowrap=true#78922
on ATI cards?
If it is faster on all ATI GPU, it will be the default value of the program.
Thanks.
You can check if this environment variable is set with the command:
>echo %AMD_OCL_BUILD_OPTIONS%
If just comes back with "%AMD_OCL_BUILD_OPTIONS%" then its not set.
Maybe this is a compile time only option. Seems to make no difference to my system.
I tried "-cl-opt-disable", "-g -O0" and "-O0". | |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 686 ID: 845 Credit: 3,014,129,146 RAC: 885,033
                              
|
AMD_OCL_BUILD_OPTIONS=-cl-opt-disable is slower for me:
Generalized Fermat Number Bench
2199064^8192+1 Time: 61.3 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 67.7 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 75.6 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 98.1 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 168 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 311 us/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 482 us/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 1.04 ms/mul. Err: 0.1758 6009544 digits
440400^2097152+1 Time: 1.81 ms/mul. Err: 0.1680 11836006 digits
360204^4194304+1 Time: 4.03 ms/mul. Err: 0.1563 23305854 digits
294612^8388608+1 Time: 8.16 ms/mul. Err: 0.1719 45879398 digits
Genefer Mark = 104.
____________
| |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Hi all,
There are some binaries of the new 3.2.5-dev version available on assembla:
Mac: https://www.assembla.com/code/genefer/subversion/nodes/746/trunk/bin/mac
Windows: https://www.assembla.com/code/genefer/subversion/nodes/746/trunk/bin/windows
Linux: coming soon
This version includes an improved OpenCL transform - should up to 5% faster on some cards, and also has fixed the bug which caused very high MaxErr on Mac. Optimised transforms for AMD CPUs (FMA4 and AVX) are also included. For Intel and pre-Bulldozer AMD CPUs the code will automatically select the fastest transform without having to run benchmarks.
Please go ahead and test the new versions (standalone, on PRPnet or BOINC with app_info.xml) and let us know how it goes.
Thanks
- Iain
The Linux apps are ready:
https://www.assembla.com/code/genefer/subversion/nodes/751/trunk/bin/linux
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,073,832,871 RAC: 21,734,063
                                                
|
Verified faster times on different GTX 645s in Win 7 environment using slightly different driver (332.21).
However, using same driver and Win 7 environment, the new app is actually slower or no different on an OEM GTX 660 (only 524288 and 4194304 tests are slightly faster).
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
Could you try AMD_OCL_BUILD_OPTIONS=-cl-opt-disable
http://www.primegrid.com/forum_thread.php?id=5255&nowrap=true#78922
on ATI cards?
Maybe this is a compile time only option. Seems to make no difference to my system.
I tried "-cl-opt-disable", "-g -O0" and "-O0".
It is a slightly slower on Tahiti, the bug of the optimizer is just on Pitcairn.
AMD_OCL_BUILD_OPTIONS is a runtime option: on my R9 270, I used the same binary and just set the environment variable.
On this computer a GFN-Short test was done in about 42000 sec. I set AMD_OCL_BUILD_OPTIONS to -cl-opt-disable and now it is done in about 37000 sec (with BOINC 3.03 version).
Thanks. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
Verified faster times on different GTX 645s in Win 7 environment using slightly different driver (332.21).
However, using same driver and Win 7 environment, the new app is actually slower or no different on an OEM GTX 660 (only 524288 and 4194304 tests are slightly faster).
Faster on GK106, slower on GK104... GPU design is a mystery.
| |
|
|
Hi all,
A new set of binaries has just been posted that should address all the problems reported in the this thread:
Mac: https://www.assembla.com/code/genefer/subversion/nodes/763/trunk/bin/mac
Windows: https://www.assembla.com/code/genefer/subversion/nodes/763/trunk/bin/windows
Linux: https://www.assembla.com/code/genefer/subversion/nodes/763/trunk/bin/linux
Also, it contains updates to the FMA transform code which should be faster, but at least on some processors has shown to slower. If anyone has an FMA-capable CPU please try it out and report your results here. Bug reports, suggestions for improvement etc. are all welcome as usual.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Mac: https://www.assembla.com/code/genefer/subversion/nodes/763/trunk/bin/mac
Windows: https://www.assembla.com/code/genefer/subversion/nodes/763/trunk/bin/windows
Linux: https://www.assembla.com/code/genefer/subversion/nodes/763/trunk/bin/linux
Anyone had a chance to test yet?
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1963 ID: 352 Credit: 6,410,580,365 RAC: 2,769,531
                                      
|
Anyone had a chance to test yet?
Windows 7 x64, i5-4670
First test is genefer from PRPNet client, second one is latest download from assembla.
c:\temp>genefer64.exe -b -x fma
genefer 3.2.1-0 (Windows/CPU/64-bit)
Supported transform implementations: default x87 sse2 sse4 avx fma
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefer64.exe -b -x fma
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "FMA"
6008024^256+1 Time: 0 us/mul. Err: 0.1250 1736 digits
4913974^512+1 Time: 0 us/mul. Err: 0.1250 3427 digits
4019150^1024+1 Time: 14.6 us/mul. Err: 0.1250 6763 digits
3287270^2048+1 Time: 0 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 11.5 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 22.8 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 48.5 us/mul. Err: 0.1484 102481 digits
1471094^32768+1 Time: 109 us/mul. Err: 0.1406 202102 digits
1203210^65536+1 Time: 225 us/mul. Err: 0.1484 398482 digits
984108^131072+1 Time: 495 us/mul. Err: 0.1406 785521 digits
804904^262144+1 Time: 1.02 ms/mul. Err: 0.1445 1548156 digits
658332^524288+1 Time: 2.31 ms/mul. Err: 0.1406 3050541 digits
538452^1048576+1 Time: 5.05 ms/mul. Err: 0.1328 6009544 digits
440400^2097152+1 Time: 11.6 ms/mul. Err: 0.1318 11836006 digits
360204^4194304+1 Time: 23.6 ms/mul. Err: 0.1250 23305854 digits
294612^8388608+1 Time: 59 ms/mul. Err: 0.1250 45879398 digits
Genefer Mark = 17.
Priority change succeeded.
c:\temp>genefer_windows64.exe -b -x fma3
genefer 3.2.5-dev (Windows/CPU/64-bit)
Supported transform implementations: fma3 avx-intel sse4 sse2 default x87
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefer_windows64.exe -b -x fma3
Priority change succeeded.
Priority change succeeded.
Generalized Fermat Number Bench
Running benchmarks for transform implementation "FMA3"
6008024^256+1 Time: 0.735 us/mul. Err: 0.1484 1736 digits
4913974^512+1 Time: 1.22 us/mul. Err: 0.1562 3427 digits
4019150^1024+1 Time: 2.46 us/mul. Err: 0.1602 6763 digits
3287270^2048+1 Time: 5.13 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 10.4 us/mul. Err: 0.1406 26336 digits
2199064^8192+1 Time: 23 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 46.2 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 106 us/mul. Err: 0.1641 202102 digits
1203210^65536+1 Time: 214 us/mul. Err: 0.1523 398482 digits
984108^131072+1 Time: 490 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 987 us/mul. Err: 0.1562 1548156 digits
658332^524288+1 Time: 2.28 ms/mul. Err: 0.1445 3050541 digits
538452^1048576+1 Time: 4.93 ms/mul. Err: 0.1406 6009544 digits
440400^2097152+1 Time: 11.4 ms/mul. Err: 0.1328 11836006 digits
360204^4194304+1 Time: 23.2 ms/mul. Err: 0.1375 23305854 digits
294612^8388608+1 Time: 57.6 ms/mul. Err: 0.1289 45879398 digits
Genefer Mark = 18.
Priority change succeeded.
____________
My stats | |
|
|
Intel AVX i5 3230m@2.60GHz/socket989/128bit/dualchannel1600MHz
Generalized Fermat Number Bench
6008024^256+1 Time: 1.1 us/mul. Err: 0.1406 1736 digits
4913974^512+1 Time: 1.78 us/mul. Err: 0.1250 3427 digits
4019150^1024+1 Time: 3.8 us/mul. Err: 0.1250 6763 digits
3287270^2048+1 Time: 7.5 us/mul. Err: 0.1406 13347 digits
2688666^4096+1 Time: 15.3 us/mul. Err: 0.1562 26336 digits
2199064^8192+1 Time: 33.5 us/mul. Err: 0.1562 51956 digits
1798620^16384+1 Time: 70.9 us/mul. Err: 0.1562 102481 digits
1471094^32768+1 Time: 154 us/mul. Err: 0.1641 202102 digits
1203210^65536+1 Time: 324 us/mul. Err: 0.1562 398482 digits
984108^131072+1 Time: 715 us/mul. Err: 0.1484 785521 digits
804904^262144+1 Time: 1.52 ms/mul. Err: 0.1484 1548156 digits
658332^524288+1 Time: 3.54 ms/mul. Err: 0.1406 3050541 digits
538452^1048576+1 Time: 7.34 ms/mul. Err: 0.1367 6009544 digits
440400^2097152+1 Time: 17.3 ms/mul. Err: 0.1406 11836006 digits
360204^4194304+1 Time: 35.6 ms/mul. Err: 0.1406 23305854 digits
294612^8388608+1 Time: 94.7 ms/mul. Err: 0.1328 45879398 digits
Genefer Mark = 11.
NVidia GT650m@835MHz/1000MHz(4000MHz)128bitDDR5memory/Driver343.98
Generalized Fermat Number Bench
2199064^8192+1 Time: 72.5 us/mul. Err: 0.1875 51956 digits
1798620^16384+1 Time: 122 us/mul. Err: 0.1875 102481 digits
1471094^32768+1 Time: 219 us/mul. Err: 0.1875 202102 digits
1203210^65536+1 Time: 427 us/mul. Err: 0.1875 398482 digits
984108^131072+1 Time: 916 us/mul. Err: 0.1797 785521 digits
804904^262144+1 Time: 1.72 ms/mul. Err: 0.1719 1548156 digits
658332^524288+1 Time: 3.51 ms/mul. Err: 0.1797 3050541 digits
538452^1048576+1 Time: 7.39 ms/mul. Err: 0.1738 6009544 digits
440400^2097152+1 Time: 15.4 ms/mul. Err: 0.1777 11836006 digits
360204^4194304+1 Time: 32.2 ms/mul. Err: 0.1592 23305854 digits
294612^8388608+1 Time: 70.8 ms/mul. Err: 0.1797 45879398 digits
Genefer Mark = 13.
-------Intel3230m B-limits
Generalized Fermat Number b Limits
The upper bound m = 256, b = 7670000, Err = 0.2812
Starting b = 9470000, Err b = 7675000, Err = 0.3125, 5 Err b = 0
The upper bound m = 512, b = 6910000, Err = 0.2812
Starting b = 7690000, Err b = 6915000, Err = 0.3125, 5 Err b = 0
The upper bound m = 1024, b = 5815000, Err = 0.2812
Starting b = 6250000, Err b = 5820000, Err = 0.3125, 5 Err b = 0
The upper bound m = 2048, b = 4680000, Err = 0.2812
Starting b = 5070000, Err b = 4685000, Err = 0.3125, 5 Err b = 0
The upper bound m = 4096, b = 3835000, Err = 0.2812
Starting b = 4120000, Err b = 3840000, Err = 0.3125, 5 Err b = 0
The upper bound m = 8192, b = 3185000, Err = 0.2812
Starting b = 3340000, Err b = 3190000, Err = 0.3125, 5 Err b = 0
The upper bound m = 16384, b = 2640000, Err = 0.2969
Starting b = 2720000, Err b = 2645000, Err = 0.3125, 5 Err b = 0
The upper bound m = 32768, b = 2165000, Err = 0.2969
Starting b = 2200000, Err b = 2170000, Err = 0.3125, 5 Err b = 0
The upper bound m = 65536, b = 1785000, Err = 0.2812
Starting b = 1790000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 131072, b = 1445000, Err = 0.2969
Starting b = 1450000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 262144, b = 1175000, Err = 0.2969
Starting b = 1180000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 524288, b = 955000, Err = 0.2656
Starting b = 960000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 1048576, b = 775000, Err = 0.2812
Starting b = 780000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 2097152, b = 625000, Err = 0.2812
Starting b = 630000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 4194304, b = 505000, Err = 0.2500
Starting b = 510000, Err b = 0, Err = 0.0000, 5 Err b = 0
-------GT650m B-limits
Generalized Fermat Number b Limits
The upper bound m = 8192, b = 2920000, Err = 0.2969
Starting b = 3340000, Err b = 2925000, Err = 0.3125, 5 Err b = 0
The upper bound m = 16384, b = 2435000, Err = 0.2969
Starting b = 2720000, Err b = 2440000, Err = 0.3125, 5 Err b = 0
The upper bound m = 32768, b = 1975000, Err = 0.2813
Starting b = 2200000, Err b = 1980000, Err = 0.3164, 5 Err b = 0
The upper bound m = 65536, b = 1645000, Err = 0.2842
Starting b = 1790000, Err b = 1650000, Err = 0.3125, 5 Err b = 0
The upper bound m = 131072, b = 1335000, Err = 0.2969
Starting b = 1450000, Err b = 1340000, Err = 0.3203, 5 Err b = 0
The upper bound m = 262144, b = 1090000, Err = 0.2969
Starting b = 1180000, Err b = 1095000, Err = 0.3281, 5 Err b = 0
The upper bound m = 524288, b = 925000, Err = 0.3008
Starting b = 960000, Err b = 930000, Err = 0.3281, 5 Err b = 0
The upper bound m = 1048576, b = 750000, Err = 0.2969
Starting b = 780000, Err b = 755000, Err = 0.3125, 5 Err b = 0
The upper bound m = 2097152, b = 620000, Err = 0.3042
Starting b = 630000, Err b = 625000, Err = 0.3125, 5 Err b = 0
The upper bound m = 4194304, b = 505000, Err = 0.2813
Starting b = 510000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 8388608, b = 405000, Err = 0.2891
Starting b = 410000, Err b = 0, Err = 0.0000, 5 Err b = 0 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
The Mac and Windows CPU versions of 3.2.5 are live as BOINC version 3.05. Please let me know if you encounter any problems.
The OpenCL versions for Mac and Windows were briefly online, but at least with the Windows versions there seems to have been a problem, so if your computer downloaded GFN or GFN-WR version 3.05 for your GPU, it may have crashed with a computation error. Right now, we're back to using 3.04 or 3.03 of the OpenCL apps, so any new tasks you download should work. Sorry for any inconvenience that may have caused.
Linux apps will be coming soon. CUDA is unchanged.
3.2.5 contains the following enhancements:
OpenCL:
Improved OpenCL b limit (on Mac) and performance.
CPU:
Optimised FMA4 and AVX transforms for AMD CPUs.
Transform auto-selection for Intel and some AMD CPUs.
Faster FMA transform.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
The Windows and Mac OpenCL (both ATI and Nvidia) apps are now online. They are BOINC version 3.05 for Mac and 3.06 for Windows. Let me know if you experience any difficulties.
The Linux apps are in the pipeline and coming soon. ("Soon" means hours or days. If I had a better estimate, I'd give it to you.)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
The new Linux apps are now online. Let me know if there's any difficulties.
____________
My lucky number is 75898524288+1 | |
|
|
I'm encountering a behavior with the GFN-WR app 3.2.5 (3.06 under Windows 8.0 x64) after shutting down BOINC and restarting. The app won't run at full load on my graphics card, normally at 98%, unless I wait about an hour before restarting BOINC. There are no overheating issues with the card and there is never anything in the stderr.txt file to shed any light on what is happening. Here's a sample of the latest text in the file, although it's truncated because it's just more of the same preceding what I've posted during each shutdown and restart of a given task.
Terminating because BOINC client requested that we should quit.
geneferocl 3.2.5 (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: projects/www.primegrid.com/primegrid_genefer_3_2_5_0_3.06_windows_intelx86__OCLcudaGFNWR.exe -boinc -q 35556^4194304+1 --device 0
Priority change succeeded.
Using OCL transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX TITAN Black', version 'OpenCL 1.1 CUDA' and driver '344.75'.
Resuming 35556^4194304+1 from a checkpoint (5372535 iterations left)
Estimated time remaining for 35556^4194304+1 is 4:27:24 | |
|
|
I'm encountering a behavior with the GFN-WR app 3.2.5 (3.06 under Windows 8.0 x64) after shutting down BOINC and restarting. The app won't run at full load on my graphics card, normally at 98%, unless I wait about an hour before restarting BOINC. There are no overheating issues with the card and there is never anything in the stderr.txt file to shed any light on what is happening. Here's a sample of the latest text in the file, although it's truncated because it's just more of the same preceding what I've posted during each shutdown and restart of a given task.
That is very strange. When BOINC shuts down it should kill the geneferocl process - can you check that this is what happens e.g. by eyeballing the list of processes in the task manager? Or better still check that the modified timestamps of the files in your BOINC slot directory are no later than the time that you shut down BOINC.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
That is very strange. When BOINC shuts down it should kill the geneferocl process - can you check that this is what happens e.g. by eyeballing the list of processes in the task manager? Or better still check that the modified timestamps of the files in your BOINC slot directory are no later than the time that you shut down BOINC.
- Iain
...and of course after making sure that it was a repeatable behavior before posting about it, I now cannot reproduce it!
Thanks for your reply and suggestion. It has happened on occasion before so the next time it occurs I will check the task manager for a still-running process and the timestamps of the files in the slot directory at that time.
EDIT: When it occurred this last time, I even went so far as to restart Windows twice and each time that odd behavior was the same (load on GPU varying from about 85% to 95% every second or two). Right now this new task is running normally with a steady 98% GPU load. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
I found a memory handling bug in genefer 3.2.5, in Linux using the valgrind memcheck tool.
backend->SetA uses alen, the number of bytes of data read from the checkpoint file, as the number of elements for the new integer array na. When SetA copies the checkpoint data to the new array (correctly) using memcpy with alen as the number of bytes, it leaves the last 3/4 of na untouched (and uninitialized).
Later Norm tries to shrink the array, testing from the end of the array for elements that are zero. There is a good chance that this fails to reduce the size properly because there is no guarantee that new integer[] performed by SetA zeroes the memory that it allocates.
To fix the problem, if allocating 4 times the number of elements that are read from the checkpoint file is intentional, then array na should be cleared in SetA before memcpy is used. Otherwise array na should be created with the proper number of elements and clearing is unnecessary.
Without examining the code in detail to determine the intent (I suspect it's the latter), I just verified that the safe approach of clearing the na array in SetA, in every backend, fixes the problem according to valgrind memcheck.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
I found a memory handling bug in genefer 3.2.5, in Linux using the valgrind memcheck tool. [...]
You're right and it's a major bug!
The size (number of bytes) of the integer "a" is written to the checkpoint file, but when a checkpoint is restored, this value is used as a length (number of integers).
I will rewrite this part of the code without mixing size and length types.
Thanks, Yves | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
I found a memory handling bug in genefer 3.2.5, in Linux using the valgrind memcheck tool. [...]
You're right and it's a major bug!
The size (number of bytes) of the integer "a" is written to the checkpoint file, but when a checkpoint is restored, this value is used as a length (number of integers).
I will rewrite this part of the code without mixing size and length types.
Thanks, Yves
Does this affect all versions of Genefer (including OCL and CUDA) on all platforms? The backup and restore code should be identical in all of them, but not necessarily the part of the code that allocates the memory, if I remember correctly.
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
Does this affect all versions of Genefer (including OCL and CUDA) on all platforms? The backup and restore code should be identical in all of them, but not necessarily the part of the code that allocates the memory, if I remember correctly.
All versions, including CUDA. You remember correctly, previous version of CUDA code was correct Nlen = alen / sizeof(uint32_t) but the current version uses "__USE_CPU_INIT" flag and common save and restore functions.
I just committed the fix.
Note that because of double-checking, I don't think that a result of a full test can be wrong. If the exponent is not correctly initialized after a restore on one computer, it will take a different value on another computer and the final residue will be different.
Surprisingly there is no more invalid result with this bug...? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
Does this affect all versions of Genefer (including OCL and CUDA) on all platforms? The backup and restore code should be identical in all of them, but not necessarily the part of the code that allocates the memory, if I remember correctly.
All versions, including CUDA. You remember correctly, previous version of CUDA code was correct Nlen = alen / sizeof(uint32_t) but the current version uses "__USE_CPU_INIT" flag and common save and restore functions.
I just committed the fix.
Note that because of double-checking, I don't think that a result of a full test can be wrong. If the exponent is not correctly initialized after a restore on one computer, it will take a different value on another computer and the final residue will be different.
Surprisingly there is no more invalid result with this bug...?
We will need to thoroughly test all the different builds and apps before putting them into production, especially since we don't currently seem to have a problem in production.
If I understand the manifestation of the problem correctly, I should get a different residue if I start and resume a genefer test than I would if I ran it straight through without stopping, correct? That's easy enough to test.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
It seems to checkpoint and restore correctly...
With resuming from a checkpoint:
C:\PRPNet\prpclient-5.3.1-windows\prpclient-1>genefer64 -q "123456^8192+1"
genefer 3.2.5 (Windows/CPU/64-bit)
Supported transform implementations: fma3 avx-intel sse4 sse2 default x87
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefer64 -q 123456^8192+1
Priority change succeeded.
Testing 123456^8192+1...
Using FMA3 transform
Starting initialization...
Initialization complete (0.002 seconds).
Estimated time remaining for 123456^8192+1 is 0:00:04
Testing 123456^8192+1... 126976 steps to go (0:00:06 remaining)
^C caught. Writing checkpoint.
C:\PRPNet\prpclient-5.3.1-windows\prpclient-1>genefer64 -q "123456^8192+1"
genefer 3.2.5 (Windows/CPU/64-bit)
Supported transform implementations: fma3 avx-intel sse4 sse2 default x87
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefer64 -q 123456^8192+1
Priority change succeeded.
Testing 123456^8192+1...
Using FMA3 transform
Resuming 123456^8192+1 from a checkpoint (124666 iterations left)
Estimated time remaining for 123456^8192+1 is 0:00:03
123456^8192+1 is composite. (RES=85a840e32f44031b) (41710 digits) (err = 0.0005) (time = 0:00:08) 08:47:12
Without checkpointing:
C:\PRPNet\prpclient-5.3.1-windows\prpclient-4>genefer64 -q "123456^8192+1"
genefer 3.2.5 (Windows/CPU/64-bit)
Supported transform implementations: fma3 avx-intel sse4 sse2 default x87
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefer64 -q 123456^8192+1
Priority change succeeded.
Testing 123456^8192+1...
Using FMA3 transform
Starting initialization...
Initialization complete (0.002 seconds).
Estimated time remaining for 123456^8192+1 is 0:00:04
123456^8192+1 is composite. (RES=85a840e32f44031b) (41710 digits) (err = 0.0005) (time = 0:00:07) 08:46:26
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
Looking in the database, I do not see an unusually high number of invalid or inconclusive tasks. We tend to get a lot of those due to GPU overclocking errors, so the numbers are higher than they are for LLR, but the numbers are in line from what I'm used to seeing with GFN.
If this bug is causing errors, it's not causing them very often.
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
It seems to checkpoint and restore correctly... [...]
Looking in the database, I do not see an unusually high number of invalid or inconclusive tasks. We tend to get a lot of those due to GPU overclocking errors, so the numbers are higher than they are for LLR, but the numbers are in line from what I'm used to seeing with GFN.
If this bug is causing errors, it's not causing them very often.
I checked the "last 3/4" that was allocated and not initialized. All these values are zero, on Windows with gcc 4.9.2 or with VS 2013.
I don't see any reason for that but it's lucky and that hides the bug.
We will need to thoroughly test all the different builds and apps before putting them into production, especially since we don't currently seem to have a problem in production.
I agree. How a new app is tested? Residues are compared with app in production?
New app should stop and restart at some time if we want to detect this sort of bug. | |
|
|
Yves, thanks for the fix. I think the code is much more self-consistend now.
If I followed things correctly both the read and write checkpoint routines would read/write the size (number of bytes) of a, and then read/write that number of bytes of data, so the data in the checkpoint files should always have been OK.
However, when the data read from the checkpoint is used to initialise the GFN 'Integer' in SetA, then we created an array with too many elements (4x as many), and also the 'len' which is stored is now the size rather than the number of elements. The memcpy is safe as we only copy 'alen' bytes into an array of 'alen*sizeof(unsigned int)' bytes.
So I don't see a correctness issue here, we are just copying around 3x as much junk data as is needed. Also, the problem that valgrind is complaining about is that we do touch uninitialised data. But I don't think that the junk data actually affects anything that goes on in the lower (valid) bytes of the array, right Yves?
If I missed something and there is really a chance of generating incorrect results please let me know!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
So I don't see a correctness issue here, we are just copying around 3x as much junk data as is needed. Also, the problem that valgrind is complaining about is that we do touch uninitialised data. But I don't think that the junk data actually affects anything that goes on in the lower (valid) bytes of the array, right Yves?
If I missed something and there is really a chance of generating incorrect results please let me know!
The result can be incorrect:
The len of 'a' was set to a-size = 4 * a-len (and not to a-len, here is the bug).
Then the function "Norm" shrinked 'a' to its correct size only if the last 3/4 of the array is zero.
It is the function "Norm" that read uninitialised data and then may generate a wrong result.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
I checked the "last 3/4" that was allocated and not initialized. All these values are zero, on Windows with gcc 4.9.2 or with VS 2013.
I don't see any reason for that but it's lucky and that hides the bug.
Possibly not luck. The OS or the API may zero it.
I wonder if Linux/Mac also has zeros or not?
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
I also tested the 3.2.5 app under linux, and it seems to work fine.
Furthermore, I tested both linux and Windows with known PRPs, just in case there's an error in the residue beyond the 16 digits that are displayed. I can't observe any malfunctions.
I can't test the Mac app.
At least under Linux and Windows, it appears that the memory is zeroed out, so the program works despite the error. It probably fails under Valgrind because I imagine Valgrind is initializing the allocated memory to a non-zero pattern.
If my assumptions are correct, then at least on Windows and Linux the app works correctly despite the bug -- unless you're running Valgrind.
____________
My lucky number is 75898524288+1 | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
At least under Linux and Windows, it appears that the memory is zeroed out, so the program works despite the error. It probably fails under Valgrind because I imagine Valgrind is initializing the allocated memory to a non-zero pattern.
If my assumptions are correct, then at least on Windows and Linux the app works correctly despite the bug -- unless you're running Valgrind.
Correct, to a degree; the O/S's behaviour can't be depended upon for program robustness. The O/S may initially hand the program zeroed memory pages as a security feature, but the program is at its own mercy after it starts to repeatedly allocate, write into, and free memory. You should be able to whip up a short program that demonstrates this to your satisfaction.
What saves the program in a quick test run is that the checkpoint file is read only once. A long-running program may behave differently.
| |
|
|
According to my tests on Mac, the new 'na' array always contains zeros. However, as has already been correctly stated, this is not guaranteed, it only happens to be the case because this is likely to be new memory which has never before been allocated by the program. Since we only call read_checkpoint / SetA once (or not at all, for a first time run), it is unlikely but not impossible that we would get non-zeroed memory. This likely explains why we never saw the bug manifest itself (except when valgrind initialises directly with non-zero values).
I don't know how confident to be that the bug never occurred in practice. In theory, all tests which made a checkpoint restart (not including tests which were 'suspended in memory' in BOINC) are potentially exposed. The bug was introduced when the checkpoint file was modified to include the 'a' array in genefer version 3.1.0 (2013-03-04). Thoughts?
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
Thoughts?
- Iain
Bug? Absolutely. It needs to be fixed. Emergency? No.
Two reasons:
1) If the calculation gets corrupted, it's likely to exceed maxErr and error out. We don't see that happening at an accelerated rate. If anything, error rates seem to have gone down.
2) As mentioned before, we're not seeing an unusual rate of inconclusive tests on BOINC. It seems nearly impossible for two to computers both get a corrupted result, not trip maxErr, and somehow get the same exact corrupted result that was caused by random contents of uninitialized memory.
On BOINC, therefore, I'm not worried at all. Everything is doublechecked. The PRPNet searches are more troublesome because we're not double checking -- but since there's no evidence of this ever causing a bad result in real life, I suspect that although the error rate may be greater than 0, it's likely still less than the general error rate (especially on GPUs!!!!). Eventually I want to double check all of the PRPNet GFN searches. I know there's lots of errors in there, so this is something that will be done eventually.
That's why I'm not panicking. It's also why we shouldn't rush this into production. The empirical evidence is that this isn't causing a problem in production. Lucky? Absolutely. I'll take lucky any day.
What would be bad is if we rushed 3.2.6 into production, fixing a bug that was not causing problems, and broke something else that did cause trouble. So let's regression test the new builds, make sure they're working correctly, and then it's a simple step to put them into production. There's a big gap between the January and March challenges, so people won't have the computers dedicated to those and will be free to test.
I'm sure composite will be first in line to test the new linux build of genefer64, which will be very helpful. Getting the linux and Mac builds tested is always problematic.
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
1) If the calculation gets corrupted, it's likely to exceed maxErr and error out. We don't see that happening at an accelerated rate. If anything, error rates seem to have gone down.
I don't think so: the exponent is corrupted, not the computation.
If a corrupted result occurred, 2^x (mod p) was checked rather than 2^(p-1) (mod p).
But I'm not worried at all too because if the result was corrupted on another computer then 2^y (mod p) was tested with x != y and then residues were different because of random contents of uninitialized memory.
I agree that the 3.2.6 should be extensively tested before being in production.
We can compare a set of residues of some short and WR tasks computed with:
- an old version of geneferCUDA (without "Integer.h", the version that found 75898^524288+1 is a good choice).
- the 3.2.6, FMA3, FMA4, AVX-Intel, AVX-AMD, SSE4, SSE2, OpenCL, CUDA on Windows, Linux and MAC. The programs should be stopped and restarted during the computations. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
2) As mentioned before, we're not seeing an unusual rate of inconclusive tests on BOINC. It seems nearly impossible for two to computers both get a corrupted result, not trip maxErr, and somehow get the same exact corrupted result that was caused by random contents of uninitialized memory.
Ay, there's the rub: corruption is not random.
When the O/S provides only zeroed memory pages to the program for new memory allocations, then the only source of memory corruption is the program itself, and that's deterministic. Two computers running the same version will have identical corrupted residues. What matters is whether the residue from the fixed version matches the residue from the buggy version.
What is the exact cause of incorrect residues in GPU versions with overclocking? Is it bad computation in GPU processors, bad synchronization as GPU processors read/write GPU RAM, or bad synchronization as the CPU reads/writes GPU RAM? If RAM is implicated, then merely using 4 times the amout increases the risk of non-deterministic corruptions, especially for GPUs that have borderline stability which are sensitive to small temperature changes. The effect on error rate of fixing the bug will be miniscule if
1) errors predominantly occur within GPU arithmetic units, or
2) the proportion of GPUs running with borderline stability is small
As not not tripping maxErr, I believe that's a function of the rounding precision of individual elements of the array. You are more likely to trip maxErr when the array is longer. So this affects the b limit, and the bug won't be observable on tests that are orders of magnitude smaller. Has anyone checked the b limit? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
I'm not denying that the potential for errors doesn't exist. Show me a single, repeatable scenario (not involving Valgrind or other artificial modifications of the run time environment) where two different runs of Genefer produce identical erroneous results. For that matter, I have yet to see a single instance where this bug has produced an erroneous result. At the same time, there's plenty of evidence to that Genefer works correctly most of the time -- and possibly all of the time.
Show me a real failure scenario. Then we'll talk about what kind of remediation is necessary, up to and including temporarily shutting down all GFN testing and restesting prior work.
Unless someone can demonstrate that we have a real problem as opposed to a theoretical problem, the plan is to fix genefer, test the new version, and put it into production as quickly as possible. But not before it's thoroughly tested. Putting insufficiently tested software into production is a mistake that's bitten us several times before. I don't want to trade a theoretical problem for a real problem. The fact that this software has already been in production for a while also has affects the decision: If there's a mess we need to clean up, another month won't significantly change the magnitude of the cleanup.
____________
My lucky number is 75898524288+1 | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
Has anyone checked the b limit?
Answering my own question - we don't have to: the b limit won't change, as the benchmark task is never checkpointed.
On the other hand, sufficiently long normal tests are checkpointed repeatedly. I've tried 8274638^32768+1 which uses all the CPU transforms and I can't get a "real" failure. This is good news.
In one instance of the test, I started with the new version and immediately stopped it before it could switch from SSE4 to SSE2. It wrote a checkpoint file which I copied to another directory that has the old version of genefer. Then then I ran the new version in one directory and the old version in the other. Both versions starting from the same checkpoint had identical behaviour and ended with the same residue.
And these are not even exactly the same kind of executables. The old version I used is the prebuilt 3.2.5 downloaded from http://uwin.mine.nu/PRPNet/ and the new one has my changes in Makefile_linux64 for linking to dynamically loaded (.so) shared libraries, including the BOINC libraries installed from the libboinc-app7 Debian package in the Linux Mint repo. The dynamic executable is 21% of the size of the statically linked version (523 KB vs 2.5 MB).
Digressing somewhat... how much would using shared libraries conserve CPU cache used by program code, especially from the system libraries? In the statically linked version, different processes use distinct memory pages for the same program code, and multiple copies of code wastes L3 cache. Better cache utilization improves CPU performance, more so for systems having slower RAM. It's also likelier to keep warm code alive in the L3 cache despite the torrent of matrix data moving through it. That's worth investigating.
Here are the shared libraries used by the dynamically linked version of genefer, with file sizes in KB:
ldd genefer_linux64 | grep "=>" | sed 's/.*=> //;s/ .*//' | xargs ls -1Hs
1704 /lib/x86_64-linux-gnu/libc.so.6
88 /lib/x86_64-linux-gnu/libgcc_s.so.1
1016 /lib/x86_64-linux-gnu/libm.so.6
132 /lib/x86_64-linux-gnu/libpthread.so.0
52 /usr/lib/libboinc_api.so.7
452 /usr/lib/libboinc.so.7
956 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
The main problem with shared libraries is that they may not be found on the system.
I often see this kind of error with the linux versions:
http://www.primegrid.com/result.php?resultid=592581943
The versions for Windows use some static versions of the run-time libraries to solve this problem. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
Digressing somewhat... how much would using shared libraries conserve CPU cache used by program code, especially from the system libraries?
Dynamically lined executables, while they might help with cache miss percentages, will hurt the project's overall efficiency. The error rate (part of the so-called "3 second bug") will go up because inevitably there will be more hosts where every task errors out.
We lose tasks. Then we lose participants. The admins lose lots of time helping users fix the problems.
For this reason, it's a non-negotiable requirement that all apps be statically linked. Dynamically linked apps are our kryptonite. They are evil. :) (Unless there's no choice, such as with GPU libraries.)
____________
My lucky number is 75898524288+1 | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
Those are really good reasons for sticking with statically linked executables for BOINC apps. As said about many things, when you decide to use BOINC to solve a problem, "now you have two problems".
I used to have BOINC problems with GPU libraries too. Mine disappeared when Mike posted 64-bit versions of BOINC apps for Linux. You can see remnants of what didn't work in the 32-bit app:
# ldd /var/lib/boinc-client/projects/www.primegrid.com/primegrid_genefer*i686*
/var/lib/boinc-client/projects/www.primegrid.com/primegrid_genefer_3_2_3_0_3.03_i686-pc-linux-gnu__cudaGFN:
linux-gate.so.1 (0xf76e0000)
libcufft.so.5.5 => not found
libcudart.so.5.5 => not found
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf75ca000)
libpthread.so.0 => /lib/i386-linux-gnu/i686/cmov/libpthread.so.0 (0xf75af000)
libm.so.6 => /lib/i386-linux-gnu/i686/cmov/libm.so.6 (0xf756c000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf7550000)
libc.so.6 => /lib/i386-linux-gnu/i686/cmov/libc.so.6 (0xf73a0000)
/lib/ld-linux.so.2 (0xf76e1000)
And what works in the 64-bit apps:
# ldd /var/lib/boinc-client/projects/www.primegrid.com/primegrid_genefer*x86_64*
/var/lib/boinc-client/projects/www.primegrid.com/primegrid_genefer_3_2_3_0_3.03_x86_64-pc-linux-gnu__cudaGFN:
linux-vdso.so.1 (0x00007fff8c7fe000)
libcufft.so.5.5 => /usr/lib/x86_64-linux-gnu/libcufft.so.5.5 (0x00007f9702bfb000)
libcudart.so.5.5 => /usr/lib/x86_64-linux-gnu/libcudart.so.5.5 (0x00007f97029ae000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f97026aa000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f970248e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9702190000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f9701f79000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9701bcd000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f97019c9000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f97017c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9707747000)
/var/lib/boinc-client/projects/www.primegrid.com/primegrid_genefer_3_2_5_0_3.05_x86_64-pc-linux-gnu__OCLcudaGFN:
linux-vdso.so.1 (0x00007fff97bfe000)
libOpenCL.so.1 => /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f3ec9622000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f3ec931f000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3ec9102000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3ec8e04000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3ec8bee000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3ec8841000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3ec863d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3ec9852000)
/var/lib/boinc-client/projects/www.primegrid.com/primegrid_genefer_3_2_5_0_3.05_x86_64-pc-linux-gnu__OCLcudaGFNWR:
linux-vdso.so.1 (0x00007fff64fca000)
libOpenCL.so.1 => /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007ff70f5c6000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff70f2c3000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff70f0a6000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff70eda8000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff70eb92000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff70e7e5000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff70e5e1000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff70f7f6000)
Yves and Mike succinctly identified the two problems with dynamic libraries in Linux (DLLs in Windows).
One problem is that most Linux users are not skilled Linux system administrators. They lack knowledge about identifying and solving system configuration problems, the most basic of which is not having required packages installed from a distro's package repositories. In some distros the repo simply doesn't have required packages and manual installation is needed, particularly GPU drivers. GPU driver and O/S upgrades are especially problematic. I've had tense occasions without a working video driver, which finally caused me to buy NVIDIA after being a long-time ATI user, and it also drove me away from Ubuntu to Linux Mint.
The second problem is the bewildering array of Linux distros, each with their own mix of software versions in their package repos. The builder of Linux apps has the unenviable decision of selecting library versions to require, and it usually comes down to whatever is installed now on the build platform. Inevitably this doesn't match some user's configuration and there are task failures, as in the example Yves shows.
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
You can see remnants of what didn't work in the 32-bit app:
[...] it also drove me away from Ubuntu to Linux Mint.
According to https://developer.nvidia.com/cuda-toolkit-55-archive
the 32-bit version of CUDA is available only on Ubuntu...? | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
I had a pure 64-bit Ubuntu system, and added the 32-bit Linux compatibility libraries because BOINC demanded it (at the time). But that didn't work with the 64-bit video driver installation. I haven't done the housekeeping to clean out the 32-bit packages yet.
EDIT: The PimeGrid apps demanded it, not BOINC itself. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
I fetched the update, and I see the x87 transform SetA method remained unchanged. Is that a mistake, or is there a reason for that? | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
I fetched the update, and I see the x87 transform SetA method remained unchanged. Is that a mistake, or is there a reason for that?
It's a mistake, x87 transform is not included in VS 2013 solution and I forget it.
The fix is committed. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
Is __USE_CPU_INIT now the only approved method for 3.2.6? The alternative doesn't work in CUDASetA because now alen is not defined. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
Is __USE_CPU_INIT now the only approved method for 3.2.6? The alternative doesn't work in CUDASetA because now alen is not defined.
It was another mistake.
Sorry :o( Fixed now. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
Where do we stand with this? Should we be doing production builds and starting testing?
____________
My lucky number is 75898524288+1 | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,532,551 RAC: 5,333

|
Where do we stand with this? Should we be doing production builds and starting testing?
Yes we can. | |
|
|
I've posted 3.2.6-dev builds on the SVN (waiting for Linux, hopefully not long).
Mac: https://www.assembla.com/code/genefer/subversion/nodes/776/trunk/bin/mac
Windows: https://www.assembla.com/code/genefer/subversion/nodes/776/trunk/bin/windows
Linux: https://www.assembla.com/code/genefer/subversion/nodes/777/trunk/bin/linux
Testing protocol to come shortly.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
I've downloaded your Linux 3.2.6 executables (rev 777) (cuda 5.5, 6.5, and OCL, all 64-bit) if I can help. LMK. Ubuntu 14.04, on a gtx770.
G | |
|
|
I have posted a google sheet containing details of all the candidates and combinations of hardware that we want to test :
https://docs.google.com/spreadsheets/d/1StsmFX5Vomr5i6OCYxdvIlriD09gS55pKFPXSBTT4oU
There are quite a lot of tests in total, so anyone who is able to help - please visit the google sheet, reserve yourself some tests and start running! Please note the instruction to do at least one checkpoint/restart during every test.
The binaries for testing are available from:
Mac: https://www.assembla.com/code/genefer/subversion/nodes/776/trunk/bin/mac
Windows: https://www.assembla.com/code/genefer/subversion/nodes/776/trunk/bin/windows
Linux: https://www.assembla.com/code/genefer/subversion/nodes/777/trunk/bin/linux
Thanks in advance for your help! Hopefully it won't take to long before we can make the public release - the code should be slightly faster than the previous version.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1963 ID: 352 Credit: 6,410,580,365 RAC: 2,769,531
                                      
|
Just to be sure, is it mandatory to run - for example - sse2 test on sse2 hardware?
____________
My stats | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
Just to be sure, is it mandatory to run - for example - sse2 test on sse2 hardware?
No.
____________
My lucky number is 75898524288+1 | |
|
|
Please note, I have also added columns to note whether you have tested using the 32 bit or 64 bit executable. Can those who already ran tests please go back and fill these in. We do not need to run every test with both binaries, but want to cover both on every platform.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Also added a grid for testing integration with PRPNet and BOINC (with app_info.xml).
Thanks everyone for all your help so far!
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Updated the BOINC/PRPNet testing table to make sure all the apps that we will use are tested.
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Artist Volunteer tester Send message
Joined: 29 Sep 08 Posts: 88 ID: 29825 Credit: 413,280,050 RAC: 73,185
                         
|
This task has the expected result but the calculation of the remaining time seems to be buggy.
$ ./genefer_linux32 -x sse2 -q 773620^262144+1
genefer 3.2.6-dev (Linux/CPU/32-bit)
Supported transform implementations: fma3 avx-intel sse4 sse2 default x87
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: ./genefer_linux32 -x sse2 -q 773620^262144+1
Priority change succeeded.
Testing 773620^262144+1...
Using SSE2 transform
Resuming 773620^262144+1 from a checkpoint (5049346 iterations left)
Estimated time remaining for 773620^262144+1 is 4:29:39
Testing 773620^262144+1... 3346432 steps to go (0:38:23 remaining)
Testing 773620^262144+1... 3321856 steps to go (0:40:08 remaining)
Testing 773620^262144+1... 3284992 steps to go (0:42:33 remaining)
Testing 773620^262144+1... 3235840 steps to go (0:45:23 remaining)
Testing 773620^262144+1... 3153920 steps to go (0:49:28 remaining)
Testing 773620^262144+1... 3035136 steps to go 2 remaining)
Testing 773620^262144+1... 2916352 steps to go
Testing 773620^262144+1... 2596864 steps to go
Testing 773620^262144+1... 1871872 steps to go (0:11:09 remaining)
Testing 773620^262144+1... 1089536 steps to go 8 remaining)
Testing 773620^262144+1... 888832 steps to go
Testing 773620^262144+1... 503808 steps to go (0:01:05 remaining)
Testing 773620^262144+1... 405504 steps to go (0:01:15 remaining)
773620^262144+1 is a probable prime. (1543643 digits) (err = 0.1875) (time = 4:13:55) 15:30:43 | |
|
|
Hi Artist, was that timing problem repeatable? I don't really know what to make of it...
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
We are still in need of testers who have:
CUDA and/or OpenCL devices on Linux
CUDA devices on Mac
If you can help with testing, please post here or PM me. It's not too hard to run the tests, and we can help!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
I will attempt the Linux CUDA tests this afternoon. I can give the BOINC CUDA/OCL a shot (both on an Nvidia GTX580) if the initial tests work out, but I don't know how to set up an appinfo for Primegrid. Help, please?
____________
Eating more cheese on Thursdays. | |
|
|
After finally getting 32bit Linux and nVidia/CUDA/OpenCL drivers to install (I now remember why I prefer ATI/AMD graphics on my linux systems...), the final test is in progress and I'll check it in the morning. I am using the cuda55 version to test with, I tried getting cuda32 to run, but the necessary files aren't included in the latest cuda dev packages. I can try to get this working tomorrow or Saturday if needed.
I will note that on my GTX 580, the OpenCL version takes roughly 2/3 the time of the CUDA version to complete the first 2 tests.
____________
Eating more cheese on Thursdays. | |
|
|
Hi Grebuloner,
Thank for running the Linux tests! If it's possible for you also to do the tests under BOINC that would be great.
To set up an app_info, the basics that you need are as follows:
1) Make sure you have no outstanding tasks running, they will be abandonded.
2) Shut down your BOINC client
3) in your BOINC_DIR/projects/www.primegrid.com put the following app_info.xml file:
<app_info>
<app>
<name>genefer</name>
<user_friendly_name>Genefer</user_friendly_name>
</app>
<file_info>
<name>genefercuda_linux32_cuda55</name>
<executable/>
</file_info>
</app_version>
<app_version>
<app_name>genefer</app_name>
<version_num>007</version_num>
<api_version>6.10.25</api_version>
<file_ref>
<file_name>genefercuda_linux32_cuda55</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
I think you need to set the permissions on the file to be the same ownership/group as the other files in the directory
4) Copy in the linux binary you want to test to the same directory
5) Set your project prefs on the PG site to send you Genefer (CPU) work. The app info is set up to pose as a CPU app, as this is a little easier to configure.
6) Restart the BOINC client, and request new work from the project.
7) Your apps should start running and have the 'Application' label 'Anonymous Platform'
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Thank you, Iain! The first BOINC task is just starting, and when I get home from work tonight I'll change the app over to OCL and get a second one in.
____________
Eating more cheese on Thursdays. | |
|
|
Thanks, that's been a great help!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 478,633,013 RAC: 374,582
                               
|
I just put 3.2.6 into production on BOINC as 3.07. Let us know if anything's not working correctly.
____________
My lucky number is 75898524288+1 | |
|
|
I just put 3.2.6 into production on BOINC as 3.07. Let us know if anything's not working correctly.
And thanks to all of you who contributed to the testing, it's much appreciated!
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Strange slowdown on I have a mac pro 4,1 with an r9 280x in it (http://www.primegrid.com/show_host_detail.php?hostid=475557), and an old quad core i7 running windows, also with a 280x in it (http://www.primegrid.com/show_host_detail.php?hostid=478715).
The i7 seems to have done ok with the updated version:
609776452 417260915 24 Feb 2015 | 7:39:51 UTC 27 Feb 2015 | 0:21:33 UTC Completed, waiting for validation 207,446.93 8,330.11 pending Genefer (World Record) v3.07 (atiGFNWR)
609075192 417249133 21 Feb 2015 | 17:50:43 UTC 24 Feb 2015 | 5:24:17 UTC Completed, waiting for validation 205,887.41 8,262.92 pending Genefer (World Record) v3.07 (atiGFNWR)
608174925 417221652 19 Feb 2015 | 4:37:34 UTC 21 Feb 2015 | 15:31:48 UTC Completed, waiting for validation 206,113.43 8,435.72 pending Genefer (World Record) v3.07 (atiGFNWR)
607132296 417221584 15 Feb 2015 | 15:48:11 UTC 19 Feb 2015 | 6:16:36 UTC Completed, waiting for validation 205,349.62 9,116.70 pending Genefer (World Record) v3.07 (atiGFNWR)
606735461 417201233 13 Feb 2015 | 18:38:40 UTC 16 Feb 2015 | 16:18:35 UTC Completed and validated 208,006.03 8,183.44 629,966.70 Genefer (World Record) v3.07 (atiGFNWR)
606239148 417107778 11 Feb 2015 | 12:41:19 UTC 14 Feb 2015 | 6:22:02 UTC Completed and validated 207,590.63 7,933.66 628,719.65 Genefer (World Record) v3.06 (atiGFNWR)
605035091 417147631 8 Feb 2015 | 7:26:56 UTC 11 Feb 2015 | 20:37:05 UTC Completed and validated 207,792.49 6,757.72 629,336.28 Genefer (World Record) v3.06 (atiGFNWR)
Not so with the mac:
606684846 417201244 14 Feb 2015 | 9:05:04 UTC 21 Feb 2015 | 23:00:40 UTC Completed and validated 649,986.67 27,239.87 629,969.57 Genefer (World Record) v3.07 (openclGFNWRMAC)
606227125 416913061 11 Feb 2015 | 9:42:35 UTC 14 Feb 2015 | 9:20:47 UTC Completed and validated 256,539.17 24,888.08 627,232.32 Genefer (World Record) v3.05 (openclGFNWRMAC)
605173845 417154291 8 Feb 2015 | 8:51:59 UTC 11 Feb 2015 | 10:03:12 UTC Completed and validated 257,357.95 24,750.17 629,371.15 Genefer (World Record) v3.05 (openclGFNWRMAC)
602710163 417000286 30 Jan 2015 | 11:14:30 UTC 2 Feb 2015 | 11:23:17 UTC Completed and validated 258,671.68 24,908.65 627,781.02 Genefer (World Record) v3.05 (openclGFNWRMAC)
602005390 416498084 27 Jan 2015 | 10:00:56 UTC 30 Jan 2015 | 11:32:05 UTC Completed and validated 261,088.38 25,024.37 625,347.09 Genefer (World Record) v3.05 (openclGFNWRMAC)
596750227 416315557 2 Jan 2015 | 7:55:58 UTC 5 Jan 2015 | 11:41:03 UTC Completed and validated 268,710.61 21,861.26 624,665.28 Genefer (World Record) v3.05 (openclGFNWRMAC)
595992868 417263805 21 Feb 2015 | 23:51:07 UTC 15 Mar 2015 | 0:51:07 UTC In progress --- --- --- Genefer (World Record) v3.07 (openclGFNWRMAC)
Any ideas, with completion time more than doubling? | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,213,267,175 RAC: 1,197,593
                        
|
Van Zimmerman wrote: Strange slowdown [...]
Any ideas, with completion time more than doubling?
Did the FFT size increase in the slower tests? | |
|
|
Van Zimmerman wrote: Strange slowdown [...]
Any ideas, with completion time more than doubling?
Did the FFT size increase in the slower tests?
Firstly, there is no FFT size jump in Genefer. The FFT length is fixed for a given N (which is the reason why there is a b limit).
Anyway, I can't think of any reason why the 3.07 app would be any slower than 3.05 - the OpenCL transform code did not change at all. I ran a quick test on my Mac (with FirePro D700), and found that the estimated runtime was about the same for both versions, using the candidate that you tested:
./primegrid_genefer_3_2_5_0_3.05_i686-apple-darwin__openclGFNWRMAC -q 41220^4194304+1
...
Running on platform 'Apple', device 'ATI Radeon HD - FirePro D700 Compute Engine', version 'OpenCL 1.2 ' and driver '1.2 (Sep 28 2014 22:27:20)'.
...
Initialization complete (23.906 seconds).
Estimated time remaining for 41220^4194304+1 is 219:15:42
./primegrid_genefer_3_2_6_0_3.07_i686-apple-darwin__openclGFNWRMAC -q 41220^4194304+1
...
Initialization complete (24.019 seconds).
Estimated time remaining for 41220^4194304+1 is 219:15:22
As we know (and is confirmed by your task logs) the estimate is quite precise. I suspect (and this will be confirmed if you run further GFN-WR tasks), that something weird was happening on your machine at that time e.g. something else was running that was using the GPU, or perhaps your CPU was being kept busy so it couldn't keep the GPU fed? It might be worth rebooting the box just to clear it up. You could also run the same test as I did - just download the relevant apps from http://www.primegrid.com/download, and test manually.
So I think/hope this is just a one-off glitch. Especially as I have been using the new app with no problem, and the same for your Windows box. Let me know if it persists!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Message boards :
Generalized Fermat Prime Search :
Genefer 3.2.5 and 3.2.6 testing |