Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Genefer 3.2.9 testing
Author |
Message |
|
Hi all,
We are getting ready to release Genefer version 3.2.9, which has a couple of new features:
- OCL2 and OCL3 extended B-limit transforms
- Progress trickle reporting for BOINC
Builds for testing are available from the SVN:
Mac: https://www.assembla.com/code/genefer/subversion/nodes/939/trunk/bin/mac
Windows: https://www.assembla.com/code/genefer/subversion/nodes/939/trunk/bin/windows
Linux: https://www.assembla.com/code/genefer/subversion/nodes/939/trunk/bin/linux
A spreadsheet to record testing progress is available here:
https://docs.google.com/spreadsheets/d/1wbJOyqZP7oJ8k4bcJKNUfU8v5U3EGx-xNB8clqVcX84
Please read the instructions, reserve and execute any tests that you can. There are 2 kinds of tests available:
- Manual tests of all the builds on all platforms for two small (n=16, n=17) candidates
- BOINC tests of all apps on all platforms (using app_info.xml)
Although the transform implementations have not changed apart from OCL2 and OCL3, there have been various changes to the build environment and we'd like to check everything works before releasing on BOINC.
Thanks
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime!
| |
|
|
We are aware of an issue where the Mac OCL2 app fails on ATI hardware (the OpenCL compiler returns an error), however I have tested it and found it to work on an Intel HD integrated graphics card.
If anyone has access to a Mac with ANY graphics card can you please try the OpenCL apps (esp OCL2). We need to understand if the app works on Nvidia hardware, and if it works on any ATI cards at all.
It's enough to run ./geneferocl2 -t and check that some primes are reported.
Please post your results here, thanks!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Command line: ./geneferocl2_macintel -t
Priority change succeeded.
Running tests for transform implementation "OCL2"
Testing 9999924^32+1...
Using OCL2 transform
Running on platform 'Apple', device 'ATI Radeon HD 5770', version 'OpenCL 1.2 ' and driver '1.2 (Aug 28 2015 19:26:19)'.
Error: build program failed.
Error returned by cvms_element_build_from_source
Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE. | |
|
|
Command line: ./geneferocl2_macintel -t
Priority change succeeded.
Running tests for transform implementation "OCL2"
Testing 9999924^32+1...
Using OCL2 transform
Running on platform 'Apple', device 'GeForce GTX 680', version 'OpenCL 1.2 ' and driver '10.5.2 346.02.03f01'.
Starting initialization...
Initialization complete (0.000 seconds).
9999924^32+1 is a probable prime. (224 digits) (err = 0.0000) (time = 0:00:22) 10:29:45
Testing 9999822^64+1...
Using OCL2 transform
Starting initialization...
Initialization complete (0.000 seconds).
Estimated time remaining for 9999822^64+1 is 0:00:00
9999822^64+1 is a probable prime. (448 digits) (err = 0.0001) (time = 0:00:01) 10:29:46
Testing 8999972^128+1...
Using OCL2 transform
Starting initialization...
Initialization complete (0.000 seconds).
Estimated time remaining for 8999972^128+1 is 0:00:02
8999972^128+1 is a probable prime. (891 digits) (err = 0.0001) (time = 0:00:04) 10:29:50
| |
|
|
Command line: ./geneferocl2_macintel -t
Priority change succeeded.
Running tests for transform implementation "OCL2"
Testing 9999924^32+1...
Using OCL2 transform
Running on platform 'Apple', device 'HD Graphics 4000', version 'OpenCL 1.2 ' and driver '1.2(Jul 29 2015 02:40:37)'.
Starting initialization...
Initialization complete (0.000 seconds).
Testing 9999924^32+1... 743 steps to go
maxErr exceeded for 9999924^32+1, 0.5000 > 0.4500 during final check | |
|
|
Command line: ./geneferocl2_macintel -t
Priority change succeeded.
Running tests for transform implementation "OCL2"
Testing 9999924^32+1...
Using OCL2 transform
Running on platform 'Apple', device 'ATI Radeon HD Tahiti XT Prototype Compute Engine', version 'OpenCL 1.2 ' and driver '1.2 (Jul 29 2015 02:43:15)'.
Error: build program failed.
Error returned by cvms_element_build_from_source
Error: OpenCL error detected: CL_BUILD_PROGRAM_FAILURE.
| |
|
|
With OCL2 I only had success on a gtx 680 (results in separate posts). | |
|
|
OCL3 appears to work on a 5770:
Command line: ./geneferocl3_macintel -t
Priority change succeeded.
Running tests for transform implementation "OCL3"
Testing 100014^32+1...
Using OCL3 transform
Running on platform 'Apple', device 'ATI Radeon HD 5770', version 'OpenCL 1.2 ' and driver '1.2 (Aug 28 2015 19:26:19)'.
Starting initialization...
Initialization complete (0.000 seconds).
100014^32+1 is a probable prime. (161 digits) (err = 0.0000) (time = 0:00:03) 10:42:18
Testing 100234^64+1...
Using OCL3 transform
Starting initialization...
Initialization complete (0.000 seconds).
Estimated time remaining for 100234^64+1 is 0:00:00
100234^64+1 is a probable prime. (321 digits) (err = 0.0000) (time = 0:00:00) 10:42:18
| |
|
|
geneferocl appears to work on an r9 280x
Command line: ./geneferocl_macintel -t
Priority change succeeded.
Running tests for transform implementation "OCL"
Testing 100234^64+1...
Using OCL transform
Running on platform 'Apple', device 'ATI Radeon HD Tahiti XT Prototype Compute Engine', version 'OpenCL 1.2 ' and driver '1.2 (Jul 29 2015 02:43:15)'.
Starting initialization...
Initialization complete (0.000 seconds).
Estimated time remaining for 100234^64+1 is 0:00:00
100234^64+1 is a probable prime. (321 digits) (err = 0.0000) (time = 0:00:04) 10:41:19
| |
|
|
Thanks Van Zimmerman,
So OCL2 also fails on your ATI card (which is a different model to mine) with the same error that I saw.
I see that you get a round-off error on the HD 4000, but that runs OK on my Iris Pro.
Could you take the Mac/Nvidia tests in the testing spreadsheet?
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
The question was specifically about OCL2. We know there's a problem with OCL2 and want to see how widespread it is. OCL and OCL3 aren't affected.
No need to post "OCL works" or "OCL3 works". We know. It will just clutter up the thread and make it harder to find the information about OCL2. Thank you!
(EDIT: I'm pointing this out not to criticize you for trying to be thorough and helpful. I want to prevent 100 other posts saying "OCL works!" by everyone else. It's just OCL2 on the Mac that's problematic.)
____________
My lucky number is 75898524288+1 | |
|
|
Will do.
Specifically OCL2 failed on 2 ATI cards, a 5770 and an r9 280x. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Will do.
Specifically OCL2 failed on 2 ATI cards, a 5770 and an r9 280x.
And I was really hoping it was just Iain's machine. ::sigh::
While we're testing stuff -- do any of your Macs have both an integrated GPU and a discrete GPU? If yes, when you run any of the OCL apps (OCL, OCL2, or OCL3), which GPU does it run on?
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Hmmmm ...
Looking at the test matrix, it's probably not feasible to do the BOINC test of the 32 bit CPU app on an actual 32 bit CPU. Even with the deadline extension to 84 days for the n=21 tasks, which is about 2000 hours, most actual 32 bit CPUs can't complete the task that quickly. Mine will take about 3000 hours. Jim's is about the same speed. I think there are some 32 bit CPUs that have SSE2 which could probably do it within the deadline. It will still be very slow, but might be fast enough to make the deadline. I doubt it's worth the wait to use a real 32 bit CPU for the BOINC tests. We certainly don't want to be testing the apps for the next three months!
____________
My lucky number is 75898524288+1 | |
|
|
Will do.
Specifically OCL2 failed on 2 ATI cards, a 5770 and an r9 280x.
And I was really hoping it was just Iain's machine. ::sigh::
While we're testing stuff -- do any of your Macs have both an integrated GPU and a discrete GPU? If yes, when you run any of the OCL apps (OCL, OCL2, or OCL3), which GPU does it run on?
Not the ones that I have in reach at the moment, I can try it tonight. Although the discrete gpu is AMD/ATI. | |
|
|
I realized I may be doing something futile-testing the 32 bit windows binaries for AMD/ATI on a 64-bit machine. Should I stop? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
I realized I may be doing something futile-testing the 32 bit windows binaries for AMD/ATI on a 64-bit machine. Should I stop?
No, that's okay.
When we said "please test 32 bit apps on 32 bit cpus if possible" we were talking about the CPU apps.
It's expected that the 32 bit GPU apps (especially on Windows where they're all 32 bits) will be run on 64 bit systems.
It's exceptionally difficult to find 32 bit systems with usable GPUs.
____________
My lucky number is 75898524288+1 | |
|
|
Genefercuda on the mac appears to be hard coded to the 6.0 toolkit:
dyld: Library not loaded: @rpath/libcufft.6.0.dylib
Referenced from: /Users/vzimmerman/Downloads/test/cuda/./genefercuda_macintel
Reason: image not found
Trace/BPT trap: 5
I have libcufft.7.5.dylib
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Genefercuda on the mac appears to be hard coded to the 6.0 toolkit:
dyld: Library not loaded: @rpath/libcufft.6.0.dylib
Referenced from: /Users/vzimmerman/Downloads/test/cuda/./genefercuda_macintel
Reason: image not found
Trace/BPT trap: 5
I have libcufft.7.5.dylib
When used in BOINC, we ship the necessary dll's and libraries along with the rest of the app.
Please download the two libraries from https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/lib/mac and place them in the same folder as your executable. Those are the 6.0 libraries.
____________
My lucky number is 75898524288+1 | |
|
|
It was a bit more complex than that, but I got it working. Had to copy the files to /usr/lib and rename them to *.6.0.dylib (symlinking didn't seem to want to work, as did either the original files or the renamed files in the executable directory), but all is well now.
Genefercuda on the mac appears to be hard coded to the 6.0 toolkit:
dyld: Library not loaded: @rpath/libcufft.6.0.dylib
Referenced from: /Users/vzimmerman/Downloads/test/cuda/./genefercuda_macintel
Reason: image not found
Trace/BPT trap: 5
I have libcufft.7.5.dylib
When used in BOINC, we ship the necessary dll's and libraries along with the rest of the app.
Please download the two libraries from https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/lib/mac and place them in the same folder as your executable. Those are the 6.0 libraries.
| |
|
|
All three of the apps attempt to use the discrete (AMD 6770M) GPU. OCL3 successfully, OCL2 fails with the same error as with other attempts, and OCL doesn't find any usable devices.
Will do.
Specifically OCL2 failed on 2 ATI cards, a 5770 and an r9 280x.
And I was really hoping it was just Iain's machine. ::sigh::
While we're testing stuff -- do any of your Macs have both an integrated GPU and a discrete GPU? If yes, when you run any of the OCL apps (OCL, OCL2, or OCL3), which GPU does it run on?
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
All three of the apps attempt to use the discrete (AMD 6770M) GPU. OCL3 successfully, OCL2 fails with the same error as with other attempts, and OCL doesn't find any usable devices.
Okay, that's good. I was hoping it would preferentially try to use the discreet GPU because we don't currently on Macs have a way to tell it which type of GPU to use.
OCL doesn't see any suitable GPUs because it requires double precision hardware which is not available on the 6770M GPU.
____________
My lucky number is 75898524288+1 | |
|
|
We are short of testing on Linux - in particular 32 bit linux with NVidia or ATI GPUs and 64 bit linux with ATI. If you have either of these systems please help! If you have any question about what to do please post here or PM me. The manual tests should only take a few hours to complete.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
We are short of testing on Linux - in particular 32 bit linux with NVidia or ATI GPUs and 64 bit linux with ATI. If you have either of these systems please help! If you have any question about what to do please post here or PM me. The manual tests should only take a few hours to complete.
Cheers
- Iain
Working on the 32 bit tests now. If I have time before work I'll do the 64 bit as well (needs a fresh install). Probably won't be able to do any BOINC testing.
____________
Eating more cheese on Thursdays. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
We are short of testing on Linux - in particular 32 bit linux with NVidia or ATI GPUs and 64 bit linux with ATI. If you have either of these systems please help! If you have any question about what to do please post here or PM me. The manual tests should only take a few hours to complete.
Cheers
- Iain
Working on the 32 bit tests now. If I have time before work I'll do the 64 bit as well (needs a fresh install). Probably won't be able to do any BOINC testing.
Every bit helps, thanks!
____________
My lucky number is 75898524288+1 | |
|
|
32 bit Linux OCL tests done. Will try the 64s tonight. For the CUDA, can anyone tell me what packages (in ubuntu) are required to get CUDA going so that cufft*.so is located by the program? I've got 304 and 349 drivers installed (304 is the latest for easy OCL) and several other NV packages, but still get that error.
Since there was an error on my Windows OCL2 NV Boinc test, I'm rerunning it on two machines, the original and another. Since the spreadsheet is understandably locked for that cell, here are the links to the two tasks.
http://www.primegrid.com/result.php?resultid=651635241
http://www.primegrid.com/result.php?resultid=651635232
____________
Eating more cheese on Thursdays. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
32 bit Linux OCL tests done. Will try the 64s tonight. For the CUDA, can anyone tell me what packages (in ubuntu) are required to get CUDA going so that cufft*.so is located by the program? I've got 304 and 349 drivers installed (304 is the latest for easy OCL) and several other NV packages, but still get that error.
Since there was an error on my Windows OCL2 NV Boinc test, I'm rerunning it on two machines, the original and another. Since the spreadsheet is understandably locked for that cell, here are the links to the two tasks.
http://www.primegrid.com/result.php?resultid=651635241
http://www.primegrid.com/result.php?resultid=651635232
I *think* you can just download the appropriate libraries from our Assembla repositories and place them *I think* in the same directory from which you're running Genefer.
The links to the two libraries are:
32 bit:
https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/lib/linux/libcudart.so.5.5_32bit?_format=raw
https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/lib/linux/libcufft.so.5.5_32bit?_format=raw
64 bit:
https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/lib/linux/libcudart.so.5.5_64bit?_format=raw
https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/lib/linux/libcufft.so.5.5_64bit?_format=raw
EDIT: I think you need to rename those libraries to:
libcudart.so.5.5
libcufft.so.5.5
____________
My lucky number is 75898524288+1 | |
|
|
32 bit Linux OCL tests done. Will try the 64s tonight. For the CUDA, can anyone tell me what packages (in ubuntu) are required to get CUDA going so that cufft*.so is located by the program?
AS long as you have a CUDA-capable driver, I think you should be able to download the required runtime libs from the trunk/libs/linux directory in SVN, and pop them in the current directory. You might need to add the current directory to your LD_LIBRARY_PATH variable too.
Since there was an error on my Windows OCL2 NV Boinc test, I'm rerunning it on two machines, the original and another.
Thanks, let us know how you get on (and for all the other testing, of course).
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
If you saw my earlier post regarding Linux CUDA libraries before I edited it, the libraries need to be renamed (remove _32bit or _64bit).
____________
My lucky number is 75898524288+1 | |
|
|
Okay, thanks for the CUDA help, running now. Should anyone search and find this later (and that includes me), some helpful notes:
Driver 304 is the latest with built in OpenCL, but won't run Cuda 5.5 (346 will)
The libraries should be placed in the /lib/ folder to be easily found.
On the subject of the OCL2 tasks, it failed again on my main machine and is still running on the trusty 580. I'm giving it one last go with the lowest clocks I can set (being the Strix 980 ti OC model, that's still 1277 MHz boost on the core and 6600 on the memory) and the fans on 50% in case it is a clock/stability issue.
____________
Eating more cheese on Thursdays. | |
|
|
On the subject of the OCL2 tasks, it failed again on my main machine and is still running on the trusty 580. I'm giving it one last go with the lowest clocks I can set (being the Strix 980 ti OC model, that's still 1277 MHz boost on the core and 6600 on the memory) and the fans on 50% in case it is a clock/stability issue.
A 980 Ti was found to hit problems with core clock at 1535MHz but OK at 1500 here: http://www.primegrid.com/forum_thread.php?id=6391&nowrap=true#87809
What OC were you running when it failed?
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
On the subject of the OCL2 tasks, it failed again on my main machine and is still running on the trusty 580. I'm giving it one last go with the lowest clocks I can set (being the Strix 980 ti OC model, that's still 1277 MHz boost on the core and 6600 on the memory) and the fans on 50% in case it is a clock/stability issue.
A 980 Ti was found to hit problems with core clock at 1535MHz but OK at 1500 here: http://www.primegrid.com/forum_thread.php?id=6391&nowrap=true#87809
What OC were you running when it failed?
1360ish while under load, with all settings at the default OC clocks that the card ships with.
____________
Eating more cheese on Thursdays. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
On the subject of the OCL2 tasks, it failed again on my main machine and is still running on the trusty 580. I'm giving it one last go with the lowest clocks I can set (being the Strix 980 ti OC model, that's still 1277 MHz boost on the core and 6600 on the memory) and the fans on 50% in case it is a clock/stability issue.
Did it fail at the same place?
____________
My lucky number is 75898524288+1 | |
|
|
On the subject of the OCL2 tasks, it failed again on my main machine and is still running on the trusty 580. I'm giving it one last go with the lowest clocks I can set (being the Strix 980 ti OC model, that's still 1277 MHz boost on the core and 6600 on the memory) and the fans on 50% in case it is a clock/stability issue.
Did it fail at the same place?
It didn't. It failed much earlier than the last task. Current attempt is still going and has lasted twice as long so far. The AMD OCL2 test is still going strong at 19 hrs/11% (I may have to give up my WSS 3rd place goal!).
The OCL test it ran earlier just validated, so at least I know the host can produce good results.
____________
Eating more cheese on Thursdays. | |
|
|
I could do some BOINC testing on linux64 for ATI. Will need some help with the app_info.xml, though (I suspect my issue is with plan_class). My "closest" to success resulted in either a string of computation errors, or a single work unit sitting ready to start and not executing. | |
|
|
I could do some BOINC testing on linux64 for ATI. Will need some help with the app_info.xml, though (I suspect my issue is with plan_class). My "closest" to success resulted in either a string of computation errors, or a single work unit sitting ready to start and not executing.
Here is the app_info I'm using successfully on my Windows ATI machine. The file is pretty OS/platform agnostic, so changing the file names to their linux equivalents (preceded by ./ I believe) should hopefully work for you.
<app_info>
<app>
<name>genefer</name>
<user_friendly_name>genefer</user_friendly_name>
</app>
<file_info>
<name>geneferocl2_windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>genefer</app_name>
<version_num>308</version_num>
<api_version>7.0.64</api_version>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>geneferocl2_windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
On a semi-related topic, now that all the standalone testing has been completed and there is only Boinc testing left, I wonder if there is something that can be done about the length of the tasks we run. Would there be a way to temporarily create a smaller GFN subproject that admins can give testers special account permission to access that might be seeded with old verified results to operate against (e.g. our recent double-check efforts on 18 and 19)?
Thanks to mackerel's Boinc on Linux thread, I'd love to sit down and get it working on my installs, and if I can run shorter tasks in testing I can finish it in a couple days or less (instead of the 3-4 weeks calculated, assuming no errors) and jump into the WSS challenge (or even get in the back end of the TRP). I'll note for the curious that all my linux testing is done serially on one computer by manually swapping OS drives and GPUs to the appropriate combo for a test. It's a 3930k with 4x8gb ram, and someday I'll see if there is a performance difference if the system can only utilize the capacity of less than half a single stick of memory.
____________
Eating more cheese on Thursdays. | |
|
|
OCL2 Windows done and done and validated :)
http://www.primegrid.com/result.php?resultid=651679238
____________
Eating more cheese on Thursdays. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
OCL2 Windows done and done and validated :)
http://www.primegrid.com/result.php?resultid=651679238
Thanks!
____________
My lucky number is 75898524288+1 | |
|
|
No luck. My was identical except for the linux-related changes and the api stuff (for 7.2.42). It downoads a work unit, and it sits in "ready to start".
Is there a link somehwere as to what the apropos choices for plan_class are for genefer?
The machine otherwise happily crunches gfn tasks with no app_info present.
http://www.primegrid.com/workunit.php?wuid=448117770
I could do some BOINC testing on linux64 for ATI. Will need some help with the app_info.xml, though (I suspect my issue is with plan_class). My "closest" to success resulted in either a string of computation errors, or a single work unit sitting ready to start and not executing.
Here is the app_info I'm using successfully on my Windows ATI machine. The file is pretty OS/platform agnostic, so changing the file names to their linux equivalents (preceded by ./ I believe) should hopefully work for you.
<app_info>
<app>
<name>genefer</name>
<user_friendly_name>genefer</user_friendly_name>
</app>
<file_info>
<name>geneferocl2_windows.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>genefer</app_name>
<version_num>308</version_num>
<api_version>7.0.64</api_version>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>geneferocl2_windows.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
On a semi-related topic, now that all the standalone testing has been completed and there is only Boinc testing left, I wonder if there is something that can be done about the length of the tasks we run. Would there be a way to temporarily create a smaller GFN subproject that admins can give testers special account permission to access that might be seeded with old verified results to operate against (e.g. our recent double-check efforts on 18 and 19)?
Thanks to mackerel's Boinc on Linux thread, I'd love to sit down and get it working on my installs, and if I can run shorter tasks in testing I can finish it in a couple days or less (instead of the 3-4 weeks calculated, assuming no errors) and jump into the WSS challenge (or even get in the back end of the TRP). I'll note for the curious that all my linux testing is done serially on one computer by manually swapping OS drives and GPUs to the appropriate combo for a test. It's a 3930k with 4x8gb ram, and someday I'll see if there is a performance difference if the system can only utilize the capacity of less than half a single stick of memory.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Is there a link somehwere as to what the apropos choices for plan_class are for genefer?
Plan class names for n=21 for Linux:
cpuGFN
cudaGFN
OCLcudaGFN (This is GeneferOCL running on an Nvidia GPU)
atiGFN (GeneferOCL running on an ATI/AMD GPU)
The plan classes are the same on Windows. On Mac, the ATI/AMD OCL plan class is openclGFNMAC, otherwise they're the same as Windows and Linux.
Different names are used for GFN-WR, and different names will be used for each of the new GFN sub-projects when they're released.
I'm not 100% sure WHY you can or need to specify the plan class in app_info. You may be able to use any plan class name that includes the keywords recognized by the boinc client, such as "cuda". Or it may need to be the exact plan classes used on the server, as noted above. I'm not 100% sure without digging through the BOINC client's and server's source code.
____________
My lucky number is 75898524288+1 | |
|
|
Not sure why either, it has made the difference between working and not, at least on a mac.
Many thanks. It lives! Testing commenced. Should be just under 14 hours for ocl. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Not sure why either, it has made the difference between working and not, at least on a mac.
Many thanks. It lives! Testing commenced. Should be just under 14 hours for ocl.
Great, thanks!
____________
My lucky number is 75898524288+1 | |
|
|
To echo Grebuloner's sentiment, http://www.primegrid.com/workunit.php?wuid=447576738 is 98 hours in, and 72% complete. Assuming OCL2 (which this task is running) is faster than OCL on mac-cuda, I suspect the OCL run will take quite a while. OCL3 took just under 30 hours to complete http://www.primegrid.com/workunit.php?wuid=447803055.
On a semi-related topic, now that all the standalone testing has been completed and there is only Boinc testing left, I wonder if there is something that can be done about the length of the tasks we run. Would there be a way to temporarily create a smaller GFN subproject that admins can give testers special account permission to access that might be seeded with old verified results to operate against (e.g. our recent double-check efforts on 18 and 19)?
Thanks to mackerel's Boinc on Linux thread, I'd love to sit down and get it working on my installs, and if I can run shorter tasks in testing I can finish it in a couple days or less (instead of the 3-4 weeks calculated, assuming no errors) and jump into the WSS challenge (or even get in the back end of the TRP). I'll note for the curious that all my linux testing is done serially on one computer by manually swapping OS drives and GPUs to the appropriate combo for a test. It's a 3930k with 4x8gb ram, and someday I'll see if there is a performance difference if the system can only utilize the capacity of less than half a single stick of memory. [/quote] | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
To echo Grebuloner's sentiment, http://www.primegrid.com/workunit.php?wuid=447576738 is 98 hours in, and 72% complete. Assuming OCL2 (which this task is running) is faster than OCL on mac-cuda, I suspect the OCL run will take quite a while. OCL3 took just under 30 hours to complete http://www.primegrid.com/workunit.php?wuid=447803055.
You have it backwards. OCL2 is several times slower than OCL. OCL3 is usually somewhat slower than OCL, but may be faster than OCL on some GPUs. Both are (much) faster than OCL2.
Putting numerical values on it, faster being better, OCL is an 8, OCL3 is usually a 6 or 7, but sometimes as high as 9, and OCL2 is a 2 or 3.
On most systems, OCL is the fastest available app.
____________
My lucky number is 75898524288+1 | |
|
|
Apologies.
Bearing that out, my boinc test run(s) on linux64 w/ati finished 4 units at 13.5 hours each w/ocl. I started ocl2 runs and calculating out, am looking at around 150 hours. Interestingly, but not necessarily surprisingly, the card is running 7c or so cooler. Also interestingly, the ocl2 tasks (its running a 7990) restarted themselves several times at approximately .010% before proceeding cleanly forward (so far).
To echo Grebuloner's sentiment, http://www.primegrid.com/workunit.php?wuid=447576738 is 98 hours in, and 72% complete. Assuming OCL2 (which this task is running) is faster than OCL on mac-cuda, I suspect the OCL run will take quite a while. OCL3 took just under 30 hours to complete http://www.primegrid.com/workunit.php?wuid=447803055.
You have it backwards. OCL2 is several times slower than OCL. OCL3 is usually somewhat slower than OCL, but may be faster than OCL on some GPUs. Both are (much) faster than OCL2.
Putting numerical values on it, faster being better, OCL is an 8, OCL3 is usually a 6 or 7, but sometimes as high as 9, and OCL2 is a 2 or 3.
On most systems, OCL is the fastest available app.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
...I wonder if there is something that can be done about the length of the tasks we run. Would there be a way to temporarily create a smaller GFN subproject...
We may have something to say about this idea in the next couple of days.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
I have discovered a new bug in 3.2.9, at least in the Windows version of the CUDA and GPU apps. I don't know if it affects other builds. It's a BOINC interface problem, so it will only be seen when running real BOINC apps under app_info.
The problem is that the app isn't updating the BOINC client with its progress. Normally, you'll see the % complete bar go up periodically, and the time-remaining will change accordingly. Those updates aren't happening, so the BOINC client is faking the progress, incrementing the progress by exactly 0.001% each time, and never changing the time-remaining estimate.
For anyone who is currently running 3.2.9 under app_info, you might want to check to see if the progress is updating correctly. "Correctly" means A) the progress is going up by more than 0.001% at each increment, and B) the time remaining varies somewhat.
____________
My lucky number is 75898524288+1 | |
|
|
I currently have 2 tests in flight.
For cuda/mac, it is incrementing by 0.001%, when it increments. It increments every second for 4-7 seconds or so, then increments every other second or so, then repeats. Time remaing varies by roughly 30-50 seconds each increment. Time remaining would appear to be overestimated. Time elapsed, just under 20 hours, progress, just under 60%, estimated time remaining 591 hours.
For linux64/amd running ocl2, it is incrementing by 0.001%, when it increments, which is every 4-5 seconds. Time remaining changes a bit. Time remaining appears to be underestimated. Time elapsed 91 hours, 62%, time remaining 20 hours.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
...I wonder if there is something that can be done about the length of the tasks we run. Would there be a way to temporarily create a smaller GFN subproject...
We may have something to say about this idea in the next couple of days.
32768 and 65536 are open for beta testing!
If you go to the PrimeGrid preferences page, you'll see that you can now choose 32768 and 65536 tasks. You'll also see that it says "beta" next to every check box.
The server is now providing 32768 and 65536 tasks. However, with one exception, it's NOT yet sending you the apps for those tasks. We're still testing them. You asked for shorter tasks for the 3.2.9 tests, and now you have them. You'll need to use app_info.xml, just like you were with n=21.
What's the exception? We want to give OCL2 and OCL3 a good workout comparing residues against the older genefer residues, so if you select the OCL checkboxes for either Nvidia or ATI/AMD, the server will send you the app -- no app_info.xml is needed. For 32768 the server is sending the OCL2 app, and for 65536 the server is sending the OCL3 app. Note that the tests are technically small enough to run with OCL, but we want to give OCL2 and OCL3 a thorough shakeout.
You may also notice that GFN-WR now has a CPU checkbox. It's also in beta, but if you check it and use app_info, you can get CPU apps for the world record tasks.
EDIT: I forgot to mention that OCL2 is NOT available for Mac on ATI/AMD GPUs. This affects 32768, but not 65536.
EDIT 2: Also note that we will be double checking all prior PRPNet and external work before moving on to new numbers.
____________
My lucky number is 75898524288+1 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
...I wonder if there is something that can be done about the length of the tasks we run. Would there be a way to temporarily create a smaller GFN subproject...
We may have something to say about this idea in the next couple of days.
32768 and 65536 are open for beta testing!
If you go to the PrimeGrid preferences page, you'll see that you can now choose 32768 and 65536 tasks. You'll also see that it says "beta" next to every check box.
The server is now providing 32768 and 65536 tasks. However, with one exception, it's NOT yet sending you the apps for those tasks. We're still testing them. You asked for shorter tasks for the 3.2.9 tests, and now you have them. You'll need to use app_info.xml, just like you were with n=21.
What's the exception? We want to give OCL2 and OCL3 a good workout comparing residues against the older genefer residues, so if you select the OCL checkboxes for either Nvidia or ATI/AMD, the server will send you the app -- no app_info.xml is needed. For 32768 the server is sending the OCL2 app, and for 65536 the server is sending the OCL3 app. Note that the tests are technically small enough to run with OCL, but we want to give OCL2 and OCL3 a thorough shakeout.
You may also notice that GFN-WR now has a CPU checkbox. It's also in beta, but if you check it and use app_info, you can get CPU apps for the world record tasks.
EDIT: I forgot to mention that OCL2 is NOT available for Mac on ATI/AMD GPUs. This affects 32768, but not 65536.
EDIT 2: Also note that we will be double checking all prior PRPNet and external work before moving on to new numbers.
Okay, so my 970 grabbed a few of the 65536 tasks. This so weird... after being used to the week long, GFN WR tasks, those 2min< tasks just seem insane. So far, 2 (edit: 3) validated already, everything seems fine.
Gonna see how it behaves overnight....
EDIT 2: It just whines like a train hitting the brakes...... | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Okay, so my 970 grabbed a few of the 65536 tasks. This so weird... after being used to the week long, GFN WR tasks, those 2min< tasks just seem insane. So far, 2 (edit: 3) validated already, everything seems fine.
Gonna see how it behaves overnight....
EDIT 2: It just whines like a train hitting the brakes......
All tasks (or very close to all tasks) have double check residues, so validation should occur within 60 seconds.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Regarding the PRPNet ports, once we're out of beta on 32768 and 65536 (i.e., you no longer need app info and the server is sending out OCL instead of OCL2 or OCL3) we'll be setting the PRPNet 32768 and 65536 ports to "no new work", and shut them down completely once all outstanding tasks are returned.
____________
My lucky number is 75898524288+1 | |
|
|
You may also notice that GFN-WR now has a CPU checkbox. It's also in beta, but if you check it and use app_info, you can get CPU apps for the world record tasks.
Ok so here's a stupid question...
I managed to compile a genefer cpu app (3.2.9) for FreeBSD 64 bit and am currently testing it. Is the Genefer WR cpu app going to be the same as the Genefer cpu app found at:
https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/src/genefer
?
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
You may also notice that GFN-WR now has a CPU checkbox. It's also in beta, but if you check it and use app_info, you can get CPU apps for the world record tasks.
Ok so here's a stupid question...
I managed to compile a genefer cpu app (3.2.9) for FreeBSD 64 bit and am currently testing it. Is the Genefer WR cpu app going to be the same as the Genefer cpu app found at:
https://www.assembla.com/spaces/genefer/subversion/source/HEAD/trunk/src/genefer
?
Exact same app for any Genefer N range, including WR (N=22). So, yes.
____________
My lucky number is 75898524288+1 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
Okay, so I was playing with the new apps... Ain't credit a bit on the low side?
My last GFN-WR for the CUDA app (so it uses almost no CPU at all) gave me a 103k PPD.
My last GFN-21 for the OCL app gave me 104k PPD.
However, GFN-16 is giving around 53k PPD. And that's because it's using OCL3, which runs much faster on Maxwell cards than the regular OCL does, so I'd expect people with older cards to have even lower yelds. GFN-15 is at 21k, which I guess comes from the fact that it runs the "uber slow" OCL 2.
Doesn't feel like crunching small WUs when long ones reward you so much better.......
OH. Is that why? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Okay, so I was playing with the new apps... Ain't credit a bit on the low side?
My last GFN-WR for the CUDA app (so it uses almost no CPU at all) gave me a 103k PPD.
My last GFN-21 for the OCL app gave me 104k PPD.
However, GFN-16 is giving around 53k PPD. And that's because it's using OCL3, which runs much faster on Maxwell cards than the regular OCL does, so I'd expect people with older cards to have even lower yelds. GFN-15 is at 21k, which I guess comes from the fact that it runs the "uber slow" OCL 2.
Doesn't feel like crunching small WUs when long ones reward you so much better.......
OH. Is that why?
N=21 and N=22 (WR) have bonuses because the tasks are so long. Other than that, the credit is based on the relative processing time if the CPU app is used. In a perfect world, if the bonuses are subtracted out, you'll get the same credit per day. There's a lot of complicating factors that make it less than perfect, so your mileage will vary.
EDIT: One of those "complicating factors" is that GPU apps are somewhat more efficient with larger N's. This seems to be more pronounced with CUDA than with OCL, and it may vary by GPU architecture, driver, etc.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Keeping up with my ongoing theme of forgetting to include useful information, here's the app and plan_class names you'll need to get app_info working correctly on 32768 and 65536:
For 32768:
app name: genefer15
plan class names:
cpuGFN15 (for CPU apps)
cudaGFN15 (for CUDA -- not sure if the server will send these since you can't select them.)
OCLcudaGFN15 (for OCL/OCL2/OCL3 on Nvidia GPUs)
atiGFN15 (for OCL/OCL2/OCL3 on ATI/AMD GPUs -- Windows/Linux only)
openclGFNMAC15 (for OCL/OCL2/OCL3 on ATI/AMD GPUs -- Mac only)
For 65536:
app name: genefer16
plan class names:
cpuGFN16 (for CPU apps)
cudaGFN16 (for CUDA -- not sure if the server will send these since you can't select them.)
OCLcudaGFN16 (for OCL/OCL2/OCL3 on Nvidia GPUs)
atiGFN16 (for OCL/OCL2/OCL3 on ATI/AMD GPUs -- Windows/Linux only)
openclGFNMAC16 (for OCL/OCL2/OCL3 on ATI/AMD GPUs -- Mac only)
Note that for openclGFNMAC15 and openclGFNMAC16 we expect the OCL2 app to fail. It's a known bug, with no known fix at the present time.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
For World Record (N=22) CPU tasks, the app name is genefer_wr and the plan class is cpuGFNWR.
____________
My lucky number is 75898524288+1 | |
|
|
Thanks Michael! Giving my test machine a break from the PRP challenge, I got BOINC GPU crunching up and running on the Linux 32 partition, and have begun running 15/16 WUs (after making a small correction, apologies for 142 aborted WUs!). Updating the testing matrix as I go.
On GFN15/16, are the b's low enough for OCL to run? It'll save me some time before I swap out the 7950 for a 480 to start Nvidia crunching.
____________
Eating more cheese on Thursdays. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
On 15/16, b is currently low enough for anything to run.
____________
My lucky number is 75898524288+1 | |
|
|
On 15/16, b is currently low enough for anything to run.
Excellent. I'll see about finishing Linux32 tonight after work and maybe starting 64.
I will note for you that at least on the Linux 32 OCL2 build, BOINC %progress/time remaining is updated correctly.
____________
Eating more cheese on Thursdays. | |
|
|
I have not seen any progress anomalies on linux64/amd or mac/nvidia. The only one of those two suites I haven't run yet is linux64/amd ocl3, which I'll fire up tomorrow morning. | |
|
|
On 15/16, b is currently low enough for anything to run.
No kidding, these are fast. 140-1100 secs for genefer16 on an otherwise anemic amd 5770. Genefer15 240 seconds on an e5520. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
Because I felt adventurous, I tried OCL 3 on GFN-WR. Looks like this one was a fail.. http://www.primegrid.com/result.php?resultid=653284461
EDIT: Max err 1.000 >0.450...... great. I thought our beta estabilished that I could run at 1569mhz just fine (http://www.primegrid.com/forum_thread.php?id=6391&nowrap=true#87809), but I guess 1480mhz is not doing it for actual WUs. I'll play with it a bit further later.
Also, I've seen one of the GFN-16 tasks get stuck on 99% progress once. Both CPU and GPU usage were at 0%, the task was just idling there for no reason. Suspending and resuming the task fixed, though, and it was able to resume computation at the last checkpoint (around 54%). | |
|
|
My 150 hour estimate after 20 minutes of run time for linux64/amd/ocl2 turned into 145.73 hours actual, so I'd have to say the progress indicator was pretty accurate.
It is interesting to see the peformance variations across platforms/cards.
On Mac/Nvidia w/gtx680, I got the following run times (hours)
cuda 33.62
ocl 24.88
ocl2 135.24
ocl3 29.72
On linux64/amd w/7990, I got the following (for one unit on one gpu w/2 running)
ocl 13.51
ocl2 145.73
projecting ocl3 150.22 after 2 hours run time
Curious that a card that is so much better at DP looks like it gets clobbered by ocl3.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
32768 and 65536 are now in normal production mode.
CPU and OCL apps are are available by selecting the corresponding check boxes; app_info is not needed. Note that this is the original OCL app, which is substantially faster than OCL2, and on most GPUs is faster than OCL3. On some GPUs, notably Nvidia Maxwell architectures, OCL3 may be faster than OCL. For now, if you want to use OCL3 you'll have to use app_info. We have plans to make this automatic in the future, but I don't yet have a reliable estimate of when that will happen.
The N=22 (World Record) CPU tasks are still in "beta" testing, so you still need to use app_info if you want to test that app.
____________
My lucky number is 75898524288+1 | |
|
|
32768 and 65536 are now in normal production mode.
CPU and OCL apps are are available by selecting the corresponding check boxes; app_info is not needed. Note that this is the original OCL app, which is substantially faster than OCL2, and on most GPUs is faster than OCL3. On some GPUs, notably Nvidia Maxwell architectures, OCL3 may be faster than OCL. For now, if you want to use OCL3 you'll have to use app_info. We have plans to make this automatic in the future, but I don't yet have a reliable estimate of when that will happen.
The N=22 (World Record) CPU tasks are still in "beta" testing, so you still need to use app_info if you want to test that app.
Just noticed this, since the tasks are now about 3x faster (~40 secs) on my gtx 670 then with ocl2 :)
Everything looks good from here!
____________
| |
|
|
32768 and 65536 are now in normal production mode.
CPU and OCL apps are are available by selecting the corresponding check boxes; app_info is not needed. Note that this is the original OCL app, which is substantially faster than OCL2, and on most GPUs is faster than OCL3. On some GPUs, notably Nvidia Maxwell architectures, OCL3 may be faster than OCL. For now, if you want to use OCL3 you'll have to use app_info. We have plans to make this automatic in the future, but I don't yet have a reliable estimate of when that will happen.
The N=22 (World Record) CPU tasks are still in "beta" testing, so you still need to use app_info if you want to test that app.
Yeah man, grabbed a mix of 15 and 16. Question since the original is faster, when selecting these for work, will you give us the option of using the OCL of our choice in the future?
Also, I noticed a post earlier that these are all double checks and as such a 2nd dc was not going to be done (at least I think I read that) so, if we get a hit will it count as a DC?
Cheers | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Question since the original is faster, when selecting these for work, will you give us the option of using the OCL of our choice in the future?
Not planning on that, no. The plan is to have a single OCL app that automatically runs the faster transform, and automatically switches transforms because of B limits.
Also, I noticed a post earlier that these are all double checks and as such a 2nd dc was not going to be done (at least I think I read that) so, if we get a hit will it count as a DC?
If we get a hit, it should be a newly found prime since we've already excluded all the primes we're aware of. As with other projects that we've moved to BOINC, we're running everything on BOINC with residues imported from PRPnet. If there was a test on PRPNet, we're doing one test as a double check against that residue, otherwise we're doing two tests on each candidate.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
One more thing about OCL -- on such small tasks, GPUs aren't as efficient as you're used to on the larger tasks. Especially with these smaller B values where AVX/FMA3 is usable. I can do four 32768s on the CPU in about 80 seconds (drawing about 60 watts), and one task on the GPU in about 30 seconds (drawing close to 200 watts). If I was going to crunch those GFNs, I'm not sure I'd use the GPU until the B values get large enough so the CPU is using x87 and the GPU is using OCL3. Right now, I think I'd stick with the CPU.
____________
My lucky number is 75898524288+1 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
I was looking at the GFN Preference page, and I saw this:
9 At this time the OCL2 app (used only for n=17-mega) is not available for Mac computers with ATI/AMD GPUs.
So..... when are we getting it? It wouldn't make sense to put that note in there, since we aren't crunching n=17. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
One more thing about OCL -- on such small tasks, GPUs aren't as efficient as you're used to on the larger tasks. Especially with these smaller B values where AVX/FMA3 is usable
Sounds like something for me to test... might take a while for existing CPU LLR work to filter through before I see results. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
GFN-15 only
i7-2600k 3.4 GHz ~100s
i5-4570s 3.2 GHz ~80s
i7-6700k 4.0 GHz ~35s
R9 280X mostly ~35s, one unit 21s, some units >150s
GTX 980Ti ~45s
GTX 960 ~60s
Systems are running 3 CPU tasks and 1 GPU task at the same time. I note the AMD GPU doesn't seem to take noticeable CPU but to be safe I'm still leaving a core free for it.
Odd the R9 280X seems to have quite variable runtimes. This doesn't appear to be happening on nvidia cards. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
GFN-16 only
i7-2600k 3.4 GHz ~392s
i5-4570s 3.2 GHz ~332s
i7-6700k 4.0 GHz ~226s
R9 280X mostly ~97s, some units <81s, longest unit >300s
GTX 980Ti ~113s
GTX 960 ~204s
Still the variation on the AMD GPU.
I assume since we're in double check, and units are currently very small, they might go up rapidly. So I don't know how long these timings will be useful for. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
After my little fail at the GFN-WR with the OCL3, I decided to try again. I thought that the issue was my card reducing voltage due to thermals (and thus causing instability), so I closely monitored my temps and voltages to make sure I'd keep stability on my OC. At first, everything seemed fine. The WU was crunching away.... until it happened, I got the Max Err 1 > 0.45 again. I was pretty pissed; after all, the WU was doing just fine before.
So I just said "screw it, I'm running stock speeds again, enough OC problems for today". But to my surprise, even at stock, it just keeps getting the error! At this rate, I'll have to abort the task and waste another 2h of work.
Any ideas? It's the second time it happens, with or without OCing. Is that the app with problems, or am I doing something wrong?
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Starting initialization...
Initialization complete (117.780 seconds).
Testing 54650^4194304+1...
Estimated time remaining for 54650^4194304+1 is 83:38:11
Terminating because BOINC client requested that we should quit.
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Resuming 54650^4194304+1 from a checkpoint (65015953 iterations left)
Estimated time remaining for 54650^4194304+1 is 81:14:18
Terminating because BOINC client requested that we should quit.
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Resuming 54650^4194304+1 from a checkpoint (65000763 iterations left)
Estimated time remaining for 54650^4194304+1 is 81:15:43
Terminating because BOINC client requested that we should quit.
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Resuming 54650^4194304+1 from a checkpoint (64980892 iterations left)
Estimated time remaining for 54650^4194304+1 is 81:48:25
Terminating because BOINC client requested that we should quit.
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Resuming 54650^4194304+1 from a checkpoint (64901996 iterations left)
Estimated time remaining for 54650^4194304+1 is 81:23:58
Terminating because BOINC client requested that we should quit.
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Resuming 54650^4194304+1 from a checkpoint (64798865 iterations left)
Estimated time remaining for 54650^4194304+1 is 81:33:12
Terminating because BOINC client requested that we should quit.
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Resuming 54650^4194304+1 from a checkpoint (64649800 iterations left)
maxErr exceeded for 54650^4194304+1, 1.0000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
maxErr exceeded by all available transform implementations
Waiting 10 minutes before attempting to continue from last checkpoint...
geneferocl3 3.2.9 (Windows/OpenCL-NTT/32-bit)
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Genefer is free source code, under the MIT license.
Command line: projects/www.primegrid.com/geneferocl3_windows.exe -boinc -q 54650^4194304+1 --device 0
Priority change succeeded.
Using OCL3 transform
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', version 'OpenCL 1.2 CUDA' and driver '358.50'.
Resuming 54650^4194304+1 from a checkpoint (64649800 iterations left)
maxErr exceeded for 54650^4194304+1, 1.0000 > 0.4500
MaxErr exceeded may be caused by overclocking, overheated GPUs and other transient errors.
maxErr exceeded by all available transform implementations
Waiting 10 minutes before attempting to continue from last checkpoint... | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
When you say without OC, is the card running at stock nvidia clocks, or is it factory overclocked anyway? I forget exactly which card it was but I had a factory OC one. It wouldn't crunch stably until I reduced it to nvidia standard clocks. | |
|
|
All run fine on my ATI HD 5850's. Are there new Badges for this tasks?
Greetings | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1930 ID: 352 Credit: 5,463,422,052 RAC: 5,787,600
                                   
|
Are there new Badges for this tasks?
All BOINC tasks from Genefer family are counted toward standard GFN badge.
(manual GFN sieving goes to PSA, IIRC)
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 905 ID: 370496 Credit: 459,496,650 RAC: 148,364
                   
|
When you say without OC, is the card running at stock nvidia clocks, or is it factory overclocked anyway? I forget exactly which card it was but I had a factory OC one. It wouldn't crunch stably until I reduced it to nvidia standard clocks.
I tried both. As soon as I unsuspend the task, it just straight up gives the Max Err warning. I suspect the checkpoint is corrupted at this point, so I give up. I'll start my 3rd WU from scratch, this time with Nvidia clocks from the get go, and see what happens.
If it fails again, I'll start suspecting there's something wrong with the app itself... | |
|
|
So are we now running "live" units on 15? I ask because I have a few pending and they have been for a while with a 2nd dc required.
Cheers | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
So are we now running "live" units on 15? I ask because I have a few pending and they have been for a while with a 2nd dc required.
Cheers
We are live. Most are DC of existing residues for b<10M, but most with b<500K don't have residues so we need to do two tests on those. That means waiting for a wingman 50% of the time with those tasks. However, at the rate we're going through these tasks, that won't last long. It's looks like it will take only about 48 hours to process everything below the OCL and AVX b limits. That limit is probably around 2.2M, but we're going to switch from OCL to OCL3 around 1.8M just to play it safe. The CPU app will switch automatically from AVX to x87 when it needs to do so.
____________
My lucky number is 75898524288+1 | |
|
|
So are we now running "live" units on 15? I ask because I have a few pending and they have been for a while with a 2nd dc required.
Cheers
We are live. Most are DC of existing residues for b<10M, but most with b<500K don't have residues so we need to do two tests on those. That means waiting for a wingman 50% of the time with those tasks. However, at the rate we're going through these tasks, that won't last long. It's looks like it will take only about 48 hours to process everything below the OCL and AVX b limits. That limit is probably around 2.2M, but we're going to switch from OCL to OCL3 around 1.8M just to play it safe. The CPU app will switch automatically from AVX to x87 when it needs to do so.
Great! thanks for the update.. kinda fun pushing this gfn's around.
Cheers | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
The initial processing rate is rather astounding. In all the years of GFN testing, 32768 has been tested up to around b=10M. In just two or three days, we'll have retested up to b=2M. After that, things will slow down because the CPU app will switch to x87 which is about 10 times slower.
____________
My lucky number is 75898524288+1 | |
|
|
So the PPRNet ports have been shut down, then? | |
|
|
The initial processing rate is rather astounding. In all the years of GFN testing, 32768 has been tested up to around b=10M. In just two or three days, we'll have retested up to b=2M. After that, things will slow down because the CPU app will switch to x87 which is about 10 times slower.
It might be my imagination but looking at the server stats it looks like usage and traffic are up from "normal". | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
So the PPRNet ports have been shut down, then?
Those two, yes.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
The initial processing rate is rather astounding. In all the years of GFN testing, 32768 has been tested up to around b=10M. In just two or three days, we'll have retested up to b=2M. After that, things will slow down because the CPU app will switch to x87 which is about 10 times slower.
It might be my imagination but looking at the server stats it looks like usage and traffic are up from "normal".
It's not your imagination.
____________
My lucky number is 75898524288+1 | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 981 ID: 301928 Credit: 543,185,506 RAC: 36,711
                        
|
On the "My badges" page, I see no pending GFN credit anymore, although I have ~600 tasks waiting for validation. Pending credit for other projects seems to function normally. Some GFN credit is counting up too. Only "GFN Pending" is missing. It something broken or was intentionally disabled?
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
On the "My badges" page, I see no pending GFN credit anymore, although I have ~600 tasks waiting for validation. Pending credit for other projects seems to function normally. Some GFN credit is counting up too. Only "GFN Pending" is missing. It something broken or was intentionally disabled?
I'll fix it soon.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Fixed.
____________
My lucky number is 75898524288+1 | |
|
|
These '15 cpu tasks seem to be hyper-threading friendly.
My i5-m460 is doing tasks at ~5 minutes each, or 8.5 minutes with HT turned on. | |
|
|
These '15 cpu tasks seem to be hyper-threading friendly.
My i5-m460 is doing tasks at ~5 minutes each, or 8.5 minutes with HT turned on.
So, it's slower with HT turned on? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
These '15 cpu tasks seem to be hyper-threading friendly.
My i5-m460 is doing tasks at ~5 minutes each, or 8.5 minutes with HT turned on.
So, it's slower with HT turned on?
No, that's faster with HT on. With HT off it's doing 1 task in 5 minutes. With HT on, it's doing two tasks in only 8.5 minutes. (With HT off, it would take 10 minutes to do those same two tasks.)
____________
My lucky number is 75898524288+1 | |
|
|
These '15 cpu tasks seem to be hyper-threading friendly.
My i5-m460 is doing tasks at ~5 minutes each, or 8.5 minutes with HT turned on.
So, it's slower with HT turned on?
Slower per 4 wu's? No.
this cpu is a dual core with HT.
Without HT, 4 Wu would take ~10 minutes to complete. 2 units then 2 units, or 5 minutes + 5 minutes.
With HT turned on 4 units running at once take ~8.5 minutes. | |
|
|
No, that's faster with HT on. With HT off it's doing 1 task in 5 minutes. With HT on, it's doing two tasks in only 8.5 minutes. (With HT off, it would take 10 minutes to do those same two tasks.)
Without HT, 4 Wu would take ~10 minutes to complete. 2 units then 2 units, or 5 minutes + 5 minutes.
With HT turned on 4 units running at once take ~8.5 minutes.
Thanks, I understand now. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
These '15 cpu tasks seem to be hyper-threading friendly.
My i5-m460 is doing tasks at ~5 minutes each, or 8.5 minutes with HT turned on.
Had to try this myself. i3-4150T about 93 seconds running 2 tasks, 153 seconds running 4. So HT is providing about 22% extra throughput in this case.
Same calc on the i5-460M numbers above would be 18% increase in throughput. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
Doing same on '16 units, I'm seeing about 15% throughput gain from running HT on the i3-4150T.
Another day I think I'll have to try this on something bigger like SGS... | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
32768 is now running OCL3 instead of OCL. It's good to b=16,777,216. Depending on your GPU OCL3 may be somewhat slower or somewhat faster than OCL.
We're currently just below b=1.9M. I expect the CPU app to switch to x87 aroun 2.4M or 2.5M.
____________
My lucky number is 75898524288+1 | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,621,444 RAC: 0
                    
|
Run time for my HD7970 GPU jumped from about 40 seconds for OCL to about 113 seconds for OCL3. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1930 ID: 352 Credit: 5,463,422,052 RAC: 5,787,600
                                   
|
32768 is now running OCL3 instead of OCL. It's good to b=16,777,216. Depending on your GPU OCL3 may be somewhat slower or somewhat faster than OCL.
When started sieving couple of years ago, 100M seemed ridiculously high for pre-AVX CPU.
(I did tested very low n like 4096 up to 100M using PFGW back in...2009?).
Couple years later, we will double-check years of effort on PRPNet, surpass it and hit b level of 16M quite soon.
Back on topic - there is a known issue with Mac app, trickle-up are fixed and working?
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
|
Run time for my HD7970 GPU jumped from about 40 seconds for OCL to about 113 seconds for OCL3.
I have seen the same thing on tahiti-based cards. The OCL3 performance hit is pretty big. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Back on topic - there is a known issue with Mac app, trickle-up are fixed and working?
The issue is with OCL2, not OCL3. OCL3 works fine on Mac. It has nothing to do with trickles; the app fails completely.
On ATI/AMD GPUs, the Mac version of OCL2 fails due to a Mac driver problem. Initially, this will only affect n=17-mega, although sooner or later 32768 and 65536 will get large enough to require OCL2. There's no known workaround, so ATI/AMD GPUs on Macs won't be usable with tasks large enough to require OCL2.
____________
My lucky number is 75898524288+1 | |
|
|
Looks like I just hit the limit on these '15 cpu units. CPU times jumped from ~10-12 minutes to up over 20. X87 onward for my 460m
Im going to do some testing and see if HT is still helping or harming now. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Looks like I just hit the limit on these '15 cpu units. CPU times jumped from ~10-12 minutes to up over 20. X87 onward for my 460m
Im going to do some testing and see if HT is still helping or harming now.
Exactly when you will switch to x87 depends on the CPU. CPUs without SSE (we're talking REALLY old, like my 32 bit Sempron) will probably switch quickest because the "default" transform has the lowest B limit. FMA, AVX, SSE3, and SSE4 all have somewhat different limits. While one CPU may complete the task using a high speed instruction set, another might need to use x87. Additionally, the CPU B limits are rather fuzzy, so over a range of numbers you'll have some completing with the high speed instructions while others switch to x87.
My 32-bit Sempron switched from "default" to "x87" a few hours ago. That increased the task time by about 50%. "Only 50%?" you might ask, but it's not that the Sempron does x87 very quickly. It's that its normal mode isn't all that fast to start with, so switching to x87 doesn't hurt it all that much.
____________
My lucky number is 75898524288+1 | |
|
|
At the pace we are doing these double checks how long before we finish? Or is that a silly question.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
At the pace we are doing these double checks how long before we finish? Or is that a silly question.
It's not a silly question, but it's an irrelevant question.
The reason it's irrelevant is that the current pace is going to slow down considerably because A) OCL3 on many GPUs is slower than OCL, B) the CPU tasks are going to switch over to the much slower x87 if they haven't already done so, and C) because of (A) and (B), I expect many people to switch over to 65536 from 32768. In a few days (I'm guessing a week or two at most) 65536 will also reach it's OCL and not-x87 B limits, and a similar slowdown will occur there too.
It's just too soon to answer that question.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Here are some numbers on 32768 and 65536.
32768:
Current leading edge: about 2.3M
We'll be doing double checks up to about 10.4M. There's about 590K residues to double check in total. Not every task has a double check residue, however. Some were obviously bad, and some were originally tested with programs other than Genefer, so they can't be double checked.
OCL3 can test up to about 16.7M, after which we switch to OCL2.
65536:
The current leading edge is about 223k, although we also have tested from about 671K to about 840K.
We'll continue testing from 223K to 671K, then we jump to 840K. We don't have (many) residues below 671K, so we're doing two tests on most of these numbers. Double checks will continue up to about 4.2M. There's about 330K double check residues.
OCL3 is good up to about 11.8M before we need to switch to OCL2.
65536 is currently running OCL and will switch to OCL3 (and x87 on CPUs) in the vicinity of about 1.8M or 1.9M.
65536 tasks will be large enough to be reported to T5K once b>=842,555.
____________
My lucky number is 75898524288+1 | |
|
|
The initial processing rate is rather astounding. In all the years of GFN testing, 32768 has been tested up to around b=10M. In just two or three days, we'll have retested up to b=2M. After that, things will slow down because the CPU app will switch to x87 which is about 10 times slower.
Will credit remain the same? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
The initial processing rate is rather astounding. In all the years of GFN testing, 32768 has been tested up to around b=10M. In just two or three days, we'll have retested up to b=2M. After that, things will slow down because the CPU app will switch to x87 which is about 10 times slower.
Will credit remain the same?
Credit is based on the amount of work done in each task, and the amount of work isn't changing. Therefore the credit doesn't change.
____________
My lucky number is 75898524288+1 | |
|
|
Thanks, Michael. I just finished one of the slower ones (32768-CPU). Run time almost tripled and credit remained the same. Comes out to less than 15cr/hr for me. I guess I was hoping when you said slower you really meant longer. Will switch to 65536 until I notice the slowdown. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1930 ID: 352 Credit: 5,463,422,052 RAC: 5,787,600
                                   
|
The current leading edge is about 223k, although we also have tested from about 671K to about 840K.
Mike, can we reset b max in progress counter for Genefer stats page?
Showing current leading edge would be more convenient I think.
It will jump to 840K when we arrive there.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
|
The GFN16 work units are running faster than the GFN15 ones now. Is there a point in the future where the faster instruction set will again be used or is this the new normal? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
The current leading edge is about 223k, although we also have tested from about 671K to about 840K.
Mike, can we reset b max in progress counter for Genefer stats page?
Showing current leading edge would be more convenient I think.
It will jump to 840K when we arrive there.
Not worth the effort. It's not easy to do, and the coding changes would only be used for a few days. We'll close the gap soon enough.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
The GFN16 work units are running faster than the GFN15 ones now. Is there a point in the future where the faster instruction set will again be used or is this the new normal?
New normal.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
131072 (low) and 131072 (mega) are now live!
I have opened up both the normal, sequential (aka "low") 131072 range and the mega-prime 131072 range.
The low 131072 range starts at b=1.3M. That's where we left off with when we crunched 131072 a few months ago. Currently, CPU tasks are still using the fast instructions sets (FMA, AVX, etc.), but will need to switch to the slower x87 instructions around b=1.6M. On the GPU side, we're using OCL3, which will be usable until b=8,388,608.
The mega-prime range starts at b=42598524. This will be using x87 for CPU tasks and OCL2 for GPU tasks, i.e., the slowest transforms due to the large b value. However, this makes for a relatively fast test for a mega-prime on the GPU (about 35 minutes on my GTX 580).
All of this is new work; there's nothing to be double checked, so I do expect that there are more primes to be found here.
Double precision is NOT required on the GPUs for either OCL2 or OCL3. That not only means that a wider variety of ATI/AMD GPUs may be used, but it also means if you have a GTX TITAN you can run it in its higher clock speed mode rather than double precision mode.
EDIT: Unfortunately, if you have an ATI/AMD GPU on a Mac, you can't use the GPU for 131072 Mega tasks. An Apple OpenCL driver bug prevents the OCL2 app from working. The OCL and OCL3 apps are not affected, and since OCL2 is only currently being used for 131072 Mega you can still run any of the other available GFN ranges. We apologize for the inconvenience.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
Grabbed a single low unit on i5-4570S (3.2 GHz) and that's estimating 30 minutes (other 3 cores on PPSE). Another low unit on a GTX 960 is estimating 13.5 minutes. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13780 ID: 53948 Credit: 343,953,395 RAC: 9,334
                              
|
Sample run times for 131072, with Core i5-4670K (Haswell w/FMA3) and Nvida GTX 580:
Low:
CPU (FMA3): 25 minutes
GPU (OCL3): 14 minutes
Mega:
CPU (x87): 5.5 hours
GPU (OCL2): 35 minutes
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
I can confirm the GTX 960 is doing "low" units in just under 13 minutes. R9 280X is just did one unit in just over 13 minutes.
Moved all my CPUs on it too, but will be a while before I have timings. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
Some CPU times, again only for "low" units:
i7-2600k, 3.4 GHz, 4 cores, 1333 ram, 30 minutes
i3-4150T, 3.0 GHz, 1 cores, 1333 ram, 28 minutes
i3-4150T, 3.0 GHz, 2 cores, 1600 ram, 30 minutes
i5-4570S, 3.2 GHz, 3 cores, 1600 ram, 26 minutes
i7-6700k, 4.0 GHz, 4 cores, 3333 ram, 17 minutes
Systems running 1 or 3 cores have one core kept aside for GPU.
GPU-Z shows the GTX 960 running about 75% TDP, so although they're pretty fast, in power per unit they're still poor compared to CPU at this point.
e.g. a stock GTX 960 is 120W TDP, so 75% of that is 90W. My card is a factory OC one, so in reality the TDP is likely even higher. In an hour it would do nearly 5 units. Even my oldest CPU, the i7-2600k, would do 8 units in the same time, and has a TDP of 95W. | |
|
|
Hi,
I notice a few people are able to run their CAL AMD Radeon (Tahiti) 3072MB series GPU's.
At present I am still unable to get my CAL (Hawaii) (4096MB) GPU to do any PrimeGrid tasks.
Tested card with Collatz & Moo Wrapper. Works on those tasks.
Have installed ATI Stream SDK (not sure if this was necessary, but did it anyhow.)
Would really love to be able to put this card to work here on PG.
Am I missing something, or is this series of card simply not compatible?
Thanks.
Jpeg | |
|
|
Hi,
I notice a few people are able to run their CAL AMD Radeon (Tahiti) 3072MB series GPU's.
At present I am still unable to get my CAL (Hawaii) (4096MB) GPU to do any PrimeGrid tasks.
Tested card with Collatz & Moo Wrapper. Works on those tasks.
Have installed ATI Stream SDK (not sure if this was necessary, but did it anyhow.)
Would really love to be able to put this card to work here on PG.
Am I missing something, or is this series of card simply not compatible?
Thanks.
Jpeg
I had the same problem I added an app_info file and had to manually download the app files because the files wouldn't install when the project attached. | |
|
|
Some CPU times, again only for "low" units:
i7-2600k, 3.4 GHz, 4 cores, 1333 ram, 30 minutes
i3-4150T, 3.0 GHz, 1 cores, 1333 ram, 28 minutes
i3-4150T, 3.0 GHz, 2 cores, 1600 ram, 30 minutes
i5-4570S, 3.2 GHz, 3 cores, 1600 ram, 26 minutes
i7-6700k, 4.0 GHz, 4 cores, 3333 ram, 17 minutes
Systems running 1 or 3 cores have one core kept aside for GPU.
GPU-Z shows the GTX 960 running about 75% TDP, so although they're pretty fast, in power per unit they're still poor compared to CPU at this point.
e.g. a stock GTX 960 is 120W TDP, so 75% of that is 90W. My card is a factory OC one, so in reality the TDP is likely even higher. In an hour it would do nearly 5 units. Even my oldest CPU, the i7-2600k, would do 8 units in the same time, and has a TDP of 95W.
It would be interesting if we had an across the board GFN Challenge right now, utilizing all of the current Genefer subprojects for cpu and gpu. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
                             
|
More data on n=17 low units.
i7-2600k, 3.4 GHz, 4 cores, 1333 ram, 29.8 minutes
i3-4150T, 3.0 GHz, 2 cores, 1333 ram, 28.2 minutes
i5-4570S, 3.2 GHz, 4 cores, 1600 ram, 26.3 minutes
i7-6700k, 4.0 GHz, 4 cores, 3333 ram, 16.7 minutes
Since the boxes have been running overnight I've got some refined numbers for CPU only run times. No new data for GPUs. While faster than the single CPU core it displaced, it used as much power as a higher end quad core to do it. I don't need that much heating yet.
Looking at the unit numbers some more, I would make the following observations:
1, if I normalise my Haswell system results for CPU clock they seem to be running about the same rate. It follows there doesn't appear to be a ram bandwidth dependency like for bigger LLRs. As far as I can see, there is no "FFT size" like on LLR for me to use to work out approximate data set size. Relatively speaking, tasks are smaller than PPS which was using 192k FFT as of a few days ago, which would take 1.5MB of ram. That is a nice figure, as i3 and i5 CPUs typically have 1.5MB of cache per core, hence not seeing the ram bandwidth dependency. i7 CPUs have at least 2MB/core so have a bit more headroom as unit sizes get bigger over time.
2, taking Haswell as the reference, with CPU clock normalised, Skylake is 26% faster. Sandy Bridge is 17% slower.
3, using published CPU TDP values, not measured actual power consumption, I calculated a "performance per watt" score, where higher is better. Skylake is about 3.8, Haswell between 2.9 and 3.4. Sandy Bridge 2.0. | |
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 701 ID: 164101 Credit: 305,166,630 RAC: 523

|
0Looking at the unit numbers some more, I would make the following observations:
1, if I normalise my Haswell system results for CPU clock they seem to be running about the same rate. It follows there doesn't appear to be a ram bandwidth dependency like for bigger LLRs. As far as I can see, there is no "FFT size" like on LLR for me to use to work out approximate data set size. Relatively speaking, tasks are smaller than PPS which was using 192k FFT as of a few days ago, which would take 1.5MB of ram. That is a nice figure, as i3 and i5 CPUs typically have 1.5MB of cache per core, hence not seeing the ram bandwidth dependency. i7 CPUs have at least 2MB/core so have a bit more headroom as unit sizes get bigger over time.
For n=17, FFT size is 2^17 = 128k.
It is the size of the data array (read/write) but you should add to it the size of the sin/cos table.
It depends on implementation, it is 3/8 * FFT size for the CPU version of Genefer. That is a read only array. L2/L3 are also some instruction caches then the size of the binary should be included too.
RAM size = 1.5 * FFT size is a better estimate for Genefer or LLR than RAM size = FFT size.
Then Genefer at n=17 takes about 1.5MB of ram.
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2577 ID: 29980 Credit: 548,779,887 RAC: 19,721
 |
|