Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Genefer 3.2.2 has been released
Author |
Message |
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
The latest version of Genefer, v. 3.2.2, has been released and is live for both short and World Record tasks. Some notable changes:
1) The OpenCl and CPU apps are significantly faster than the previous versions.
2) It is likely that the OpenCL app is now faster than the CUDA app on any Nvidia GPU, but especially on newer (Kepler or later, 6xx or newer) GPUs. On the app selection page, the OpenCL app now appears before the CUDA app. NOTE: On Nvidia cards, the OpenCL driver hogs a CPU core. If you have an Nvidia GPU, you may chose between the faster app (OpenCL) or that app that uses very little CPU (CUDA).
3) The CPU app now supports the FMA3 instructions on Intel Haswell CPUs, which makes them even faster.
4) The bug in the Mac ATI app has been fixed.
____________
My lucky number is 75898524288+1 | |
|
|
This is welcome news. Thanks to the developers for the new and improved versions of Genefer apps and the testers who took the time and effort to evaluate them. | |
|
|
Hello !
Thank you for this excellent news.
When will the new app be released ?
Just tried, and I received Genefer 3.0.1 ?
Or : is it the name of the new app ?
Thank You
Best Regards | |
|
|
Just tried, and I received Genefer 3.0.1 ?
Or : is it the name of the new app ?
Yes, 3.0.1 is the BOINC app version, this corresponds to version 3.2.2 of the Genefer code.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1963 ID: 352 Credit: 6,402,379,339 RAC: 2,512,564
                                      
|
R280X, slow C2D CPU and some benchmarks using different version of Genefer.
This host is on PRPNet so I should know in an hour about 827258^524288+1 test. I'm currious not only about time but also Err value.
EDIT: 827258^524288+1 is a probable composite. (RES=6dfe04ee1cce65c9) (3102549 digits) (err = 0.3906) (time = 1:19:20)
Note Err value and also note Genefer Mark.
Version 3.1.2.7
658332^524288+1 Time: 824 us/mul. Err: 0.5000 3050541 digits
360204^4194304+1 Time: 5.38 ms/mul. Err: 0.5000 23305854 digits
Genefer Mark = 74.
Version 3.2.0.0
658332^524288+1 Time: 551 us/mul. Err: 0.2266 3050541 digits
360204^4194304+1 Time: 3.91 ms/mul. Err: 0.1953 23305854 digits
Genefer Mark = 102.
Version 3.2.2.0
Generalized Fermat Number Bench
658332^524288+1 Time: 426 us/mul. Err: 0.1875 3050541 digits
360204^4194304+1 Time: 3.42 ms/mul. Err: 0.1719 23305854 digits
Genefer Mark = 116.
____________
My stats | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I started BOINC with a GFN short and copied the Genefer.exe from a slot directory.
I get the following when I try to run from the command line: C:\Users\Downloads\genefer-3.2.2>primegrid_genefer_3_2_2_0_3.01_windows_intelx86__atiGFN.exe -l
This version of C:\Users\KarpinFamily\Downloads\genefer-3.2.2\primegrid_genefer_
3_2_2_0_3.01_windows_intelx86__atiGFN.exe is not compatible with the version of
Windows you're running. Check your computer's system information to see whether
you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact
the software publisher. I am running Windows 7 64 bit
Is this application 32 bit? How do I run it from the command line in 32 bit mode? Where can I download the 64 bit version?
When I try to run the exe from the command line I also get a comment box titled "Unsupported 16-Bit Application". Doubt that. | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I started BOINC with a GFN short and copied the Genefer.exe from a slot directory.
I get the following when I try to run from the command line:C:\Users\Downloads\genefer-3.2.2>primegrid_genefer_3_2_2_0_3.01_windows_intelx86__atiGFN.exe -l
This version of C:\Users\KarpinFamily\Downloads\genefer-3.2.2\primegrid_genefer_
3_2_2_0_3.01_windows_intelx86__atiGFN.exe is not compatible with the version of
Windows you're running. Check your computer's system information to see whether
you need a x86 (32-bit) or x64 (64-bit) version of the program, and then contact
the software publisher. I am running Windows 7 64 bit
Is this application 32 bit? How do I run it from the command line in 32 bit mode? Where can I download the 64 bit version?
When I try to run the exe from the command line I also get a comment box titled "Unsupported 16-Bit Application". Doubt that.
Hmm. My fault. You have to copy the exe from ../BOINC/projects/www.primegrid.com/
Hint might have been the file size. b limit test in progress. | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
>primegrid_genefer_3_2_2_0_3.01_windows_intelx86__atiGFN.exe -l
geneferocl 3.2.2 (Windows/OpenCL/32-bit)
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: primegrid_genefer_3_2_2_0_3.01_windows_intelx86__atiGFN.exe -l
Priority change succeeded.
Running on platform 'AMD Accelerated Parallel Processing', device 'Tahiti', vers
ion 'OpenCL 1.2 AMD-APP (1445.5)' and driver '1445.5 (VM)'.
Generalized Fermat Number b Limits
Running limits test for transform implementation "OCL"
The upper bound m = 8192, b = 2915000, Err = 0.2969
Starting b = 3340000, Err b = 2920000, Err = 0.3281, 5 Err b = 0
The upper bound m = 16384, b = 2390000, Err = 0.2969
Starting b = 2720000, Err b = 2395000, Err = 0.3125, 5 Err b = 0
The upper bound m = 32768, b = 1965000, Err = 0.2969
Starting b = 2200000, Err b = 1970000, Err = 0.3438, 5 Err b = 0
The upper bound m = 65536, b = 1590000, Err = 0.2969
Starting b = 1790000, Err b = 1595000, Err = 0.3125, 5 Err b = 0
The upper bound m = 131072, b = 1295000, Err = 0.2969
Starting b = 1450000, Err b = 1300000, Err = 0.3125, 5 Err b = 0
The upper bound m = 262144, b = 1085000, Err = 0.2813
Starting b = 1180000, Err b = 1090000, Err = 0.3125, 5 Err b = 0
The upper bound m = 524288, b = 895000, Err = 0.2813
Starting b = 960000, Err b = 900000, Err = 0.3125, 5 Err b = 0
The upper bound m = 1048576, b = 750000, Err = 0.3086
Starting b = 780000, Err b = 755000, Err = 0.3125, 5 Err b = 0
The upper bound m = 2097152, b = 600000, Err = 0.2813
Starting b = 630000, Err b = 605000, Err = 0.3125, 5 Err b = 0
The upper bound m = 4194304, b = 505000, Err = 0.2896
Starting b = 510000, Err b = 0, Err = 0.0000, 5 Err b = 0
The upper bound m = 8388608, b = 405000, Err = 0.3125
Starting b = 410000, Err b = 0, Err = 0.0000, 5 Err b = 0
So no change vs 3.2.1 for HD7970 GPU. | |
|
|
As you can see there has been a significant improvement in the OpenCL code (and also in the CPU version for some CPUs). The vast majority of the credit for this goes to Yves and his rediscovery and implementation of the Z-transform algorithm.
Cheers
- Iain
Version 3.1.2.7
658332^524288+1 Time: 824 us/mul. Err: 0.5000 3050541 digits
360204^4194304+1 Time: 5.38 ms/mul. Err: 0.5000 23305854 digits
Genefer Mark = 74.
Version 3.2.0.0
658332^524288+1 Time: 551 us/mul. Err: 0.2266 3050541 digits
360204^4194304+1 Time: 3.91 ms/mul. Err: 0.1953 23305854 digits
Genefer Mark = 102.
Version 3.2.2.0
Generalized Fermat Number Bench
658332^524288+1 Time: 426 us/mul. Err: 0.1875 3050541 digits
360204^4194304+1 Time: 3.42 ms/mul. Err: 0.1719 23305854 digits
Genefer Mark = 116.
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Excellent news about the FMA3 support for CPUs, thanks everyone. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,521,622 RAC: 5,385

|
I'm currious not only about time but also Err value.
The number of operations is smaller with the new transform. That's why error is smaller. The number of flops per primality test has decreased. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
We've improved the error handling a bit in most of the 3.2.2 apps and released 3.2.3 as BOINC 3.03.
Note that the 3.2.2/3.01 Windows CUDA and Windows CPU apps were already working as intended and have not been updated.
____________
My lucky number is 75898524288+1 | |
|
|
Does the new Genefer 3.2.2 CPU app indicate whether or not its using the AVX or the FMA transform? I do have a Haswell CPU and according to the stderr.txt file from the Genefer CPU app I'm running it picked the AVX transform. I'm running it on Windows 8.1 64 bit. Does that mean the app is using AVX instructions and FMA or it chose AVX alone? I'm a bit confused about this workunit. | |
|
|
One way to soften the impact of the "cpu hog" effect is to turn hyperthreading ON. GeneferOCL still wants 100% of a "core", but you've got twice as many "cores" to play with. There's no apparent impact on the overall runtime of GFN-short units, at least... the GPU is the deciding factor there; I've run a couple as a test using the newest stock app. This is on a 2600K, Ubuntu 14.04, GTX770.
Of course the down-side is that if you're running LLR on the other cores, they will take roughly twice as long to run (depending on how many you've got going). But if you're cpu-sieving, you should have HT ON anyway.
G | |
|
|
I don't know how advanced some of the volunteer programmers are here at Primegrid (thank you all) but, I came across some very advanced optimization info and testing software from a guy named Agner Fog that 'maybe' of help for continuing to make GFN app (and other Primegrid apps) better, faster, more efficient. Hopefully there's something helpful there.
http://agner.org/optimize/
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Yes it does. If it says it's using AVX, it's using AVX.
Please let us know about any irregularities.
____________
My lucky number is 75898524288+1 | |
|
|
Yes it does. If it says it's using AVX, it's using AVX.
Please let us know about any irregularities.
It seems to be running just fine, but shouldn't the program have chosen FMA instead of AVX or am I just confused? I don't have any benchmarks to go off of since its been a while since I've run an AVX CPU Genefer app. | |
|
|
Quick calculation of older vs newer on my system with the 750TI. The processor is a 1st gen i7 2.8G.
Last couple of WR I ran went over 13 days, about 1,130,000 seconds. The current one running is over 6% and I hand calculate it will take over 11 days, or about 1,002,000 seconds. About an 11% speed increase.
Does this fit in the expected speed increase on the NVidia cards? If so, one should now be able to run them on the GTX 640 and be able to get them in on time, instead of about a 2 day overage.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Yes it does. If it says it's using AVX, it's using AVX.
Please let us know about any irregularities.
It seems to be running just fine, but shouldn't the program have chosen FMA instead of AVX or am I just confused? I don't have any benchmarks to go off of since its been a while since I've run an AVX CPU Genefer app.
Yes, it should have chosen FMA, at least in most cases. The program automatically chooses whichever transform is fastest. However, what sometimes can happen is that the cpu is busy when testing the FMA transform and not so busy when testing the AVX transform, so it might think AVX is faster and use that instead.
Is it consistently choosing AVX, or just that one time?
This may be an area where we might be able to improve on the program going forward.
____________
My lucky number is 75898524288+1 | |
|
|
Thanks for the suggestions and ideas. It looks like the program chose the AVX transform for the first workunit I was crunching, so I didn't want to but I aborted it and paused all of my CPU work for BOINC, then downloaded another couple of Genefer CPU workunits.
This time the new workunit did choose FMA instead. It looks like if the CPU is busy it may not benchmark correctly. But I would defer to the program's choice anyway, since FMA is not necessarily always faster than AVX for every workunit. | |
|
tng Send message
Joined: 29 Aug 10 Posts: 499 ID: 66603 Credit: 50,799,389,667 RAC: 31,522,489
                                                    
|
Looks like a nice performance improvement for my Titan on WR.
____________
| |
|
|
Ya my 650 ti boost did the last WR in 259 hrs. and I'm estimating the new one will be done in about 205 hours. That a nice improvement. | |
|
|
My GTX 660 ti in 9:49 h ( Genefer 3.03 ) ( GFN - Short ) is 5:15 h faster.
Not WR -Short taks, dont know why ? But true
perhaps
with newest driver - Update NVIDIA GeForce GTX 660 Ti (2048MB) driver: 34052
Whow..what a magic speed !!! Incredible but True !
Special thx to all Volunteers ;)
Best Regards
Tom
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
We discovered that the CPU version of Genefer was trying to run the FMA3 transform on AMD Bulldozer CPUs -- which have FMA4 instructions but not FMA3.
That's been fixed in version 3.2.4 (BOINC v. 3.04), which is now live. Only the CPU Genefer apps have changed.
Please let me know if any unexpected happens.
____________
My lucky number is 75898524288+1 | |
|
|
Is there a difference between the version 3.01 OpenCL NVidia Genefer WR program and the version 3.03 OpenCL Genefer WR? I noticed when I downloaded the workunit on the 28th of July it was 3.01, since then the program itself has been updated to 3.03. If I understand correctly the underlying version is 3.2.2 and that hasn't changed. Is that just some minor improvements and adjustments or something more?
Also, thanks for all the hard work improving these programs. Theres definitely a big speedup in the processing... from 259 hours to 216 hours. That's a huge improvement! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Is there a difference between the version 3.01 OpenCL NVidia Genefer WR program and the version 3.03 OpenCL Genefer WR? I noticed when I downloaded the workunit on the 28th of July it was 3.01, since then the program itself has been updated to 3.03. If I understand correctly the underlying version is 3.2.2 and that hasn't changed. Is that just some minor improvements and adjustments or something more?
Also, thanks for all the hard work improving these programs. Theres definitely a big speedup in the processing... from 259 hours to 216 hours. That's a huge improvement!
3.01 was version 3.2.2, and 3.03 is version 3.2.3. The difference is improved error handling.
____________
My lucky number is 75898524288+1 | |
|
|
primegrid_genefer_3_2_3_0_3.03_x86_64-pc-linux-gnu__OCLcudaGFN downloaded for me fine and appears to be running OK on my Ubuntu 14.04 (64-bit) box, with driver 331.79, GTX770. ETC per stderr.txt is 7:32:35, which is basically the same as several verified results I reported a few weeks ago with the prior version.
--Gary
EDIT: obviously this comment applies to the GPU app. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,210,027,308 RAC: 1,138,821
                        
|
primegrid_genefer_3_2_3_0_3.03_x86_64-pc-linux-gnu__OCLcudaGFN is also working with Linux Mint LMDE (another Debian derivative) using 64-bit Nvidia drivers from the distro's repo.
One result validated, and one failed with maxERR exceeded. I haven't been watching the GPU internal temperature lately; it's now at 71 C, previously 67 C. Cleaning the dust off the case grills didn't help, but manually setting the GPU fan speed to 100% drops the GPU temperature to 61 C. Looks like I can't trust the factory default GPU fan speed setting to keep the system cool enough during GFN. | |
|
|
One result validated, and one failed with maxERR exceeded. I haven't been watching the GPU internal temperature lately; it's now at 71 C, previously 67 C. Cleaning the dust off the case grills didn't help, but manually setting the GPU fan speed to 100% drops the GPU temperature to 61 C. Looks like I can't trust the factory default GPU fan speed setting to keep the system cool enough during GFN.
If you find that the results still fail with maxERR exceeded, try reducing the memory clock speed incrementally by 100 mhz until you have success. I had to do this for GFN-World Record tasks on a GTX TITAN Black SC and when I found a memory clock speed that worked, I then overclocked the gpu core clock by a bit, so overall no perceived loss in processing speed.
I am using a closed-loop all-in-one liquid cooler on this card and temps don't exceed 47° C when running GFN-WR tasks. These liquid a-i-o coolers do not cool the card's VRMs as well as the reference card's blower-type cooler, but the VRM's extra heat doesn't seem to affect this particular card. | |
|
|
The CPU app is really fast now, at least combined with an FMA3 CPU. I ran 3 WUs with HT off, leaving one core free, and they all completed in well under 33 hours (2 of them have validated). Isn't that now on a par with a 570 and beating a 670? i.e. the runtime for one leading-edge Genefer WU on a 570 is over 10 hours, and a bit slower on a 670. That's progress. I wonder when the time will come for WR tasks to be available for the CPU? A big advisory on the PG preferences page would certainly be in order, basically saying "Don't be silly with these, and even if you've got the fastest CPU money can buy, probably don't use all cores.". Would it be any "worse" than SoB? People are trusted to run those and the timeout is two months, resulting in threads like the "Side bets on wingmen" one.
Would it be possible, when a WU is sent out, IF there is a CPU asking for work, to send one duplicate to a GPU and the other to a CPU? i.e., never send the initial two both to CPU, and then create the potential for a 2-month wait before either task even gets re-sent. Just thinking aloud, although I really would like to test a WR task on my CPU. I think I've asked before, but please put me at the top of the list for a trial of GFN-WR on the CPU. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Would it be possible, when a WU is sent out, IF there is a CPU asking for work, to send one duplicate to a GPU and the other to a CPU? i.e., never send the initial two both to CPU, and then create the potential for a 2-month wait before either task even gets re-sent. Just thinking aloud, although I really would like to test a WR task on my CPU. I think I've asked before, but please put me at the top of the list for a trial of GFN-WR on the CPU.
Not easily.
____________
My lucky number is 75898524288+1 | |
|
|
What was the last important change you implemented to PG which was "easy"?
Yes, I am half-joking, my main reason for posting earlier was to say "Wow, the CPU Generfer app is fast, it used to be basically a GPU-or-almost-nothing thing.".
But now it isn't, so I was just looking ahead. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
What was the last important change you implemented to PG which was "easy"?
Yes, I am half-joking, my main reason for posting earlier was to say "Wow, the CPU Generfer app is fast, it used to be basically a GPU-or-almost-nothing thing.".
But now it isn't, so I was just looking ahead.
This is the "forget about it" kind of difficult.
____________
My lucky number is 75898524288+1 | |
|
|
Fair enough. I therefore revert to my original point - the CPU GFN app is now very fast, and I think more people should consider using it. I know the TDP of CPUs (especially when overslocked) is a highly misleading figure, but there are applications which can show you how much power your CPU is consuming; such as Asus' AI suite. When running 3 GFN WUs, my 4770K@4.3 fluctuates between 95 and 105 watts. So not only will it beat a 670 to 3 completed WUs, it'll do it using something like 60% of the power. | |
|
Message boards :
Generalized Fermat Prime Search :
Genefer 3.2.2 has been released |