Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Project Staging Area :
Genefer or LLR or PFGW for GFN high b?
Author |
Message |
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13526 ID: 53948 Credit: 245,062,730 RAC: 273,562
                          
|
With all of the GFN ports either well into x87 territory, or close to it, it's time to ask whether it's better to stick with Genefer, or go with either LLR or PFGW. Genefer is much faster, as long as it's on a level playing ground with the other two programs. If they're all using FMA3, Genefer wins hands down.
But the algorithms are very different, and while LLR and PFGW can use the advanced SIMD instructions at any b level, Genefer can't and has to use the much slower x87 instructions beyond a certain b.
Here's some tests, all on the number 8624742^32768+1:
Genefer: (17:37)
C:\PRPNet\prpclient-5.3.1-windows\prpclient-1>genefer64 -q "8624742^32768+1"
genefer 3.2.5 (Windows/CPU/64-bit)
Supported transform implementations: fma3 avx-intel sse4 sse2 default x87
Copyright 2001-2014, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: genefer64 -q 8624742^32768+1
Priority change succeeded.
Testing 8624742^32768+1...
Using FMA3 transform
The checkpoint doesn't match current test: 8624742^32768+1 != 919024^524288+1.
Current test will be restarted
Starting initialization...
Initialization complete (0.023 seconds).
Testing 8624742^32768+1... 754975 steps to go
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using FMA3; switching to AVX (Intel).
Testing 8624742^32768+1...
Using AVX (Intel) transform
Resuming 8624742^32768+1 from a checkpoint (754975 iterations left)
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using AVX (Intel); switching to SSE4.
Testing 8624742^32768+1...
Using SSE4 transform
Resuming 8624742^32768+1 from a checkpoint (754975 iterations left)
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using SSE4; switching to SSE2.
Testing 8624742^32768+1...
Using SSE2 transform
Resuming 8624742^32768+1 from a checkpoint (754975 iterations left)
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using SSE2; switching to Default.
Testing 8624742^32768+1...
Using Default transform
Resuming 8624742^32768+1 from a checkpoint (754975 iterations left)
maxErr exceeded for 8624742^32768+1, 0.4688 > 0.4500
maxErr exceeded while using Default; switching to x87 (80-bit).
Testing 8624742^32768+1...
Using x87 (80-bit) transform
Resuming 8624742^32768+1 from a checkpoint (754975 iterations left)
Estimated time remaining for 8624742^32768+1 is 0:17:22
Testing 8624742^32768+1... 753664 steps to go (0:17:23 remaining)
Successful computation progress with x87 (80-bit); switching back to FMA3.
Testing 8624742^32768+1...
Using FMA3 transform
Resuming 8624742^32768+1 from a checkpoint (753663 iterations left)
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using FMA3; switching to AVX (Intel).
Too many errors with FMA3; Calculation will proceed using only more accurate tra
nsforms.
Testing 8624742^32768+1...
Using AVX (Intel) transform
Resuming 8624742^32768+1 from a checkpoint (753663 iterations left)
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using AVX (Intel); switching to SSE4.
Too many errors with AVX (Intel); Calculation will proceed using only more accur
ate transforms.
Testing 8624742^32768+1...
Using SSE4 transform
Resuming 8624742^32768+1 from a checkpoint (753663 iterations left)
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using SSE4; switching to SSE2.
Too many errors with SSE4; Calculation will proceed using only more accurate tra
nsforms.
Testing 8624742^32768+1...
Using SSE2 transform
Resuming 8624742^32768+1 from a checkpoint (753663 iterations left)
maxErr exceeded for 8624742^32768+1, 0.5000 > 0.4500
maxErr exceeded while using SSE2; switching to Default.
Too many errors with SSE2; Calculation will proceed using only more accurate tra
nsforms.
Testing 8624742^32768+1...
Using Default transform
Resuming 8624742^32768+1 from a checkpoint (753663 iterations left)
maxErr exceeded for 8624742^32768+1, 1.0000 > 0.4500
maxErr exceeded while using Default; switching to x87 (80-bit).
Too many errors with Default; Calculation will proceed using only more accurate
transforms.
Testing 8624742^32768+1...
Using x87 (80-bit) transform
Resuming 8624742^32768+1 from a checkpoint (753663 iterations left)
Estimated time remaining for 8624742^32768+1 is 0:17:15
8624742^32768+1 is a probable composite. (RES=252e347bec60a92c) (227271 digits)
(err = 0.0098) (time = 0:17:37) 11:32:02
LLR: (13:13)
C:\PRPNet\prpclient-5.3.1-windows\prpclient-2>llr64 -d -q"8624742^32768+1"
Base factorized as : 2*3*7*173*1187
Base prime factor(s) taken : 173, 1187
Starting N-1 prime test of 8624742^32768+1
Using generic reduction FMA3 FFT length 80K, Pass1=320, Pass2=256, a = 3
8624742^32768+1 is not prime. RES64: 489C0D1BF2A9F72C. OLD64: D9D42753D7FDE581
Time : 793.501 sec.
PFGW (in PRP mode, without -tm): (13:09)
C:\PRPNet\prpclient-5.3.1-windows\prpclient-3>pfgw64 -V -q"8624742^32768+1
PFGW Version 3.7.8.64BIT.20141125.Win_Dev [GWNUM 28.5]
Special modular reduction using generic reduction FMA3 FFT length 80K, Pass1=320
, Pass2=256 on 8624742^32768+1
8624742^32768+1 is composite: RES64: [489C0D1BF2A9F72C] (789.1908s+0.0058s)
At least on my computer (when running just one core), Genefer is the slowest of the three, and LLR and PFGW are similar in speed and significantly faster than Genefer.
LLR/PFGW will have less of an advantage on AVX CPUs (compared to FMA), and even less on SSE3 CPUs. It may be somewhat faster on AVX, and is likely faster on non-AVX CPUs. Anyone want to do some tests and see what it's like on their computer?
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2183 ID: 1178 Credit: 9,105,038,602 RAC: 12,910,096
                                      
|
Working on it with a T2450 (32-bit CoreDuo)...LLR and PFGW will take a little while.
Genefer x87 time is 1:02:14.
Edit: LLR has run only about 10% in one-half hour (i.e., 5 hours estimated).
Edit #2: PFGW's 32-bit version gives me an error and will not crunch.
Conclusion, on SSE2 capable machine Genefer is still MUCH faster. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2183 ID: 1178 Credit: 9,105,038,602 RAC: 12,910,096
                                      
|
Trying on an i7-860 now (SSE 3, 3S, 4.1 & 4.2 capable).
Genefer x87 time is 23:04.
LLR64 time is 40:49.
PFGW64 time is 41.01.
Genefer still faster here. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2183 ID: 1178 Credit: 9,105,038,602 RAC: 12,910,096
                                      
|
Trying on i7-2670QM (2.2 GHz) with AVX (but no FMA3):
Genefer x87 time is 33:04.
LLR64 time is 38:56. | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1137 ID: 120786 Credit: 267,685,510 RAC: 4,344
                    
|
AMD X6 1100T cpu @3.5GHz:
>genefer64.exe -q "8624742^32768+1"
8624742^32768+1 is a probable composite. (RES=252e347bec60a92c) (227271 digits)
(err = 0.0103) (time = 0:32:18) 07:08:58
i.e. 1938 seconds
>llr64.exe -d -q"8624742^32768+1"
8624742^32768+1 is not prime. RES64: 489C0D1BF2A9F72C. OLD64: D9D42753D7FDE581
Time : 2633.996 sec.
>pfgw64.exe -V -q"8624742^32768+1"
8624742^32768+1 is composite: RES64: [489C0D1BF2A9F72C] (2668.1334s+0.0365s)
Conclusion is that genefer x87 is still faster than llr and pfgw for AMD CPUs.
Really AMD CPUs are now lower rung performers at Primegrid and are more suited to sieving than finding primes. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13526 ID: 53948 Credit: 245,062,730 RAC: 273,562
                          
|
Conclusion is that genefer x87 is still faster than llr and pfgw for AMD CPUs.
Really AMD CPUs are now lower rung performers at Primegrid and are more suited to sieving than finding primes.
It looks like Genefer is faster for everything EXCEPT FMA3. That's hardly surprising.
By the way, high-b GFN tasks could be one place where AMD CPUs are competitive. Without AVX, I see no obvious reason for AMD to run the x87 transforms slower than Intel. Running the GFN ports that only can use x87 (which will probably be everything below n=21 by the end of 2015) on AMD may bring new life for AMD here.
____________
My lucky number is 75898524288+1 | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 916 ID: 107307 Credit: 974,494,172 RAC: 310
                    
|
i7-2600 3.4GHz HT off:
LLR 16:57
PFGW 16:57
Genefer 16:41
i7-3770 3.4GHz HT off:
LLR 16:36
PFGW 16:35
Genefer 16:41 | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1137 ID: 120786 Credit: 267,685,510 RAC: 4,344
                    
|
Conclusion is that genefer x87 is still faster than llr and pfgw for AMD CPUs.
Really AMD CPUs are now lower rung performers at Primegrid and are more suited to sieving than finding primes.
It looks like Genefer is faster for everything EXCEPT FMA3. That's hardly surprising.
By the way, high-b GFN tasks could be one place where AMD CPUs are competitive. Without AVX, I see no obvious reason for AMD to run the x87 transforms slower than Intel. Running the GFN ports that only can use x87 (which will probably be everything below n=21 by the end of 2015) on AMD may bring new life for AMD here.
I guess your right. I was being too pessimistic. I do have 6 cores to your 4 after all, so x87 performance of my CPU is really 81% of yours (6/4 * 1057 sec/1938 sec). | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 773 ID: 55391 Credit: 701,311,919 RAC: 265,125
                      
|
It's nice to see genefer 3.2.6-dev turning in the same residue and shaving 6 seconds off the time of genefer 3.2.5
core i7-920 @ 2.8 GHz
LLR 3.2.5 23:35
LLR 3.2.6-dev 23:29 and its estimate of 1409 seconds CPU time is right on
time ./genefer_linux64 -x x87 -q "8624742^32768+1"
genefer 3.2.6-dev (Linux/CPU/64-bit)
Supported transform implementations: sse4 sse2 default x87
Copyright 2001-2015, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2015, Iain Bethune, Michael Goetz, Ronald Schneider
Command line: ./genefer_linux64 -x x87 -q 8624742^32768+1
Priority change succeeded.
Testing 8624742^32768+1...
Using x87 (80-bit) transform
Starting initialization...
Initialization complete (0.030 seconds).
Estimated time remaining for 8624742^32768+1 is 0:23:29
8624742^32768+1 is a probable composite. (RES=252e347bec60a92c) (227271 digits) (err = 0.0098) (time = 0:23:33) 23:50:38
real 23m33.179s
user 23m29.092s
sys 0m4.008s
| |
|
Message boards :
Project Staging Area :
Genefer or LLR or PFGW for GFN high b? |