From Geoff Reynolds, the creator of gcwsieve. Versions can be downloaded here.
x86-64 Linux: gcwsieve-1.1.8 (~2X faster than 32 bit) STABLE
x86-64 Linux: gcwsieve-1.2.2 (~2X faster than 32 bit) TESTING
x86 Linux: gcwsieve-1.1.8 STABLE
x86 Linux: gcwsieve-1.2.2 TESTING
x86-64 Windows: gcwsieve-1.1.8 (~2X faster than 32 bit) STABLE
x86-64 Windows: gcwsieve-1.2.2 (~2X faster than 32 bit) TESTING
x86 Windows: gcwsieve-1.1.8 STABLE
x86 Windows: gcwsieve-1.2.2 TESTING
x86-64 Mac: gcwsieve-1.1.4 TESTING
version 1.2.2: (28 Jan 2008) TESTING
Has an improved powmod function. This doesn't noticeably affect PrimeGrid sieving, but should make some difference when there are only a small number of terms in the sieve.
(Multithreading still doesn't work in the Windows version).
version 1.2.0: (4 Jan 2008) 32 & 64 bit Linux only TESTING
Has a new switch: `-t --threads N' will spread the work of the sieve over N cores. This is no faster than just running N copies of the program however. See README-threads for differences from the 1.1.x versions.
There is no Windows version yet because fork() is missing from the mingw library, but it might be possible to compile for Windows using the Cygwin compiler.
It should compile for MacOS X without modification.
version 1.1.8: (4 Jan 2008)
Has new switches allow sieving a subset of the terms in an ABC file:
-b --base B Restrict sieve to base B terms.
-C --cullen Restrict sieve to Cullen terms.
-W --woodall Restrict sieve to Woodall terms.
version 1.1.7: (10 Dec 2007)
The x86-64 Windows executable is built to work around a potential compiler bug. I don't think the bug actually affected gcwsieve, this is just a precaution.
version 1.1.6: (6 Dec 2007)
Has a new switch: `gcwsieve -An ...' will set affinity to CPU number n.
The source also has a new option: add -DSMALL_P to the CPPFLAGS to get an executable which can sieve n*b^n+/-1 with factors in the range n/2 < p < 2^31 instead of the usual range n < p < 2^51 (or 2^62).
version 1.1.5: (19 Oct 2007)
Fixes a bug, introduced in version 1.1.0, that caused Woodall factors read with the -k switch to be incorrectly rejected.
version 1.1.3 & version 1.0.24: (27 Sep 2007) 32 & 64 bit Linux only
A minor bugfix for Linux builds: The HUP signal is now correctly ignored when run as `nohup gcwsieve ...'.
version 1.1.2: (26 Sep 2007) 64 bit Windows only
A 64-bit Windows executable for gcwsieve has been built.
See this thread if you're interested in testing. http://www.primegrid.com/forum_thread.php?id=732
version 1.1.0: (19 Sep 2007)
Sieving Cullen and Woodall together is now possible. There is little if any performance to be gained from this, it is mainly for convenience.
This feature required extensive changes to the source code, so some testing would probably be a good idea.
The input ABC file no longer needs to be sorted, the terms may appear in any order. The output file will be sorted with all Woodall terms followed by all Cullen terms (each in order of increasing n).
version 1.0.23: (19 Sep 2007)
I have added two new switches:
-n --nmin N0
-N --nmax N1
These restrict sieving to terms n*b^n+/-1 with n in the range N0 <= n <= N1. (Defaults are N0=0, N1=2^32-1). Terms outside this range will be discarded as the input file is read, and so will not be written to the output file either.
N0 and N1 may be specified using `e' notation, e.g. `-n 1e6' instead of `-n 1000000'.
version 1.0.20: (1 Sep 2007) 64 bit only
The x86-64 executable now has seperate code paths optimised for Intel (Core 2) and AMD (Athlon 64) CPUs. The Athlon 64 code should be about 15% faster than previous versions.
The appropriate code path should be selected automatically, but can be overridden with the --amd or --intel command-line switches.
version 1.0.18: (22 Aug 2007)
This version has two minor bugfixes:
Test for Extended 3DNow instead of just 3DNow to determine whether the prefetchnta instruction is available on AMD CPUs. This affected K6-2 CPUs.
Use the best benchmark time instead of the average benchmark time when deciding whether or not to use software prefetching. The average times could be inaccurate when there were other processes running on the same CPU.
There are also some changes to the status line display: The percentage of CPU usage (cpu_time/elapsed_time) is now reported, the status line alternates between these two sets of stats:
p=1071802477019, 249775 p/sec, 16 factors, 100.0% cpu, 2953 sec/factor
p=1071817422251, 249836 p/sec, 16 factors, 16.9% done, ETA 24 Aug 14:23
And there are two new switches change the information displayed on the status line:
Reports primes/sec (the number of prime factors tested per second) instead of p/sec (the increase in p per second).
Reports p/sec, primes/sec, and sec/factor using elapsed time instead of CPU time.
version 1.0.17: (18 Aug 2007)
Version 1.0.17 should properly detect the availability of prefetch instructions on AMD machines with 3DNow! but without SSE. (Some earlier Athlons).
A more compact ABC file format will now be written by default. The old format will still be written if the --multisieve switch is given. Either format can be used for the input file:
ABC $a*$b^$a$c // CW Sieved to: 100000000000 with gcwsieve
2000055 2 +1
2000110 2 +1
2000116 2 +1
2000128 2 +1
ABC $a*2^$a+1 // CW Sieved to: 100000000000 with gcwsieve
version 1.0.16: (12 Aug 2007)
Version 1.0.16 has support for software prefetching, using the prefetchnta instruction available for SSE machines, or GCC's __builtin_prefetch() function for non x86/x86-64 builds.
Prefetching should result in a speedup in the case that the sieve is too large to fit in L2 cache (each sieve term takes 8 bytes), but on some machines it results in a slowdown instead, probably because it interferes with the automatic hardware prefetcher.
So before sieving starts some test runs are made with and without prefetch, and the faster method selected. Use the --verbose switch to see whether prefetch was selected. To override the automatic selection, use these new switches:
--prefetch: Force use of prefetch.
--no-prefetch: Prevent use of prefetch.
Here are some times for a 216000 term sieve (Primegrid Cullen 10M) at p=1000e9:
P3 450MHz, 512Kb L2: 1167 p/sec 1502 p/sec +29%
P3 600MHz, 256Kb L2: 1462 p/sec 1993 p/sec +36%
P4 2.9GHz, 512Kb L2: 12224 p/sec 11711 p/sec -4%
version 1.0.15: (8 Aug 2007)
The main loop for x86 machines without SSE2 is now 100% assembly. It runs about 30% faster on my P3.
version 1.0.14: (5 Aug 2007)
The main loop for SSE2 and x86-64 machines is now 100% assembly instead of a mixture of C and inline assembly, and tries to read memory in a more predictable way.
The 32-bit executable runs about 15% faster on my P4, and the 64-bit executable runs about 60% faster on my C2D. (64-bit is now almost twice as fast as 32-bit on the C2D).
version 1.0.13: (2 Aug 2007)
This version fixes a memory allocation bug that could cause the program to abort at the end of a sieve range, or a memory leak if there were multiple ranges queued up in the work file.
No work needs to be repeated, as all results for the range would have been written to file before the abort. The affected builds were:
Windows: versions 1.0.0 - 1.0.10.
OS X: versions 1.0.0 - 1.0.12.
The bug didn't affect the Linux builds.