Author |
Message |
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
From Geoff Reynolds, the creator of sr2sieve. Versions can be downloaded here.
x86-64 Linux: sr2sieve-1.6.18 (~1.5X faster than 32 bit) STABLE
x86-64 Linux: sr2sieve-1.7.12 (~1.5X faster than 32 bit) TESTING
x86 Linux: sr2sieve-1.6.18 STABLE
x86 Linux: sr2sieve-1.7.12 TESTING
x86-64 Windows: sr2sieve-1.6.18 (~1.5X faster than 32 bit) STABLE
x86-64 Windows: sr2sieve-1.7.12 (~1.5X faster than 32 bit) TESTING
x86 Windows: sr2sieve-1.6.18 STABLE
x86 Windows: sr2sieve-1.7.12 TESTING
x86-64 Mac: sr2sieve-1.6.7 TESTING
Version 1.7.12: (24 July 2008) TESTING
This version has a new switch `-X --skip-cubic' which causes cubic and higher power residue tests to be skipped. This makes the algorithm more like that used by srsieve 0.6.x, and so might be faster and use less memory when there are very many sequences in the sieve or when the n-range is too short.
If you are currently sieving with srsieve 0.6.x because you have found it to be faster than sr2sieve, please try out sr2sieve -X and let me know whether it is faster.
Version 1.7.11: (12 June 2008) TESTING
This version fixes a bug reported by Chuck Lasher that could cause junk characters to be printed at the end of some screen messages . It showed up when a long file name was specified as argument to the -C switch in sr1sieve.
Version 1.7.10: (05 April 2008) TESTING
This version fixes a problem introduced in verisons 1.7.0/1.3.0 that caused the elapsed time statistic reported in the checkpoint file and at the end of a range to be inaccurate.
Version 1.7.6: (19 Jan 2008) TESTING
This version adds the optimisation from 1.7.5 to non-asm and PPC64 builds. It also has a tweak to the 32-bit SSE2 build that improves the Pentium 4 times a little more.
Note that the changes in 1.7.5 might cause a slowdown if there are less than 4 sequences in the sieve, and 8-16 sequences might be needed before the gains become noticeable.
Version 1.7.5: (18 Jan 2008) TESTING
This version has vectorised powmod functions for x86 and x86-64.
I don't know why I didn't make this optimisation earlier: When p=1 (mod r) the r-power residue test involves computing the power (-k/c)^((p-1)/r) mod p for each sequence k*b^b+c. Since the exponent (p-1)/r is fixed, this can be done efficiently by creating a vector of values -k/c mod p and running the powmod algorithm on the whole vector at once.
The gains from this optimisation are mainly due to implementing the vector powmod function with fully pipelined code, which explains why the Pentium 4 benefits most:
Sob.dat 6k SoB.dat 17k riesel.dat 66k sr5data.txt 220k
---------- ----------- -------------- ----------------
1.6.18 p=1e15: 233 kp/s 127 kp/s 67.7 kp/s 35.0 kp/s
1.7.5 p=1e15: 238 kp/s 133 kp/s 75.4 kp/s 43.4 kp/s
P3 600MHz: +2% +5% +11% +24%
1.6.18 p=1e15: 2726 kp/s 1481 kp/s 730 kp/s 343 kp/s
1.7.5 p=1e15: 2825 kp/s 1597 kp/s 900 kp/s 494 kp/s
C2D 2.66GHz +4% +8% +23% +44%
1.6.18 p=1e15: 975 kp/s 527 kp/s 257 kp/s 110 kp/s
1.7.5 p=1e15: 1047 kp/s 604 kp/s 345 kp/s 179 kp/s
P4 2.9GHz +7% +15% +34% +63%
Version 1.7.1: (3 Jan 2008) Linux only TESTING
Fixes a bug in 1.7.0 that could cause factors at the very end of a completed range to be lost.
Version 1.7.0: (2 Jan 2008) Linux only TESTING
This version has a new switch: `-t --threads N' starts N child threads. There are still some problems to work out, so I recommend using the latest 1.6.x version for important work. See the README-threads file for a list of known problems.
The threading is done with fork(), so the `threads' are full unix processes with their own address space, communicating with the parent process via pipes. This method is very easy to bolt on to an existing program because there is no need for functions to be thread-safe. However it does have a lot of communication and scheduling overhead, and my implementation is surely less than perfect, so there is some performance loss from running one copy of sr2sieve with N threads compared to just running N copies of sr2sieve.
Multithreading does save memory though, because the child threads don't need to contain a copy of the Sieve of Eratosthenes which dominates the memory requirements as sieve depth increases.
There is no windows version yet because fork() and pipe() are missing from the mingw library.
Version 1.6.18: (11 Jan 2008)
This version extends the maximum number of subsequences to 2^32-1. (In practice memory will run out long before this limit is reached).
In previous versions the limit was 2^16-1, but it was not always properly enforced, which could have caused factors to be missed for input files containing much more than 1000 sequences. This problem existed since version 1.4.22. It didn't affect SoB/PSP, RieselSieve or SR5.
Version 1.6.17: (26 Dec 2007)
These versions have a `-q --quiet' switch to prevent factors being printed to the screen. They also record the version number in the log file.
Version 1.6.16: (13 Dec 2007)
In version 1.6.16 the Windows x86-64 executable is built to work around a
compiler bug that causes an access violation when sieving p > 2^51.
Version 1.6.14: (2 Dec 2007)
Fixes a segfault bug on narrow n-ranges. The bug didn't affect PrimeGrid sieving in any way.
Version 1.6.13: (7 Nov 2007)
When a factor p is found to divide a term k*b^n+c, these versions now check whether p == k*b^n+c, and if so the term is logged as a prime but not removed from the sieve and not reported as a factor.
This behaviour is compatible with srsieve, but not with NewPGen or most other sieve programs. It only ever affects terms with exponents n <= 64.
In some situations leaving a small prime term in the sieve can impede performance, as it can form exceptions to otherwise regular patterns in the remaining exponents. If in doubt, test all small terms with LLR then remove them from the sieve manually.
Version 1.6.12: (2 Nov 2007)
This version fixes two bugs in the Sobistrator compatibility mode (when using the -j switch):
1. Ranges in SoBStatus.dat are now written with the pmax= line before the pmin= line. Ranges read from SoBStatus.dat and nextrange.txt can now have the lines in either order.
2. Now when the -r switch is given with the -j switch, RieselStatus.dat is used instead of SoBStatus.dat.
Version 1.6.11: (22 Oct 2007)
This version has a new -j or --sobistrator switch will make sr2sieve behave in a similar way to JJsieve or proth_sieve, for compatibility with Sobistrator:
* Checkpoints will be written to SoBStatus.dat, including a kp/s speed measured using elapsed time.
* If no range is given on the command line, or if the range given matches the one in SoBStatus.dat, work will resume from SoBStatus.dat. If the range given on the command line is different to the one in SoBStatus.dat then the one given on the command line will be used and a warning given.
* After the current range finishes, subsequent ranges will be read from nextrange.txt.
* Factors will be written to fact.txt and duplicates to factexcl.txt (these file names can be overridden with the -f and -D switches.)
Idealy it should be possible to start a range with JJsieve and continue it with `sr2sieve -j -s', and vice versa.
Version 1.6.9: (16 Oct 2007)
Has an improved giant-step method for x86-64, it combines modular multiplication operations with hashtable lookups in one pass.
10-20% increase in speeds
Version 1.6.7: (8 Oct 2007)
Has improvements to the hashtable code. It should benefit all machines, but maybe some more than others.
Version 1.6.6: (4 Oct 2007)
The 64-bit Windows build now uses the correct hashtable optimisations, and so should hopefully run close to the speed of the Linux build. I have also added this fix to version 1.5.21.
The gen/6 mulmod method added in version 1.6.4 caused a slowdown on Core 2, so now it will only be considered for use when running on AMD. It can still be selected manually with the -B and -G switches, e.g. -Bgen/6.
Version 1.6.5: (3 Oct 2007)
This version has some minor changes to the x86-64 assembly that might make it a little faster on Windows. Previous versions had to translate from the Win64 calling convention into the Linux convention for some functions.
The changes in version 1.6.4 seem to benefit the Athlon 64, but as usual what is fast on one machine is slow on everything else, so the Core 2 may have lost some ground. If that happened you could try using the -Bgen/8 command line switch to see if that gets the speed back on the Core 2. I may have to split the 64-bit code into AMD and Intel specific paths in future versions.
Version 1.6.4: (1 Oct 2007)
New command line switches available in all builds:
-e --elapsed-time Use elapsed instead of CPU time for status line reports.
-f --factors FILE Append found factors to FILE instead of factors.txt.
-D --duplicates FILE Append duplicate factors to FILE.
-S --save TIME Write checkpoint every TIME seconds. (default 300).
The sr2sieve-amd and sr2sieve-intel executables have been merged, there is now one 32-bit sr2sieve executable with three seperate critical code paths:
Intel code path: Optimised for Pentium 2 or Pentium 3
AMD code path: Optimised for Athlon/Duron.
SSE2 code path: Optimised for Pentium 4.
The correct code path should be selected automatically but can be overridden with command-line switches:
`sr2sieve --amd' is equivalent to running the old sr2sieve-amd executable,
`sr2sieve --intel' is equivalent to running the sr2sieve-intel executable.
Both --amd and --intel will use the SSE2 code path if the hardware supports it, unless the --no-sse2 switch is also used. The `-v --verbose' switch will report which code path was selected.
The AMD code path requires the CMOV instruction set, so the Intel code path will be selected for early AMD chips (K6 and earlier). There is not a big difference between the AMD and Intel code paths yet, they were just compiled with different CFLAGS.
Also for the x86-64 version: new gen/6 methods perform 6 mulmods in parallel, which might be faster on the Athlon 64.
Version 1.6.3: (27 Sep 2007) 32 & 64 bit Linux only
A minor bugfix for Linux so that running as `nohup sr2sieve ...' works correctly.
Version 1.6.2: (26 Sep 2007) 64 bit Windows only
Some minor changes to allow building with the GCC x86_64-gnu-linux to x86_64-pc-mingw32 cross-compiler.
Version 1.6.1: (24 Sep 2007) 32 bit Windows only
This version works with both forms k*b^n+/-1 and b^n+/-k together in the same sieve.
It doesn't appear to be any slower than 1.5.x when sieving k*b^n+/-1 alone, but if anyone notices a slowdown on their own hardware, please let me know.
(There are a number of extra branches in the code, but when all sequences in the sieve have the same form the branches are predictable).
Version 1.5.21: (4 Oct 2007) 64 bit Windows only
The 64-bit Windows build now uses the correct hashtable optimisations, and so should hopefully run close to the speed of the Linux build. I have also added this fix to version 1.5.21.
The gen/6 mulmod method added in version 1.6.4 caused a slowdown on Core 2, so now it will only be considered for use when running on AMD. It can still be selected manually with the -B and -G switches, e.g. -Bgen/6.
Version 1.5.20: (27 Sep 2007) STABLE
A 64-bit Windows executable has been added to the builds.
Version 1.5.19 : (23 Sep 2007)
This version fixes a bug introduced in version 1.4.27 that caused an error message at startup if the --pmax switch was used without the --pmin switch when the sieve file contained the start of the sieve range.
Version 1.6.x will be able to sieve b^n+/-k for use on the Dual-Sierpinski problem. Thanks Phil Moore for pointing out how little change was needed to make this work. There is an experimental version 1.6.0 for those interested, hopefully in a future version the code will be integrated into the standard sr2sieve binary so that b^n+/-k and k*b^n+/-1 can be sieved together.
Version 1.5.18 : (11 Sep 2007)
This version just adds some code to reset the FPU precision before use, it shouldn't be necessary on a properly functioning system, but it doesn't take any exra time to do.
There is a small improvement to the x86-64 mulmod, about 1% faster on Core 2 CPUs. Otherwise no need to upgrade.
Version 1.5.8: (23 Jun 2007) Win64 bit only
I have fixed a corruption of the FPU stack in the sr2sieve-fpu build, but there may be other bugs remaining. Use sr2sieve-fpu for tesing only for now.
In the standard x86-64 build I have enabled gen/8 methods. There are not enough general registers to do 8 mulmods in parallel, but they may still be faster than gen/4 methods in some cases.
____________
|
|
|
|
there are new versions :) 1.7.12 for example |
|
|
|
Will using sr2sieve -X result in finding fewer factors, or is there any downside to trying it out to see if it's faster?
____________
|
|
|
geoff Volunteer developer Send message
Joined: 3 Aug 07 Posts: 99 ID: 10427 Credit: 343,437 RAC: 0
 
|
Will using sr2sieve -X result in finding fewer factors, or is there any downside to trying it out to see if it's faster?
sr2sieve -X should find exactly the same factors, but expect it to be much slower for projects like PSP or 321. I think it will only be faster when there are 1000's of sequences (1000's of k's) in the sieve, but there is no harm in trying it out.
|
|
|
|
Version 1.8.1:
New process priority behaviour is incompatiple with previous versions:
Default is not to change process priority (previous default was idle).
-zz sets lowest priority (nice 20)
-z sets low priority (nice 10)
-Z sets high priority (nice -10)
-ZZ sets highest priority (nice -20)
Changes in sr2sieve 1.8.1
Thank's Geoff.
/Lennart |
|
|
|
now testing the updates to sr2sieve:
Starting at
version 1.7.12
windows dual 1.481mpps
linux dual 1.331mpps
windows 658kpps
windows lap 431kpps
ending at
version 1.8.1
windows dual 1.606mpps
linux dual 2.723mpps
windows 630kpps
windows lap 435kpps
Is it really possible that the linux got THAT MUCH faster?? It's still at about 2.732mpps...
____________
|
|
|
|
now testing the updates to sr2sieve:
Starting at
version 1.7.12
windows dual 1.481mpps
linux dual 1.331mpps
windows 658kpps
windows lap 431kpps
ending at
version 1.8.1
windows dual 1.606mpps
linux dual 2.723mpps
windows 630kpps
windows lap 435kpps
Is it really possible that the linux got THAT MUCH faster?? It's still at about 2.732mpps...
How do your commandline file look's.
If you not have -Z it can be true.
1,7.12 have default nice=19 1.8.1 have nice=0
/Lennart |
|
|
|
1,7.12 have default nice=19 1.8.1 have nice=0
But htis goes against what all BOINC projects do: low-priority processes that don't impact the system much.
I run BOINC on desktop and on servers and I'm fine with BOINC yields to other tasks, this unobtrusiveness is why I run it.
If you're doing that because of Linux manages power differently than Windows, it's just the wrong thing to do it. You'd better educate the users about it and let them choose how to run BOINC (see this post).
If PrimeGrid is going to change BOINC's paradigm of using a system's spare computing cycles, frankly competing for computing resources, then I cannot run it on my systems.
Please, clarify.
____________
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
1,7.12 have default nice=19 1.8.1 have nice=0
But htis goes against what all BOINC projects do: low-priority processes that don't impact the system much.
I run BOINC on desktop and on servers and I'm fine with BOINC yields to other tasks, this unobtrusiveness is why I run it.
If you're doing that because of Linux manages power differently than Windows, it's just the wrong thing to do it. You'd better educate the users about it and let them choose how to run BOINC (see this post).
If PrimeGrid is going to change BOINC's paradigm of using a system's spare computing cycles, frankly competing for computing resources, then I cannot run it on my systems.
Please, clarify.
This thread discussion is for the manual sieving effort here at PrimeGrid. It does not impact the BOINC side at all.
EDIT: Currently, this only affects the 321 manual sieving effort. We start all of the sieves manually first and move to BOINC if needed. BOINC continues to be a "low impact" system.
____________
|
|
|
|
This thread discussion is for the manual sieving effort here at PrimeGrid. It does not impact the BOINC side at all.
OK, then.
Thanks a bunch.
____________
|
|
|
|
any chance we can get some x86-osx binarys running via boinc?
i have a few mac-mini's that i can install on if so..
i was able to download and compile the 1.8.9 version myself, so sr2seive doesnt seem broken at all.
?
yes/no/maybe? |
|
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1259 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
I have had many difficulties linking anything with BOINC on OS X. You are welcome to try. |
|
|
geoff Volunteer developer Send message
Joined: 3 Aug 07 Posts: 99 ID: 10427 Credit: 343,437 RAC: 0
 
|
It is possible to compile simple BOINC applications for Linux, like the wrappers used by PrimeGrid, using only a small subset of the BOINC library. I had to do this because I don't have all the graphics and sound libraries required for a full build.
I used the following files from version 6.2.18, but it might be possible to trim this list further with a bit of editing:
include/app_ipc.h
include/boinc_api.h
include/common_defs.h
include/diagnostics.h
include/error_numbers.h
include/filesys.h
include/hostinfo.h
include/md5.h
include/md5_file.h
include/mfile.h
include/miofile.h
include/parse.h
include/prefs.h
include/proxy_info.h
include/shmem.h
include/str_util.h
include/util.h
include/version.h
src/app_ipc.C
src/boinc_api.C
src/diagnostics.C
src/filesys.C
src/hostinfo.C
src/md5.c
src/md5_file.C
src/mfile.C
src/miofile.C
src/parse.C
src/prefs.C
src/proxy_info.C
src/shmem.C
src/str_util.C
src/util.C
I think it might be possible to compile a minimal version of the BOINC library that doesn't use any C++ code, I think this would make things a lot simpler and the resulting executables a lot smaller.
|
|
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1259 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
That sounds like a viable idea. Have you suggested to the folks over at BOINC to build a smaller library based upon the objects you are referencing? |
|
|
|
wait i think i missed something..
is something else required other then the sr2seive binary?
if so, can someone point me in the right direction? |
|
|
geoff Volunteer developer Send message
Joined: 3 Aug 07 Posts: 99 ID: 10427 Credit: 343,437 RAC: 0
 
|
wait i think i missed something..
is something else required other then the sr2seive binary?
if so, can someone point me in the right direction?
PrimeGrid uses a wrapper program to run sr2sieve, all the BOINC stuff is done in the wrapper. You don't need the wrapper for manual sieving.
|
|
|
|
ok so we need the boinc wrapper for OSX also.
is that source code somewhere?
is there any value of trying to merge the 2 together? (wrapper and client)
|
|
|
geoff Volunteer developer Send message
Joined: 3 Aug 07 Posts: 99 ID: 10427 Credit: 343,437 RAC: 0
 
|
ok so we need the boinc wrapper for OSX also.
is that source code somewhere?
is there any value of trying to merge the 2 together? (wrapper and client)
I don't know where the wrapper source is.
The practical problem with combining the BOINC code into the sr2sieve executable is just that the assembly-language parts of sr2sieve can only be compiled with GCC, but the Windows BOINC library can not be compiled with GCC, or at least I haven't found out how to do it.
I intend to add the necessary BOINC functions to sr2sieve, but because of the situation above it has not been a priority. (I have already added them to gcwsieve, but I don't think any real testing has been done yet.)
|
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2653 ID: 1 Credit: 109,633,255 RAC: 47,554
                     
|
The source code for the wrapper is in SourceForge, http://www.sf.net/projects/primegrid/ - have in mind that somebody has showed interest in porting it to Mac OS already, so don't duplicate the work.
____________
|
|
|
|
hi.
just want to inform you that geocities is DEAD at least half a year )))
please change first post to googlesites. )))
http://sites.google.com/site/geoffreywalterreynolds/programs/sr2sieve
like this
____________
wbr, Me. Dead J. Dona
|
|
|