Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Project Staging Area :
Servers recommended for my PS3
Author |
Message |
|
Now that AP26 has ended, my three PS3 need something else to do. Seeing as how they got me to an amethyst badge on AP26, I'm hoping to improve my badge here on PSA.
I do have my first PS3 trying out a GCW13 task but I'm looking for recommendations for which servers have work they do the best at.
| |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
Now that AP26 has ended, my three PS3 need something else to do. Seeing as how they got me to an amethyst badge on AP26, I'm hoping to improve my badge here on PSA.
I do have my first PS3 trying out a GCW13 task but I'm looking for recommendations for which servers have work they do the best at.
IIRC, you should be able to do any project except FPS (factorial) and PRS (primorial). Nothing on PS3 can run PRP tests for FPS or PRS. I don't know about SGS (sophie-germain). The k values might be too high for phrot.
You will find the most primes on port 11000, but none will be in the Top 5000 and you won't be able to do GFN divisibility testing.
GFN32768 and GFN65536 are possible as genefer can be built on PS3, but it might be possible that they are beyond genefer's ability to test (genefer cannot test above a certain value for b, which I don't recall). You can use the -l switch with genefer to detect the limits on your PS3 then see if the server(s) has reached that limit yet before choosing to do work for it.
IMO, GCW13 is the most worthwhile project because once a prime is found, the project will end. I don't know if there are any plans to take it to k > 1000000 if nothing is found by then. | |
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
The following servers (in order of WU length) mentioned by rogue are available to phrot which is used by the PS3 :
- server=PPSElow:0:10:uwin.mine.nu:11000
- server=PPSEhigh:0:5:uwin.mine.nu:10000
- server=SGS:0:1:prpnet.primegrid.com:12000
- server=ESP:0:1:pgllr.mine.nu:9000
- server=27121:0:1:prpnet.primegrid.com:12006
- server=GCW13:0:1:prpnet.primegrid.com:12004
The GFN servers are also mentioned but currently the PS3 build does not have a generfer option:
- server=GFN32768:0:1:prpnet.primegrid.com:12005
- server=GFN65536:0:1:prpnet.primegrid.com:12003
More information on each port:
PPSElow: Proth Prime Search Extended: k*2^n=1; 1200<k<10000 for n<500K
PPSEhigh: Proth Prime Search Extended: k*2^n=1; 1200<k<10000 for n>500K
SGS: Sophie Germain Prime Search: (k*2^666666-1)
ESP: The extended Sierpinski problem
27121: 27121 Prime Search: k=27 & 121 for k*2^n+-1 for n<10M
GCW: Generalized Woodall & Cullen; b=13 (n*13^n-/+1)
GFN32768: Generalized Fermat Number Prime Search (b^2^32768+1)
GFN65536: Generalized Fermat Number Prime Search (b^2^65536+1)
____________
| |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
[quote
The GFN servers are also mentioned but currently the PS3 builds do not have a generfer option:
- server=GFN32768:0:1:prpnet.primegrid.com:12005
- server=GFN65536:0:1:prpnet.primegrid.com:12003
[/quote]
They don't? That is something that needs to be rectified. I don't know if I provided anyone with the RISC source to genefer, but it should build on PS3. It uses its own FFT, so it isn't reliant on a third party library for that. If anyone with a PS3 want to try to build it, please send me an e-mail (not a PM) and we'll work offline to get it working. | |
|
Lexs Volunteer developer
 Send message
Joined: 16 Mar 08 Posts: 61 ID: 20289 Credit: 49,033,000 RAC: 0
               
|
I don't know if I provided anyone with the RISC source to genefer, but it should build on PS3. It uses its own FFT, so it isn't reliant on a third party library for that. If anyone with a PS3 want to try to build it, please send me an e-mail (not a PM) and we'll work offline to get it working.
I have a genefer-2.2 build for PS3 which also uses the 6 SPU's, but it is not very stable. Further I have a CUDA version, but only tested in CUDAEMU.
If someone has time to look into it more deeply I'll send you the sources.
____________
| |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
I don't know if I provided anyone with the RISC source to genefer, but it should build on PS3. It uses its own FFT, so it isn't reliant on a third party library for that. If anyone with a PS3 want to try to build it, please send me an e-mail (not a PM) and we'll work offline to get it working.
I have a genefer-2.2 build for PS3 which also uses the 6 SPU's, but it is not very stable. Further I have a CUDA version, but only tested in CUDAEMU.
If someone has time to look into it more deeply I'll send you the sources.
It is probably my sources that you are working with. If you e-mail me we can investigate the stability issues. | |
|
Lexs Volunteer developer
 Send message
Joined: 16 Mar 08 Posts: 61 ID: 20289 Credit: 49,033,000 RAC: 0
               
|
I have a genefer-2.2 build for PS3 which also uses the 6 SPU's, but it is not very stable. Further I have a CUDA version, but only tested in CUDAEMU.
If someone has time to look into it more deeply I'll send you the sources.
It is probably my sources that you are working with. If you e-mail me we can investigate the stability issues.
Yes, its based on your sources of genefer 1.3 with its FFT routines ripped out and replaced by Syoichiro Yamada's FFTW implementation then forward-ported to the checkpointing and checks of genefer 2.2.
Since FFTW gained seemless SPU support in version 3.3.1 it now utilizes the SPU's without any change.
The CUDA version is quite similar, instead of the FFT done with FFTW it uses the CUDAFFT libs.
At least in simulation and for smaller exponents it works OK. But there sure is more work to be done for boundary checking. I'll send you both sources by PM.
____________
| |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
I have a genefer-2.2 build for PS3 which also uses the 6 SPU's, but it is not very stable. Further I have a CUDA version, but only tested in CUDAEMU.
If someone has time to look into it more deeply I'll send you the sources.
It is probably my sources that you are working with. If you e-mail me we can investigate the stability issues.
Yes, its based on your sources of genefer 1.3 with its FFT routines ripped out and replaced by Syoichiro Yamada's FFTW implementation then forward-ported to the checkpointing and checks of genefer 2.2.
Since FFTW gained seemless SPU support in version 3.3.1 it now utilizes the SPU's without any change.
The CUDA version is quite similar, instead of the FFT done with FFTW it uses the CUDAFFT libs.
At least in simulation and for smaller exponents it works OK. But there sure is more work to be done for boundary checking. I'll send you both sources by PM.
I can't do much with the CUDA version, but the FFTW version intrigues me. I will certainly look into that and do some comparison timings. In the worst case the original source for 2.2 should work. You stated that the FFTW version is based on 1.3, but with mods for 2.2. Since FFTW can be built on x86, do you (or anyone else) have timings to compare it to any of current x86 versions? I suspect it would be faster than genefer80 and possibly genefer itself.
Finally if you have any FFTW timings (without CUDA support), I would like to see them compared to genefer 2.2 on your machine. That would be very insightful. | |
|
Lexs Volunteer developer
 Send message
Joined: 16 Mar 08 Posts: 61 ID: 20289 Credit: 49,033,000 RAC: 0
               
|
Only a few short benchmarks, skipping the 64K and above for now,
can't waste that much CPU time during a challenge ;-)
CUDA Emulation Mode sm_13 on a Mac mini 2006 Core Duo 1.66GHz without any NVIDIA chipset
./genefer-2.2cuda-1 genefer.work
5683936^256+1 is a probable composite. (RES=893994a255a3326f) (1730 digits) (err = 0.3750) (time = 0:03:16)
…this is so slow, someone on a faster machine and/or real NVidia card should to this benchmark...
PS3 FFTW-SPU version (using 1 PPU, 6 SPU, on Cell-Blade it should automatically use more SPU's):
./genefer-2.2-ps3 genefer.work
5683936^256+1 is a probable composite. (RES=893994a255a3326f) (1730 digits) (err = 0.2500) (time = 0:00:00)
4616790^512+1 is a probable composite. (RES=05704de8d08c2a0a) (3413 digits) (err = 0.2500) (time = 0:00:00)
3750000^1024+1 is a probable composite. (RES=0f807c291b252057) (6732 digits) (err = 0.2500) (time = 0:00:02)
3045946^2048+1 is a probable composite. (RES=bd72b5713f909aed) (13279 digits) (err = 0.2188) (time = 0:00:08)
2474076^4096+1 is a probable composite. (RES=0b53a7da1c7c9181) (26188 digits) (err = 0.2500) (time = 0:00:26)
2009574^8192+1 is a probable composite. (RES=ab752d28c1e60445) (51636 digits) (err = 0.2500) (time = 0:01:42)
1632282^16384+1 is a probable composite. (RES=eee7f094cb5f4f86) (101791 digits) (err = 0.2500) (time = 0:06:29)
1325824^32768+1 is a probable composite. (RES=7aea6cae5b1b0904) (200622 digits) (err = 0.2500) (time = 0:24:42)
With bigger FFTs PS3 should become really good for this, see http://www.fftw.org/cell/ps3/
Linux 32-bit LLVM-2.7/Clang FFTW version on a Pentium 4 2.8GHz:
./genefer-clang genefer.work
5683936^256+1 is a probable composite. (RES=893994a255a3326f) (1730 digits) (err = 0.2500) (time = 0:00:01)
4616790^512+1 is a probable composite. (RES=05704de8d08c2a0a) (3413 digits) (err = 0.2500) (time = 0:00:00)
3750000^1024+1 is a probable composite. (RES=0f807c291b252057) (6732 digits) (err = 0.3125) (time = 0:00:02)
3045946^2048+1 is a probable composite. (RES=bd72b5713f909aed) (13279 digits) (err = 0.2500) (time = 0:00:06)
2474076^4096+1 is a probable composite. (RES=0b53a7da1c7c9181) (26188 digits) (err = 0.2812) (time = 0:00:27)
2009574^8192+1 is a probable composite. (RES=ab752d28c1e60445) (51636 digits) (err = 0.2812) (time = 0:01:52)
1632282^16384+1 is a probable composite. (RES=eee7f094cb5f4f86) (101791 digits) (err = 0.3125) (time = 0:07:51)
1325824^32768+1 is a probable composite. (RES=7aea6cae5b1b0904) (200622 digits) (err = 0.2812) (time = 0:39:31)
FFTW on Pentium 4 has its highlight on 8K FFTs, see http://www.fftw.org/speed/Pentium4-2.4GHz-gcc/
the Core2 processors seem to shine up to 64K FFTs, see http://www.fftw.org/speed/CoreDuo-3.0GHz-icc/
Linux 32-bit GCC-4.4.2 x86-generic-32 version on a Pentium 4 2.8GHz:
./genefer_x86-gcc genefer.work
5683936^256+1 is a probable composite. (RES=893994a255a3326f) (1730 digits) (err = 0.2767) (time = 0:00:00)
4616790^512+1 is a probable composite. (RES=05704de8d08c2a0a) (3413 digits) (err = 0.3314) (time = 0:00:00)
3750000^1024+1 is a probable composite. (RES=0f807c291b252057) (6732 digits) (err = 0.3495) (time = 0:00:02)
3045946^2048+1 is a probable composite. (RES=bd72b5713f909aed) (13279 digits) (err = 0.3695) (time = 0:00:06)
2474076^4096+1 is a probable composite. (RES=0b53a7da1c7c9181) (26188 digits) (err = 0.3952) (time = 0:00:24)
2009574^8192+1 is a probable composite. (RES=ab752d28c1e60445) (51636 digits) (err = 0.3533) (time = 0:01:42)
1632282^16384+1 is a probable composite. (RES=eee7f094cb5f4f86) (101791 digits) (err = 0.3636) (time = 0:07:10)
1325824^32768+1 is a probable composite. (RES=7aea6cae5b1b0904) (200622 digits) (err = 0.3804) (time = 0:30:23)
the native version is faster so far, but FFTW has a lot of options for tuning and also different CPU models
show completely different speedups
____________
| |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
Those are great numbers. How do they compare to genefx64 and genefer80? Have you run it with the -l option to see what the upper limit is for b?
I will find some time in the next few days to see how PPC compares when using FFTW. | |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
I did some testing with the FFTW version of genefer on MacPPC. It performs poorly compared to the RISC version. I've added some of the RISC optimizations for unrolling loops and although that improved it by about 20%, it is still about 20% slower than the code I've been using. What's even worse is that as the FFT size increases, the performance gets worse compared to the RISC version.
Here are the comparable benchmarks:
genefer_risc -b
Generalized Fermat Number Bench
5683936^256+1 Time: 4.16 us/mul. Err: 0.0000 1730 digits
4616790^512+1 Time: 9.04 us/mul. Err: 0.0000 3413 digits
3750000^1024+1 Time: 19.8 us/mul. Err: 0.0000 6732 digits
3045946^2048+1 Time: 42.8 us/mul. Err: 0.0000 13279 digits
2474076^4096+1 Time: 97.3 us/mul. Err: 0.0000 26188 digits
2009574^8192+1 Time: 213 us/mul. Err: 0.0000 51636 digits
1632282^16384+1 Time: 439 us/mul. Err: 0.0000 101791 digits
1325824^32768+1 Time: 965 us/mul. Err: 0.0000 200622 digits
1076904^65536+1 Time: 2.58 ms/mul. Err: 0.0000 395325 digits
874718^131072+1 Time: 6.38 ms/mul. Err: 0.0000 778813 digits
710492^262144+1 Time: 14.4 ms/mul. Err: 0.0000 1533952 digits
577098^524288+1 Time: 33.1 ms/mul. Err: 0.0000 3020555 digits
468750^1048576+1 Time: 77.1 ms/mul. Err: 0.0000 5946413 digits
380742^2097152+1 Time: 170 ms/mul. Err: 0.0000 11703432 digits
./genefer_fftw -b
Generalized Fermat Number Bench
5683936^256+1 Time: 4.54 us/mul. Err: 0.0000 1730 digits
4616790^512+1 Time: 9.99 us/mul. Err: 0.0000 3413 digits
3750000^1024+1 Time: 21.7 us/mul. Err: 0.0000 6732 digits
3045946^2048+1 Time: 48.3 us/mul. Err: 0.0000 13279 digits
2474076^4096+1 Time: 103 us/mul. Err: 0.0000 26188 digits
2009574^8192+1 Time: 262 us/mul. Err: 0.0000 51636 digits
1632282^16384+1 Time: 641 us/mul. Err: 0.0000 101791 digits
1325824^32768+1 Time: 2.04 ms/mul. Err: 0.0000 200622 digits
1076904^65536+1 Time: 8.58 ms/mul. Err: 0.0000 395325 digits
874718^131072+1 Time: 49.7 ms/mul. Err: 0.0000 778813 digits
710492^262144+1 Time: 113 ms/mul. Err: 0.0000 1533952 digits
577098^524288+1 Time: 234 ms/mul. Err: 0.0000 3020555 digits
468750^1048576+1 Time: 344 ms/mul. Err: 0.0000 5946413 digits
380742^2097152+1 Time: 879 ms/mul. Err: 0.0000 11703432 digits
Both were compiled with the following switches: -O3 -ffast-math -mdynamic-no-pic -mtune=G5 -mcpu=970 -fomit-frame-pointer -falign-loops=16
Unfortunately the FFTW version has lower limits before triggering rounding errors, so it is completely unusable on MacPPC. | |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
I realized that --fast-math causes problems, so I removed it. That doesn't impact the relative benchmarks, but does impact performance. I need to investigate. | |
|
Lexs Volunteer developer
 Send message
Joined: 16 Mar 08 Posts: 61 ID: 20289 Credit: 49,033,000 RAC: 0
               
|
I realized that --fast-math causes problems, so I removed it.
I had a similar problem with -ffast-math when compiling phrot.
You could check to enable everything that -ffast-math uses, that's:
-fno-math-errno
-fno-rounding-math
-fno-signaling-nans
-fcx-limited-range
-fno-signed-zeros
-fno-trapping-math
-freciprocal-math
-ffinite-math-only
and then disabling:
-fno-associative-math
which caused the trouble for phrot on PS3.
Meanwhile I've changed the FFT to in-place-transformation, it doesn't make much sense for the FFTW version,
but the CUDA version uses the same scheme/plan, and could save memory and transfer times there.
I also started a OpenCL_FFT version, but hell this is really hard to setup.
Another interesting point would be SMP and/or multicore support within the FFT:
http://www.fftw.org/parallel/parallel-fftw.html
Something I discovered is, when using FFTW's MEASURE or PATIENT mode over ESTIMATE
the residue-calculation goes wrong(zero). Don't know why.
The bad thing is, that on Cell/PS3:
"The FFTW_ESTIMATE mode may produce seriously suboptimal plans, and it becomes particularly confused if you enable both the SPEs and Altivec. If you care about performance, please use FFTW_MEASURE or FFTW_PATIENT until we figure out a more reliable performance model."
http://fftw.org/cell/index.html
And some more times from yesterdays run on the PS3:
1076904^65536+1 is a probable composite. (RES=86b640061bf8dce8) (395325 digits) (err = 0.2500) (time = 1:38:40)
874718^131072+1 is a probable composite. (RES=80a0dd44bf881dbe) (778813 digits) (err = 0.2500) (time = 6:28:55)
710492^262144+1 is a probable composite. (RES=9c216faec7e33833) (1533952 digits) (err = 0.2500) (time = 26:01:26)
____________
| |
|
rogueVolunteer developer
 Send message
Joined: 8 Sep 07 Posts: 1249 ID: 12001 Credit: 18,565,548 RAC: 0
 
|
The one causing the problems is -funsafe-math-optimizations, which is part of -ffast-math. Oddly, this does not cause any problems with phrot on my G5. Removing that option is a performance killer, but using it leads to invalid results. | |
|
Message boards :
Project Staging Area :
Servers recommended for my PS3 |