PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : AP26 - AP27 Search : Improving the AP26 application

Author Message
geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12603 - Posted: 8 Jan 2009 | 6:27:40 UTC

If there is anyone here with C programming experience or even just experience with compiling C programs, you might be interested in trying to make the AP26 application run faster.

The other primality testing and sieving programs that PrimeGrid currently uses all employ assembly language in the critical sections and so tend to be unresponsive to optimisations made by the C compiler, but the important part of the AP26 program is implemented in plain C, and so the application could potentially be made faster by using a different compiler or better optimisation settings.

The original source is at http://www.math.uni.wroc.pl/~jwr/AP26/ with the modifications for BOINC at http://www.geocities.com/g_w_reynolds/AP26/. I have tried to make the modified source so that it can be compiled with the Intel or Microsoft compilers, but I haven't tested with anything other than GCC. If there are any incompatibilities I can probably help to fix them though.

MenipeProject donor
Volunteer tester
Send message
Joined: 2 Jan 08
Posts: 205
ID: 17041
Credit: 37,154,455
RAC: 36,596
321 LLR Gold: Earned 500,000 credits (645,811)Cullen LLR Gold: Earned 500,000 credits (666,506)ESP LLR Gold: Earned 500,000 credits (555,746)Generalized Cullen/Woodall LLR Gold: Earned 500,000 credits (555,123)PPS LLR Ruby: Earned 2,000,000 credits (3,773,468)PSP LLR Gold: Earned 500,000 credits (664,246)SoB LLR Gold: Earned 500,000 credits (631,080)SR5 LLR Gold: Earned 500,000 credits (606,500)SGS LLR Amethyst: Earned 1,000,000 credits (1,324,903)TPS LLR (retired) Bronze: Earned 10,000 credits (30,981)TRP LLR Gold: Earned 500,000 credits (837,635)Woodall LLR Gold: Earned 500,000 credits (502,434)321 Sieve Amethyst: Earned 1,000,000 credits (1,276,372)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (2,919,475)Generalized Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,410,644)PPS Sieve Jade: Earned 10,000,000 credits (13,945,392)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Gold: Earned 500,000 credits (712,299)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,064,240)AP 26/27 Gold: Earned 500,000 credits (676,543)GFN Ruby: Earned 2,000,000 credits (3,312,604)PSA Amethyst: Earned 1,000,000 credits (1,041,391)
Message 12605 - Posted: 8 Jan 2009 | 14:07:11 UTC - in response to Message 12603.

I will give it a try to compile for Windows x86/x86_64 with Visual Studio 2008
____________

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 12608 - Posted: 8 Jan 2009 | 15:57:43 UTC
Last modified: 8 Jan 2009 | 15:58:11 UTC

I attempted to compile it with VS2005, and it fails due to compiler bug. It will be very interesting to know if VS2008 has it fixed (the compiler thinks there are too many nested loops whereas there isn't and compile fails).

BTW, it would be great if you could try to build a native BOINC version.
____________

MenipeProject donor
Volunteer tester
Send message
Joined: 2 Jan 08
Posts: 205
ID: 17041
Credit: 37,154,455
RAC: 36,596
321 LLR Gold: Earned 500,000 credits (645,811)Cullen LLR Gold: Earned 500,000 credits (666,506)ESP LLR Gold: Earned 500,000 credits (555,746)Generalized Cullen/Woodall LLR Gold: Earned 500,000 credits (555,123)PPS LLR Ruby: Earned 2,000,000 credits (3,773,468)PSP LLR Gold: Earned 500,000 credits (664,246)SoB LLR Gold: Earned 500,000 credits (631,080)SR5 LLR Gold: Earned 500,000 credits (606,500)SGS LLR Amethyst: Earned 1,000,000 credits (1,324,903)TPS LLR (retired) Bronze: Earned 10,000 credits (30,981)TRP LLR Gold: Earned 500,000 credits (837,635)Woodall LLR Gold: Earned 500,000 credits (502,434)321 Sieve Amethyst: Earned 1,000,000 credits (1,276,372)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (2,919,475)Generalized Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,410,644)PPS Sieve Jade: Earned 10,000,000 credits (13,945,392)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Gold: Earned 500,000 credits (712,299)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,064,240)AP 26/27 Gold: Earned 500,000 credits (676,543)GFN Ruby: Earned 2,000,000 credits (3,312,604)PSA Amethyst: Earned 1,000,000 credits (1,041,391)
Message 12609 - Posted: 8 Jan 2009 | 16:10:28 UTC - in response to Message 12608.

I have been able to compile with VS2008, but when I run the code it tells me "long double type is too narrow".

When I print out the sizeof(long double) it tells me it is 8. It must be something with the way the compiler is treating doubles.

Once I can get it working I will attempt a native BOINC version.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12674 - Posted: 11 Jan 2009 | 1:32:41 UTC - in response to Message 12609.

I have been able to compile with VS2008, but when I run the code it tells me "long double type is too narrow".


The long double code was added to replace the GMP library functions used in the original source.

I have uploaded a new version or the source (11 January) that might get around this problem. Instead of using the C compiler's long double type, it uses inline assembly to do the necessary FPU calculations directly.

The new code is used when NO_LONG_DOUBLE is defined, so it can be tested with GCC by compiling as

gcc -O3 -Wall -DNO_LONG_DOUBLE -o AP26 AP26-boinc.c

I have little experience with the Microsoft inline assembly format so let me know if there are any syntax errors. It should be compatible with both 32 and 64-bit compilers as only x87 FPU instructions are used in the inline assembly, everything else is done in C.

HonzaProject donor
Volunteer moderator
Volunteer tester
Project scientist
Send message
Joined: 15 Aug 05
Posts: 1837
ID: 352
Credit: 2,594,195,029
RAC: 1,162,016
Discovered 5 mega primesEliminated 3 conjecture "k"sFound 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2018 Tour de Primes Mountain Stage2019 Tour de Primes largest primeFound 4 primes in the 2019 Tour de PrimesFound 1 mega prime in the 2019 Tour de PrimesFound 1 prime in the 2019 Tour de Primes Mountain Stage321 LLR Emerald: Earned 50,000,000 credits (50,556,284)Cullen LLR Emerald: Earned 50,000,000 credits (50,296,190)ESP LLR Emerald: Earned 50,000,000 credits (50,853,190)Generalized Cullen/Woodall LLR Emerald: Earned 50,000,000 credits (50,309,119)PPS LLR Double Bronze: Earned 100,000,000 credits (100,157,974)PSP LLR Sapphire: Earned 20,000,000 credits (23,462,639)SoB LLR Emerald: Earned 50,000,000 credits (67,865,553)SR5 LLR Emerald: Earned 50,000,000 credits (50,320,316)SGS LLR Sapphire: Earned 20,000,000 credits (22,598,870)TPS LLR (retired) Bronze: Earned 10,000 credits (43,033)TRP LLR Emerald: Earned 50,000,000 credits (57,441,313)Woodall LLR Emerald: Earned 50,000,000 credits (50,823,915)321 Sieve Sapphire: Earned 20,000,000 credits (49,043,456)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,142,109)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (50,504,945)PPS Sieve Double Gold: Earned 500,000,000 credits (504,377,255)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,288,222)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,149,354)AP 26/27 Double Silver: Earned 200,000,000 credits (214,118,887)GFN Double Gold: Earned 500,000,000 credits (625,583,002)PSA Double Gold: Earned 500,000,000 credits (531,254,992)
Message 12678 - Posted: 11 Jan 2009 | 8:44:24 UTC

Test under Win x64 went fine (comparing agains TEST-366384.txt)
Will AP26 be added as supported platform for Windows?
____________
My stats
Badge score: 1*1 + 5*1 + 7*1 + 8*8 + 9*7 + 11*1 + 12*3 = 187

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 12691 - Posted: 11 Jan 2009 | 18:18:18 UTC - in response to Message 12678.

Will AP26 be added as supported platform for Windows?

Once we have a native-BOINC 64bit application compiled, yes.

Honza, have you compiled it yourself? It would be nice if you could share :)
____________

HonzaProject donor
Volunteer moderator
Volunteer tester
Project scientist
Send message
Joined: 15 Aug 05
Posts: 1837
ID: 352
Credit: 2,594,195,029
RAC: 1,162,016
Discovered 5 mega primesEliminated 3 conjecture "k"sFound 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2018 Tour de Primes Mountain Stage2019 Tour de Primes largest primeFound 4 primes in the 2019 Tour de PrimesFound 1 mega prime in the 2019 Tour de PrimesFound 1 prime in the 2019 Tour de Primes Mountain Stage321 LLR Emerald: Earned 50,000,000 credits (50,556,284)Cullen LLR Emerald: Earned 50,000,000 credits (50,296,190)ESP LLR Emerald: Earned 50,000,000 credits (50,853,190)Generalized Cullen/Woodall LLR Emerald: Earned 50,000,000 credits (50,309,119)PPS LLR Double Bronze: Earned 100,000,000 credits (100,157,974)PSP LLR Sapphire: Earned 20,000,000 credits (23,462,639)SoB LLR Emerald: Earned 50,000,000 credits (67,865,553)SR5 LLR Emerald: Earned 50,000,000 credits (50,320,316)SGS LLR Sapphire: Earned 20,000,000 credits (22,598,870)TPS LLR (retired) Bronze: Earned 10,000 credits (43,033)TRP LLR Emerald: Earned 50,000,000 credits (57,441,313)Woodall LLR Emerald: Earned 50,000,000 credits (50,823,915)321 Sieve Sapphire: Earned 20,000,000 credits (49,043,456)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,142,109)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (50,504,945)PPS Sieve Double Gold: Earned 500,000,000 credits (504,377,255)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,288,222)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,149,354)AP 26/27 Double Silver: Earned 200,000,000 credits (214,118,887)GFN Double Gold: Earned 500,000,000 credits (625,583,002)PSA Double Gold: Earned 500,000,000 credits (531,254,992)
Message 12707 - Posted: 12 Jan 2009 | 12:33:56 UTC - in response to Message 12691.

Honza, have you compiled it yourself? It would be nice if you could share :)

I took the one provided in binary.

Just checked if I can complile AP26 myself...
(I struggled with psieve and gave up earlier due to syntax errors).

This time, Geoffrey's suggested commands worked when merged with Jaroslaw Wroblewski's libs. I used MinGW for Win x64.

Unfortunately, app is painfully slow - 15 mins vs. 6 mins.
From output, this line is missing so I guess there is not FPU involved as should be.
Found the FPU in double precision mode, switching to extended precision mode.

I'm not experienced with compiling apps. I may try a bit more later...
____________
My stats
Badge score: 1*1 + 5*1 + 7*1 + 8*8 + 9*7 + 11*1 + 12*3 = 187

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12710 - Posted: 12 Jan 2009 | 16:39:46 UTC - in response to Message 12707.


Unfortunately, app is painfully slow - 15 mins vs. 6 mins.


That is quite strange. I thought that the only issue was the primality checking procedure.

I testrun original version of the program with GMP prime testing.

Then I replaced prime testing procedure with dummy procedure:

int PrimeQ(long long value){return (int)(value+1)%2;}

This should in no time mark all numbers as composite (and compiler has no way of knowing that only odd integers are delivered to PrimeQ as arguments). Hence I would expect the program do all the stuff except primality checking and produce no output.

The timerun gain by this change was 8.3%. That means that the original version uses 91.7% for sieving and 8.3% for primality testing. Therefore it should be quite immune for inefficiency of primality testing part.

Can you check how much time it takes to run the program with the dummy primality procedure above? It can be used to trace whether inefficiency is on the side of primality checks or the rest.

HonzaProject donor
Volunteer moderator
Volunteer tester
Project scientist
Send message
Joined: 15 Aug 05
Posts: 1837
ID: 352
Credit: 2,594,195,029
RAC: 1,162,016
Discovered 5 mega primesEliminated 3 conjecture "k"sFound 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2018 Tour de Primes Mountain Stage2019 Tour de Primes largest primeFound 4 primes in the 2019 Tour de PrimesFound 1 mega prime in the 2019 Tour de PrimesFound 1 prime in the 2019 Tour de Primes Mountain Stage321 LLR Emerald: Earned 50,000,000 credits (50,556,284)Cullen LLR Emerald: Earned 50,000,000 credits (50,296,190)ESP LLR Emerald: Earned 50,000,000 credits (50,853,190)Generalized Cullen/Woodall LLR Emerald: Earned 50,000,000 credits (50,309,119)PPS LLR Double Bronze: Earned 100,000,000 credits (100,157,974)PSP LLR Sapphire: Earned 20,000,000 credits (23,462,639)SoB LLR Emerald: Earned 50,000,000 credits (67,865,553)SR5 LLR Emerald: Earned 50,000,000 credits (50,320,316)SGS LLR Sapphire: Earned 20,000,000 credits (22,598,870)TPS LLR (retired) Bronze: Earned 10,000 credits (43,033)TRP LLR Emerald: Earned 50,000,000 credits (57,441,313)Woodall LLR Emerald: Earned 50,000,000 credits (50,823,915)321 Sieve Sapphire: Earned 20,000,000 credits (49,043,456)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,142,109)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (50,504,945)PPS Sieve Double Gold: Earned 500,000,000 credits (504,377,255)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,288,222)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,149,354)AP 26/27 Double Silver: Earned 200,000,000 credits (214,118,887)GFN Double Gold: Earned 500,000,000 credits (625,583,002)PSA Double Gold: Earned 500,000,000 credits (531,254,992)
Message 12711 - Posted: 12 Jan 2009 | 19:57:06 UTC - in response to Message 12710.
Last modified: 12 Jan 2009 | 19:57:30 UTC

That is quite strange. I thought that the only issue was the primality checking procedure.
I think the issue is that I'm unexperienced with compiling and this is why app that I compiled using MinGW is so slow.
I'm happy that I was able to make it running...
____________
My stats
Badge score: 1*1 + 5*1 + 7*1 + 8*8 + 9*7 + 11*1 + 12*3 = 187

MenipeProject donor
Volunteer tester
Send message
Joined: 2 Jan 08
Posts: 205
ID: 17041
Credit: 37,154,455
RAC: 36,596
321 LLR Gold: Earned 500,000 credits (645,811)Cullen LLR Gold: Earned 500,000 credits (666,506)ESP LLR Gold: Earned 500,000 credits (555,746)Generalized Cullen/Woodall LLR Gold: Earned 500,000 credits (555,123)PPS LLR Ruby: Earned 2,000,000 credits (3,773,468)PSP LLR Gold: Earned 500,000 credits (664,246)SoB LLR Gold: Earned 500,000 credits (631,080)SR5 LLR Gold: Earned 500,000 credits (606,500)SGS LLR Amethyst: Earned 1,000,000 credits (1,324,903)TPS LLR (retired) Bronze: Earned 10,000 credits (30,981)TRP LLR Gold: Earned 500,000 credits (837,635)Woodall LLR Gold: Earned 500,000 credits (502,434)321 Sieve Amethyst: Earned 1,000,000 credits (1,276,372)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (2,919,475)Generalized Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,410,644)PPS Sieve Jade: Earned 10,000,000 credits (13,945,392)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Gold: Earned 500,000 credits (712,299)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,064,240)AP 26/27 Gold: Earned 500,000 credits (676,543)GFN Ruby: Earned 2,000,000 credits (3,312,604)PSA Amethyst: Earned 1,000,000 credits (1,041,391)
Message 12712 - Posted: 12 Jan 2009 | 22:03:07 UTC - in response to Message 12711.

After playing with the code to compile, I have managed to compile with Visual Studio 2008. The run time for the test is about 42 minutes on a Celeron Core 2 Duo 1.83Mhz.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12714 - Posted: 13 Jan 2009 | 5:01:09 UTC - in response to Message 12712.

After playing with the code to compile, I have managed to compile with Visual Studio 2008. The run time for the test is about 42 minutes on a Celeron Core 2 Duo 1.83Mhz.


I assume that is for a 32-bit executable. Could you compare it with my 32-bit GCC executable (AP26-i686-windows.exe) on the same machine?

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12715 - Posted: 13 Jan 2009 | 5:19:25 UTC - in response to Message 12710.


Unfortunately, app is painfully slow - 15 mins vs. 6 mins.


That is quite strange. I thought that the only issue was the primality checking procedure.


On my C2D at 2.66GHz in 64-bit mode the test range took 303 seconds with the original GMP PrimeQ() and 287 seconds with the replacement long double PrimeQ(), so the replacement doesn't make a huge difference.

But if something goes wrong with the compilation so that PrimeQ() returns false positives, then that can make it run much slower even though the program might still produce correct output.

If anyone can compile the GMP library for Windows then try using that to compile the original source, or add -DGMP to the command line with my source. Even though the GMP routine is a little slower, an improvement in compiling the rest of the program might more than compensate.

MenipeProject donor
Volunteer tester
Send message
Joined: 2 Jan 08
Posts: 205
ID: 17041
Credit: 37,154,455
RAC: 36,596
321 LLR Gold: Earned 500,000 credits (645,811)Cullen LLR Gold: Earned 500,000 credits (666,506)ESP LLR Gold: Earned 500,000 credits (555,746)Generalized Cullen/Woodall LLR Gold: Earned 500,000 credits (555,123)PPS LLR Ruby: Earned 2,000,000 credits (3,773,468)PSP LLR Gold: Earned 500,000 credits (664,246)SoB LLR Gold: Earned 500,000 credits (631,080)SR5 LLR Gold: Earned 500,000 credits (606,500)SGS LLR Amethyst: Earned 1,000,000 credits (1,324,903)TPS LLR (retired) Bronze: Earned 10,000 credits (30,981)TRP LLR Gold: Earned 500,000 credits (837,635)Woodall LLR Gold: Earned 500,000 credits (502,434)321 Sieve Amethyst: Earned 1,000,000 credits (1,276,372)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (2,919,475)Generalized Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,410,644)PPS Sieve Jade: Earned 10,000,000 credits (13,945,392)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Gold: Earned 500,000 credits (712,299)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,064,240)AP 26/27 Gold: Earned 500,000 credits (676,543)GFN Ruby: Earned 2,000,000 credits (3,312,604)PSA Amethyst: Earned 1,000,000 credits (1,041,391)
Message 12719 - Posted: 13 Jan 2009 | 14:04:47 UTC - in response to Message 12714.

Using the 32-bit GCC executable (AP26-i686-windows.exe) on the same machine took a little more than 41 minutes. Almost idential times.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12737 - Posted: 14 Jan 2009 | 4:22:15 UTC

I had made a small optimisation to SITO.H but forgot to include it with the
modified source. I have uploaded new source (14 January) with the modified SITO.H renamed to SITO-boinc.H

I read somewhere that the MS compiler can't use inline assembly in 64-bit mode, is that still true? I can write the assembly routines as external functions if that is necessary.

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 12807 - Posted: 16 Jan 2009 | 20:44:37 UTC

Hello

Are you sure "int validate_ap26(int k, int d, int64_t f)" work fine?

I have compiled app on windows with GMP (used GMP port for Windows), and it didn't work correct for me:(

In SOL-AP26.txt i found only this:
10 366384 16137071418665303
11 366384 7315382125981751
11 366384 3007107694524497
12 366384 2248956393265657

most of "corrects" results marked as Non-Solution

For example app from Jarek's site return correct results (same GMP lib, same compilator ect)

Reslts:
10 366384 785436737771879
10 366384 16137071418665303
10 366384 13946622976437587
10 366384 8734502163109357
11 366384 6998839830228583
10 366384 785468127438811
11 366384 5581880890832023
25 366384 6171054912832631
10 366384 9684491327954279
10 366384 3267586960583089
10 366384 1459178643530617
10 366384 9735298495263823
10 366384 1727805601738891
10 366384 1574579779396307
10 366384 2138118918508471
11 366384 10613929284401483
10 366384 14083229671187167
11 366384 16518632605478693
11 366384 3007107694524497
10 366384 1255892682660361
11 366384 7036595108074969
10 366384 8188609810134857
10 366384 2411963918614357
11 366384 6569259450263561
10 366384 13700995611657901
14 366384 15879407069784169
11 366384 6250872237076277
10 366384 7687488261269507
10 366384 660669773389609
10 366384 11977568522771779
10 366384 1540799946122147
12 366384 14782924219657043
10 366384 13812949836154363
11 366384 2003229604981763
10 366384 14531182366753699
13 366384 15731878523384263
13 366384 2167218735183577

That show GMP port is not problem, something in app is wrong.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12808 - Posted: 16 Jan 2009 | 21:04:32 UTC - in response to Message 12807.
Last modified: 16 Jan 2009 | 21:43:06 UTC

I think that in this case primality checking inside the program went wrong.

For example you got

11 366384 7315382125981751

while it is AP11 containing last 11 terms of AP25.

The program checks primality back and forth so it shouldn't report AP as shorter than it actually is.

On the other hand there must be something else wrong, since the program shouldn't find this AP11 if it didn't encounter primes earlier. At the moment I do not understand that - failure of primality check by itself doesn't explain this.


EDIT:

OK, I understand, an extra code (I wasn't aware of) after
/* Check trailing terms */
explains why the tail of AP25 could show up.

At this point I can believe that primality checks inside the program go somewhat wrong and validate_ap26 disagrees with the findings.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12813 - Posted: 17 Jan 2009 | 5:03:44 UTC - in response to Message 12808.

After getting the exact printout from Sesef, I can confirm that the program he has compiled, finds AP's correctly, but validate_ap26 refuses them, for reasons I do not understand at the moment - there must be something wrong with primality check inside validate_ap26 and primes are recognized as composites.

We will try to pinpoint the problem.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12820 - Posted: 17 Jan 2009 | 15:42:10 UTC - in response to Message 12813.

We have managed to localize Sesef's problems.

The problem was caused by

mpz_set_ui(mpN,N);

which instead of converting 64-bit integer N to GMP, truncated it to 32 bits.

This could be cured by using universal conversion instead:

mpz_set_ui(mpN,N>>32);
mpz_mul_2exp(mpN,mpN,32);
mpz_add_ui(mpN,mpN,N&(INT64_C(4294967295)));

Sesef used Visual Studio and apparently the 32/64 bit stuff was messed up there. Despite that the above behaviour is characteristic to 32 bits, the conversion used in my PrimeQ64.h works fine, although it must have 64 bit system.

Oddly enough, attemp to read GMP_LIMB_BITS gave no clear answer as to its value !!!

printf("GMP_LIMB_BITS: %d\n",GMP_LIMB_BITS);
gave 64, while at compile time it appeared to be 32.

According to Visual Studio, Sesef used, only parts of the code corresponding to 32 bits should be compiled.

At the original version of the program

# include <gmp.h>
# if (GMP_LIMB_BITS==64)
# include "PrimeQ64.h"
# elif (GMP_LIMB_BITS==32)
# include "PrimeQ32.h"
# endif

Visual Studio marked that PrimeQ32.h will be compiled, but did compile PrimeQ64.h (since PrimeQ32.h was not included at all in the compiled source).

The bottom line is: This GMP_LIMB_BITS may be messed up and one must be very carefull.

Sesef is now trying to optimize the compilation and include all the BOINC stuff.

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 12826 - Posted: 18 Jan 2009 | 2:19:48 UTC

I compiled application with all Boinc stuff. To compile i used Vistual Studio 2008 Team System, Intel Compiler and GMP port for Windows by Brian Gladman (http://fp.gladman.plus.com/computing/gmp4win.htm)

You can download it here:
http://www.ee.pw.edu.pl/~jaworows/primegrid/AP26_WINx64.zip

If you get any error first try install this:
http://www.microsoft.com/downloads/details.aspx?familyid=BA9257CA-337F-4B40-8C14-157CFDFFEE4E&displaylang=en


---------------------------------------------
My pc (Athlon 3200+ 2.0 Ghz@2.7 Ghz) test params took in 4:50 (290 sec)

---------------------------------------------
Have a nice tests :)

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12827 - Posted: 18 Jan 2009 | 7:58:04 UTC - in response to Message 12826.
Last modified: 18 Jan 2009 | 8:00:46 UTC

Since in various versions of the program, linux GMP primality tests are being replaced by some other routines depending on how compiler handles e.g. long double or GMP_LIMB_BITS, it would be a good idea to test how it works on larger numbers. The original test for K=366,384 doesn't use as large numbers as may appear during the search.

The closest limitation of the current AP26 program with respect to possible overflows is:

K cannot exceed 76,419,953
This a distant problem, would appear around 116,000,000 WU, the search is highly unlikely to go that high.

Cure for this very remote problem is:
In MAKES.H replace all the expressions of the form
(j*STEP)%541
with corresponding
(j*(STEP%541))%541
but this is not a serious issue at the moment.

In order to test whether primality checks are working correctly for the numbers of size we may expect to encounter in forseeable future, we should testrun the program for the highest of expected parameters.

Say, K=76,000,000, SHIFT=640.
Since so large K wouldn't be matched with so large SHIFT anyway, this is more than enough for current needs.

For this test AP26-ini.txt should contain
76000000 76000000 640

Run of the original GMP program produced:
11 76000000 237196596534720983
10 76000000 261285664232309413
11 76000000 245863618658643989
11 76000000 255672492975704803
11 76000000 246758454654478271
12 76000000 239683100514196253
10 76000000 261577529875648267
11 76000000 265146928947598571
13 76000000 219960250410820847
10 76000000 264184740340459621
10 76000000 254437677249184591
10 76000000 262024996718320189
11 76000000 235066907929601569
11 76000000 241339505938284671

Any other version should give identical results.

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 12831 - Posted: 18 Jan 2009 | 11:54:37 UTC - in response to Message 12826.

If you get any error first try install this:
http://www.microsoft.com/downloads/details.aspx?familyid=BA9257CA-337F-4B40-8C14-157CFDFFEE4E&displaylang=en

Sadly, this is not acceptable for public release. I'm not sure if there are any differences between VS2005 and VS2008, but you should try to change project settings: C/C++ -> Code Generation -> Runtime Library -> Multi-threaded (/MT). Should remove the dependency on a few DLLs.

Try getting Dependency Walker (http://www.dependencywalker.com/), these compilation settings should remove the dependency on MSVCR90.DLL and LIBMMD.DLL.
____________

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 12834 - Posted: 18 Jan 2009 | 12:35:01 UTC - in response to Message 12831.
Last modified: 18 Jan 2009 | 12:38:30 UTC

nvm this.

Little mistake.

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 12836 - Posted: 18 Jan 2009 | 14:03:42 UTC

OK I recompiled it with /MT

http://www.ee.pw.edu.pl/~jaworows/primegrid/AP26_WINx64v2.zip

Profile BerndBrotProject donor
Volunteer tester
Avatar
Send message
Joined: 5 May 07
Posts: 110
ID: 7879
Credit: 21,318,408
RAC: 0
321 LLR Silver: Earned 100,000 credits (175,485)Cullen LLR Silver: Earned 100,000 credits (109,161)PPS LLR Amethyst: Earned 1,000,000 credits (1,017,344)PSP LLR Silver: Earned 100,000 credits (187,807)SoB LLR Silver: Earned 100,000 credits (175,519)SGS LLR Amethyst: Earned 1,000,000 credits (1,012,353)TPS LLR (retired) Bronze: Earned 10,000 credits (56,227)TRP LLR Silver: Earned 100,000 credits (446,036)Woodall LLR Silver: Earned 100,000 credits (122,789)321 Sieve Silver: Earned 100,000 credits (102,340)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,563,140)PPS Sieve Turquoise: Earned 5,000,000 credits (6,982,689)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (310,926)TRP Sieve (suspended) Silver: Earned 100,000 credits (109,509)AP 26/27 Silver: Earned 100,000 credits (107,429)GFN Bronze: Earned 10,000 credits (96,331)PSA Turquoise: Earned 5,000,000 credits (5,743,925)
Message 12838 - Posted: 18 Jan 2009 | 15:01:54 UTC - in response to Message 12836.
Last modified: 18 Jan 2009 | 15:10:20 UTC

OK I recompiled it with /MT

http://www.ee.pw.edu.pl/~jaworows/primegrid/AP26_WINx64v2.zip



Just testing with Vista x64


Can you explain what it do?


____________

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 12840 - Posted: 18 Jan 2009 | 16:37:29 UTC
Last modified: 18 Jan 2009 | 17:29:39 UTC

Seems to be running fine in standalone. I'm making this public version. Thanks, Sesef!

[edit] Well, for some reason Windows versions is twice as slow comparing with Linux...
____________

Profile BerndBrotProject donor
Volunteer tester
Avatar
Send message
Joined: 5 May 07
Posts: 110
ID: 7879
Credit: 21,318,408
RAC: 0
321 LLR Silver: Earned 100,000 credits (175,485)Cullen LLR Silver: Earned 100,000 credits (109,161)PPS LLR Amethyst: Earned 1,000,000 credits (1,017,344)PSP LLR Silver: Earned 100,000 credits (187,807)SoB LLR Silver: Earned 100,000 credits (175,519)SGS LLR Amethyst: Earned 1,000,000 credits (1,012,353)TPS LLR (retired) Bronze: Earned 10,000 credits (56,227)TRP LLR Silver: Earned 100,000 credits (446,036)Woodall LLR Silver: Earned 100,000 credits (122,789)321 Sieve Silver: Earned 100,000 credits (102,340)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,563,140)PPS Sieve Turquoise: Earned 5,000,000 credits (6,982,689)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (310,926)TRP Sieve (suspended) Silver: Earned 100,000 credits (109,509)AP 26/27 Silver: Earned 100,000 credits (107,429)GFN Bronze: Earned 10,000 credits (96,331)PSA Turquoise: Earned 5,000,000 credits (5,743,925)
Message 12842 - Posted: 18 Jan 2009 | 19:28:52 UTC - in response to Message 12840.

Seems to be running fine in standalone. I'm making this public version. Thanks, Sesef!

[edit] Well, for some reason Windows versions is twice as slow comparing with Linux...



Just finish my first WUs with the new app

wuid=57503315
wuid=57503318

about 800s

____________

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12845 - Posted: 18 Jan 2009 | 21:13:01 UTC - in response to Message 12719.

Using the 32-bit GCC executable (AP26-i686-windows.exe) on the same machine took a little more than 41 minutes. Almost idential times.

Since the run-time for the 32-bit application doesn't seem to be much slower than the 64-bit one, why not release it too? The increased number of hosts should make up for any slow-down in the application.

TIA

____________

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 12846 - Posted: 19 Jan 2009 | 0:08:33 UTC - in response to Message 12845.

Using the 32-bit GCC executable (AP26-i686-windows.exe) on the same machine took a little more than 41 minutes. Almost idential times.

Since the run-time for the 32-bit application doesn't seem to be much slower than the 64-bit one, why not release it too? The increased number of hosts should make up for any slow-down in the application.

TIA

I think the conversation that quote was pulled from was comparing two 32 bit builds.

A single k test for 32 bit takes about 41 minutes.
A single k test for 64 bit takes about 5 minutes.
____________

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12847 - Posted: 19 Jan 2009 | 0:37:41 UTC - in response to Message 12846.

I think the conversation that quote was pulled from was comparing two 32 bit builds.

A single k test for 32 bit takes about 41 minutes.
A single k test for 64 bit takes about 5 minutes.

I see. Yet, other PrimeGrid applications take much longer than 41min to run, so why not 32-bit AP26 too?

TIA

____________

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 12848 - Posted: 19 Jan 2009 | 1:05:40 UTC - in response to Message 12847.

I see. Yet, other PrimeGrid applications take much longer than 41min to run, so why not 32-bit AP26 too?

TIA

With AP26, there's a 1:8 processing time ratio between 64 bit and 32 bit. With the sieves, the greatest ratio is 1:2. And with LLR, it's 1:1.

Currently, we are not able to justify a 32 bit application for AP26 with such a large discrepancy between processing times. Relatively speaking, computing resources would be better utilized elsewhere. More specifically, 32 bit machines are best suited for any LLR project.
____________

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12849 - Posted: 19 Jan 2009 | 1:22:42 UTC - in response to Message 12848.

With AP26, there's a 1:8 processing time ratio between 64 bit and 32 bit. With the sieves, the greatest ratio is 1:2. And with LLR, it's 1:1.

It seems to me that as 32-bit systems are 6x more numerous (cursorily from this), that would make up for some of that.

Perhaps using MMX or SSE2 for processing the 64-bit types would help 32-bit hosts as well... Just let me know if I can be of any help.

Thanks.

____________

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 12850 - Posted: 19 Jan 2009 | 2:01:27 UTC - in response to Message 12849.

With AP26, there's a 1:8 processing time ratio between 64 bit and 32 bit. With the sieves, the greatest ratio is 1:2. And with LLR, it's 1:1.

It seems to me that as 32-bit systems are 6x more numerous (cursorily from this), that would make up for some of that.

This is true for SETI and mostly true for PrimeGrid as well...see HERE.

However, comparing the amount of work being done, although 64 bit is mostly outnumbered, it more than makes up for it in processing "power". While users can crunch whatever they like (from what's offered), it continues to be our guidance to recommend that users place computers where they are most fully utilized.

Even with 32 bit machines outnumbering 64 bit machines, it's still better to place them in LLR projects. As more and more 64 bit machines become available, they more than make up for the loss, especially in sieving.

We also encourage users to utilize the fullest of their machines. That's why Wubi is often suggested to those who have a 32 bit OS on a 64 bit machine and have the flexibility to dual boot. Below are instructions for anyone who is interested. It really is quite easy. :) I'm typing this from a Wubi installed machine. :D

32 bit OS with 64 bit CPU
This is for those who are "driving with the hand brake active" (running a 32 bit OS on a 64 bit machine) ;) Wubi is a very nice tool that installs Ubuntu as a dual boot to your 64 bit machine. It's as simple as adding a program to Windows. You can even uninstall it like you would any other program. :)

Wubi - Ubuntu Installer

After getting 64 bit OS running on your machine, here's a link to ALL BOINC clients (including Linux x64): http://boinc.berkeley.edu/download_all.php If you are new to Linux and need help getting set up, let us know.

Alternately, you can just open a terminal (Applications/Accessories/Terminal) and type:

sudo apt-get install boinc-client boinc-manager

The latest BOINC client will be installed in your Applications/System Tools tab and you can run it from there.

It really is simple. :)

Ubuntu has a power save feature that throttles the CPU frequency down to save power. It does not treat BOINC as an active program so will throttle down thus causing under-performance. For those downloading the latest Ubuntu 8.10, please follow these directions to make sure the frequency is not throttled down while BOINC is running:

1. In Ubuntu, right click the top menu bar
2. Select add to panel
3. Select "CPU Frequency Scaling Monitor"
4. Click add

This will add a processor icon to the top menu. Click on the icon and set to "Performance". This also allows you to see the CPU frequency.
____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12852 - Posted: 19 Jan 2009 | 5:42:33 UTC - in response to Message 12842.

Seems to be running fine in standalone. I'm making this public version. Thanks, Sesef!

[edit] Well, for some reason Windows versions is twice as slow comparing with Linux...



Just finish my first WUs with the new app

wuid=57503315
wuid=57503318

about 800s


800s per WU doesn't seem to be doubled time. It is of the order of magnitude what one should expect from the linux version. Geoff wrote that the test range took 287s. Since each WU contains 3 such ranges, this makes 861s, sometimes more since some WU's may require a few percent more.

I am unable to compensate for CPU type here, but the times are comparable in this case.

I got also a report from another user indicating that there is no significant performance difference between Linux and Win app (he claimed to have WU in 13-17 minutes on 2.5 GHz).

Since I do not have exact benchmark at hand, I could believe that Win app is somewhat slower, but certainly not by a factor close to 2.

Are you sure the Win64 application is so slow, Rytis? Perhaps there were some other factors in play that made you think the performance of WIN64 app is halved.

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 12853 - Posted: 19 Jan 2009 | 6:39:56 UTC - in response to Message 12852.
Last modified: 19 Jan 2009 | 6:40:50 UTC

Are you sure the Win64 application is so slow, Rytis? Perhaps there were some other factors in play that made you think the performance of WIN64 app is halved.

Most probably what happened is that I compared apples and oranges :) I based my claim on comparison with PSP sr2sieve application, because I had no way to run AP26 Linux on this host. Seems that other hosts are doing work much faster. I think the speed difference is from CPU cache - the fast hosts have 8MB cache, and my test host has only 2MB.

Augustine, while we are not going to release 32bit applications for AP26, it is possible to compile and run your own if you'd want to - the source code is public.
____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12889 - Posted: 20 Jan 2009 | 15:13:28 UTC - in response to Message 12853.
Last modified: 20 Jan 2009 | 15:44:37 UTC

I was really puzzled where the huge 8:1 speed ratio between 64/32 come from, so I had a closer look at the most used time consuming operation of the AP26.

This operation is n%p , where n is a 64-bit variable integer (unpredictable for the compiler), and p is a small prime EXPLICITLY given in the code.

Here is a simple sample code which gave me more or less 8:1 speed ratio on the same computer, but with 64 versus 32 bit compilation:

int main(int argc, char *argv[])
{
long long n=127777777712345LL;
int i, sum=0;
for(i=1;i<100000000;i++)sum+=(n+i)%127;
// take instead of 127 any prime in the range 101 - 541
printf("sum: %d\n",sum);
return 0;
}

I have no clue why the above phenomena takes place, but if anyone can make the above division run faster on a 32-bit platform, there is an excellent chance to get a similar speed up of the 32-bit AP26 program.

EDIT:

In the division n%p you can assume n to have 48 bits and p to have 9 bits (or 8 if it helps).

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12890 - Posted: 20 Jan 2009 | 16:03:22 UTC - in response to Message 12853.

Augustine, while we are not going to release 32bit applications for AP26, it is possible to compile and run your own if you'd want to - the source code is public.

Where from?

TIA

____________

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 12891 - Posted: 20 Jan 2009 | 16:04:10 UTC - in response to Message 12890.

Where from?

http://www.geocities.com/g_w_reynolds/AP26/
____________

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12892 - Posted: 20 Jan 2009 | 16:11:18 UTC - in response to Message 12889.
Last modified: 20 Jan 2009 | 16:16:36 UTC

A 64-bit division (remainder) on x86-64 is a single instruction, whereas it needs to run a routine on x86, since it cannot divide a 64-bit number by another in a single instruction as x86-64 can.

However, C semantics is forcing the dividend and the divider to be 64 bits long. x86 does have an instruction, IDIV, which divides a 64-bit number by a 32-bit number with quotient and remainder 32 bits long. If the remainder is guaranteed to be less than 32 bits long with the input data for AP26, then this instruction could be inserted as in-line assembler.

Another trick that could be used, if the divisor is constant, is to use fixed-point arithmetic. Roughly, instead of using a division, it uses multiplication by the reciprocal.

GCC is capable of figuring this out if it uses feedback-directed optimizations, when the code is compiled with -fprofile-generate, run for a representative WU, and then recompiled with -fprofile-use. If it can find a handful of values for the divisor that happen often, it will generate code that checks for these values and use fixed-point arithmetic for them and regular arithmetic for others.

This trick was added in order to bump the performance of 176.vpr in SPEC CPU2000.

HTH
____________

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12893 - Posted: 20 Jan 2009 | 16:25:06 UTC - in response to Message 12891.
Last modified: 20 Jan 2009 | 16:33:24 UTC

Where from?

http://www.geocities.com/g_w_reynolds/AP26/

It's a true crime that this code resorts to x87 instructions. This is just wrong on x86-64 because it precludes vectorization. :-(
____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12896 - Posted: 20 Jan 2009 | 16:54:59 UTC - in response to Message 12893.


It's a true crime that this code resorts to x87 instructions. This is just wrong on x86-64 because it precludes vectorization. :-(


I do not know what "x87 instructions" and "vectorization" are.
Can you explain it in a few words or is it too complicated?

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12898 - Posted: 20 Jan 2009 | 17:01:55 UTC - in response to Message 12896.

I do not know what "x87 instructions" and "vectorization" are.
Can you explain it in a few words or is it too complicated?

x87 instructions are instructions using the x87 FPU, or those starting with F... Vectorization is an optimization enabled by SSE instructions that operate on more than one data at a time. GCC is capable of using them in loops, improving the performance between 2 and 4 times, in theory.

I can rewrite the routines in primeq.h, but only in the weekend.

HTH

____________

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 12900 - Posted: 20 Jan 2009 | 17:22:25 UTC - in response to Message 12898.

I can rewrite the routines in primeq.h, but only in the weekend.


Will it be faster then GMP prime check?? (Win app use GMP port)

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12901 - Posted: 20 Jan 2009 | 18:09:19 UTC - in response to Message 12900.
Last modified: 20 Jan 2009 | 18:10:01 UTC

I have replaced most crucial 64-bit divisions by 32-bit ones. The patch is described in

http://www.math.uni.wroc.pl/~jwr/AP26/faster32.zip

Unfortunately I didn't have access to a computer which would unable me to do reliable benchmarks. Using an archaic djgpp under Win98 I got an indication of 2.6 times gain. If that was true, then the 8:1 ratio could be dropped to something around 3:1.

Does anyone here care enough about 32 bit version to verify it?

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12902 - Posted: 20 Jan 2009 | 18:50:45 UTC - in response to Message 12901.

Oooops, sorry, I had a little bug there. The fixed version is in place now (contains corected patch and file Bug.txt).

The speed gain seems to be somewhat smaller now, something like 2.25, which would drop 8 to around 3.5.

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12903 - Posted: 20 Jan 2009 | 19:10:41 UTC - in response to Message 12900.

Will it be faster then GMP prime check?? (Win app use GMP port)

That would be an idea worth checking, since GMP does have routines optimized for x86-64.

GMP wouldn't be able to use vectorization though.

HTH

____________

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12913 - Posted: 21 Jan 2009 | 5:16:04 UTC - in response to Message 12893.
Last modified: 21 Jan 2009 | 5:22:05 UTC

Where from?

http://www.geocities.com/g_w_reynolds/AP26/

It's a true crime that this code resorts to x87 instructions. This is just wrong on x86-64 because it precludes vectorization. :-(


The code that uses the x87 instructions is not time critical, it accounts for less than 2% of the total run time. The x87 instructions are used because the extra precision of x87 over SSE2 floating point is needed. (If SSE2 ops were used then the numbers would be limited to 2^52, x87 operations allow 2^62).

edit: BTW x87 ops can be vectorised by hand, sr2sieve uses highly vectorised ASM code and x87 ops are used in the x86_64 version instead of SSE2 when the factors exceed 2^51.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12914 - Posted: 21 Jan 2009 | 7:28:40 UTC - in response to Message 12913.

I have determined that the optimized 32-bit code is spending about half of the CPU time on instructions in SITO.H (or SITO-boinc.H) and it is likely that also in 64-bit version most time consuming part is there, although I haven't made explicit tests to measure that.

Since I have no idea about assembly coding, plain C was the only way I could write a working program.

SITO.H (or SITO-boinc.H) contains so simple code that it doesn't require to understand the whole algorithm to play with this part.

If someone thinks (s)he can rewrite this code in assembly or optimize in other way to make it run faster, please give it a try.
Note that n59 does not exceed 2^48.
Each of the varaibles "sito" and OKOK***[***] has 64 bits.

The most time consuming part starts at the line containing OKOK101.

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12922 - Posted: 21 Jan 2009 | 16:00:13 UTC - in response to Message 12913.

edit: BTW x87 ops can be vectorised by hand, sr2sieve uses highly vectorised ASM code and x87 ops are used in the x86_64 version instead of SSE2 when the factors exceed 2^51.

x87 cannot be vectorized because each instruction operates on only one datum, unlike SSE2, which operates on two or four data.

But I understand why x87 was used. However, I think that fixed point would be an alternative too.

____________

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12924 - Posted: 21 Jan 2009 | 17:03:21 UTC - in response to Message 12914.
Last modified: 21 Jan 2009 | 17:18:36 UTC

Note that n59 does not exceed 2^48.
Each of the varaibles "sito" and OKOK***[***] has 64 bits.

The most time consuming part starts at the line containing OKOK101.

Since n59 is smaller than 2^48, long double is not needed, as it fits in the 52 bits of mantissa in double.

How about this code instead?

#if 0 sito=OKOK61[r61]; if(sito&=OKOK67[r67]) if(sito&=OKOK71[r71]) if(sito&=OKOK73[r73]) if(sito&=OKOK79[r79]) if(sito&=OKOK83[r83]) if(sito&=OKOK89[r89]) if(sito&=OKOK97[r97]) #else sito=OKOK61[r61] & OKOK67[r67] & OKOK71[r71] & OKOK73[r73] & OKOK79[r79] & OKOK83[r83] & OKOK89[r89] & OKOK97[r97]; #endif if(sito&=OKOK101[n59-(int64_t)(n59*(1.0/101))]) if(sito&=OKOK103[n59-(int64_t)(n59*(1.0/103))]) if(sito&=OKOK107[n59-(int64_t)(n59*(1.0/107))]) if(sito&=OKOK109[n59-(int64_t)(n59*(1.0/109))]) if(sito&=OKOK113[n59-(int64_t)(n59*(1.0/113))]) if(sito&=OKOK127[n59-(int64_t)(n59*(1.0/127))]) if(sito&=OKOK131[n59-(int64_t)(n59*(1.0/131))]) if(sito&=OKOK137[n59-(int64_t)(n59*(1.0/137))]) if(sito&=OKOK139[n59-(int64_t)(n59*(1.0/139))]) if(sito&=OKOK149[n59-(int64_t)(n59*(1.0/149))]) if(sito&=OKOK151[n59-(int64_t)(n59*(1.0/151))]) if(sito&=OKOK157[n59-(int64_t)(n59*(1.0/157))]) if(sito&=OKOK163[n59-(int64_t)(n59*(1.0/163))]) if(sito&=OKOK167[n59-(int64_t)(n59*(1.0/167))]) if(sito&=OKOK173[n59-(int64_t)(n59*(1.0/173))]) if(sito&=OKOK179[n59-(int64_t)(n59*(1.0/179))]) if(sito&=OKOK181[n59-(int64_t)(n59*(1.0/181))]) if(sito&=OKOK191[n59-(int64_t)(n59*(1.0/191))]) if(sito&=OKOK193[n59-(int64_t)(n59*(1.0/193))]) if(sito&=OKOK197[n59-(int64_t)(n59*(1.0/197))]) if(sito&=OKOK199[n59-(int64_t)(n59*(1.0/199))]) if(sito&=OKOK211[n59-(int64_t)(n59*(1.0/211))]) if(sito&=OKOK223[n59-(int64_t)(n59*(1.0/223))]) if(sito&=OKOK227[n59-(int64_t)(n59*(1.0/227))]) if(sito&=OKOK229[n59-(int64_t)(n59*(1.0/229))]) if(sito&=OKOK233[n59-(int64_t)(n59*(1.0/233))]) if(sito&=OKOK239[n59-(int64_t)(n59*(1.0/239))]) if(sito&=OKOK241[n59-(int64_t)(n59*(1.0/241))]) if(sito&=OKOK251[n59-(int64_t)(n59*(1.0/251))]) if(sito&=OKOK257[n59-(int64_t)(n59*(1.0/257))]) if(sito&=OKOK263[n59-(int64_t)(n59*(1.0/263))]) if(sito&=OKOK269[n59-(int64_t)(n59*(1.0/269))]) if(sito&=OKOK271[n59-(int64_t)(n59*(1.0/271))]) if(sito&=OKOK277[n59-(int64_t)(n59*(1.0/277))]) if(sito&=OKOK281[n59-(int64_t)(n59*(1.0/281))]) if(sito&=OKOK283[n59-(int64_t)(n59*(1.0/283))]) if(sito&=OKOK293[n59-(int64_t)(n59*(1.0/293))]) if(sito&=OKOK307[n59-(int64_t)(n59*(1.0/307))]) if(sito&=OKOK311[n59-(int64_t)(n59*(1.0/311))]) if(sito&=OKOK313[n59-(int64_t)(n59*(1.0/313))]) if(sito&=OKOK317[n59-(int64_t)(n59*(1.0/317))]) if(sito&=OKOK331[n59-(int64_t)(n59*(1.0/331))])

The code that replaces the remainder operation should be much faster, but with so many chained branches, there ought to be a lot of CPU stalls due to branch misprediction.

HTH
____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12925 - Posted: 21 Jan 2009 | 17:25:59 UTC - in response to Message 12924.


if(sito&=OKOK101[n59-(int64_t)(n59*(1.0/101))])



Shouldn't be
if(sito&=OKOK101[n59-101*(int64_t)(n59*(1.0/101))])

instead?

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12927 - Posted: 21 Jan 2009 | 17:51:03 UTC - in response to Message 12925.
Last modified: 21 Jan 2009 | 17:52:56 UTC

Shouldn't it be

if(sito&=OKOK101[n59-101*(int64_t)(n59*(1.0/101))])

instead?

Yes, thanks.

#if 0 sito=OKOK61[r61]; if(sito&=OKOK67[r67]) if(sito&=OKOK71[r71]) if(sito&=OKOK73[r73]) if(sito&=OKOK79[r79]) if(sito&=OKOK83[r83]) if(sito&=OKOK89[r89]) if(sito&=OKOK97[r97]) #else sito=OKOK61[r61] & OKOK67[r67] & OKOK71[r71] & OKOK73[r73] & OKOK79[r79] & OKOK83[r83] & OKOK89[r89] & OKOK97[r97]; #endif if(sito&=OKOK101[n59-101*(int64_t)(n59*(1.0/101))]) if(sito&=OKOK103[n59-103*(int64_t)(n59*(1.0/103))]) if(sito&=OKOK107[n59-107*(int64_t)(n59*(1.0/107))]) if(sito&=OKOK109[n59-109*(int64_t)(n59*(1.0/109))]) if(sito&=OKOK113[n59-113*(int64_t)(n59*(1.0/113))]) if(sito&=OKOK127[n59-127*(int64_t)(n59*(1.0/127))]) if(sito&=OKOK131[n59-131*(int64_t)(n59*(1.0/131))]) if(sito&=OKOK137[n59-137*(int64_t)(n59*(1.0/137))]) if(sito&=OKOK139[n59-139*(int64_t)(n59*(1.0/139))]) if(sito&=OKOK149[n59-149*(int64_t)(n59*(1.0/149))]) if(sito&=OKOK151[n59-151*(int64_t)(n59*(1.0/151))]) if(sito&=OKOK157[n59-157*(int64_t)(n59*(1.0/157))]) if(sito&=OKOK163[n59-163*(int64_t)(n59*(1.0/163))]) if(sito&=OKOK167[n59-167*(int64_t)(n59*(1.0/167))]) if(sito&=OKOK173[n59-173*(int64_t)(n59*(1.0/173))]) if(sito&=OKOK179[n59-179*(int64_t)(n59*(1.0/179))]) if(sito&=OKOK181[n59-181*(int64_t)(n59*(1.0/181))]) if(sito&=OKOK191[n59-191*(int64_t)(n59*(1.0/191))]) if(sito&=OKOK193[n59-193*(int64_t)(n59*(1.0/193))]) if(sito&=OKOK197[n59-197*(int64_t)(n59*(1.0/197))]) if(sito&=OKOK199[n59-199*(int64_t)(n59*(1.0/199))]) if(sito&=OKOK211[n59-211*(int64_t)(n59*(1.0/211))]) if(sito&=OKOK223[n59-223*(int64_t)(n59*(1.0/223))]) if(sito&=OKOK227[n59-227*(int64_t)(n59*(1.0/227))]) if(sito&=OKOK229[n59-229*(int64_t)(n59*(1.0/229))]) if(sito&=OKOK233[n59-233*(int64_t)(n59*(1.0/233))]) if(sito&=OKOK239[n59-239*(int64_t)(n59*(1.0/239))]) if(sito&=OKOK241[n59-241*(int64_t)(n59*(1.0/241))]) if(sito&=OKOK251[n59-251*(int64_t)(n59*(1.0/251))]) if(sito&=OKOK257[n59-257*(int64_t)(n59*(1.0/257))]) if(sito&=OKOK263[n59-263*(int64_t)(n59*(1.0/263))]) if(sito&=OKOK269[n59-269*(int64_t)(n59*(1.0/269))]) if(sito&=OKOK271[n59-271*(int64_t)(n59*(1.0/271))]) if(sito&=OKOK277[n59-277*(int64_t)(n59*(1.0/277))]) if(sito&=OKOK281[n59-281*(int64_t)(n59*(1.0/281))]) if(sito&=OKOK283[n59-283*(int64_t)(n59*(1.0/283))]) if(sito&=OKOK293[n59-293*(int64_t)(n59*(1.0/293))]) if(sito&=OKOK307[n59-307*(int64_t)(n59*(1.0/307))]) if(sito&=OKOK311[n59-311*(int64_t)(n59*(1.0/311))]) if(sito&=OKOK313[n59-313*(int64_t)(n59*(1.0/313))]) if(sito&=OKOK317[n59-317*(int64_t)(n59*(1.0/317))]) if(sito&=OKOK331[n59-331*(int64_t)(n59*(1.0/331))])

____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12928 - Posted: 21 Jan 2009 | 18:02:31 UTC

I tried on a 64-bit linux comp someting like:

n=12345678901234LL; for(i=1;i<100000000;i++)sum+=(n+i)%101;

and then
for(i=1;i<100000000;i++)sum+=(n+i)-101*(long long)((n+i)*(1.0/101));

and the second version was 1.37 times slower :-(

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12931 - Posted: 21 Jan 2009 | 20:00:41 UTC - in response to Message 12902.

The speed gain seems to be somewhat smaller now, something like 2.25, which would drop 8 to around 3.5.


I have incorporated Jarek's faster32.zip code into the BOINC executables just uploaded to http://www.geocities.com/g_w_reynolds/AP26/. Here are the new times for the range K=366384,SHIFT=0 on my Core 2 Duo 2.66GHz (Linux):

32-bit: 8 min. 10 sec.
64-bit: 4 min. 43 sec. (64-bit 1.7 times faster than 32-bit)

For comparison the old times before adding the faster32.zip code were:

32-bit: 28 min. 27 sec.
64-bit: 4 min. 43 sec. (64-bit 6.0 times faster than 32-bit)


Great work Jarek! Is the range of K and SHIFT that the new code can handle still the same? I added the changes to MAKES.H that you suggested, changing (j*SHIFT)%X into (j*(SHIFT%X))%X.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12933 - Posted: 21 Jan 2009 | 20:08:26 UTC

From what I can work out, part of the reason that the old 32-bit code was so much slower is that GCC (at least version 4.1.2) doesn't seem to be able to make an important optimisation for this sort of function in 32-bit mode:

unsigned long long mod541(unsigned long long n)
{
return n%541;
}

In 64-bit mode GCC transforms division operations into a multiplication by a generalised inverse. Because the modulus is a compile-time constant the inverse can be computed at compile time, and so the resulting code contains no div instruction at all.

Although it is possible in principle to perform a similar optimisation in 32-bit mode, for some reason GCC doesn't do it. Instead it makes a call to a generic library function to perform the mod, which can have no knowledge of the modulus at compile time and probably has to resort to a plain div instruction.

However when n is a 32-bit value GCC performs the optimisation in 32-bit
mode too.

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 12934 - Posted: 21 Jan 2009 | 20:49:14 UTC - in response to Message 12933.

From what I can work out, part of the reason that the old 32-bit code was so much slower is that GCC (at least version 4.1.2) doesn't seem to be able to make an important optimisation for this sort of function in 32-bit mode:

That's because only x86-64 has 64-bit registers and 64-bit multiplications.

HTH

____________

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 12935 - Posted: 21 Jan 2009 | 20:59:58 UTC - in response to Message 12934.

From what I can work out, part of the reason that the old 32-bit code was so much slower is that GCC (at least version 4.1.2) doesn't seem to be able to make an important optimisation for this sort of function in 32-bit mode:

That's because only x86-64 has 64-bit registers and 64-bit multiplications.

HTH

The optimisation involves transforming a 64-bit division into a 64-bit multiplication by a constant, but there is no need for the multiplication to be done in a singe 64-bit register. Using 4x32-bit multiplies and some additions to synthesize a 64-bit multiplication would still be many times faster than calling a library function to do a 64-bit division.

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1188
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 12936 - Posted: 22 Jan 2009 | 1:05:43 UTC - in response to Message 12935.

If these division routines are called very frequently (thousands of times), then pre-compute the magic number and store in a table. Will n ever exceed 16 bits? If not, then once you have the magic number (always presume that the compiler doesn't calculate it or the inverse), then you get

return n - ((n*magicNumber) >> magicShift);

instead of

return n%mod;

This can be done with 32-bit registers since neither n nor the magicNumber exceed 16 bits. The magic numbers and their shifts are calculated up front for each number you would divide by. If n does exceed 16 bits, then you will need to use a 64-bit value for the intermediate value. This should be doable in assembly code. If no 64-bit registers are available you could also consider:

double dn = (double) n;
return (int64_t) (n - (n * divR);

where divR is a constant 1.0 / divisor, which can be stored in a table or defined as a constant, such as

static const divR = 1.0 / 541;

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12938 - Posted: 22 Jan 2009 | 3:19:51 UTC - in response to Message 12931.


Is the range of K and SHIFT that the new code can handle still the same?

Yes, the change has no effect on the range.

I added the changes to MAKES.H that you suggested, changing (j*SHIFT)%X into (j*(SHIFT%X))%X.

STEP, not SHIFT (The change you made is correct, there is just a typo in the above line).
This change allows to use K up to 318,000,000.

There is one more possible change I have been thinking of.
In AP26-boinc.h there are lines
#include "IF.H" if(n%7) if(n%11) if(n%13) if(n%17) if(n%19) if(n%23)


From the point of view of computations performed and output produced, any permutation of those lines would give identical effect. Perhaps the permutation

if(n%7) if(n%11) if(n%13) if(n%17) if(n%19) if(n%23) #include "IF.H"


would be a bit faster, or perpaps it is right to include "IF.H" somewhere in the middle. I haven't made any tests. This is unlikely to have noticable effect, but if you can measure any speed difference at all, then you can rearrange those lines.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12941 - Posted: 22 Jan 2009 | 5:40:55 UTC - in response to Message 12933.

From what I can work out, part of the reason that the old 32-bit code was so much slower is that GCC (at least version 4.1.2) doesn't seem to be able to make an important optimisation for this sort of function in 32-bit mode:

unsigned long long mod541(unsigned long long n)
{
return n%541;
}


The ONLY change I have done to the code was to replace the division of the form above to something that goes along the line:

int mod541(long long n)
{
return ( ((int)n&((1<<30)-1)) + Mod[2^30,541]*(int)(n>>30) )%541;
}

where Mod[2^30,541] is precomputed and explicitly given to the code.
I could have used unsigned, but I didn't bother to.

As it happened it was even slightly better than that, since n (n59 in the actual code) was the same in all the divisions, so ((int)n&((1<<30)-1)) and (int)(n>>30) could be precomputed for all the divisions being made.

Note that I have used my knowledge that n has at most 48 bits. The above division works whenever there is no overflow in the addition above. With unsigned one can replace 30 by 31 and then the key issue is whether
Mod[2^31,541]*(int)(n>>31) fits in 31 bits.

As it happened, 331 was the largest prime, I could make the above replacement for. Larger primes, up to 541, are being used as well, but those divisions are executed rarely and also I do not have limitation to 48 bits of the dividend there, so the above trick couldn't be implemented.

The above trick requires human knowledge that the divident is limited to, say, 48 bits. Even if a compiler was smart, can you pass this knowledge to the compiler somehow?

There is still an open question about the division like that on 64-bit machines. Can you be faster than the compiler is? Your advantage is the knowledge that the dividend is limited to 48 bits, which compiler does not know about.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12943 - Posted: 22 Jan 2009 | 8:21:11 UTC - in response to Message 12941.

In fact similar optimalization of such division on 32-bit platform can be performed even if the dividend is fully loaded with 64 significant bits. Simply we can cut the dividend into 3 pieces, 30, 17, 17 bits long and still there is enough room to handle any 13-bit divisor (or even any divisor up to 1.5*2^13).

I am really surprised compilers cannot be trusted to complile efficiently so basic operations like this kind of division.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 12955 - Posted: 23 Jan 2009 | 7:27:49 UTC - in response to Message 12943.

I have worked out a replacement for all the remaining divisions in the AP26 program to suit 32-bit version.

The additional patch is described in:

http://www.math.uni.wroc.pl/~jwr/AP26/if32.zip

Since those divisions are not being executed frequently, do not expect a miracle, but I feel that a few % speed gain is quite likely.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13002 - Posted: 26 Jan 2009 | 23:43:18 UTC
Last modified: 26 Jan 2009 | 23:45:28 UTC

I haven't tried any ideas from the recent posts yet, but I reduced the number of branches from 49 to 7 in SITO-boinc.H by changing statements of the form

if (sito&=A1)
if (sito&=A2)
...
if (sito&=A7)

to

if (sito&=(A1 & A2 & ... & A7))


Only the 64-bit code benefitted. I also used the GCC __builtin_expect() operator to adjust the branch predictions, but I don't know how to do this
for other compilers.

The 64-bit time for the test range on my Core 2 Duo at 2.66GHz reduced from 4 min. 43 sec. to 4 min. 27 sec.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13009 - Posted: 27 Jan 2009 | 1:27:33 UTC - in response to Message 12941.

The above trick requires human knowledge that the divident is limited to, say, 48 bits. Even if a compiler was smart, can you pass this knowledge to the compiler somehow?


If you do something like n &= ((1LL<<48)-1) then the compiler should know that 0 <= n < 2^48 thereafter, but I don't know whether GCC makes much use of that knowledge.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13024 - Posted: 27 Jan 2009 | 9:56:03 UTC - in response to Message 13009.

The above trick requires human knowledge that the divident is limited to, say, 48 bits. Even if a compiler was smart, can you pass this knowledge to the compiler somehow?


If you do something like n &= ((1LL<<48)-1) then the compiler should know that 0 <= n < 2^48 thereafter, but I don't know whether GCC makes much use of that knowledge.


I have tested (gcc under 64-bit linux)

for(n=12345678901234LL;n<12345678901234LL+100000000;n++)sum+=n%DIV;

against

for(n=12345678901234LL;n<12345678901234LL+100000000;n++)sum+=(n&((1LL<<48)-1))%DIV;

The second version was slower than the first one for DIV=101, but significantly faster for DIV=127.

So at least it looks like for %127 the trick does help.

From the point of view of computations performed it is harmless to add the line
n59 &= ((1LL<<48)-1);
at the beginning of SITO-boinc.H, but I am not sure whether the compiler would remember that, when it gets to the divisions a few lines later.

The same line can be also added before divisions starting with the line
r61=n59%61;
in AP26-boinc.H, but since those divisions are not frequent, it may not give a measurable effect one way or the other.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13030 - Posted: 27 Jan 2009 | 11:38:57 UTC - in response to Message 13024.

It seems that 101 is one of very few exceptions.

For most divisors I have tried there seems to be a significant speed gain, sometimes nearly 30%.

spurious
Send message
Joined: 12 Jan 09
Posts: 5
ID: 34116
Credit: 104,678
RAC: 0
AP 26/27 Silver: Earned 100,000 credits (104,671)
Message 13036 - Posted: 27 Jan 2009 | 13:48:42 UTC

Modulus with a number of the form 2^n-1 can be done without division. Thats why 127 is so fast.

http://graphics.stanford.edu/~seander/bithacks.html

A book and a site that supports it ( source code avalable for free! )

http://www.hackersdelight.org/

Here is a site that talks about branch elimination.

http://www.cellperformance.com/articles/2006/07/tutorial_branch_elimination_pa.html

And don't forget to go to AMD and Intel's site. They both have very good manuals for programming that talk alot about optimization and the benifits of 64-bit.
____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13038 - Posted: 27 Jan 2009 | 14:23:27 UTC - in response to Message 13036.

In the file
http://www.math.uni.wroc.pl/~jwr/AP26/division.txt
I have collected results of a simple benchmark of 64-bit versus 48-bit division (as suggested by Geoff).

Division by 127 is the fastest, but not by so much as one could expect.

64-bit division by 127 is slower than many 48-bit divisions by other numbers.

The timing ratio of 48/64 bit divisions is quite often around 0.72 in my test, which also performs some other operations, so the pure division should give even larger speed difference.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13055 - Posted: 28 Jan 2009 | 0:48:55 UTC - in response to Message 12955.

I have worked out a replacement for all the remaining divisions in the AP26 program to suit 32-bit version.

The additional patch is described in:

http://www.math.uni.wroc.pl/~jwr/AP26/if32.zip

Since those divisions are not being executed frequently, do not expect a miracle, but I feel that a few % speed gain is quite likely.


I added this patch to the executables uploaded today, the time for the 32-bit executable reduced from 8 min. 10 sec. to 7 min. 58 sec. on my Core 2 Duo 2.66GHz.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13142 - Posted: 30 Jan 2009 | 1:14:24 UTC - in response to Message 12936.

If no 64-bit registers are available you could also consider:

double dn = (double) n;
return (int64_t) (n - (n * divR);

where divR is a constant 1.0 / divisor, which can be stored in a table or defined as a constant, such as

static const divR = 1.0 / 541;


I implemented the main SITO.H code in assembly for SSE2 machines using this method (i.e. all multiplications done in floating point). The times for the test range on my machines are:

32-bit on P4 2.9 GHz: 11 min. 48 sec. (was 14 min. 25 sec.)
32-bit on C2D 2.66GHz: 6 min. 54 sec. (was 7 min. 58 sec.)
64-bit on C2D 2.66GHz: 4 min. 12 sec. (was 4 min. 27 sec.)

The code works on the assumption that n59 < 2^51 and p < 2^31 when computing the remainder n59%p, but doesn't make any special use of the fact that p is a compile-time constant. The remainder ends up being in the range 0 <= n59%p <= p, instead of the usual range 0 <= n59%p < p, so I extended the OKOK arrays by one element (e.g. OKOK101[101] = OKOK101[0]) to avoid the need to correct the result.

It is quite possible that this code will be slower on other machine types, so I have built seperate -pentium4 and -core2 executables that use the assembly routines. If someone could test them on an AMD machine and let me know whether it is faster or slower that would be great.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13147 - Posted: 30 Jan 2009 | 5:08:26 UTC - in response to Message 13142.

Geoff,
my tests described in http://www.math.uni.wroc.pl/~jwr/AP26/division.txt indicated that implementing your suggestion of telling the compiler that n59 has 48 bits by using (n59&((1LL<<48)-1))%DIV instead of n59%DIV speeds up the division significantly in most cases. I feel that this option may be more promissing than divisions using double.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13196 - Posted: 30 Jan 2009 | 23:02:34 UTC - in response to Message 13147.

my tests described in http://www.math.uni.wroc.pl/~jwr/AP26/division.txt indicated that implementing your suggestion of telling the compiler that n59 has 48 bits by using (n59&((1LL<<48)-1))%DIV instead of n59%DIV speeds up the division significantly in most cases. I feel that this option may be more promissing than divisions using double.


I tried adding that to the C code and it had little effect on the speed of the program overall, in fact it was a little slower. But I will try it in assembly when I get time.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13204 - Posted: 31 Jan 2009 | 8:50:16 UTC - in response to Message 13196.

Perhaps one cannot speed up the program that way :-(

But if you decide one day to put more of your own time into investigating the possibility of the 48-bit division speedup, here are a few hints. They may not help, but they shouldn't hurt at least.

This relates to division by primes 101-331.

1. According to my tests, there were some divisors, which were causing slowdown with 48-bit division trick. The are very badly placed since those divisions are among most frequent. The bad guys are 101, 103, 109, 131. There are some neutral primes: 113, 137, 191, 193, 241, 251, which gave no effect either way. Perhaps it is a good idea not to apply the trick to those primes, at least at the test stage.

Even that, I estimated that the good divisors should have twice as much the influence as the bad ones, so it doesn't explain by itself why there was no speed gain.

2. Each next division is executeded less frequently than the previous one. For example, for each 1000 divisions by 101, one should expect 105.65 divisions by 151 and 21.02 divisions by 211. Hence those far divisions should have minor effect if any.

3. Hence to verify whether the trick would work at all, it is enough to concentrate on five divisors 107, 127, 139, 149, 151. If it is possible to speed up the program, those five divisions should contribute more than all the rest combined. So the question is: can you speed up the program by playing with only those 5 divisions?

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13206 - Posted: 31 Jan 2009 | 12:28:16 UTC - in response to Message 13204.


2. Each next division is executeded less frequently than the previous one. For example, for each 1000 divisions by 101, one should expect 105.65 divisions by 151 and 21.02 divisions by 211. Hence those far divisions should have minor effect if any.


I was a bit wrong with the estimates here and the actual proportion of frequencies of divisions by 101, 151, 211 is more like 1000:287:65, but this doesn't affect the general phenomena.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13210 - Posted: 31 Jan 2009 | 14:19:13 UTC - in response to Message 13206.
Last modified: 31 Jan 2009 | 15:02:07 UTC

Inspired by the trick with multiplication by double 1/101. I have tried the same thing with integer arithmetic instead. I defined

#define INV101 182641030432767838ULL
#define INV103 179094602657374288ULL
#define INV107 172399477324388333ULL
#define INV109 169236184162472951ULL
#define INV113 163245522776190723ULL
#define INV127 145249953336295683ULL
#define INV131 140814840257324822ULL
#define INV137 134647766961383589ULL
#define INV139 132710389019493178ULL
#define INV149 123803651501406387ULL
#define INV151 122163868037811601ULL
#define INV157 117495185182863387ULL
#define INV163 113170209041162894ULL
#define INV167 110459545351554202ULL
#define INV173 106628578460748854ULL
#define INV179 103054436165975149ULL
#define INV181 101915713114417413ULL
#define INV191 96579811904238491ULL
#define INV193 95578984837873325ULL
#define INV197 93638294790403816ULL
#define INV199 92697206400550511ULL

where INV101=Ceiling[2^64/101]

Then I tested the division of the form

(101*(unsigned int)(1+((INV101*n)>>40)))>>24

with
unsigned long long n;
and it seems to be a lot faster than pure n%101

Note that in AP26 program n59 can be made unsigned with no harm. The only issue here is that the right shift >>40 should fill left bits with zero's - I am not sure what are the standards here.

EDIT:

Let me think whether the above division is always correct - it did give correct results on 1,000,000,000 divisions, but I haven't conducted formal proof of its correctness yet.

EDIT 2:

There was a small bug, the formula above is fixed.

WRONG: (101*(unsigned int)((INV101*n)>>40))>>24
CORRECT: (101*(unsigned int)(1+((INV101*n)>>40)))>>24

Unfortunately the correction made the division much slower :-((((

EDIT 3:

Another correct version is:

CORRECT2: (256+101*(unsigned int)(((INV101*n)>>40)))>>24

CORRECT2 seems to be faster than CORRECT (but not as fast as WRONG).

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13213 - Posted: 31 Jan 2009 | 16:02:22 UTC - in response to Message 13210.
Last modified: 31 Jan 2009 | 16:25:17 UTC

After all, the following seems to be my best suggestion for a replacement for n59%101:

#define INV101f 182641030432767837ULL

where INV101f=Floor[2^64/101]

(101*(unsigned int)(((INV101f*(n59+1))>>40)))>>24

n59+1 may be precomputed for all the divisions in SITO.H.
NOTE: n59 is always odd, so n59+1 is always even - this is a remark just in case someone wants to play with all possible bits, although I doubt it could help.

The problem with my first attempt was that I mixed rounding up (INV101=Ceiling) with rounding down (>>40 which discards low bits).

Here all the rounding is down and errors are small, so the formula is correct. Since (n59+1)>0 and INV101f<2^64/101

(101*(unsigned int)(((INV101f*(n59+1))>>40)))

is just below (n59%101 + 1) * 2^24 but is never equal to it.

Note that for divisor>256 shifts must be replaced by >>41 and >>23.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13227 - Posted: 31 Jan 2009 | 18:20:54 UTC - in response to Message 13213.

I am not sure whether it can help, but I have verified that
(INV101f*n59)>>57
will lead to a 7-bit integer uniquely corresponding to n59%101.
(NOTE: I assume that the bit-shift sets left bits to zero, so perhaps n59 should be changed to unsigned).

The correspondence is as follows:

n59%101 = 0 corresponds to 127
n59%101 = r corresponds to Floor[128*r/101]

Therefore I could imagine OKOK101 as a table with 128 entries. 101 of them placed at positions corresponding to 7-bit integers obtained from the above formula, 27 empty and irrelevant. This could continue up to OKOK127. I am not sure whether the outcome would be worth the programming effort. This would save only one multiplication by 101 and one bit-shift.

So the entry OKOK101[0] should be placed in NEW_OKOK101[127]
and for r=1,2,3,...,100
OKOK101[r] should be placed in NEW_OKOK101[(128*r)/101]

The new call replacing OKOK101[n59%101] would be then:

NEW_OKOK101[(INV101f*n59)>>57]

The same holds for larger 7-bit divisors (and 8-bit divisors with >>56). For 9-bit divisors the errors are becoming too large, so the correspondence is not 1-1 anymore.

For 131 it may be not worth the game for yet another reason: OKOK131 would have to be twice the size (256 instead of 131) and could waste too much processor's cache.

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1188
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 13237 - Posted: 1 Feb 2009 | 0:24:51 UTC - in response to Message 13227.

For 131 it may be not worth the game for yet another reason: OKOK131 would have to be twice the size (256 instead of 131) and could waste too much processor's cache.


Which is why I was promoting a magic number. If all of your divisors are less then 15 bits, then you can compute a 16-bit magic number and a shift and do all of the work in a 32-bit register (signed). You can look at the code behind http://www.hackersdelight.org/magic.htm to learn the code or grab it from one of Geoff's sieving programs. Hacker's Delight is a 32-bit version, but you can easily write a 16-bit version to handle your small divisors.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13245 - Posted: 1 Feb 2009 | 7:43:20 UTC - in response to Message 13237.

That is more or less what I am suggesting, except there is 48-bit dividend here, so everything is done in 64-bit registers.

If I understand correctly, what you mean, to compute n%131 one has to multiply n by a magic number to get an 8-bit integer and then somehow map it to the corresponding residue in the range 0-130. Is that right?

The point is that we have the following 2 options:

1. Apply the mapping (which is some operation to perform, hence takes time) and take an entry from a table of size 131.

2. Use larger table of size 256 with 131 entries and 125 holes, but save time on performing the mapping.

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1188
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 13271 - Posted: 1 Feb 2009 | 13:35:30 UTC - in response to Message 13245.

That is more or less what I am suggesting, except there is 48-bit dividend here, so everything is done in 64-bit registers.

If I understand correctly, what you mean, to compute n%131 one has to multiply n by a magic number to get an 8-bit integer and then somehow map it to the corresponding residue in the range 0-130. Is that right?

The point is that we have the following 2 options:

1. Apply the mapping (which is some operation to perform, hence takes time) and take an entry from a table of size 131.

2. Use larger table of size 256 with 131 entries and 125 holes, but save time on performing the mapping.


To divide by 131, you get the magic number of 16009 and a shift of 5. The code to do the mod is as follows:

(n - ((n * 16009) >> (5 + 16))*131)

The shift is 5+16 because the n * 16009 requires a 32-bit register, but because the assumption is that the divisor is less than 16 bites (thus how the magic number was calculated) you need to truncate the lower 16 bits of that operation. This can all be done in a 32-bit register. You can use a table or hard-code the values for each divisor.

My earlier mistake (which I often tend to forget) is that there are two multiplies and a shift in this operation. The advantage is that no 64-bit values are needed.

Here is the code to calculate the magic number (where d is the divisor):

uint16_t magic, d = 131;
uint16_t two15 = 0x8000;

uint16_t t = two15;
uint16_t anc = t - 1 - t%d; // Absolute value of nc.
uint16_t p = 15; // Init p.
uint16_t q1 = two15/anc; // Init q1 = 2**p/|nc|.
uint16_t r1 = two15 - q1*anc; // Init r1 = rem(2**p, |nc|).
uint16_t q2 = two15/d; // Init q2 = 2**p/|d|.
uint16_t r2 = two15- q2*d; // Init r2 = rem(2**p, |d|).
uint16_t delta, mag;

do {
p = p + 1;
q1 = 2*q1; // Update q1 = 2**p/|nc|.
r1 = 2*r1; // Update r1 = rem(2**p, |nc|.
if (r1 >= anc) { // (Must be an unsigned
q1 = q1 + 1; // comparison here).
r1 = r1 - anc;
}
q2 = 2*q2; // Update q2 = 2**p/|d|.
r2 = 2*r2; // Update r2 = rem(2**p, |d|.
if (r2 >= d) { // (Must be an unsigned
q2 = q2 + 1; // comparison here).
r2 = r2 - d;
}
delta = d - r2;
} while (q1 < delta || (q1 == delta && r1 == 0));

mag = q2 + 1;

magicShift = p - 16; // shift amount to return.

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 13274 - Posted: 1 Feb 2009 | 14:35:22 UTC - in response to Message 13271.


To divide by 131, you get the magic number of 16009 and a shift of 5. The code to do the mod is as follows:

(n - ((n * 16009) >> (5 + 16))*131)


This fails already for n=77682, so it is good only for 16-bit n.

In AP26 program n may be up tp 2^48. In order to do the above kind of trick you would need a magic number of similar accuracy, so the product (n* magic) would have no way of fitting in 64-bit register, it would require more like 96 bits.

However in my earlier posts I gave an idea how to perform a division like that, at a similar cost.

With unsigned n one can replace n%131 with

((unsigned int)((140814840257324821ULL*(n+1))>>40)*131U)>>24

for 64-bit computations or with

(((32786009U*(n+1))>>8)*131U)>>24

on 32-bits.

This should work correctly for n<2^56 and n<2^24 respectively (in fact you could drive almost up to 2^64/131 and 2^32/131 or perhaps even slightly over that - one would have to estimate errors accurately to find the exact limits).

The constants above are Floor[2^64/131] and Floor[2^32/131] respectively.

rogue
Volunteer developer
Avatar
Send message
Joined: 8 Sep 07
Posts: 1188
ID: 12001
Credit: 18,565,548
RAC: 0
PPS LLR Bronze: Earned 10,000 credits (31,229)PSA Jade: Earned 10,000,000 credits (18,533,435)
Message 13279 - Posted: 1 Feb 2009 | 17:12:32 UTC - in response to Message 13274.

I hadn't realized that n could be larger than 16 bits, although you probably stated it in this thread. I also presumed that you were sieving across n. I guess that what I really need to do is to look at your code and your paper to understand it. My initial thought is that something akin to what FermFact does might be possible, that is sieving across a rectangular space instead of a linear one.

Yet another of those things that I wish I had time to engross myself in...

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13312 - Posted: 3 Feb 2009 | 0:07:41 UTC - in response to Message 13213.

After all, the following seems to be my best suggestion for a replacement for n59%101:

#define INV101f 182641030432767837ULL

where INV101f=Floor[2^64/101]

(101*(unsigned int)(((INV101f*(n59+1))>>40)))>>24


Note that for divisor>256 shifts must be replaced by >>41 and >>23.


I haven't finished the code yet, but in 64-bit mode this method looks like it will be substantially faster than the floating point method.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13361 - Posted: 5 Feb 2009 | 1:06:11 UTC - in response to Message 13213.

(101*(unsigned int)(((INV101f*(n59+1))>>40)))>>24


I have uploaded new executables, the 64-bit ones use the above method, the time for the test range on my machine:

64-bit on C2D 2.66GHz: 3 min. 46 sec. (was 4 min. 12 sec.)

It turns out that the assembly version is only marginally faster than the C code for this method.

I have put the faster32.zip method into assembly for the 32-bit version and that makes it faster than the floating-point method on my Core 2 Duo in 32-bit mode:

32-bit on C2D 2.66GHz: 6 min. 48 sec. (was 6 min. 54 sec.)

but the floating point method is still much faster on my Pentium 4 so I have left that unchanged in the AP26-pentium4-linux executable.


I haven't included the floating point (SSE2) code in the BOINC executable because it looks like it might only be faster on Pentium 4 machines, but if we can work out on which other machines the AP26-pentium4-* executable runs faster than the AP26-i686-* executable, then I will add both versions to the 32-bit BOINC executable and select which to use at runtime.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 13878 - Posted: 25 Feb 2009 | 5:09:47 UTC

I just uploaded new files at http://www.geocities.com/g_w_reynolds/AP26/, the only change is that the Linux BOINC executables are now compiled with the C++ libraries statically linked.

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 13905 - Posted: 26 Feb 2009 | 18:43:32 UTC
Last modified: 26 Feb 2009 | 18:49:49 UTC

I compiled newest source but it is still slower then first compilation. Compiled 2 versions with SITO-Boinc.h & MAKES-Boinc.h and original MAKES.h & SITO.h from Jarek's source



Version with SITO.h & MAKES.h

http://www.ee.pw.edu.pl/~jaworows/primegrid/AP26a.zip

Version with SITO-Boinc.h & MAKES-Boinc.h

http://www.ee.pw.edu.pl/~jaworows/primegrid/AP26b.zip

I dont use any assembler code, VS assembler compiler have problems with GCC asm syntax, more about it you can read here http://www.codeproject.com/KB/cpp/gccasm.aspx

-----------------------------------------------------------------
About x86 version. I will try to compile it today or tomorrow i need some time to download Intel x86 Compiler.

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 14065 - Posted: 2 Mar 2009 | 2:27:36 UTC

I have uploaded new executables dated 02 Mar 2009. The only change is to fix the invalid results produced by the 23 Feb 2009 x86_64 Windows executable.

ebahapoProject donor
Avatar
Send message
Joined: 11 Aug 05
Posts: 82
ID: 190
Credit: 2,358,134
RAC: 353
321 LLR Bronze: Earned 10,000 credits (13,495)Cullen LLR Bronze: Earned 10,000 credits (32,732)ESP LLR Bronze: Earned 10,000 credits (40,513)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (43,124)PPS LLR Silver: Earned 100,000 credits (371,184)PSP LLR Silver: Earned 100,000 credits (105,304)SoB LLR Silver: Earned 100,000 credits (339,297)SGS LLR Silver: Earned 100,000 credits (225,117)TPS LLR (retired) Silver: Earned 100,000 credits (143,637)Woodall LLR Bronze: Earned 10,000 credits (85,367)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (30,102)PPS Sieve Silver: Earned 100,000 credits (489,216)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (50,681)TRP Sieve (suspended) Silver: Earned 100,000 credits (296,390)GFN Bronze: Earned 10,000 credits (59,109)
Message 14197 - Posted: 5 Mar 2009 | 22:06:41 UTC - in response to Message 12848.

I see. Yet, other PrimeGrid applications take much longer than 41min to run, so why not 32-bit AP26 too?

With AP26, there's a 1:8 processing time ratio between 64 bit and 32 bit. With the sieves, the greatest ratio is 1:2. And with LLR, it's 1:1.

Since the x86 Linux application has been out, it seems that the 64 to 32-bit ratio is more like 1:2. Why not release a x86 Windows version too?

TIA

____________

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 14198 - Posted: 5 Mar 2009 | 22:17:13 UTC - in response to Message 14197.


Since the x86 Linux application has been out, it seems that the 64 to 32-bit ratio is more like 1:2. Why not release a x86 Windows version too?

Should be coming soon.

____________

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 14230 - Posted: 6 Mar 2009 | 20:15:05 UTC - in response to Message 14197.

I see. Yet, other PrimeGrid applications take much longer than 41min to run, so why not 32-bit AP26 too?

With AP26, there's a 1:8 processing time ratio between 64 bit and 32 bit. With the sieves, the greatest ratio is 1:2. And with LLR, it's 1:1.

Since the x86 Linux application has been out, it seems that the 64 to 32-bit ratio is more like 1:2. Why not release a x86 Windows version too?

TIA



I have problems with download Intel Compiler. I hope i will download it successful tonight (i have 10-15 kb/s speed on intel site :().

sesef
Send message
Joined: 11 Nov 08
Posts: 26
ID: 31577
Credit: 1,299,123
RAC: 0
321 LLR Bronze: Earned 10,000 credits (12,687)PPS LLR Silver: Earned 100,000 credits (127,235)SGS LLR Bronze: Earned 10,000 credits (37,872)TRP LLR Bronze: Earned 10,000 credits (40,627)Woodall LLR Bronze: Earned 10,000 credits (24,883)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,692)TRP Sieve (suspended) Silver: Earned 100,000 credits (271,736)AP 26/27 Gold: Earned 500,000 credits (762,684)
Message 14267 - Posted: 7 Mar 2009 | 23:43:47 UTC

x68 x64 compiled source form 02 Mar with Intel Compiler.

http://www.speedyshare.com/244920841.html

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 14369 - Posted: 14 Mar 2009 | 18:37:21 UTC - in response to Message 14197.

Since the x86 Linux application has been out, it seems that the 64 to 32-bit ratio is more like 1:2. Why not release a x86 Windows version too?

A 32 bit Windows application has been released along with an updated 32 bit Linux application. Both have the speed improvements.

Good Luck!
____________

Jack
Send message
Joined: 26 Nov 08
Posts: 76
ID: 32168
Credit: 11,976,778
RAC: 0
321 LLR Silver: Earned 100,000 credits (107,488)Cullen LLR Bronze: Earned 10,000 credits (16,381)PPS LLR Silver: Earned 100,000 credits (291,347)PSP LLR Silver: Earned 100,000 credits (234,328)SoB LLR Silver: Earned 100,000 credits (145,831)SGS LLR Silver: Earned 100,000 credits (107,251)TPS LLR (retired) Bronze: Earned 10,000 credits (20,445)TRP LLR Bronze: Earned 10,000 credits (13,956)Woodall LLR Bronze: Earned 10,000 credits (13,948)321 Sieve Amethyst: Earned 1,000,000 credits (1,358,496)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (58,935)PPS Sieve Turquoise: Earned 5,000,000 credits (6,996,813)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (455,231)TRP Sieve (suspended) Silver: Earned 100,000 credits (213,378)AP 26/27 Amethyst: Earned 1,000,000 credits (1,555,454)PSA Silver: Earned 100,000 credits (388,566)
Message 14426 - Posted: 17 Mar 2009 | 1:29:40 UTC - in response to Message 14369.

This is just a thank you for the 32 bit windows app to all involved with the coding and testing. My wife only lets me use 64bit linux on her P.C. for short spurts as she likes windows. Now even at twice as long I can get more done.. Thanks all, Jack

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 15654 - Posted: 17 May 2009 | 2:27:42 UTC

I'm currently doing a port to the IBM Cell / B.E. Linux platform. Besides a few errors i'm getting with PPE/SPE communications, it seems the work units that do complete are tagged as invalid by the server.

Is the source code modified to report only APs greater than 20 in SOL-AP26.txt in the official applications?

A better question would be, is the source available in this thread able to be used as-is with the anonymous platform feature? As-is, it reports AP 10s and larger. I don't know if i'm chasing a problem with my results file format. All testing I can do on my end passes.

Thanks for any help.

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 15663 - Posted: 17 May 2009 | 14:39:55 UTC - in response to Message 15654.

Is the source code modified to report only APs greater than 20 in SOL-AP26.txt in the official applications?

A better question would be, is the source available in this thread able to be used as-is with the anonymous platform feature? As-is, it reports AP 10s and larger. I don't know if i'm chasing a problem with my results file format. All testing I can do on my end passes.

Thanks for any help.

The source can be used as-is. The format isn't the problem, we accept all APs with length>=10, however, I checked the WUs in question, and it seems that your client did not upload any files, therefore the WUs were marked as invalid.
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 15669 - Posted: 17 May 2009 | 21:20:28 UTC
Last modified: 17 May 2009 | 21:21:04 UTC

Thank you for your quick response!

I was able to pinpoint the problem with your help, it was in my code resolving boinc's output file path.

Here is the first completed/validated AP26 workunit on Cell / B.E. hardware:

http://www.primegrid.com/result.php?resultid=106073142

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 15672 - Posted: 17 May 2009 | 22:35:43 UTC - in response to Message 15669.

Here is the first completed/validated AP26 workunit on Cell / B.E. hardware:

http://www.primegrid.com/result.php?resultid=106073142

Congratulations!!! And a job well done. Testing is very fast. :)

____________

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 15676 - Posted: 18 May 2009 | 5:35:56 UTC - in response to Message 15669.

Great!

Now, when you decide when the app is ready, would you be willing to give the build to me to be made public?
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 15678 - Posted: 18 May 2009 | 10:18:09 UTC - in response to Message 15676.

A few things to keep in mind, the app is reporting only the PPE cpu time, actual computation time is about 2 hours per work unit. Since the SPE's cpu time isn't counted (it's hidden in another thread), and the PPE is only being used at about 7% during computation (it's doing the PrimeQ() function for the SPE), that's why the calculation time is so low in the stats.

However, 6 SPEs are available on the PS3 platform (8 per cpu on other Cell processor hardware). I'm using a modified boinc client that i've told to use 6 cpus, so that it runs 6 threads at one time.

So about every 2 hours it reports 6 workunits. Not really fast, but not bad, either. Remember, the SPE is a vector engine, and i'm using pure scalar code at this time. I feel the code can be made to run faster with some work.

To release a public app, a few issues to be dealt with, i'd have to compile a static version of a boinc client and the AP26 app i'm using in order to distribute it. The only publicly available Cell boinc client is from ps3grid, and it's hacked to only use one processor, and cc_config.xml will not accept 6 cpus. Also, i'm still chasing the odd SPE/PPE comm error, which I think happens when killing boinc and restarting.

I do have 2 machines running for testing, now.

Profile 7ri9991 [MM]
Avatar
Send message
Joined: 24 Feb 09
Posts: 57
ID: 36070
Credit: 30,388,836
RAC: 0
321 LLR Bronze: Earned 10,000 credits (22,795)Cullen LLR Bronze: Earned 10,000 credits (23,087)PPS LLR Silver: Earned 100,000 credits (105,532)PSP LLR Bronze: Earned 10,000 credits (53,996)SoB LLR Bronze: Earned 10,000 credits (34,455)SR5 LLR Bronze: Earned 10,000 credits (11,626)SGS LLR Bronze: Earned 10,000 credits (20,209)TRP LLR Bronze: Earned 10,000 credits (18,601)Woodall LLR Silver: Earned 100,000 credits (113,413)321 Sieve Bronze: Earned 10,000 credits (21,376)Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,541,362)PPS Sieve Sapphire: Earned 20,000,000 credits (23,162,986)TRP Sieve (suspended) Bronze: Earned 10,000 credits (33,358)AP 26/27 Silver: Earned 100,000 credits (106,414)GFN Turquoise: Earned 5,000,000 credits (5,117,595)
Message 15754 - Posted: 21 May 2009 | 16:07:24 UTC - in response to Message 15678.

A few things to keep in mind, the app is reporting only the PPE cpu time, actual computation time is about 2 hours per work unit.

That's interesting. My 400Mz powerpc64 IBM pSeries gets one done in about 2.5 hours. Of course it can only do one at a time.

Profile 7ri9991 [MM]
Avatar
Send message
Joined: 24 Feb 09
Posts: 57
ID: 36070
Credit: 30,388,836
RAC: 0
321 LLR Bronze: Earned 10,000 credits (22,795)Cullen LLR Bronze: Earned 10,000 credits (23,087)PPS LLR Silver: Earned 100,000 credits (105,532)PSP LLR Bronze: Earned 10,000 credits (53,996)SoB LLR Bronze: Earned 10,000 credits (34,455)SR5 LLR Bronze: Earned 10,000 credits (11,626)SGS LLR Bronze: Earned 10,000 credits (20,209)TRP LLR Bronze: Earned 10,000 credits (18,601)Woodall LLR Silver: Earned 100,000 credits (113,413)321 Sieve Bronze: Earned 10,000 credits (21,376)Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,541,362)PPS Sieve Sapphire: Earned 20,000,000 credits (23,162,986)TRP Sieve (suspended) Bronze: Earned 10,000 credits (33,358)AP 26/27 Silver: Earned 100,000 credits (106,414)GFN Turquoise: Earned 5,000,000 credits (5,117,595)
Message 15755 - Posted: 21 May 2009 | 17:08:56 UTC

Well, maybe four hours. My last one was a short one.

Profile John M. Johnson "Novex"Project donor
Volunteer tester
Avatar
Send message
Joined: 16 Aug 07
Posts: 625
ID: 10876
Credit: 1,066,951
RAC: 0
321 LLR Bronze: Earned 10,000 credits (50,802)Cullen LLR Silver: Earned 100,000 credits (111,106)PPS LLR Silver: Earned 100,000 credits (101,350)PSP LLR Bronze: Earned 10,000 credits (15,210)SGS LLR Bronze: Earned 10,000 credits (27,055)TPS LLR (retired) Bronze: Earned 10,000 credits (67,826)Woodall LLR Bronze: Earned 10,000 credits (10,609)321 Sieve Bronze: Earned 10,000 credits (70,873)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (20,185)PPS Sieve Bronze: Earned 10,000 credits (20,034)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (24,551)TRP Sieve (suspended) Bronze: Earned 10,000 credits (40,040)AP 26/27 Silver: Earned 100,000 credits (159,148)PSA Silver: Earned 100,000 credits (344,961)
Message 15963 - Posted: 1 Jun 2009 | 14:37:13 UTC

Check this out and let me know what you think http://www.primegrid.com/forum_thread.php?id=1363
____________


John M. Johnson "Novex"

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 16357 - Posted: 20 Jun 2009 | 12:37:44 UTC

I also finally got AP26-boinc to run on opensolaris.
But boinc does behave a little bit odd.
Would give it another try outside a VM.
The Results
Do not know whether this is fast or slow for a VM on a Core 2 Duo T8400.

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 16375 - Posted: 20 Jun 2009 | 22:18:52 UTC

Oh dear!
Migrating OpenSolaris from a kvm-virtual machine driven by a Core 2 Duo T8400 (2.4 GHz) to Core 2 Duo E6550 (2.33 GHz but older model) makes the application slower,
I can't believe it...

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 16437 - Posted: 23 Jun 2009 | 21:21:57 UTC
Last modified: 23 Jun 2009 | 21:22:06 UTC

So i finally achieved to put the boinc in 64-bit under Solaris to run.
The applications now also is 64-bit.
Now my E6550 is faster under Solaris than under Linux.

opensolaris 2009.06 ~ 769 - 925 seconds
RHEL 5.3 ~ 970 - 1040 seconds

But i would attribute the differences to the range: ~ 8450000 under RHEL whereas ~ 8000000 under opensolaris.
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 16464 - Posted: 26 Jun 2009 | 2:56:16 UTC

After testing the Cell/BE Linux (Playstation 3) application, it seems to be stable enough to let others use if they are interested. Note that I have tested Yellow dog linux 6.0 and 6.1. Other distros such as Fedora/PPC64 (with appropriate Cell/BE packages) should also work.

Yellow Dog Linux 6.0 works fine out of the box except the note below.

Yellow Dog Linux 6.1 seems to have SPU scheduling problems, and will result in random workunit errors "invalid mailbox value from SPE" in stderr, along with the note below.


NOTE:
A known issue with the app arises when BOINC decides to pause / suspend work and run a random CPU benchmark every few days. This will kill all 6 running workunits with a signal 11 due to being run on the SPEs and BOINC paused the PPE parent thread. Same thing will happen if you manually pause BOINC or a single workunit. This isn't an issue if you leave the system running like me. A few errors every few days due to the autobench.


Shutting down BOINC and restarting BOINC all work fine with checkpointing.

If someone wants to run AP26 on their PS3, I can supply a tarball that should work for you. It will consist of the PS3 BOINC client from Lars Bausch (http://www.dotsch.de/boinc), the BOINC GUI manager from PS3grid.net (now GPUgrid.net), a cc_config.xml, app_info.xml, and of course the AP26 binary.


Expect 6 workunits to complete in 2 to 2.5 hours runtime. The SPEs do not have branch prediction, and cost is 18 cycles. Those familiar with the code know there are plenty of branches to miss. The beauty of the code is that everything but the PrimeQ function is loaded into the SPE's 256KB local store.

-Bryan Little

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 16469 - Posted: 26 Jun 2009 | 8:19:41 UTC

You are user of the day: Congratulations!

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 16542 - Posted: 1 Jul 2009 | 3:09:37 UTC - in response to Message 16464.

Wow, this looks like quite an achievement!

Expect 6 workunits to complete in 2 to 2.5 hours runtime. The SPEs do not have branch prediction, and cost is 18 cycles. Those familiar with the code know there are plenty of branches to miss. The beauty of the code is that everything but the PrimeQ function is loaded into the SPE's 256KB local store.


I am sure there are ways to reduce the number of branches. I haven't looked at the AP26 code for a while, but it is OK to accumulate the bitmap comparisons (bitwise ands) in SITO.H rather than branching after every one. I think the x86 code ran fastest with a branch after every 7-10 comparisons, but maybe for the SPE code this should be much higher.

Profile [FVG] baxProject donor
Avatar
Send message
Joined: 22 Feb 09
Posts: 9
ID: 35963
Credit: 211,909,463
RAC: 0
321 LLR Silver: Earned 100,000 credits (127,308)PPS LLR Bronze: Earned 10,000 credits (82,705)321 Sieve Gold: Earned 500,000 credits (806,546)Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,296,583)PPS Sieve Double Silver: Earned 200,000,000 credits (208,806,026)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (241,215)TRP Sieve (suspended) Silver: Earned 100,000 credits (143,640)AP 26/27 Silver: Earned 100,000 credits (263,164)GFN Silver: Earned 100,000 credits (131,389)
Message 16582 - Posted: 2 Jul 2009 | 21:54:27 UTC
Last modified: 2 Jul 2009 | 21:55:12 UTC

What do I need to crunch with PS3 client ?

I'm already crunching YOYO on YDL 6.1 and GPUGrid on USB live linux distro (2nd PS3)

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 16584 - Posted: 2 Jul 2009 | 23:01:49 UTC
Last modified: 2 Jul 2009 | 23:08:01 UTC

Thank you for the info geoff, will keep this in mind. Most SPE branch reduction strategies involve calculating BOTH the if and else, then picking one of them. Basically using the SPE's raw computation speed to mask how dumb it is :)

As for public release, give me a little more time, will try to get the app where I originally wanted it, running 7 threads at a time (thanks to an idea from Lexs, this may be much easier than I originally thought).

Will post more info soon.

Profile [FVG] baxProject donor
Avatar
Send message
Joined: 22 Feb 09
Posts: 9
ID: 35963
Credit: 211,909,463
RAC: 0
321 LLR Silver: Earned 100,000 credits (127,308)PPS LLR Bronze: Earned 10,000 credits (82,705)321 Sieve Gold: Earned 500,000 credits (806,546)Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,296,583)PPS Sieve Double Silver: Earned 200,000,000 credits (208,806,026)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (241,215)TRP Sieve (suspended) Silver: Earned 100,000 credits (143,640)AP 26/27 Silver: Earned 100,000 credits (263,164)GFN Silver: Earned 100,000 credits (131,389)
Message 16609 - Posted: 5 Jul 2009 | 6:16:37 UTC - in response to Message 16584.

...As for public release, give me a little more time, will try to get the app where I originally wanted it, running 7 threads at a time (thanks to an idea from Lexs, this may be much easier than I originally thought).

Will post more info soon.


thx, I will wait for public release... waiting is not a hard job, I can efford it :-D

Profile pschoeferProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 20 Sep 05
Posts: 654
ID: 845
Credit: 1,876,756,105
RAC: 1,718,127
Found 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (5,570,600)Cullen LLR Turquoise: Earned 5,000,000 credits (5,098,443)ESP LLR Jade: Earned 10,000,000 credits (11,096,003)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (19,415,453)PPS LLR Jade: Earned 10,000,000 credits (12,944,256)PSP LLR Turquoise: Earned 5,000,000 credits (6,435,120)SoB LLR Jade: Earned 10,000,000 credits (12,074,463)SR5 LLR Jade: Earned 10,000,000 credits (11,972,751)SGS LLR Turquoise: Earned 5,000,000 credits (6,242,555)TPS LLR (retired) Bronze: Earned 10,000 credits (76,372)TRP LLR Jade: Earned 10,000,000 credits (10,366,704)Woodall LLR Turquoise: Earned 5,000,000 credits (8,281,303)321 Sieve Amethyst: Earned 1,000,000 credits (1,909,583)Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,000,264)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (11,111,347)PPS Sieve Double Silver: Earned 200,000,000 credits (342,928,524)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,177,104)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,556,644)AP 26/27 Double Bronze: Earned 100,000,000 credits (103,860,507)GFN Double Silver: Earned 200,000,000 credits (338,250,646)PSA Double Gold: Earned 500,000,000 credits (940,384,931)
Message 16685 - Posted: 9 Jul 2009 | 19:36:13 UTC

Some time ago, we did some tests and there was a Windows 64bit binary (02 Mar 2009), which was ~25% faster than the current stock app. Is there any reason why that fast binary is not released to the public?
____________

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 16687 - Posted: 9 Jul 2009 | 23:35:40 UTC - in response to Message 16685.

Some time ago, we did some tests and there was a Windows 64bit binary (02 Mar 2009), which was ~25% faster than the current stock app. Is there any reason why that fast binary is not released to the public?

Good question!!! I am reviewing why that move didn't take place.
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 16755 - Posted: 12 Jul 2009 | 5:52:33 UTC

AP26 on Playstation 3 CELL/BE development update:

Using a few tricks specific to the CELL/BE processor, I was able to speed up the application quite a bit. Also, the application is now stable on YDL 6.0, 6.1, 6.2 operating systems. Should work fine on others, too.

Runtime for the test range before today's code changes: 45m 5s
Runtime for the test range after today's code changes: 34m 25s

Using quick math based on the test range, one can approximate the PS3 is going to complete 6 workunits in about 104 minutes. This about one workunit every 17.3 minutes.
Actual computation time will vary by workunit.

In addition to the speedup, the PPE thread cpu usage is now 2-3%, down from 7-8% with the old code.

Today's code changes were the following:

*Double buffered and used optimal alignment for MFC transfers between SPE->PPE

*Modified SITO.H until I found exactly where branching was less expensive than bitwise ANDing more calculations.

*Offloaded all the result reporting code to the PPE thread, so we don't have to stall the SPE to write the output file.

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 16770 - Posted: 12 Jul 2009 | 23:52:28 UTC

More PS3 news:

Normally, one can compile a program with the profile flag, run the program, then recompile with the data generated during profiling. This way the compiler can see how to optimize the code.

When programming the CELL's SPEs, unless you have a very small program, this is not possible because the program with profiling code will easily exceed local store memory.

So, today was spent manually counting branch hits/misses in the important loops of the AP26 program in order to manually hint the branches with __builtin_expect.

A few notable things in AP26-boinc.h:

* if((sito>>b)&1) this evaluates to true only ~1.5% of the time
* the tests n%7 to n%23 pass ~61.3% of the time
* if(i59>1) evaluates to true ~97% of the time
* if(n59>=MOD) evaluates to true ~64% of the time

Also, each group in SITO-boinc.H was tested. The code was then adjusted so each if() test passes 55-70% of the time, with a branch hint of likely.

Additionally, since scalar loads/stores/operations are slow on the SPE, all integer r61 - r97 code related to the quick OKOK tests was vectorized. The important part is the calculations at the end of the main loop.

Here is the original code:

if(i59>1){ r61+=s61; r67+=s67; r71+=s71; r73+=s73; r79+=s79; r83+=s83; r89+=s89; r97+=s97; if(n59>=MOD){ n59-=MOD; r61-=m61; r67-=m67; r71-=m71; r73-=m73; r79-=m79; r83-=m83; r89-=m89; r97-=m97; if(r61<0)r61+=61; if(r67<0)r67+=67; if(r71<0)r71+=71; if(r73<0)r73+=73; if(r79<0)r79+=79; if(r83<0)r83+=83; if(r89<0)r89+=89; if(r97<0)r97+=97; }; if(r61>=61)r61-=61; if(r67>=67)r67-=67; if(r71>=71)r71-=71; if(r73>=73)r73-=73; if(r79>=79)r79-=79; if(r83>=83)r83-=83; if(r89>=89)r89-=89; if(r97>=97)r97-=97; }


Notice lots of branches and scalar operations. Now the vectorized version (vector declarations are not shown):
// testing showed greater than 97% of the time this is executed if(LIKELY(i59>1)){ n59+=S59; r_numvec1 = spu_add(r_numvec1, svec1); r_numvec2 = spu_add(r_numvec2, svec2); //testing showed greater than 64% of the time this is executed if(LIKELY(n59>=MOD)){ n59-=MOD; r_numvec1 = spu_sub(r_numvec1, mvec1); r_numvec2 = spu_sub(r_numvec2, mvec2); add_1 = spu_add(r_numvec1, numvec1); add_2 = spu_add(r_numvec2, numvec2); r_numvec1 = spu_sel(r_numvec1, add_1, spu_cmpgt(zerovec, r_numvec1) ); r_numvec2 = spu_sel(r_numvec2, add_2, spu_cmpgt(zerovec, r_numvec2) ); } minus_1 = spu_sub(r_numvec1, numvec1); minus_2 = spu_sub(r_numvec2, numvec2); r_numvec1 = spu_sel(r_numvec1, minus_1, spu_cmpgt(r_numvec1, numvec3) ); r_numvec2 = spu_sel(r_numvec2, minus_2, spu_cmpgt(r_numvec2, numvec4) ); }


This was time consuming, but resulted in the PS3 completing 6 workunits in about 96.9 minutes.

This is about one workunit every 16 minutes 9 seconds based on the test range. Previous version was 17 minutes 18 seconds.

-Bryan Little

Profile pschoeferProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 20 Sep 05
Posts: 654
ID: 845
Credit: 1,876,756,105
RAC: 1,718,127
Found 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (5,570,600)Cullen LLR Turquoise: Earned 5,000,000 credits (5,098,443)ESP LLR Jade: Earned 10,000,000 credits (11,096,003)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (19,415,453)PPS LLR Jade: Earned 10,000,000 credits (12,944,256)PSP LLR Turquoise: Earned 5,000,000 credits (6,435,120)SoB LLR Jade: Earned 10,000,000 credits (12,074,463)SR5 LLR Jade: Earned 10,000,000 credits (11,972,751)SGS LLR Turquoise: Earned 5,000,000 credits (6,242,555)TPS LLR (retired) Bronze: Earned 10,000 credits (76,372)TRP LLR Jade: Earned 10,000,000 credits (10,366,704)Woodall LLR Turquoise: Earned 5,000,000 credits (8,281,303)321 Sieve Amethyst: Earned 1,000,000 credits (1,909,583)Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,000,264)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (11,111,347)PPS Sieve Double Silver: Earned 200,000,000 credits (342,928,524)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,177,104)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,556,644)AP 26/27 Double Bronze: Earned 100,000,000 credits (103,860,507)GFN Double Silver: Earned 200,000,000 credits (338,250,646)PSA Double Gold: Earned 500,000,000 credits (940,384,931)
Message 16784 - Posted: 13 Jul 2009 | 12:43:02 UTC - in response to Message 16687.

Some time ago, we did some tests and there was a Windows 64bit binary (02 Mar 2009), which was ~25% faster than the current stock app. Is there any reason why that fast binary is not released to the public?

Good question!!! I am reviewing why that move didn't take place.

Any news on that topic? That fast binary is almost as fast as mfl0p's optimized (now official) Linux app on my Q9450, so it would be great to see it released before the challenge. ;)
____________

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 16785 - Posted: 13 Jul 2009 | 12:46:30 UTC
Last modified: 13 Jul 2009 | 12:55:15 UTC

PS3 app has been released for platform name powerpc64-ps3-linux-gnu.
____________

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 16786 - Posted: 13 Jul 2009 | 12:47:08 UTC - in response to Message 16687.

Some time ago, we did some tests and there was a Windows 64bit binary (02 Mar 2009), which was ~25% faster than the current stock app. Is there any reason why that fast binary is not released to the public?

Good question!!! I am reviewing why that move didn't take place.

Because BOINC version was never built out of it, only standalone versions.
____________

Profile VatoProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Feb 08
Posts: 713
ID: 18447
Credit: 89,624,308
RAC: 391,668
321 LLR Amethyst: Earned 1,000,000 credits (1,902,702)Cullen LLR Ruby: Earned 2,000,000 credits (2,049,559)ESP LLR Ruby: Earned 2,000,000 credits (2,562,829)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,001,883)PPS LLR Ruby: Earned 2,000,000 credits (4,643,504)PSP LLR Amethyst: Earned 1,000,000 credits (1,825,922)SoB LLR Ruby: Earned 2,000,000 credits (2,023,559)SR5 LLR Ruby: Earned 2,000,000 credits (2,298,384)SGS LLR Ruby: Earned 2,000,000 credits (2,185,345)TPS LLR (retired) Silver: Earned 100,000 credits (103,523)TRP LLR Ruby: Earned 2,000,000 credits (2,457,903)Woodall LLR Ruby: Earned 2,000,000 credits (2,048,906)321 Sieve Jade: Earned 10,000,000 credits (10,574,611)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,119,699)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (10,278,995)PPS Sieve Turquoise: Earned 5,000,000 credits (9,440,418)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (4,080,177)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,221,054)AP 26/27 Turquoise: Earned 5,000,000 credits (6,068,351)GFN Turquoise: Earned 5,000,000 credits (5,022,609)PSA Turquoise: Earned 5,000,000 credits (8,713,203)
Message 16792 - Posted: 13 Jul 2009 | 14:44:24 UTC - in response to Message 16786.

That's a shame - quite a lot of effort went into the original work and the testing of it.
____________

Profile Michael
Send message
Joined: 2 Dec 07
Posts: 45
ID: 15545
Credit: 6,643
RAC: 0

Message 16805 - Posted: 13 Jul 2009 | 23:45:08 UTC

And also is there someway to display the maximal chain the user found in each WU

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 16807 - Posted: 14 Jul 2009 | 1:00:33 UTC - in response to Message 16786.

Some time ago, we did some tests and there was a Windows 64bit binary (02 Mar 2009), which was ~25% faster than the current stock app. Is there any reason why that fast binary is not released to the public?

Good question!!! I am reviewing why that move didn't take place.

Because BOINC version was never built out of it, only standalone versions.

We're looking for someone who can build a Win 64 bit BOINC version of the 02 Mar 2009 AP26 source. It can be found in here:

http://www.geocities.com/g_w_reynolds/AP26/

Thanks!
____________

Profile pschoeferProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 20 Sep 05
Posts: 654
ID: 845
Credit: 1,876,756,105
RAC: 1,718,127
Found 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (5,570,600)Cullen LLR Turquoise: Earned 5,000,000 credits (5,098,443)ESP LLR Jade: Earned 10,000,000 credits (11,096,003)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (19,415,453)PPS LLR Jade: Earned 10,000,000 credits (12,944,256)PSP LLR Turquoise: Earned 5,000,000 credits (6,435,120)SoB LLR Jade: Earned 10,000,000 credits (12,074,463)SR5 LLR Jade: Earned 10,000,000 credits (11,972,751)SGS LLR Turquoise: Earned 5,000,000 credits (6,242,555)TPS LLR (retired) Bronze: Earned 10,000 credits (76,372)TRP LLR Jade: Earned 10,000,000 credits (10,366,704)Woodall LLR Turquoise: Earned 5,000,000 credits (8,281,303)321 Sieve Amethyst: Earned 1,000,000 credits (1,909,583)Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,000,264)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (11,111,347)PPS Sieve Double Silver: Earned 200,000,000 credits (342,928,524)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,177,104)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,556,644)AP 26/27 Double Bronze: Earned 100,000,000 credits (103,860,507)GFN Double Silver: Earned 200,000,000 credits (338,250,646)PSA Double Gold: Earned 500,000,000 credits (940,384,931)
Message 16823 - Posted: 14 Jul 2009 | 18:58:53 UTC - in response to Message 16807.

Spent a few hours to get mingw-w64 running and at least I'm able to compile the standalone version, but trying to compile the BOINC version I get lots of undefined references:

C:\cds\AP26-Win64>gcc -O3 -DBOINC -o AP26 AP26-boinc.c -lstdc++ -lpthread -lboinc_api -lboinc C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x139): undefined reference to `boinc_resolve_filename' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x144): undefined reference to `boinc_fopen' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x1da): undefined reference to `boinc_time_to_checkpoint' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x2a4): undefined reference to `boinc_end_critical_section' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x2e1): undefined reference to `boinc_begin_critical_section' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x304): undefined reference to `boinc_checkpoint_completed' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x541): undefined reference to `boinc_is_standalone' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x6a0): undefined reference to `boinc_is_standalone' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x6d1): undefined reference to `boinc_is_standalone' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x77fd): undefined reference to `boinc_init' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x7a5a): undefined reference to `boinc_finish' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x2dc): undefined reference to `boinc_fraction_done' collect2: ld returned 1 exit status

Not sure if I've put all BOINC headers into the correct place, maybe someone can tell me what exactly I need and where I have to put it?
____________

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 16824 - Posted: 14 Jul 2009 | 19:39:51 UTC
Last modified: 14 Jul 2009 | 19:40:43 UTC

have you tried -lm?
Adding -m64 could not hurt too in order to get a 64-bit app explicitly.

LexsProject donor
Volunteer developer
Avatar
Send message
Joined: 16 Mar 08
Posts: 61
ID: 20289
Credit: 49,033,000
RAC: 0
321 LLR Amethyst: Earned 1,000,000 credits (1,281,253)Cullen LLR Ruby: Earned 2,000,000 credits (2,050,895)PPS LLR Jade: Earned 10,000,000 credits (10,537,729)PSP LLR Ruby: Earned 2,000,000 credits (2,162,049)SoB LLR Amethyst: Earned 1,000,000 credits (1,170,185)SGS LLR Ruby: Earned 2,000,000 credits (2,003,465)TPS LLR (retired) Silver: Earned 100,000 credits (253,775)TRP LLR Ruby: Earned 2,000,000 credits (2,057,863)Woodall LLR Amethyst: Earned 1,000,000 credits (1,156,322)321 Sieve Amethyst: Earned 1,000,000 credits (1,580,131)Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,010,671)PPS Sieve Jade: Earned 10,000,000 credits (17,904,428)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,000,607)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,193,192)AP 26/27 Ruby: Earned 2,000,000 credits (2,500,663)PSA Amethyst: Earned 1,000,000 credits (1,166,390)
Message 16825 - Posted: 14 Jul 2009 | 19:46:06 UTC - in response to Message 16823.

Try adding -L/src/path/to/boinc/lib -L/src/path/to/boinc/api to your line.

Profile pschoeferProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 20 Sep 05
Posts: 654
ID: 845
Credit: 1,876,756,105
RAC: 1,718,127
Found 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (5,570,600)Cullen LLR Turquoise: Earned 5,000,000 credits (5,098,443)ESP LLR Jade: Earned 10,000,000 credits (11,096,003)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (19,415,453)PPS LLR Jade: Earned 10,000,000 credits (12,944,256)PSP LLR Turquoise: Earned 5,000,000 credits (6,435,120)SoB LLR Jade: Earned 10,000,000 credits (12,074,463)SR5 LLR Jade: Earned 10,000,000 credits (11,972,751)SGS LLR Turquoise: Earned 5,000,000 credits (6,242,555)TPS LLR (retired) Bronze: Earned 10,000 credits (76,372)TRP LLR Jade: Earned 10,000,000 credits (10,366,704)Woodall LLR Turquoise: Earned 5,000,000 credits (8,281,303)321 Sieve Amethyst: Earned 1,000,000 credits (1,909,583)Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,000,264)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (11,111,347)PPS Sieve Double Silver: Earned 200,000,000 credits (342,928,524)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,177,104)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,556,644)AP 26/27 Double Bronze: Earned 100,000,000 credits (103,860,507)GFN Double Silver: Earned 200,000,000 credits (338,250,646)PSA Double Gold: Earned 500,000,000 credits (940,384,931)
Message 16826 - Posted: 14 Jul 2009 | 20:33:07 UTC - in response to Message 16825.

Still no success, I'll try again tomorrow.
____________

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 16833 - Posted: 15 Jul 2009 | 0:43:16 UTC - in response to Message 16823.

C:\cds\AP26-Win64>gcc -O3 -DBOINC -o AP26 AP26-boinc.c -lstdc++ -lpthread -lboinc_api -lboinc C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp/ccSTFc3W.o:AP26-boinc.c:(.text+0x139): undefined reference to `boinc_resolve_filename'



I found that the order of the libraries makes a difference: Try the order -lboinc_api -lboinc -lstdc++ -lpthread

Profile pschoeferProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 20 Sep 05
Posts: 654
ID: 845
Credit: 1,876,756,105
RAC: 1,718,127
Found 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (5,570,600)Cullen LLR Turquoise: Earned 5,000,000 credits (5,098,443)ESP LLR Jade: Earned 10,000,000 credits (11,096,003)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (19,415,453)PPS LLR Jade: Earned 10,000,000 credits (12,944,256)PSP LLR Turquoise: Earned 5,000,000 credits (6,435,120)SoB LLR Jade: Earned 10,000,000 credits (12,074,463)SR5 LLR Jade: Earned 10,000,000 credits (11,972,751)SGS LLR Turquoise: Earned 5,000,000 credits (6,242,555)TPS LLR (retired) Bronze: Earned 10,000 credits (76,372)TRP LLR Jade: Earned 10,000,000 credits (10,366,704)Woodall LLR Turquoise: Earned 5,000,000 credits (8,281,303)321 Sieve Amethyst: Earned 1,000,000 credits (1,909,583)Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,000,264)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (11,111,347)PPS Sieve Double Silver: Earned 200,000,000 credits (342,928,524)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,177,104)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,556,644)AP 26/27 Double Bronze: Earned 100,000,000 credits (103,860,507)GFN Double Silver: Earned 200,000,000 credits (338,250,646)PSA Double Gold: Earned 500,000,000 credits (940,384,931)
Message 16943 - Posted: 20 Jul 2009 | 9:55:26 UTC

Sorry, no more ideas what to do now. Because nothing helped on Windows, I switched to Linux and mingw64-cross-compiler, but I can't build the BOINC libraries (error: #error _WIN32_IE setting conflicts). :|
____________

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 17065 - Posted: 24 Jul 2009 | 3:32:27 UTC - in response to Message 16464.
Last modified: 24 Jul 2009 | 3:33:03 UTC

The beauty of the code is that everything but the PrimeQ function is loaded into the SPE's 256KB local store.


How much is lost by not computing PrimeQ in the SPE? If it is significant then perhaps I could write a PrimeQ function that doesn't use any assembly, just plain C (with long long ints), if that would help?

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 17068 - Posted: 24 Jul 2009 | 10:03:03 UTC - in response to Message 17065.


How much is lost by not computing PrimeQ in the SPE? If it is significant then perhaps I could write a PrimeQ function that doesn't use any assembly, just plain C (with long long ints), if that would help?


I've since updated the code so the SPE doesn't have to wait on the PrimeQ calculation, it's all handled by the PPE, this is in the official app. It may actually be faster the way i'm doing the calculations now, since i'm gathering 12 N values and sending them via mfc to the PPE and using the fast ppc mulmod/expmod/getmagic functions included in the source.

At least this way the PPE is not totally idle.

Also, integer multiplication over 16bits (!) is very slow on the SPE.

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 17155 - Posted: 30 Jul 2009 | 13:16:21 UTC - in response to Message 16807.

We're looking for someone who can build a Win 64 bit BOINC version of the 02 Mar 2009 AP26 source. It can be found in here:

http://www.geocities.com/g_w_reynolds/AP26/

Thanks!

This is still an open request. :)
____________

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 17157 - Posted: 30 Jul 2009 | 15:26:54 UTC
Last modified: 30 Jul 2009 | 15:28:25 UTC

I built a Solaris@SPARC 64-bit-version but with 33 minutes - albeit without optimization other than -03 - on a SPARC64 V @ 1080 MHz i think i do not need to pursuit that approach any further.
I contacted Rytis to ask what he thinks but want to know your opinion none the less.

I may take a look into the windows-64bit app at home if it is in the open.
____________

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 17161 - Posted: 30 Jul 2009 | 16:53:47 UTC

Forgot to say compared to around 7.5 Minutes on a Core 2 Duo with 2.16 GHz

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 17165 - Posted: 30 Jul 2009 | 21:12:30 UTC

Cell/BE - PS3 news:

Remember my old post about the SPEs being vector engines?

Today I was able to fully vectorize all the division/mod code on the SPEs in SITO.H file for the app, using vector addition/subtraction/compare (there's 128 registers per SPE, so plenty of room to do this).

it's now doing 6 workunits in about 45mins

or, about one every 7min 30sec

This is over twice as fast as the current version 1.00 app

I'm going to give the new app to Rytis to post as official.

And look for future optimizations - there's more to vectorize.

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 17220 - Posted: 2 Aug 2009 | 22:13:16 UTC - in response to Message 17065.


How much is lost by not computing PrimeQ in the SPE? If it is significant then perhaps I could write a PrimeQ function that doesn't use any assembly, just plain C (with long long ints), if that would help?


Geoff, i'm revisiting this due to having hit a wall with optimizations on the cell port. After making version 1.01 more than twice as fast it became apparent the program spends almost 2/3 of the time stalled transferring data to the PPE for PrimeQ testing.

You mention possibly writing a PrimeQ function that doesn't use any assembly, just plain C with long long ints. That would be great, as attempting to port the PPC asm to SPU asm is not going to be easy for me.

Few notes:
*All integer long long signed or unsigned math is ok.
*SPU has 64bit doubles.
*Double fpu math is ok, but single precision is not totally IEEE compliant on SPU.

Thanks for any help with this!

Bryan

geoffProject donor
Volunteer developer
Send message
Joined: 3 Aug 07
Posts: 99
ID: 10427
Credit: 343,437
RAC: 0
TRP Sieve (suspended) Bronze: Earned 10,000 credits (57,150)PSA Silver: Earned 100,000 credits (271,962)
Message 17263 - Posted: 5 Aug 2009 | 2:25:40 UTC - in response to Message 17220.

You mention possibly writing a PrimeQ function that doesn't use any assembly, just plain C with long long ints.


I have uploaded a no-assembly version of PrimeQ_ppc64.h at PrimeQ_gen.zip.

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 17277 - Posted: 6 Aug 2009 | 0:22:00 UTC - in response to Message 17263.


I have uploaded a no-assembly version of PrimeQ_ppc64.h at PrimeQ_gen.zip.


Thanks for the quick response Geoff, the code compiles and runs fine on the SPEs. Just trying a few quick tests it appears that the slow integer multiply on SPEs is going to be an issue computing primeQ on the SPEs. No hard numbers yet, but halfway thru the workunit I stopped the program because it was quite a bit slower.

I may have been wrong when I said the SPE-PPE MFC stalls are slowing the program down. I think the compiler ignored all the IF.H tests when I was doing a dummy test on the SITO code (testing speed). Therefore, lots of modulo operations were bypassed. oops. Will do some better testing when I get some time. If this is the case I may be able to do some spu-vector tricks on the n% code.

And just to show how much fun the SPEs are to program, I compiled my SPE optimized source on an Intel Core2 and it was painfully slower than your original code. Gotta love exotic hardware. Maybe Rytis can port AP26 to gpgpu :)

Again, thanks for the help!

Rytis
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 22 Jun 05
Posts: 2639
ID: 1
Credit: 21,279,785
RAC: 6,497
321 LLR Silver: Earned 100,000 credits (104,475)Cullen LLR Silver: Earned 100,000 credits (291,372)ESP LLR Bronze: Earned 10,000 credits (18,156)Generalized Cullen/Woodall LLR Bronze: Earned 10,000 credits (15,259)PPS LLR Silver: Earned 100,000 credits (113,978)PSP LLR Silver: Earned 100,000 credits (116,517)SoB LLR Silver: Earned 100,000 credits (151,232)SR5 LLR Bronze: Earned 10,000 credits (14,071)SGS LLR Silver: Earned 100,000 credits (100,082)TPS LLR (retired) Silver: Earned 100,000 credits (111,607)TRP LLR Jade: Earned 10,000,000 credits (15,976,138)Woodall LLR Silver: Earned 100,000 credits (101,463)321 Sieve Silver: Earned 100,000 credits (201,501)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (214,653)Generalized Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (14,200)PPS Sieve Silver: Earned 100,000 credits (302,417)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (200,232)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,453,872)AP 26/27 Silver: Earned 100,000 credits (473,058)GFN Silver: Earned 100,000 credits (163,887)PSA Bronze: Earned 10,000 credits (97,541)
Message 17294 - Posted: 6 Aug 2009 | 15:12:13 UTC - in response to Message 17277.

Gotta love exotic hardware. Maybe Rytis can port AP26 to gpgpu :)

Yeah, if I got a right GPU when buying a new one... My ATI HD4670 does not support double precision numbers, for example :(
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 17359 - Posted: 10 Aug 2009 | 23:55:38 UTC
Last modified: 11 Aug 2009 | 0:00:57 UTC

I've got the PS3 to do 6 workunits in parallel in about 32mins, or about 5mins 30sec a workunit now.

Also, PPE thread cpu usage down since i'm using the maximum MFC transfer size allowed, to reduce MFC overhead.

The speed increase came from modifying the source in a way that may make ALL cpus run faster on AP26. I have not compiled a x86_64 version to test this yet.

First, we add this to CONST-boinc.H:

//Bryan little, new for IF.H replacement #define INV7 UINT64_C(2635249153387078802) #define INV11 UINT64_C(1676976733973595601) #define INV13 UINT64_C(1418980313362273201) #define INV17 UINT64_C(1085102592571150095) #define INV19 UINT64_C(970881267037344821) #define INV23 UINT64_C(802032351030850070) #define INV337 UINT64_C(54738112978366622) #define INV347 UINT64_C(53160645745560667) #define INV349 UINT64_C(52856000211202153) #define INV353 UINT64_C(52257065364616293) #define INV359 UINT64_C(51383688227603207) #define INV367 UINT64_C(50263607830271257) #define INV373 UINT64_C(49455077945602015) #define INV379 UINT64_C(48672147951740241) #define INV383 UINT64_C(48163822646761231) #define INV389 UINT64_C(47420935922132523) #define INV397 UINT64_C(46465350311610961) #define INV401 UINT64_C(46001855545410353) #define INV409 UINT64_C(45102063749901104) #define INV419 UINT64_C(44025642180691053) #define INV421 UINT64_C(43816494236839790) #define INV431 UINT64_C(42799870240625409) #define INV433 UINT64_C(42602180308798040) #define INV439 UINT64_C(42019918163347497) #define INV443 UINT64_C(41640505809728107) #define INV449 UINT64_C(41084062524965593) #define INV457 UINT64_C(40364866682077793) #define INV461 UINT64_C(40014629227135686) #define INV463 UINT64_C(39841779856824085) #define INV467 UINT64_C(39500522641776341) #define INV479 UINT64_C(38510947961815347) #define INV487 UINT64_C(37878324586672590) #define INV491 UINT64_C(37569743530976683) #define INV499 UINT64_C(36967422993405915) #define INV503 UINT64_C(36673447462643243) #define INV509 UINT64_C(36241147492553146) #define INV521 UINT64_C(35406418567580713) #define INV523 UINT64_C(35271021173440825) #define INV541 UINT64_C(34097493666745936)



Next, we change the IF code from:
if(n%7) if(n%11) if(n%13) if(n%17) if(n%19) if(n%23) #include "AP26/IF.H"


To this:
if(LIKELY(REM(n,7,3) && REM(n,11,4) && REM(n,13,4) && REM(n,17,5) && REM(n,19,5) && REM(n,23,5))) if(LIKELY(OK337[REM(n,337,9)] & OK347[REM(n,347,9)] & OK349[REM(n,349,9)] & OK353[REM(n,353,9)] & OK359[REM(n,359,9)] & OK367[REM(n,367,9)] & OK373[REM(n,373,9)] & OK379[REM(n,379,9)])) if(LIKELY(OK383[REM(n,383,9)] & OK389[REM(n,389,9)] & OK397[REM(n,397,9)] & OK401[REM(n,401,9)] & OK409[REM(n,409,9)] & OK419[REM(n,419,9)] & OK421[REM(n,421,9)] & OK431[REM(n,431,9)])) if(LIKELY(OK433[REM(n,433,9)] & OK439[REM(n,439,9)] & OK443[REM(n,443,9)] & OK449[REM(n,449,9)] & OK457[REM(n,457,9)] & OK461[REM(n,461,9)] & OK463[REM(n,463,9)] & OK467[REM(n,467,9)])) if(LIKELY(OK479[REM(n,479,9)] & OK487[REM(n,487,9)] & OK491[REM(n,491,9)] & OK499[REM(n,499,9)] & OK503[REM(n,503,9)] & OK509[REM(n,509,9)] & OK521[REM(n,521,10)] & OK523[REM(n,523,10)] & OK541[REM(n,541,10)]))

So now we are using multiply/bit shift instead of modulo operations. Notice we can bitwise AND the OK# array values, they are always 1 or 0.

-Bryan Little

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 17380 - Posted: 11 Aug 2009 | 23:05:19 UTC - in response to Message 12827.

Today I saw the post quoted below, and testing the faster IF.H code I posted above, it overflows and gives incorrect results at tests this high. So, it is faster, but at large numbers will not find correct results. So a modified version of REM() will be needed at this level. So, in other words, don't use the above code.


Since in various versions of the program, linux GMP primality tests are being replaced by some other routines depending on how compiler handles e.g. long double or GMP_LIMB_BITS, it would be a good idea to test how it works on larger numbers. The original test for K=366,384 doesn't use as large numbers as may appear during the search.

This a distant problem, would appear around 116,000,000 WU, the search is highly unlikely to go that high.

In order to test whether primality checks are working correctly for the numbers of size we may expect to encounter in forseeable future, we should testrun the program for the highest of expected parameters.

Say, K=76,000,000, SHIFT=640.
Since so large K wouldn't be matched with so large SHIFT anyway, this is more than enough for current needs.

Any other version should give identical results.

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 17958 - Posted: 12 Sep 2009 | 23:11:42 UTC
Last modified: 12 Sep 2009 | 23:21:44 UTC

Today I was looking for a way to get more RAC in AP26 without spending more $. I had a few options, one being starting up an old Pentium III 1ghz computer and getting some nice, slow times out of it. Now that's not going to work very well :)

My other option was to write some code. I decided to port the AP26 application to run with Altivec on the PPE portion of the Cell processor. Stock March '09 code compiled with -o3 was doing the test range in 23 minutes. With my Altivec changes, this was brought down to 13 minutes.

Now I have a PS3 that can run AP26 on the PPE, and we already have the SPE application that's been out for a while. Now the problem remains, how do we run one thread on PPE, and six on the SPEs with BOINC's limited features in this area?

Easy, I compiled two custom boinc clients, one for the PPE, and one for the SPEs. Starting one of them with --no_gui_rpc flag will allow both clients to run at the same time. So now we have what appears to be 2 host machines on one computer. This allows two different binaries for the same project.

Since there is some CPU overhead with the six SPE threads running simultaneously with the PPE thread, workunit time for the PPE is about 55 minutes. I accomplished my goal, I do think this is faster than starting up the Pentium III 1ghz, and no additional power consumption :)

If anyone is interested in the Altivec code for testing on other PPC machines, I can post it. I do not know if it will be faster on other PPC CPUs, as I do not have one to use for compiling and testing.
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 18197 - Posted: 27 Sep 2009 | 14:06:55 UTC

I have posted my 32bit and 64bit Linux source code with SSE/SSE2 modifications, and also mirrored some of Jaroslaw Wroblewski and Geoffrey Reynolds source code here:

http://mfl0p.angelfire.com/

An important note, the 32bit source will only work properly when compiled with -DBOINC and linked with BOINC libraries. To get it to work standalone, you will have to add some _mm_empty() function calls to maintain the FPU state when switching from MMX mode.

Geoff, if you aren't aware, Geocities is closing on 26 October 2009 and all files will be deleted.
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 19409 - Posted: 29 Nov 2009 | 3:42:57 UTC

I have a question for anyone familiar with converting division by constant -> multiply by inverse and bit shift for 32bit integers.

If anyone can post how to convert a 32bit integer modulo operation to a multiply by pre-computed inverse it would be great.

All I have found by searching is 64bit integer routines (including the REM define in the AP26 code).

Thanks in advance
____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 19410 - Posted: 29 Nov 2009 | 5:55:54 UTC - in response to Message 19409.
Last modified: 29 Nov 2009 | 6:26:10 UTC

Here are my thoughts about finding remainder n%p with 32-bit multiplication only, where n, p are unsigned integers with p being a small constant.

Assumption

p has at most b bits and is not a power of 2.

Case 1

n has at most 32-b bits.

Precompute

INVp = Floor[(2^32)/p] = Floor[(2^32-1)/p]

Then n%p is given by

(p * ((INVp*(n+1)) >> b)) >> (32-b)

Case 2

n has at most 62-3b bits

Take

n1 = n&((1<<(31-b))-1)
(lowest 31-b bits of n)

n2 = n>>(31-b)
(highest 31-2b bits of n)

m = n1 + n2 * Mod[2^(31-b),p]

m has at most 32-b bits, so m%p falls in case 1.

Case 3

n has at most 91-5b bits

Cut n into n1,n2,n3 having (31-b), (30-2b), (30-2b) bits respectively

m = n1 + n2 * Mod[2^(31-b),p] + n3 * Mod[2^(61-3b),p]

Case 4

n has at most 120-7b bits

Cut n into n1,n2,n3,n4 having (30-b), (30-2b), (30-2b), (30-2b) bits respectively

m = n1 + n2 * Mod[2^(30-b),p] + n3 * Mod[2^(60-3b),p] + n4 * Mod[2^(90-5b),p]


EDIT: SORRY

I am afraid I haven't predicted all the rounding errors correctly. Will post a correction soon.

EDIT2: CORRECTION

In Case 1 we need

n <= (2^32)/p - 2^b

We could try to fix it by making sure n is sufficiently smaller than 2^(32-b), but it is much simpler if we just assume

p <= (2^32)/((2^(32-b))+2^b)

that is:

b=6 p<=63
b=7 p<=127
b=8 p<=255
b=9 p<=511
b=10 p<=1023
b=11 p<=2046
b=12 p<=4080
b=13 p<=8065
b=14 p<=15420
b=15 p<=26214
b=16 p<=32768

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 19411 - Posted: 29 Nov 2009 | 9:27:50 UTC - in response to Message 19410.

With the same restriction on p as above, we can deal better with longer n by the direct formula

(p * ((INVp*(n1+1)+INVp2*n2) >> b)) >> (32-b)

with

INVp2=Floor[Mod[2^(31-b),p]*2^32/p]

n1 = lowest (31-b) bits of n
n2 = highest (31-b) bits of n

n can have (62-2b) bits.

Similarly:

(p * ((INVp*(n1+1)+INVp2*n2+INVp3*n3) >> b)) >> (32-b)

with

INVp2=Floor[Mod[2^(31-b),p]*2^32/p]
INVp3=Floor[Mod[2^(61-2b),p]*2^32/p]

n1 = lowest (31-b) bits of n
n2 = next (30-b) bits of n
n3 = highest (30-b) bits of n

n can have (91-3b) bits.

Note that we must assure n1+n2+n3 < 2^(32-b) or at least
n1+n2+n3 < (2^32)/p-2^b.


Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 19421 - Posted: 30 Nov 2009 | 4:21:19 UTC - in response to Message 19411.

Thanks Jarek, I was able to use the information you posted, and it's working.

It seems most compilers automatically perform these optimizations, but when working with exotic hardware, the compilers are not very complete yet.

Manually optimizing the code makes readability horrible, but if it works, why not? :)
____________

Profile roadrunner_gsProject donor
Volunteer developer
Send message
Joined: 11 Sep 08
Posts: 580
ID: 28785
Credit: 197,597,118
RAC: 123,470
321 LLR Silver: Earned 100,000 credits (344,212)PPS LLR Sapphire: Earned 20,000,000 credits (44,446,348)PSP LLR Amethyst: Earned 1,000,000 credits (1,113,016)SoB LLR Gold: Earned 500,000 credits (924,869)SGS LLR Gold: Earned 500,000 credits (655,953)TRP LLR Silver: Earned 100,000 credits (292,869)Woodall LLR Gold: Earned 500,000 credits (546,071)321 Sieve Gold: Earned 500,000 credits (934,518)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (283,632)PPS Sieve Jade: Earned 10,000,000 credits (10,281,980)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (339,564)TRP Sieve (suspended) Silver: Earned 100,000 credits (310,404)AP 26/27 Double Bronze: Earned 100,000,000 credits (129,560,480)GFN Gold: Earned 500,000 credits (556,571)PSA Turquoise: Earned 5,000,000 credits (6,999,200)
Message 20167 - Posted: 28 Dec 2009 | 7:19:44 UTC
Last modified: 28 Dec 2009 | 7:21:03 UTC

geocities is no more available.
Where could is the boinc-code of reynolds situated now? Is this on mfl0p's page?

JohnProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Feb 06
Posts: 2876
ID: 2449
Credit: 2,681,934
RAC: 0
321 LLR Bronze: Earned 10,000 credits (11,773)Cullen LLR Bronze: Earned 10,000 credits (14,945)ESP LLR Bronze: Earned 10,000 credits (26,855)PPS LLR Bronze: Earned 10,000 credits (84,876)PSP LLR Bronze: Earned 10,000 credits (15,311)SoB LLR Bronze: Earned 10,000 credits (21,440)SR5 LLR Bronze: Earned 10,000 credits (29,270)SGS LLR Bronze: Earned 10,000 credits (26,616)TPS LLR (retired) Bronze: Earned 10,000 credits (36,288)TRP LLR Bronze: Earned 10,000 credits (41,655)Woodall LLR Bronze: Earned 10,000 credits (15,807)321 Sieve Bronze: Earned 10,000 credits (20,014)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (23,405)PPS Sieve Bronze: Earned 10,000 credits (36,192)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (20,306)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,738)GFN Bronze: Earned 10,000 credits (86,217)PSA Ruby: Earned 2,000,000 credits (2,143,756)
Message 20168 - Posted: 28 Dec 2009 | 9:03:00 UTC - in response to Message 20167.

geocities is no more available.
Where could is the boinc-code of reynolds situated now? Is this on mfl0p's page?

http://sites.google.com/site/geoffreywalterreynolds/programs/
____________

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 21362 - Posted: 25 Feb 2010 | 3:02:21 UTC

Question for Jarek, AP26 has searched a large range thus far, and currently at around K 35,000,000 for shift=0 and about 3,500,000 for shift=256

All CPU/GPU applications I have built used Geoff's code base with the change to up the limit of the application's test range ability ref:


K cannot exceed 76,419,953
This a distant problem, would appear around 116,000,000 WU, the search is highly unlikely to go that high.

Cure for this very remote problem is:
In MAKES.H replace all the expressions of the form
(j*STEP)%541
with corresponding
(j*(STEP%541))%541
but this is not a serious issue at the moment.

/* This file replaces MAKES.H from Jaroslaw Wroblewski's AP26 sample implementation. The only changes are to use int64_t instead of long long, and to replace expressions (j*STEP)%X with (j*(STEP%X))%X. */


Looking at other parts of the program, is there a set "ceiling" K or shift value that the application will start to overflow the data types used, or your search algorithm will no longer work properly? Another thing to be considered, the PrimeQ replacement code, which is different across CPU and GPU platforms, will it overflow at a known value?

I am looking to the future, since AP26 has not appeared so far.

Thanks,
Bryan
____________

Jarek
Volunteer developer
Send message
Joined: 28 Dec 08
Posts: 57
ID: 33488
Credit: 10
RAC: 0

Message 21365 - Posted: 25 Feb 2010 | 5:32:14 UTC - in response to Message 21362.

I do not know about limitations of various versions of PrimeQ. At the moment the largest primes tested are in 58 bits. With a given SHIFT and K all the primes tested are below MOD*(SHIFT+63)+K*23#*25.

SHIFT is safe up to 35584 which is sky high.

In my original AP26.h you can find lines:

n0=(N0*(K%17835)+((N0*17835)%MOD)*(K/17835)+N30)%MOD;

S31=(PRES2*(K%17835)+((PRES2*17835)%MOD)*(K/17835))%MOD;
S37=(PRES3*(K%17835)+((PRES3*17835)%MOD)*(K/17835))%MOD;
S41=(PRES4*(K%17835)+((PRES4*17835)%MOD)*(K/17835))%MOD;
S43=(PRES5*(K%17835)+((PRES5*17835)%MOD)*(K/17835))%MOD;
S47=(PRES6*(K%17835)+((PRES6*17835)%MOD)*(K/17835))%MOD;
S53=(PRES7*(K%17835)+((PRES7*17835)%MOD)*(K/17835))%MOD;
S59=(PRES8*(K%17835)+((PRES8*17835)%MOD)*(K/17835))%MOD;

where I partly guarded against modular multiplication overflow with a
dummy number 17835, which made program safe up to K=318,000,000.

In human readible form those lines should do
n0=(N0*K+N30)%MOD;

S31=(PRES2*K)%MOD;
etc...

but the above as is would very soon overflow 64-bit integers.

Profile mfl0pProject donor
Volunteer developer
Send message
Joined: 5 Apr 09
Posts: 195
ID: 38042
Credit: 658,862,905
RAC: 237,801
Discovered 2 AP26sFound 1 prime in the 2019 Tour de PrimesESP LLR Ruby: Earned 2,000,000 credits (3,689,780)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,696,299)PPS LLR Turquoise: Earned 5,000,000 credits (9,217,311)PSP LLR Bronze: Earned 10,000 credits (26,222)SoB LLR Turquoise: Earned 5,000,000 credits (6,906,699)TRP LLR Amethyst: Earned 1,000,000 credits (1,704,136)Woodall LLR Silver: Earned 100,000 credits (218,868)PPS Sieve Sapphire: Earned 20,000,000 credits (30,517,663)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Bronze: Earned 10,000 credits (47,384)TRP Sieve (suspended) Bronze: Earned 10,000 credits (49,560)AP 26/27 Double Gold: Earned 500,000,000 credits (558,897,095)GFN Sapphire: Earned 20,000,000 credits (45,426,073)PSA Silver: Earned 100,000 credits (456,000)
Message 56129 - Posted: 4 Jul 2012 | 21:40:04 UTC
Last modified: 4 Jul 2012 | 21:41:01 UTC

Anyone wanting to continue AP26 may want to modify the code for today's improved hardware. Source code for CUDA, PS3, and CPU apps can be found here:

https://sites.google.com/site/mfl0pwww/files
____________

Message boards : AP26 - AP27 Search : Improving the AP26 application

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2019 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 2.91, 2.78, 2.37
Generated 20 Oct 2019 | 2:09:01 UTC