PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Number crunching : Max FFT size for each LLR project?

Author Message
mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 130938 - Posted: 8 Jul 2019 | 12:05:34 UTC

I think there was a post in the past that had FFT sizes for each project, but I can't find it again. Anyone know where it was? It is possible it would be out of date also, so is there some way I can find current values short of looking at random units?

Why? Ryzen 3000 (Zen 2) was launched yesterday, and comes with at least 32MB of L3 cache. I expect to get my sample on Tuesday and bench it. Based on what I currently understand of it, it has potential to be the best choice for LLR use in terms of a balance of power consumption, compute performance, and pricing. FFT size will allow a more educated guess into the optimal running configuration, which I can then test with benchmarks.

Profile Michael GoetzProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Jan 10
Posts: 13189
ID: 53948
Credit: 216,634,659
RAC: 40,228
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,822,730)Cullen LLR Ruby: Earned 2,000,000 credits (2,005,249)ESP LLR Turquoise: Earned 5,000,000 credits (5,009,577)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Jade: Earned 10,000,000 credits (12,841,928)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (34,291,181)SR5 LLR Jade: Earned 10,000,000 credits (10,002,028)SGS LLR Ruby: Earned 2,000,000 credits (2,014,138)TRP LLR Ruby: Earned 2,000,000 credits (2,737,347)Woodall LLR Ruby: Earned 2,000,000 credits (2,195,123)321 Sieve Turquoise: Earned 5,000,000 credits (8,208,975)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (20,114,159)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,114,260)GFN Emerald: Earned 50,000,000 credits (72,213,603)PSA Jade: Earned 10,000,000 credits (12,404,447)
Message 130940 - Posted: 8 Jul 2019 | 12:54:33 UTC - in response to Message 130938.

I think there was a post in the past that had FFT sizes for each project, but I can't find it again. Anyone know where it was? It is possible it would be out of date also, so is there some way I can find current values short of looking at random units?

Why? Ryzen 3000 (Zen 2) was launched yesterday, and comes with at least 32MB of L3 cache. I expect to get my sample on Tuesday and bench it. Based on what I currently understand of it, it has potential to be the best choice for LLR use in terms of a balance of power consumption, compute performance, and pricing. FFT size will allow a more educated guess into the optimal running configuration, which I can then test with benchmarks.


Current values:

+-------+----------------------------------------+------+------+ | appid | user_friendly_name | min | max | +-------+----------------------------------------+------+------+ | 2 | Sophie Germain (LLR) | 128 | 128 | | 3 | Woodall (LLR) | 1440 | 1920 | | 4 | Cullen (LLR) | 1536 | 1920 | | 7 | 321 (LLR) | 800 | 800 | | 8 | Prime Sierpinski Problem (LLR) | 1920 | 2048 | | 10 | PPS (LLR) | 192 | 192 | | 13 | Seventeen or Bust | 2560 | 2880 | | 15 | The Riesel Problem (LLR) | 720 | 1008 | | 18 | PPSE (LLR) | 120 | 120 | | 19 | Sierpinski/Riesel Base 5 Problem (LLR) | 560 | 720 | | 20 | Extended Sierpinski Problem | 1280 | 1280 | | 21 | PPS-Mega (LLR) | 200 | 256 | | 30 | Generalized Cullen/Woodall (LLR) | 1440 | 1792 | +-------+----------------------------------------+------+------+


The SoB line includes post-DC n=31M candidates.
____________
My lucky number is 75898524288+1

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 130941 - Posted: 8 Jul 2019 | 13:14:47 UTC - in response to Message 130940.

Thanks Michael, looks like they will comfortably fit in the L3 cache so it could give some interesting performance numbers, all going well. Will write more once I've tested, hopefully tomorrow.

Profile JayProject donor
Send message
Joined: 27 Feb 10
Posts: 97
ID: 56067
Credit: 50,781,982
RAC: 15,828
321 LLR Silver: Earned 100,000 credits (104,482)Cullen LLR Silver: Earned 100,000 credits (102,670)ESP LLR Silver: Earned 100,000 credits (104,502)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (114,572)PPS LLR Silver: Earned 100,000 credits (102,015)PSP LLR Silver: Earned 100,000 credits (118,657)SoB LLR Sapphire: Earned 20,000,000 credits (25,459,489)SR5 LLR Silver: Earned 100,000 credits (100,043)SGS LLR Silver: Earned 100,000 credits (105,539)TRP LLR Silver: Earned 100,000 credits (106,264)Woodall LLR Silver: Earned 100,000 credits (107,466)321 Sieve Silver: Earned 100,000 credits (140,910)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,005,637)Generalized Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (104,861)PPS Sieve Ruby: Earned 2,000,000 credits (4,033,489)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Jade: Earned 10,000,000 credits (15,365,882)TRP Sieve (suspended) Silver: Earned 100,000 credits (119,061)AP 26/27 Silver: Earned 100,000 credits (105,118)GFN Silver: Earned 100,000 credits (250,199)PSA Silver: Earned 100,000 credits (131,126)
Message 130945 - Posted: 8 Jul 2019 | 16:39:30 UTC - in response to Message 130941.

Thanks Michael, looks like they will comfortably fit in the L3 cache so it could give some interesting performance numbers, all going well. Will write more once I've tested, hopefully tomorrow.


I'm looking forward to reading what you learn. I'm very interested in AVX-512 performance on these. I've been waiting for a couple years to upgrade to something with good AVX-512 and a good amount of cache.

Profile GrebulonerProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Nov 09
Posts: 280
ID: 49572
Credit: 1,521,549,860
RAC: 584,507
Discovered 1 mega primeFound 2 primes in the 2018 Tour de PrimesFound 4 primes in the 2019 Tour de PrimesFound 3 primes in the 2020 Tour de PrimesFound 1 mega prime in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (11,836,135)Cullen LLR Jade: Earned 10,000,000 credits (10,168,095)ESP LLR Jade: Earned 10,000,000 credits (11,401,438)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (11,458,304)PPS LLR Sapphire: Earned 20,000,000 credits (32,702,911)PSP LLR Jade: Earned 10,000,000 credits (12,641,950)SoB LLR Jade: Earned 10,000,000 credits (14,905,469)SR5 LLR Sapphire: Earned 20,000,000 credits (21,058,593)SGS LLR Jade: Earned 10,000,000 credits (10,162,090)TRP LLR Jade: Earned 10,000,000 credits (13,491,050)Woodall LLR Jade: Earned 10,000,000 credits (10,037,126)321 Sieve Sapphire: Earned 20,000,000 credits (21,753,607)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,178,073)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (56,046,594)PPS Sieve Double Gold: Earned 500,000,000 credits (500,175,369)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (9,468,384)TRP Sieve (suspended) Jade: Earned 10,000,000 credits (10,076,645)AP 26/27 Double Silver: Earned 200,000,000 credits (421,938,644)GFN Double Silver: Earned 200,000,000 credits (212,078,636)PSA Double Bronze: Earned 100,000,000 credits (125,982,552)
Message 130948 - Posted: 8 Jul 2019 | 16:55:59 UTC - in response to Message 130941.

Thanks Michael, looks like they will comfortably fit in the L3 cache so it could give some interesting performance numbers, all going well. Will write more once I've tested, hopefully tomorrow.


I have been intrigued myself how well a 64 MB cache would contend with large WUs, now that AMD has a parity implementation of AVX2 vs. Intel.

I was reading the Anandtech review on the matter and came across this paragraph (emphasis mine):

What immediately catches the eye when switching between the two results is the new 16MB L3 cache capacity which doubles upon the 8MB of Matisse. We have to remind ourselves that even though the whole chip contains 64MB of L3 cache, this is not a unified cache and a single CPU core will only see its own CCX’s L3 cache before going into main memory, which is in contrast to Intel’s L3 cache where all the cores have access to the full amount.


The 3700X has 2 CCXes of 4 cores each, and the 3900X is supposedly 4 CCXes with 3 active cores each. I think that the cache exclusivity (and multithreaded LLR's scaling problem over multiple sockets) will mean that maximum efficiency will occur by choosing thread x task counts by CCX, while PSP, SoB, and possibly Cul/Woo will still need to work with main memory. All of this of course relying on Windows 10 1903 getting the process assignments right.

I am very much looking forward to your results! And I think we all are hoping that buying an AMD system no longer means making PG compromises...
____________
Eating more cheese on Thursdays.

Profile GrebulonerProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Nov 09
Posts: 280
ID: 49572
Credit: 1,521,549,860
RAC: 584,507
Discovered 1 mega primeFound 2 primes in the 2018 Tour de PrimesFound 4 primes in the 2019 Tour de PrimesFound 3 primes in the 2020 Tour de PrimesFound 1 mega prime in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (11,836,135)Cullen LLR Jade: Earned 10,000,000 credits (10,168,095)ESP LLR Jade: Earned 10,000,000 credits (11,401,438)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (11,458,304)PPS LLR Sapphire: Earned 20,000,000 credits (32,702,911)PSP LLR Jade: Earned 10,000,000 credits (12,641,950)SoB LLR Jade: Earned 10,000,000 credits (14,905,469)SR5 LLR Sapphire: Earned 20,000,000 credits (21,058,593)SGS LLR Jade: Earned 10,000,000 credits (10,162,090)TRP LLR Jade: Earned 10,000,000 credits (13,491,050)Woodall LLR Jade: Earned 10,000,000 credits (10,037,126)321 Sieve Sapphire: Earned 20,000,000 credits (21,753,607)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,178,073)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (56,046,594)PPS Sieve Double Gold: Earned 500,000,000 credits (500,175,369)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (9,468,384)TRP Sieve (suspended) Jade: Earned 10,000,000 credits (10,076,645)AP 26/27 Double Silver: Earned 200,000,000 credits (421,938,644)GFN Double Silver: Earned 200,000,000 credits (212,078,636)PSA Double Bronze: Earned 100,000,000 credits (125,982,552)
Message 130949 - Posted: 8 Jul 2019 | 17:01:23 UTC - in response to Message 130945.

I'm very interested in AVX-512 performance on these. I've been waiting for a couple years to upgrade to something with good AVX-512 and a good amount of cache.


AVX-512 is still exclusive to HEDT/Server Intel processors (will supposedly hit mainstream platforms whenever Ice Lake arrives). I can't find anything even rumor-based that it will come to AMD in Zen 3.
____________
Eating more cheese on Thursdays.

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 130951 - Posted: 8 Jul 2019 | 18:13:07 UTC - in response to Message 130948.

The more I learn, the less great this CPU series seems.

The bandwidth between the cores and IO is asymmetric, with only half speed writes.
All data leaving a CCX goes through IO die, even if the destination is another CCX on the same die.
That the L3 cache is not unified means running 8 cores on one task might not be so great. Treating it as 2x4 cores is probably better.

Still, my CPU has been dispatched so I hope to bench tomorrow evening.

Profile JayProject donor
Send message
Joined: 27 Feb 10
Posts: 97
ID: 56067
Credit: 50,781,982
RAC: 15,828
321 LLR Silver: Earned 100,000 credits (104,482)Cullen LLR Silver: Earned 100,000 credits (102,670)ESP LLR Silver: Earned 100,000 credits (104,502)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (114,572)PPS LLR Silver: Earned 100,000 credits (102,015)PSP LLR Silver: Earned 100,000 credits (118,657)SoB LLR Sapphire: Earned 20,000,000 credits (25,459,489)SR5 LLR Silver: Earned 100,000 credits (100,043)SGS LLR Silver: Earned 100,000 credits (105,539)TRP LLR Silver: Earned 100,000 credits (106,264)Woodall LLR Silver: Earned 100,000 credits (107,466)321 Sieve Silver: Earned 100,000 credits (140,910)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,005,637)Generalized Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (104,861)PPS Sieve Ruby: Earned 2,000,000 credits (4,033,489)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Jade: Earned 10,000,000 credits (15,365,882)TRP Sieve (suspended) Silver: Earned 100,000 credits (119,061)AP 26/27 Silver: Earned 100,000 credits (105,118)GFN Silver: Earned 100,000 credits (250,199)PSA Silver: Earned 100,000 credits (131,126)
Message 130955 - Posted: 8 Jul 2019 | 19:53:56 UTC - in response to Message 130949.

AVX-512 is still exclusive to HEDT/Server Intel processors (will supposedly hit mainstream platforms whenever Ice Lake arrives). I can't find anything even rumor-based that it will come to AMD in Zen 3.


Dagnabit! I nearly pulled the trigger on a Xeon Phi 7290 a month ago becuase of AVX-512 and cache. But I decided to wait to see what AMD was bringing.

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 130991 - Posted: 9 Jul 2019 | 23:12:07 UTC
Last modified: 9 Jul 2019 | 23:31:07 UTC

Ok, I wont start a new thread for now. Too much data and not enough conclusion. I don't have an image host to put the chart here, but it is at the following link along with a description of the testing.
https://linustechtips.com/main/topic/1080453-ryzen-3600-vs-8086k-for-prime-number-finding/

The question here is probably how fast is Zen 2? Well, the provisional result is they have matched Intel in FPU performance, and with optimisations to how you run tasks, the bigger cache means it can maintain higher performance at bigger FFT sizes.

Note the following is a prediction of throughput based on the Prime95 benchmark data. Do not complain to me if you see differently in reality, as I have also seen discrepancies between theory and reality. That will be for later testing to confirm or otherwise.

I'm comparing my 8086k and the 3600 with a better cooler than it comes with. Clocks did seem to fluctuate with temperature on the 3600.

For small tasks (SGS, PPSE, PPS) running one per core the 8086k takes a small lead, presumably from its higher clock, but the 3600 is close behind.

Mega is in an awkward spot for the 8086k, as it isn't fully efficient one per core, nor running -t3. Maybe -t2 would be better but that was not a tested condition. This size is no problem for the 3600 and it should be faster.

Into the mid sizes (TRP, SR5, 321, ESP) it is better to run the 8086k with 1 task of 6 threads as all the cache is needed to feed it. The 3600 differs here, and two tasks of 3 cores each was more optimal. This is probably due to the 32MB cache actually being split into two separate 16MB regions, so one task per region gives best performance. Running a single task with 6 cores was significantly below that.

The bigger tasks are where the differences really show. GCW, Woo, Cul, PSP, SoB are beyond the 8086k cache and performance drops down towards that afforded by the ram speed. I was only using 3000 which is wholly inadequate to feed it, and it would take about double that. The 3600 still at two tasks of 3 cores each could still do these excluding SoB with high performance, up to 50% faster than the 8086k. For leading edge SoB work the relative performance of the 3600 drops, presumably as it has now exceeded the 16MB chunk size of L3 cache and it is partially hitting ram again. Still, running a single task on 6 threads here gives significantly greater performance than the 8086k.

To recap:
Small units: run 1 task per core on either CPU
Medium units: run 1 task 6 cores on Intel, 2 tasks 3 cores on 3600.
Large units: 1 task 6 cores on both systems, but 3600 will be much faster.

I didn't run 3 tasks of 2 cores each on either system. It should be viable on 8086k and may fill the small performance hole observed at Mega size units. It probably isn't a good idea on the 3600 as you can't divide two cache chunks into 3 equal units, but the bigger cache allows you to jump from 1 to 3 cores per task anyway.

Similar principles could be applied to 8, 12, 16 core models.

I would add the CPU ran hotter than I'd expect, given it wasn't taking that much power in the greater scheme of things, and I had an upgraded over stock cooler. It was running ball park 80C. There is speculation that although the power might not be high, the die is relatively small so power density is the problem. I'm not sure there is a good solution for that. Maybe the 8 core per die models will have more of a cooling problem.

Note I'm running PPSE on the 3600, 1 per core, and will leave it going overnight. If you look up the system, ignore the 1st 12 units done as I mistakenly left in -t2 from previous usage and was also doing things with affinity. It is looking slightly faster than a 6700k I'm using as reference, even though the 6700k is at 4 GHz and 3600 averaged around 50MHz under.

Profile Michael GoetzProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Jan 10
Posts: 13189
ID: 53948
Credit: 216,634,659
RAC: 40,228
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,822,730)Cullen LLR Ruby: Earned 2,000,000 credits (2,005,249)ESP LLR Turquoise: Earned 5,000,000 credits (5,009,577)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Jade: Earned 10,000,000 credits (12,841,928)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (34,291,181)SR5 LLR Jade: Earned 10,000,000 credits (10,002,028)SGS LLR Ruby: Earned 2,000,000 credits (2,014,138)TRP LLR Ruby: Earned 2,000,000 credits (2,737,347)Woodall LLR Ruby: Earned 2,000,000 credits (2,195,123)321 Sieve Turquoise: Earned 5,000,000 credits (8,208,975)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (20,114,159)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,114,260)GFN Emerald: Earned 50,000,000 credits (72,213,603)PSA Jade: Earned 10,000,000 credits (12,404,447)
Message 130996 - Posted: 10 Jul 2019 | 0:23:28 UTC
Last modified: 10 Jul 2019 | 0:24:04 UTC

Would it be fair to say...

1) It looks like AMD finally built a CPU with full AVX performance.

2) The huge cache is a big advantage for those FFTs where the FFT fits in the Ryzen cache but not in the Intel cache.

3) Ryzen is now competitive with Intel unless you go with a (dual unit) AVX-512 CPU.

Is that correct?
____________
My lucky number is 75898524288+1

GregCinAZProject donor
Send message
Joined: 12 Nov 18
Posts: 34
ID: 1077873
Credit: 284,389,832
RAC: 1,303,122
321 LLR Ruby: Earned 2,000,000 credits (3,786,878)Cullen LLR Ruby: Earned 2,000,000 credits (2,381,527)ESP LLR Ruby: Earned 2,000,000 credits (2,239,427)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,162,067)PPS LLR Ruby: Earned 2,000,000 credits (3,035,107)PSP LLR Ruby: Earned 2,000,000 credits (2,274,591)SoB LLR Ruby: Earned 2,000,000 credits (3,887,741)SR5 LLR Ruby: Earned 2,000,000 credits (2,141,632)SGS LLR Ruby: Earned 2,000,000 credits (2,187,103)TRP LLR Ruby: Earned 2,000,000 credits (2,181,237)Woodall LLR Ruby: Earned 2,000,000 credits (2,359,557)321 Sieve Sapphire: Earned 20,000,000 credits (20,006,668)Generalized Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (2,038,005)PPS Sieve Double Bronze: Earned 100,000,000 credits (167,164,519)AP 26/27 Jade: Earned 10,000,000 credits (15,589,808)GFN Sapphire: Earned 20,000,000 credits (34,640,959)PSA Jade: Earned 10,000,000 credits (16,316,377)
Message 130998 - Posted: 10 Jul 2019 | 2:45:37 UTC - in response to Message 130991.

Wow mackerel, thank you so much for that. Very encouraging as my 3600X is arriving Thursday.

zombie67 [MM]Project donor
Avatar
Send message
Joined: 30 Nov 06
Posts: 206
ID: 4065
Credit: 938,871,524
RAC: 1,710,947
Discovered 1 mega primeFound 3 primes in the 2020 Tour de PrimesFound 1 mega prime in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (7,405,319)Cullen LLR Jade: Earned 10,000,000 credits (10,023,175)ESP LLR Turquoise: Earned 5,000,000 credits (5,471,658)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,662,315)PPS LLR Jade: Earned 10,000,000 credits (15,162,648)PSP LLR Jade: Earned 10,000,000 credits (10,217,431)SoB LLR Jade: Earned 10,000,000 credits (10,437,158)SR5 LLR Turquoise: Earned 5,000,000 credits (8,226,119)SGS LLR Jade: Earned 10,000,000 credits (10,038,442)TRP LLR Jade: Earned 10,000,000 credits (10,150,363)Woodall LLR Jade: Earned 10,000,000 credits (10,363,566)321 Sieve Jade: Earned 10,000,000 credits (10,034,713)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,297,856)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,711,622)PPS Sieve Double Silver: Earned 200,000,000 credits (392,028,584)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Jade: Earned 10,000,000 credits (11,707,274)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,316,578)AP 26/27 Emerald: Earned 50,000,000 credits (51,222,372)GFN Double Silver: Earned 200,000,000 credits (287,310,850)PSA Emerald: Earned 50,000,000 credits (52,053,230)
Message 131001 - Posted: 10 Jul 2019 | 7:36:34 UTC - in response to Message 130998.

Wow mackerel, thank you so much for that.


Yes! Thanks!
____________
Reno, NV

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 131002 - Posted: 10 Jul 2019 | 8:19:44 UTC - in response to Message 130996.

To Michael's questions:
1, the AVX performance is now comparable to Intel consumer CPUs. I still need further testing to get a more precise figure on that.
2, yes, the cache helps it work faster in more situations. I've seen similar with the desktop Broadwell CPUs, which had 128 MB L4 cache. Basically it didn't matter what ram you had attached to it. I was running it single channel for a time when I had a shortage and performance remained high.
3, same as #1, but a two unit AVX-512 CPU has more potential, as well as more heat resulting from that. Even at stock I don't consider my 7800X to be safe as it hits 100C with high end air cooler.

A later test will be performance per watt, and I suspect the 3600 will be ahead in that too.

rjs5
Send message
Joined: 20 Feb 11
Posts: 34
ID: 87238
Credit: 520,611,635
RAC: 747,158
Discovered 3 mega primes321 LLR Jade: Earned 10,000,000 credits (10,244,140)Cullen LLR Jade: Earned 10,000,000 credits (10,235,118)ESP LLR Jade: Earned 10,000,000 credits (10,191,519)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,253,932)PPS LLR Jade: Earned 10,000,000 credits (10,024,943)PSP LLR Jade: Earned 10,000,000 credits (10,188,140)SoB LLR Jade: Earned 10,000,000 credits (10,297,183)SR5 LLR Jade: Earned 10,000,000 credits (10,048,145)SGS LLR Turquoise: Earned 5,000,000 credits (7,674,124)TRP LLR Turquoise: Earned 5,000,000 credits (5,979,654)Woodall LLR Turquoise: Earned 5,000,000 credits (9,342,655)321 Sieve Turquoise: Earned 5,000,000 credits (7,884,360)Cullen/Woodall Sieve (suspended) Gold: Earned 500,000 credits (879,677)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,273,914)PPS Sieve Emerald: Earned 50,000,000 credits (60,304,021)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,437,484)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,509,929)AP 26/27 Sapphire: Earned 20,000,000 credits (23,000,627)GFN Double Silver: Earned 200,000,000 credits (272,310,511)PSA Sapphire: Earned 20,000,000 credits (35,533,950)
Message 131017 - Posted: 10 Jul 2019 | 17:21:04 UTC - in response to Message 130940.

What are the units of the FFT sizes in the table?
Do those sizes represent all the data needed for the FFT plus the additional software prefetched data?


I think there was a post in the past that had FFT sizes for each project, but I can't find it again. Anyone know where it was? It is possible it would be out of date also, so is there some way I can find current values short of looking at random units?

Why? Ryzen 3000 (Zen 2) was launched yesterday, and comes with at least 32MB of L3 cache. I expect to get my sample on Tuesday and bench it. Based on what I currently understand of it, it has potential to be the best choice for LLR use in terms of a balance of power consumption, compute performance, and pricing. FFT size will allow a more educated guess into the optimal running configuration, which I can then test with benchmarks.


Current values:

+-------+----------------------------------------+------+------+ | appid | user_friendly_name | min | max | +-------+----------------------------------------+------+------+ | 2 | Sophie Germain (LLR) | 128 | 128 | | 3 | Woodall (LLR) | 1440 | 1920 | | 4 | Cullen (LLR) | 1536 | 1920 | | 7 | 321 (LLR) | 800 | 800 | | 8 | Prime Sierpinski Problem (LLR) | 1920 | 2048 | | 10 | PPS (LLR) | 192 | 192 | | 13 | Seventeen or Bust | 2560 | 2880 | | 15 | The Riesel Problem (LLR) | 720 | 1008 | | 18 | PPSE (LLR) | 120 | 120 | | 19 | Sierpinski/Riesel Base 5 Problem (LLR) | 560 | 720 | | 20 | Extended Sierpinski Problem | 1280 | 1280 | | 21 | PPS-Mega (LLR) | 200 | 256 | | 30 | Generalized Cullen/Woodall (LLR) | 1440 | 1792 | +-------+----------------------------------------+------+------+


The SoB line includes post-DC n=31M candidates.

Profile Michael GoetzProject donor
Honorary cruncher
Avatar
Send message
Joined: 21 Jan 10
Posts: 13189
ID: 53948
Credit: 216,634,659
RAC: 40,228
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,822,730)Cullen LLR Ruby: Earned 2,000,000 credits (2,005,249)ESP LLR Turquoise: Earned 5,000,000 credits (5,009,577)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Jade: Earned 10,000,000 credits (12,841,928)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (34,291,181)SR5 LLR Jade: Earned 10,000,000 credits (10,002,028)SGS LLR Ruby: Earned 2,000,000 credits (2,014,138)TRP LLR Ruby: Earned 2,000,000 credits (2,737,347)Woodall LLR Ruby: Earned 2,000,000 credits (2,195,123)321 Sieve Turquoise: Earned 5,000,000 credits (8,208,975)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (20,114,159)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,114,260)GFN Emerald: Earned 50,000,000 credits (72,213,603)PSA Jade: Earned 10,000,000 credits (12,404,447)
Message 131018 - Posted: 10 Jul 2019 | 18:17:27 UTC - in response to Message 131017.

What are the units of the FFT sizes in the table?
Do those sizes represent all the data needed for the FFT plus the additional software prefetched data?


Those are the FFT sizes (in K, as in 2K == 2048) that LLR chooses for its calculation. That's the number of elements in each array.

Each element is a complex (i.e., real + imaginary) number consisting of 2 64-bit double precision floating point numbers. That's 8 bytes for each number. I believe you need space to store 3 copies of that so you can do a multiply operation such as C = A * B, so the total memory usage should be 24 times the FFT size.

So, for example, for the largest SoB FFT of 2880K, the memory usage for the FFT calculations is 24 * 2880 * 1024 or 70778880. That's 67.5 MB.

For the smallest PPSE FFT of 120K, that's 24 * 120 * 1024 or 2.8 MB.

There's other memory used as well, but it's insignificant. What's important is that the FFT storage fits in cache, since cache is a LOT faster than main memory. If the entire FFT fits in cache the task can run a lot faster.
____________
My lucky number is 75898524288+1

rjs5
Send message
Joined: 20 Feb 11
Posts: 34
ID: 87238
Credit: 520,611,635
RAC: 747,158
Discovered 3 mega primes321 LLR Jade: Earned 10,000,000 credits (10,244,140)Cullen LLR Jade: Earned 10,000,000 credits (10,235,118)ESP LLR Jade: Earned 10,000,000 credits (10,191,519)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,253,932)PPS LLR Jade: Earned 10,000,000 credits (10,024,943)PSP LLR Jade: Earned 10,000,000 credits (10,188,140)SoB LLR Jade: Earned 10,000,000 credits (10,297,183)SR5 LLR Jade: Earned 10,000,000 credits (10,048,145)SGS LLR Turquoise: Earned 5,000,000 credits (7,674,124)TRP LLR Turquoise: Earned 5,000,000 credits (5,979,654)Woodall LLR Turquoise: Earned 5,000,000 credits (9,342,655)321 Sieve Turquoise: Earned 5,000,000 credits (7,884,360)Cullen/Woodall Sieve (suspended) Gold: Earned 500,000 credits (879,677)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,273,914)PPS Sieve Emerald: Earned 50,000,000 credits (60,304,021)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,437,484)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,509,929)AP 26/27 Sapphire: Earned 20,000,000 credits (23,000,627)GFN Double Silver: Earned 200,000,000 credits (272,310,511)PSA Sapphire: Earned 20,000,000 credits (35,533,950)
Message 131019 - Posted: 10 Jul 2019 | 18:34:37 UTC - in response to Message 131018.

What are the units of the FFT sizes in the table?
Do those sizes represent all the data needed for the FFT plus the additional software prefetched data?


Those are the FFT sizes (in K, as in 2K == 2048) that LLR chooses for its calculation. That's the number of elements in each array.

Each element is a complex (i.e., real + imaginary) number consisting of 2 64-bit double precision floating point numbers. That's 8 bytes for each number. I believe you need space to store 3 copies of that so you can do a multiply operation such as C = A * B, so the total memory usage should be 24 times the FFT size.

So, for example, for the largest SoB FFT of 2880K, the memory usage for the FFT calculations is 24 * 2880 * 1024 or 70778880. That's 67.5 MB.

For the smallest PPSE FFT of 120K, that's 24 * 120 * 1024 or 2.8 MB.

There's other memory used as well, but it's insignificant. What's important is that the FFT storage fits in cache, since cache is a LOT faster than main memory. If the entire FFT fits in cache the task can run a lot faster.


So the software prefetching is only bringing the 3 copies for the "C = A * B" into the caches?
I thought that the code might also be prefetching the next 3 copies in to prepare for the next iteration.

thanks





Profile VatoProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Feb 08
Posts: 735
ID: 18447
Credit: 168,198,375
RAC: 252,249
Found 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,412,961)Cullen LLR Ruby: Earned 2,000,000 credits (2,198,091)ESP LLR Ruby: Earned 2,000,000 credits (2,573,645)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,014,176)PPS LLR Turquoise: Earned 5,000,000 credits (6,749,140)PSP LLR Ruby: Earned 2,000,000 credits (3,100,794)SoB LLR Ruby: Earned 2,000,000 credits (2,023,559)SR5 LLR Ruby: Earned 2,000,000 credits (2,979,625)SGS LLR Ruby: Earned 2,000,000 credits (2,515,111)TPS LLR (retired) Silver: Earned 100,000 credits (103,523)TRP LLR Ruby: Earned 2,000,000 credits (2,473,221)Woodall LLR Ruby: Earned 2,000,000 credits (2,068,051)321 Sieve Jade: Earned 10,000,000 credits (19,500,888)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,119,699)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (10,278,995)PPS Sieve Sapphire: Earned 20,000,000 credits (26,285,305)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (4,080,177)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,221,054)AP 26/27 Jade: Earned 10,000,000 credits (14,206,910)GFN Jade: Earned 10,000,000 credits (19,081,239)PSA Sapphire: Earned 20,000,000 credits (34,211,039)
Message 131022 - Posted: 10 Jul 2019 | 20:39:47 UTC - in response to Message 131019.

the gwnum library that LLR uses will modify in place where necessary.
so it's a relatively static and localised data structure (i think).
especially if it lives fully in your cache!
____________

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 131030 - Posted: 10 Jul 2019 | 22:17:08 UTC - in response to Message 131018.
Last modified: 10 Jul 2019 | 22:36:09 UTC

Each element is a complex (i.e., real + imaginary) number consisting of 2 64-bit double precision floating point numbers. That's 8 bytes for each number. I believe you need space to store 3 copies of that so you can do a multiply operation such as C = A * B, so the total memory usage should be 24 times the FFT size.

I found 8x to work well for estimating performance. On the complex part, I had long wondered what that does as it is a benchmark option in P95, but I never used it. I took a data set recently but haven't plotted it against non-complex results yet. I think I'll do that now...

Edit: The complex benchmark run looks almost the same as the non-complex one. Values are slightly lower, but the falls occur at the same FFT sizes so this doesn't change any conclusions.

crashtech
Send message
Joined: 19 Jan 17
Posts: 1
ID: 484268
Credit: 250,139,000
RAC: 48,074
321 LLR Turquoise: Earned 5,000,000 credits (6,290,088)Cullen LLR Ruby: Earned 2,000,000 credits (2,564,989)ESP LLR Amethyst: Earned 1,000,000 credits (1,359,437)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,806,036)PPS LLR Turquoise: Earned 5,000,000 credits (9,282,346)PSP LLR Ruby: Earned 2,000,000 credits (4,586,516)SoB LLR Jade: Earned 10,000,000 credits (11,613,869)SR5 LLR Ruby: Earned 2,000,000 credits (3,232,624)SGS LLR Silver: Earned 100,000 credits (261,314)TRP LLR Ruby: Earned 2,000,000 credits (4,062,534)Woodall LLR Amethyst: Earned 1,000,000 credits (1,770,588)321 Sieve Ruby: Earned 2,000,000 credits (2,706,670)Generalized Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (158,385)PPS Sieve Double Bronze: Earned 100,000,000 credits (138,285,162)AP 26/27 Jade: Earned 10,000,000 credits (12,945,686)GFN Sapphire: Earned 20,000,000 credits (48,212,756)
Message 131093 - Posted: 14 Jul 2019 | 21:05:35 UTC

Hi,

There seems to be a bit of disagreement in regards to calculating the approximate L3 footprint of the various PG WUs. The difference between multiplying by 8 or 24 is pretty large! Is there a reason why 8x seems to work in practical terms when 24x appears to be the theoretical answer?

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 131099 - Posted: 15 Jul 2019 | 7:55:24 UTC - in response to Message 131093.

I've only ever seen and used 8x up to this thread and I don't see any evidence to use otherwise.

rjs5
Send message
Joined: 20 Feb 11
Posts: 34
ID: 87238
Credit: 520,611,635
RAC: 747,158
Discovered 3 mega primes321 LLR Jade: Earned 10,000,000 credits (10,244,140)Cullen LLR Jade: Earned 10,000,000 credits (10,235,118)ESP LLR Jade: Earned 10,000,000 credits (10,191,519)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,253,932)PPS LLR Jade: Earned 10,000,000 credits (10,024,943)PSP LLR Jade: Earned 10,000,000 credits (10,188,140)SoB LLR Jade: Earned 10,000,000 credits (10,297,183)SR5 LLR Jade: Earned 10,000,000 credits (10,048,145)SGS LLR Turquoise: Earned 5,000,000 credits (7,674,124)TRP LLR Turquoise: Earned 5,000,000 credits (5,979,654)Woodall LLR Turquoise: Earned 5,000,000 credits (9,342,655)321 Sieve Turquoise: Earned 5,000,000 credits (7,884,360)Cullen/Woodall Sieve (suspended) Gold: Earned 500,000 credits (879,677)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,273,914)PPS Sieve Emerald: Earned 50,000,000 credits (60,304,021)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,437,484)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,509,929)AP 26/27 Sapphire: Earned 20,000,000 credits (23,000,627)GFN Double Silver: Earned 200,000,000 credits (272,310,511)PSA Sapphire: Earned 20,000,000 credits (35,533,950)
Message 131162 - Posted: 16 Jul 2019 | 21:59:24 UTC - in response to Message 131093.

Hi,

There seems to be a bit of disagreement in regards to calculating the approximate L3 footprint of the various PG WUs. The difference between multiplying by 8 or 24 is pretty large! Is there a reason why 8x seems to work in practical terms when 24x appears to be the theoretical answer?


There is a lot of empirical data that shows that running the CPU at 50% (no hyper-threading) yields the "best" or "near best" performance. I am not sure how long this has been the assumption, but I am not sure how frequently the bottleneck is the cache size.

On the i9-9980XE I am running, it appears more like the Memory Fill Buffer limit is the problem. Fill Buffer unavailable event counts spike at 50% CPU load and the number of cache line evictions stays flat. That implies that there is room in the cache (no evictions), but memory Read/Write traffic is too high.

The information available about when a software prefetch consumes a Fill Buffer and/or generates a CPU stall is confusing to me and I think changes between CPU. Think Intel was even talking about changing the CPU behavior on Skylake so the software prefetch was converted into a NOP if there were no Fill Buffers available.

It appears more like gwnum is over-tuned for one WU which causes multiple WU to choke the bus ... stalling other WU.

I wish it was easier to build the Liinux64 gwnum.a from source, then I could do some testing/analysis.



mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 131178 - Posted: 17 Jul 2019 | 12:00:15 UTC - in response to Message 131162.
Last modified: 17 Jul 2019 | 12:00:44 UTC

The basic operating state for gwnum is for a single core on a single task. Running multiple tasks, and/or multi-threaded tasks takes a little more consideration.

I think George had said in the past the multi-thread code isn't the best, and breaks up the work into smaller bits before re-assembling them. So the practical result of that is, smaller tasks don't scale well running multiple threads.

With normalised data from Prime95 benchmark it is possible to see how different scenarios behave. In general, if the total footprint of all running tasks is less than the L3 size, you generally get good performance. This could be simplified as FFT size * 8 * number of tasks running < L3 cache size. If you exceed L3 cache size, then ram bandwidth enters into the equation. Ram bandwidth shortage is the biggest limitation for bigger tasks. Dual channel is wholly inadequate for >4 core fast cores (most Intel Core CPUs, Zen 2 Ryzen). Running a single task multi-thread helps in this scenario.

HT is a complicated matter. I've only occasionally seen hints that, in some limited scenarios, it can give an uplift in performance compared to not having/using it. In general it doesn't seem to give any significant boost but still increases power consumption. There are also scenarios where it can be used to lessen losses e.g. Windows scheduler has sucked in past, don't know if it changed since then. Running multiple single thread tasks one per core without affinity could lower throughput by ~10%. Affinity resolves that, turning off HT resolves that, or running more threads than cores resolves that (at higher power usage).

rjs5
Send message
Joined: 20 Feb 11
Posts: 34
ID: 87238
Credit: 520,611,635
RAC: 747,158
Discovered 3 mega primes321 LLR Jade: Earned 10,000,000 credits (10,244,140)Cullen LLR Jade: Earned 10,000,000 credits (10,235,118)ESP LLR Jade: Earned 10,000,000 credits (10,191,519)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,253,932)PPS LLR Jade: Earned 10,000,000 credits (10,024,943)PSP LLR Jade: Earned 10,000,000 credits (10,188,140)SoB LLR Jade: Earned 10,000,000 credits (10,297,183)SR5 LLR Jade: Earned 10,000,000 credits (10,048,145)SGS LLR Turquoise: Earned 5,000,000 credits (7,674,124)TRP LLR Turquoise: Earned 5,000,000 credits (5,979,654)Woodall LLR Turquoise: Earned 5,000,000 credits (9,342,655)321 Sieve Turquoise: Earned 5,000,000 credits (7,884,360)Cullen/Woodall Sieve (suspended) Gold: Earned 500,000 credits (879,677)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,273,914)PPS Sieve Emerald: Earned 50,000,000 credits (60,304,021)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,437,484)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,509,929)AP 26/27 Sapphire: Earned 20,000,000 credits (23,000,627)GFN Double Silver: Earned 200,000,000 credits (272,310,511)PSA Sapphire: Earned 20,000,000 credits (35,533,950)
Message 131196 - Posted: 17 Jul 2019 | 20:34:58 UTC - in response to Message 131178.

The basic operating state for gwnum is for a single core on a single task. Running multiple tasks, and/or multi-threaded tasks takes a little more consideration.

I think George had said in the past the multi-thread code isn't the best, and breaks up the work into smaller bits before re-assembling them. So the practical result of that is, smaller tasks don't scale well running multiple threads.

With normalised data from Prime95 benchmark it is possible to see how different scenarios behave. In general, if the total footprint of all running tasks is less than the L3 size, you generally get good performance. This could be simplified as FFT size * 8 * number of tasks running < L3 cache size. If you exceed L3 cache size, then ram bandwidth enters into the equation. Ram bandwidth shortage is the biggest limitation for bigger tasks. Dual channel is wholly inadequate for >4 core fast cores (most Intel Core CPUs, Zen 2 Ryzen). Running a single task multi-thread helps in this scenario.

HT is a complicated matter. I've only occasionally seen hints that, in some limited scenarios, it can give an uplift in performance compared to not having/using it. In general it doesn't seem to give any significant boost but still increases power consumption. There are also scenarios where it can be used to lessen losses e.g. Windows scheduler has sucked in past, don't know if it changed since then. Running multiple single thread tasks one per core without affinity could lower throughput by ~10%. Affinity resolves that, turning off HT resolves that, or running more threads than cores resolves that (at higher power usage).


I think we are all 100% in agreement.

I understand the task that George tackled and solved. I understand completely what he was saying about limitations and why he was saying it. I also understand that running an empirical test is the easiest way to get a "reasonable best" performance.

I am just making one additional point that I think prefetching is aggravating the "ram bandwidth" problem ... not helping for threads or multiple WU. I think the "ram bandwidth" problem is triggered and aggravated earlier than "L3 cache full" by gwnum prefetching of data into L1 that is already in the L2 and L3 caches and not in DRAM. I am getting a huge spike in FILL BUFFER NOT AVAILABLE events and very few last level cache (LLC) evictions.

If the data is already in the cache hierarchy, the L1 prefetch instruction consumes a line fill buffer which is the read path from DRAM. All T0 and T1 prefetches would cause this problem. These instructions will insert dead spots in the data pipe from memory and starve the caches every time they prefetch cached data. Other threads or WU really needing data from DRAM will be stalled while the unneeded prefetches are completed.

I have been unable to use Linux performance tools to look at the run-time performance of sllr64 because of the way it is built. All the Linux tools print garbage for the sllr64 code and give up on run-time disassembly.

I could do more analysis if I could build sllr64 (especially on Linux), but I haven't had much success.




mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2324
ID: 29980
Credit: 405,165,578
RAC: 167,005
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (71,943,765)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,993,013)SGS LLR Turquoise: Earned 5,000,000 credits (6,756,929)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (16,460,208)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Jade: Earned 10,000,000 credits (18,014,408)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (74,140,875)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,886,552)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 131231 - Posted: 18 Jul 2019 | 19:34:02 UTC - in response to Message 131196.

Been thinking about it a bit. If you can't test with llr, is it possible to try with prime95? Might be worth a post on mersenneforum.

Post to thread

Message boards : Number crunching : Max FFT size for each LLR project?

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 1.32, 1.20, 1.20
Generated 31 Mar 2020 | 0:25:05 UTC