PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Number crunching : Haswell performance testing

Author Message
mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 83249 - Posted: 6 Feb 2015 | 22:26:25 UTC

As mentioned in my other thread, I had bought and built a new system around a Haswell i5-4570S. This was as I felt my old Sandy Bridge i7-2600k system wasn't cutting it any more in prime finding.

I just spent the evening testing various permutations of projects and settings, not only memory speed.

Tested were:
a, Haswell, 1 core, ram 1600 CL8, FMA 64-bit, 3.2 GHz CPU
b, Haswell, 4 cores, ram 1600 CL8, FMA 64-bit, 3.2 GHz CPU
c, Haswell, 4 cores, ram 1600 CL8, AVX 32-bit, 3.2 GHz CPU
d, Haswell, 4 cores, ram 1333 CL9, FMA 64-bit, 3.2 GHz CPU
e, Sandy Bridge, 4 cores, ram 1333 CL9, AVX 64-bit, 3.5 GHz CPU

The Haswell CPU turbo modes were deliberately set so that the same maximum clock of 3.2 GHz was used regardless of loading.

Tests run were :
PPSE 2703*2^1320249+1
SGS 1790320587027*2^1290000-1
MEGA 261*2^3425713+1
TRP 409753*2^7440347-1
ESP 202705*2^7528454+1
PSP 222113*2^17486093+1
SoB 55459*2^27389974+1

There's nothing special about the chosen numbers to test, other than I was assigned them in the recent past. Therefore they should be generally representative of recent sizes in each project.

That's a lot of results which I'm too tiered to sort through right now. I did some rough comparisons so following is a generalised summary. I'll post the numbers with some analysis some time on the weekend.

Case a vs. b: Haswell running 1 core should not be limited on ram bandwidth, so should show best case scenario. 4 cores are generally expected to give best throughput, even if they run slower. The small tests of PPSE and SGS don't show much difference in time. Once we get to MEGA and above, we're looking at a single core running 32 to 55% faster than 4 cores active. Oddly ESP and TRP are the worst here, not PSP or SoB.

Case a vs. d: Comparing Haswell running 4 cores, with ram at 1600 and 1333 shows no significant difference for PPSE, SGS and MEGA. ESP is about 8% faster with faster ram, and TRP, PSP and SoB are 14% faster with faster ram. If the processes were purely ram bandwidth limited, you'd expect 20% difference between the two settings.

Question: why doesn't MEGA show any difference with ram speeds, yet still shows a performance difference with running 1 vs. 4 instances? Something to do with the way it sits in the different levels of CPU cache perhaps?

Case b vs. e: Haswell vs. Sandy Bridge. Without normalising for the different CPU clock, it is still a clear win for the Haswell based system. Over 50% more throughput for PPSE, 10% for MEGA, and 30% ball park for the rest.

Miklos M.
Send message
Joined: 14 Apr 12
Posts: 365
ID: 138455
Credit: 11,784,946,938
RAC: 9,489,930
Discovered 1 mega prime321 LLR Amethyst: Earned 1,000,000 credits (1,761,944)Cullen LLR Gold: Earned 500,000 credits (800,070)ESP LLR Amethyst: Earned 1,000,000 credits (1,357,780)PPS LLR Turquoise: Earned 5,000,000 credits (6,052,646)PSP LLR Silver: Earned 100,000 credits (113,391)SoB LLR Gold: Earned 500,000 credits (651,117)SR5 LLR Amethyst: Earned 1,000,000 credits (1,712,028)SGS LLR Amethyst: Earned 1,000,000 credits (1,546,160)TRP LLR Amethyst: Earned 1,000,000 credits (1,864,098)Woodall LLR Gold: Earned 500,000 credits (624,562)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (23,418,974)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,036,945)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (83,631,931)PPS Sieve Double Jade: Earned 10,000,000,000 credits (11,412,507,016)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (46,622,519)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (32,704,392)AP 26/27 Sapphire: Earned 20,000,000 credits (30,670,198)WW Amethyst: Earned 1,000,000 credits (1,040,000)GFN Double Bronze: Earned 100,000,000 credits (134,000,297)
Message 83264 - Posted: 8 Feb 2015 | 13:14:30 UTC - in response to Message 83249.

I found the Haswell to be significantly faster than the AuthenticAMD
AMD FX(tm)-8150 Eight-Core, in all tasks. Compared to the GenuineIntel
Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz [Family 6 Model 62 it is faster in only two projects for me: PPSE LLR and the PPS LLR. But its great value in speed is that my two GTX 770's crunch much faster than with AMD. Not quiet as fast as the Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz , now that I am not using hyperthreading on the latter.

Miklos M.
Send message
Joined: 14 Apr 12
Posts: 365
ID: 138455
Credit: 11,784,946,938
RAC: 9,489,930
Discovered 1 mega prime321 LLR Amethyst: Earned 1,000,000 credits (1,761,944)Cullen LLR Gold: Earned 500,000 credits (800,070)ESP LLR Amethyst: Earned 1,000,000 credits (1,357,780)PPS LLR Turquoise: Earned 5,000,000 credits (6,052,646)PSP LLR Silver: Earned 100,000 credits (113,391)SoB LLR Gold: Earned 500,000 credits (651,117)SR5 LLR Amethyst: Earned 1,000,000 credits (1,712,028)SGS LLR Amethyst: Earned 1,000,000 credits (1,546,160)TRP LLR Amethyst: Earned 1,000,000 credits (1,864,098)Woodall LLR Gold: Earned 500,000 credits (624,562)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (23,418,974)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,036,945)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (83,631,931)PPS Sieve Double Jade: Earned 10,000,000,000 credits (11,412,507,016)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (46,622,519)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (32,704,392)AP 26/27 Sapphire: Earned 20,000,000 credits (30,670,198)WW Amethyst: Earned 1,000,000 credits (1,040,000)GFN Double Bronze: Earned 100,000,000 credits (134,000,297)
Message 83267 - Posted: 8 Feb 2015 | 17:57:50 UTC - in response to Message 83264.

But, I saw that when running ESP, my 770 gpu's run much longer per unit.

gazzyk1ns
Send message
Joined: 2 Aug 11
Posts: 1321
ID: 107047
Credit: 357,084,984
RAC: 0
321 LLR Amethyst: Earned 1,000,000 credits (1,446,317)Cullen LLR Ruby: Earned 2,000,000 credits (2,100,744)ESP LLR Turquoise: Earned 5,000,000 credits (5,192,394)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,159,526)PPS LLR Amethyst: Earned 1,000,000 credits (1,307,875)PSP LLR Jade: Earned 10,000,000 credits (10,408,711)SoB LLR Sapphire: Earned 20,000,000 credits (22,944,219)SR5 LLR Ruby: Earned 2,000,000 credits (3,467,485)SGS LLR Amethyst: Earned 1,000,000 credits (1,085,939)TRP LLR Turquoise: Earned 5,000,000 credits (5,630,136)Woodall LLR Ruby: Earned 2,000,000 credits (2,035,102)321 Sieve (suspended) Ruby: Earned 2,000,000 credits (2,146,795)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,190,376)Generalized Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (2,651,496)PPS Sieve Double Silver: Earned 200,000,000 credits (221,565,496)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,591,642)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,819,072)AP 26/27 Ruby: Earned 2,000,000 credits (2,150,876)GFN Sapphire: Earned 20,000,000 credits (22,745,636)PSA Sapphire: Earned 20,000,000 credits (38,445,145)
Message 83270 - Posted: 8 Feb 2015 | 21:50:32 UTC

The 4960X is an Ivy Bridge-E. Not that it's not a great CPU or anything, but for longer LLU tests especially, it will lose some ground to a Haswell because of the lack of FMA.

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 83276 - Posted: 9 Feb 2015 | 13:18:12 UTC - in response to Message 83249.
Last modified: 9 Feb 2015 | 13:24:08 UTC

Ok, took me a while but here's a summary table of the results.



This one shows the values as described in my 1st post showing how much "faster" the first listed item is compared to the 2nd. Except when comparing the cores, 4 cores were active when doing these comparisons.

In short, once you get to the bigger units, the ram performance starts making a significant contribution. The ram bandwidth is not the sole limiting factor though, as you would expect to see 20% improvement from the faster ram. Presumably for smaller units the cache and/or lower ram usage is not a bottleneck.

A lot of Haswell's performance has been attributed to FMA instructions. For smaller units, for sure there is a big improvement compared to "only" running AVX, although some difference may also be attributed to the AVX case being run in 32-bit mode in this test case. It is also interesting that there seems to be a cutover point where FMA provides a benefit and where ram starts making a bigger difference.

Regardless of the above, in most cases Haswell is noticeably faster than my Sandy Bridge box which runs at a higher CPU clock, but lower ram clock. It may be speculated if putting in faster ram might close the gap between them. I'm not sure it is valid to do so, but if I subtract the ram difference for Haswell with the Haswell vs. Sandy Bridge results, we're still looking at around 20% faster performance for the longer units.




This table shows the relative throughput from running a different number of instances of a given project. For the smallest units e.g. PPSE and SGS, scaling is near perfect. 4 instances = 4x the throughput. We see a slowdown with MEGA where 4 instances = 3x throughput, and give or take some variation this remains with the bigger projects.

I think this was of some strategic importance during the YOTS challenge, as some found by not running all cores, they could speed up some units in order to get them within a deadline. So for minimal runtime cases, running a reduced number of cores will help.

Miklos M.
Send message
Joined: 14 Apr 12
Posts: 365
ID: 138455
Credit: 11,784,946,938
RAC: 9,489,930
Discovered 1 mega prime321 LLR Amethyst: Earned 1,000,000 credits (1,761,944)Cullen LLR Gold: Earned 500,000 credits (800,070)ESP LLR Amethyst: Earned 1,000,000 credits (1,357,780)PPS LLR Turquoise: Earned 5,000,000 credits (6,052,646)PSP LLR Silver: Earned 100,000 credits (113,391)SoB LLR Gold: Earned 500,000 credits (651,117)SR5 LLR Amethyst: Earned 1,000,000 credits (1,712,028)SGS LLR Amethyst: Earned 1,000,000 credits (1,546,160)TRP LLR Amethyst: Earned 1,000,000 credits (1,864,098)Woodall LLR Gold: Earned 500,000 credits (624,562)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (23,418,974)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,036,945)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (83,631,931)PPS Sieve Double Jade: Earned 10,000,000,000 credits (11,412,507,016)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (46,622,519)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (32,704,392)AP 26/27 Sapphire: Earned 20,000,000 credits (30,670,198)WW Amethyst: Earned 1,000,000 credits (1,040,000)GFN Double Bronze: Earned 100,000,000 credits (134,000,297)
Message 83334 - Posted: 13 Feb 2015 | 16:00:11 UTC - in response to Message 83264.

I found that so far the fastest comparisons are when I crunch SGS, the Haswell runs them about 20% faster.

Profile NeoProject donor
Volunteer tester
Avatar
Send message
Joined: 28 Oct 10
Posts: 710
ID: 71509
Credit: 91,178,992
RAC: 0
321 LLR Ruby: Earned 2,000,000 credits (2,050,605)Cullen LLR Silver: Earned 100,000 credits (104,070)ESP LLR Amethyst: Earned 1,000,000 credits (1,018,623)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (198,013)PPS LLR Amethyst: Earned 1,000,000 credits (1,642,058)PSP LLR Gold: Earned 500,000 credits (897,619)SoB LLR Amethyst: Earned 1,000,000 credits (1,683,694)SR5 LLR Amethyst: Earned 1,000,000 credits (1,002,127)SGS LLR Gold: Earned 500,000 credits (580,961)TRP LLR Amethyst: Earned 1,000,000 credits (1,025,151)Woodall LLR Silver: Earned 100,000 credits (123,136)321 Sieve (suspended) Bronze: Earned 10,000 credits (20,061)Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,074,349)Generalized Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (167,039)PPS Sieve Sapphire: Earned 20,000,000 credits (45,234,841)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Silver: Earned 100,000 credits (108,377)TRP Sieve (suspended) Silver: Earned 100,000 credits (100,605)AP 26/27 Amethyst: Earned 1,000,000 credits (1,115,868)GFN Jade: Earned 10,000,000 credits (14,691,481)PSA Jade: Earned 10,000,000 credits (18,340,315)
Message 83337 - Posted: 14 Feb 2015 | 1:43:17 UTC - in response to Message 83276.

Mackerel,

Have you disabled turbo boost on the CPU?

Are you running at a set ghz via bios?

My i5 will fluctuate from 3.6 ghz to 3.9 ghz depending on temps.

Neo

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 83345 - Posted: 14 Feb 2015 | 9:29:54 UTC - in response to Message 83337.

As described in the 1st post I fixed the running speed to 3.2 GHz during this test.

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 83672 - Posted: 2 Mar 2015 | 17:49:56 UTC
Last modified: 2 Mar 2015 | 17:51:20 UTC

Finally got around to doing some sieve testing. I'm running only "The Sierpinski Problem ESP/PSP/SoB Sieve (ESP-Sieve)" because it is a CPU only project, plus it is the last active project for me to get to all gold badges :)

I'm testing it on 4 of my boxes since they're the only ones with 64-bit OSes, and my older ones are still 32-bit.

Normalised for clock speed, their relative per-thread performance is as follows: (value is results per day, per thread, per GHz)
Q6600 (4C): 14.6
X6 1055T (6C): 14.7
i5-4570S (4C): 21.5
i7-2600 (4C, 4T): 17.0
i7-2600 (4C, 8T): 11.7

This gives an indication of their relative efficiency. The older generation CPUs are about the same, with Sandy Bridge and later Haswell gaining some improvements on the way.

Hyper-Threading is a way to gain throughput so the drop in per-thread performance is expected, but the question is how much throughput does it actually give?

Multiplied by the number of running threads, we get 68.1 and 93.8 for HT off and on respectively, or we get 38% more throughput with HT running.

Side note: the best throughput performance boost I ever saw from HT was 50% for a project called Life Mapper. That was in the Pentium 4 era when HT first came around.

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 85894 - Posted: 4 Jun 2015 | 12:38:57 UTC

While reading about the desktop Broadwell parts I was interested to see Crystal Well, which for our purposes is a 128MB L4 cache inside the processor package. On further looking at Crystal Well, I was surprised to see it was already implemented in Haswell parts! Why hadn't I seen it previously? Because it was only used in high end mobile parts, not desktop parts. But with Broadwell, desktop chips get some love too.

Why might this be of interest? As shown previously in this thread, when running multiple tasks on a CPU, it isn't hard to run out of cache, which in turns results in slower memory access. Running tasks 4 cores doesn't get you 4x the throughput of one core except in smallest tasks like SGS and PPSE. For bigger tasks you might only get 3x the performance when running 4 tasks. Taken to an extreme, during the recent FPS challenge on PRPnet, I found even worse scaling there. Three instances actually gave more throughput than running 4 instances.

Faster memory could help mitigate these effects, but practically speaking without some serious overclocking there isn't much to be done.

Enter Crystal Well, 128MB is considerably bigger than the existing 6 to 8 MB L3 caches of upper mainstream desktop parts. If the working data could sit inside that, the memory bandwidth would once again reduce in performance. It isn't clear just how much bandwidth or latency is associated with Crystal Well, but I've seen a figure given as ~50GB/s. This compares with about 25.5GB/s for dual channel DDR3-1600. I've also not be able to clearly quantify the L3 cache bandwidth, but it seems to be in the ball park of just under 200GB/s for Intel processors of recent generations.

So it will be hard to say how much improvement this might bring short of testing it in practice. Might not be me testing it as I'm not planning on another build this generation.

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 86825 - Posted: 26 Jul 2015 | 11:22:30 UTC

Today I downloaded the latest version of Prime95 (28.5) and tried the benchmark on it. I was curious if they might be useful as an indicator of performance for tasks here.

The first part of the benchmark runs a sequence of different FFT sizes and threads/cores combinations. FFT sizes range from 1024k to 8192k, with one thread per core. For processors with hyper-threading, it also appears to test 2 threads per core for 1 core and max cores. For indication, I grabbed a SoB unit to see what FFT size that is running, and the one I got was 2880k.

Benchmarks were run on my Haswell and Sandy Bridge systems under normal conditions. Specifically, turbo mode is enabled so there will be clock differences depending on the number of active cores.

To simplify the results, I wont go into detail. There are some mild effects depending on FFT size, and also some difference between the best and average results. These are minor for the purposes of relative comparisons I'm doing.

Let's get an easy one out of the way, hyper-threading. We all know it is not beneficial for this work, and the results here confirm that. The performance dropped some 10% as overall average, which is actually more than I though. Perhaps this is due to the bigger FFT sizes than I was looking at previously, where the penalty is less.

Core scaling: For Haswell, relative to 1 core, the throughput for 2, 3, and 4 cores are approximately 1.9x, 2.5x, 2.8x respectively. Similarly for SB we get 1.9x, 2.6x, 3.0x. It is interesting the performance drop as the number of cores goes up is less on SB than Haswell. We could speculate on the reasons. Perhaps it is due to the bigger L2 cache of SB, or is the higher performance of the Haswell cores pushing the limit externally to the CPU cores?

Haswell vs. SB: my Haswell box is about 32% faster than SB for a single thread, and averaging perhaps 25% with 4 cores running. This is not clock normalised, and the Haswell box is a lower nominal clock than my SB box.

A question for those more familiar with Prime95 than I am: does it use the multiple threads on the same work, or is each thread doing different work? There could be some saving in cache and memory bandwidth usage if they were working on the same unit, but I'm not good enough at mathematics or programming to look at that further.

Profile Michael GoetzProject donor
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 21 Jan 10
Posts: 13434
ID: 53948
Credit: 232,451,827
RAC: 163,944
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,822,730)Cullen LLR Ruby: Earned 2,000,000 credits (3,582,762)ESP LLR Turquoise: Earned 5,000,000 credits (5,021,269)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Jade: Earned 10,000,000 credits (16,008,485)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (34,291,181)SR5 LLR Jade: Earned 10,000,000 credits (10,007,110)SGS LLR Ruby: Earned 2,000,000 credits (2,291,098)TRP LLR Turquoise: Earned 5,000,000 credits (5,084,329)Woodall LLR Ruby: Earned 2,000,000 credits (2,887,264)321 Sieve (suspended) Jade: Earned 10,000,000 credits (10,061,196)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (22,885,121)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,118,303)WW Silver: Earned 100,000 credits (184,000)GFN Emerald: Earned 50,000,000 credits (75,106,275)PSA Jade: Earned 10,000,000 credits (12,445,029)
Message 86826 - Posted: 26 Jul 2015 | 12:00:01 UTC

The reason you saw a bigger drop off in multi-core performance with Haswell as compared to SB is that the Haswell CPUs are significantly faster than SB because of the FMA3/AVX2 instructions that were added in Haswell.

The increase in memory speed between the architectures isn't enough to keep up with the speed increase in the CPUs, so as you increase the number of cores, the memory increasingly can't keep up with the CPUs. That's the cause of the reduced efficiency of adding more cores. Since Haswell is so much faster than SB, the effect of waiting for memory is more pronounced.

If you ran the tests with faster memory, either by using faster ram or on a quad-channel system, your results would have been different.
____________
My lucky number is 75898524288+1

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 86933 - Posted: 4 Aug 2015 | 22:01:36 UTC

http://www.anandtech.com/show/9482/intel-broadwell-pt2-overclocking-ipc/3

Continuing on the topic of cache vs. memory, this page of Anandtech's Broadwell testing includes an interesting graph showing the latency vs. data set size for recent generations of Intel processors. They have normalised CPU clock to 3 GHz and ram to 1866 C9 so the difference is due to the architecture. The improvement from Broadwell's effective L4 cache (where available) is visible picking up where L3 cache runs out, giving another performance step before being ram limited. It isn't <L3 cache fast, nor main memory slow.

Is there a simple way to calculate the data set size (in MB) given a FFT size? I just want to get a better feel for how different multiple instances of LLR tests may fit in the various caches.

axn
Volunteer developer
Send message
Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0
321 LLR Gold: Earned 500,000 credits (656,177)Cullen LLR Silver: Earned 100,000 credits (457,297)ESP LLR Silver: Earned 100,000 credits (437,502)PPS LLR Amethyst: Earned 1,000,000 credits (1,256,334)SoB LLR Amethyst: Earned 1,000,000 credits (1,246,109)SR5 LLR Ruby: Earned 2,000,000 credits (2,193,129)SGS LLR Silver: Earned 100,000 credits (101,658)Woodall LLR Silver: Earned 100,000 credits (292,129)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (32,498)PPS Sieve Jade: Earned 10,000,000 credits (15,515,229)AP 26/27 Amethyst: Earned 1,000,000 credits (1,491,867)GFN Ruby: Earned 2,000,000 credits (4,245,870)PSA Silver: Earned 100,000 credits (101,309)
Message 86936 - Posted: 5 Aug 2015 | 3:07:26 UTC - in response to Message 86933.

Is there a simple way to calculate the data set size (in MB) given a FFT size?

FFT size * 8 bytes (+ overheads).

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2449
ID: 29980
Credit: 424,561,292
RAC: 318,249
Discovered 4 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,903,451)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (78,410,078)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (19,866,589)Woodall LLR Turquoise: Earned 5,000,000 credits (8,146,745)321 Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)WW Gold: Earned 500,000 credits (692,000)GFN Emerald: Earned 50,000,000 credits (71,894,447)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 86937 - Posted: 5 Aug 2015 | 10:42:06 UTC - in response to Message 86936.

Thanks. So to take my previous example of a SoB unit, that was 2880k FFT.

2880k * 8 = 23040k = 22.5MB?

Is that right?

Is the 8 bytes (64 bit) from the size of each piece of data in the used instructions? I've probably worded that badly but I hope the intent is there.

axn
Volunteer developer
Send message
Joined: 29 Dec 07
Posts: 285
ID: 16874
Credit: 28,027,106
RAC: 0
321 LLR Gold: Earned 500,000 credits (656,177)Cullen LLR Silver: Earned 100,000 credits (457,297)ESP LLR Silver: Earned 100,000 credits (437,502)PPS LLR Amethyst: Earned 1,000,000 credits (1,256,334)SoB LLR Amethyst: Earned 1,000,000 credits (1,246,109)SR5 LLR Ruby: Earned 2,000,000 credits (2,193,129)SGS LLR Silver: Earned 100,000 credits (101,658)Woodall LLR Silver: Earned 100,000 credits (292,129)Cullen/Woodall Sieve (suspended) Bronze: Earned 10,000 credits (32,498)PPS Sieve Jade: Earned 10,000,000 credits (15,515,229)AP 26/27 Amethyst: Earned 1,000,000 credits (1,491,867)GFN Ruby: Earned 2,000,000 credits (4,245,870)PSA Silver: Earned 100,000 credits (101,309)
Message 86941 - Posted: 5 Aug 2015 | 13:22:15 UTC - in response to Message 86937.

Thanks. So to take my previous example of a SoB unit, that was 2880k FFT.

2880k * 8 = 23040k = 22.5MB?

Is that right?

Is the 8 bytes (64 bit) from the size of each piece of data in the used instructions? I've probably worded that badly but I hope the intent is there.

You got it. 8 bytes is the size of a "double" (IEEE 754 64-bit double precision floating point)

Message boards : Number crunching : Haswell performance testing

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 3.72, 3.98, 3.98
Generated 30 Nov 2020 | 19:56:37 UTC