Author |
Message |
darkclown Volunteer tester Send message
Joined: 3 Oct 06 Posts: 333 ID: 3605 Credit: 1,551,694,411 RAC: 550,408
                          
|
I know this has been discussed in the Number Crunching thread some, but I think I'm seeing it specifically w/ Cullen WUs. Does Cullen suffer from the cache miss issue with Intel Core i7 with HT enabled? WUs on my i5-2400 system are taking much less time than on my i7-3820 w/ HT.
i5 WU: http://www.primegrid.com/workunit.php?wuid=336493532 (112k sec)
i7 WU: http://www.primegrid.com/workunit.php?wuid=336464178 (158k sec)
____________
My lucky #: 60133106^131072+1 (GFN 17-mega) |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3245 ID: 50683 Credit: 152,646,050 RAC: 24,511
                         
|
HT is just enabled or HT is fully used?
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
darkclown Volunteer tester Send message
Joined: 3 Oct 06 Posts: 333 ID: 3605 Credit: 1,551,694,411 RAC: 550,408
                          
|
Fully used. I'm running on all 8 cores.
An aside, how would I configure PG to not use *all* of the HT cores, but partial? Set to 3/4 CPUs (and thus 6/8 cores?)
____________
My lucky #: 60133106^131072+1 (GFN 17-mega) |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14036 ID: 53948 Credit: 475,889,971 RAC: 246,026
                               
|
I know this has been discussed in the Number Crunching thread some, but I think I'm seeing it specifically w/ Cullen WUs. Does Cullen suffer from the cache miss issue with Intel Core i7 with HT enabled? WUs on my i5-2400 system are taking much less time than on my i7-3820 w/ HT.
i5 WU: http://www.primegrid.com/workunit.php?wuid=336493532 (112k sec)
i7 WU: http://www.primegrid.com/workunit.php?wuid=336464178 (158k sec)
Probably, but you should test it yourself. See how long tasks take when you're running 4, and see how long they take when you're running 8. If the 8-way tests take noticeably longer than than double the 4 way tests, then it's likely the cause is increased cache misses.
Regardless of the cause, if 8-way processing takes more than twice as long as 4-way processing, it's better to do 4-way.
____________
My lucky number is 75898524288+1 |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3245 ID: 50683 Credit: 152,646,050 RAC: 24,511
                         
|
i5 WU: http://www.primegrid.com/workunit.php?wuid=336493532 (112k sec)
i7 WU: http://www.primegrid.com/workunit.php?wuid=336464178 (158k sec)
This is great result. You belong to only few peoples that have increased performance with HT = ON . Many others got drop in performance...
In your case
112000 /4 = 28000 per WU
158000 /8 = 19750 per WU.
So ideally will be 14000 per WU, but you result is near perfect. As long as you with HT enabled compute below 56000 per WU you have performance increased...
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
|
you should probably find it's faster per wu to use only 6/8.
You belong to only few peoples that have increased performance with HT = ON .
have you tried for yourself to see what option gives the best times per wu? |
|
|
darkclown Volunteer tester Send message
Joined: 3 Oct 06 Posts: 333 ID: 3605 Credit: 1,551,694,411 RAC: 550,408
                          
|
you should probably find it's faster per wu to use only 6/8.
You belong to only few peoples that have increased performance with HT = ON .
have you tried for yourself to see what option gives the best times per wu?
I may try later, to see if 3/4 CPUs (6/8 cores) gives me better time, but it's not a big deal. The only thing I really note is that when running 8/8 along w/ a PPS Sieve task, one of the Cullen (or any CPU-based) WUs suffers accordingly.
|
|
|
|
you should probably find it's faster per wu to use only 6/8.
You belong to only few peoples that have increased performance with HT = ON .
have you tried for yourself to see what option gives the best times per wu?
I may try later, to see if 3/4 CPUs (6/8 cores) gives me better time, but it's not a big deal. The only thing I really note is that when running 8/8 along w/ a PPS Sieve task, one of the Cullen (or any CPU-based) WUs suffers accordingly.
that was really aimed at crunchi as he keeps saying that ht is slow for everyone but I don't think he's ever tried.
8/8 (or 12/12) is definitely slower per wu than 6/8 (9/12) on all my pcs - i7 970, 2600 and 3770. |
|
|
|
darkclown's i5 and i7 are not comparable, simply by measuring HT "on" vs. "off". The i5-2400 is "mid range", while the i7-3820 is "extreme", with 0.5G faster clocks and 4GB more L3 cache.
I ran another test. This is an extension of a test I made earlier this month comparing the cache-miss effect on Sandy vs. Ivy. I posted about this on the AtP message board. At that time I was comparing the two processors, and not worrying about hyperthreading ("off" for all tests). Bottom line was that Ivy seemed more severely impacted under heavy load.
This time, I'm comparing HT "off" vs. HT "on", on the Ivy Bridge 3770K. I used prpnet instances to run the exact same Woodall number (11574717*2^11574717-1, which I had earlier run in boinc and had validated) 1 through 8 times in parallel (just to a few percent, long enough to let the timing stabilize). Nothing else going on on the box, not even GPU. These are the "time per iteration" values reported by LLR (3.8.9), so lower is faster (units ms.). Ubuntu 12.04.
HT off:
1: 6.239
2: 6.657
3: 7.789
4: 9.940
HT on:
1: 6.229
2: 6.634
3: 7.649
4: 9.712
5: ~12.9
6: ~15.0
7: ~18.2
8: 20.624
The times for 5-7 are rougher averages over the various LLR instances, as you never know at any given point which one will be forced to "share", and which will have a dedicated core. The scheduler keeps them pretty well balanced when there are 4 or fewer, or 8.
Of course remember this is one machine, running one number. Your mileage may vary.
--Gary |
|
|
|
If I read that correctly it shows that 6 cores is best for you too. |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3245 ID: 50683 Credit: 152,646,050 RAC: 24,511
                         
|
If I read that correctly it shows that 6 cores is best for you too.
HT off:
1: 6.239
2: 6.657
3: 7.789
4: 9.940
HT on:
1: 6.229
2: 6.634
3: 7.649
4: 9.712
5: ~12.9/5 = 2.58
6: ~15.0/6 = 2.5
7: ~18.2/7 = 2.6
8: 20.624/8= 2.578
9.94/4=2,485 per WU when HT is off
20.624/8= 2.578 per WU when HT is on.
So what I see that result with HT is always slower. Even sweet spot of 6 WU (with HT) have average 2.5 vs 2,485 (without HT)
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3253 ID: 130544 Credit: 2,422,141,945 RAC: 3,920,887
                           
|
Apart from sieving maybe ;).
+ as the credit/time is now the same for all LLR projects, the above tests will be the same for every other subproj I would assume. |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3245 ID: 50683 Credit: 152,646,050 RAC: 24,511
                         
|
In sieving my I7-2700K got about 18% increase when using HT
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
|
If I read that correctly it shows that 6 cores is best for you too.
9.94/4=2,485 per WU when HT is off
20.624/8= 2.578 per WU when HT is on.
So what I see that result with HT is always slower. Even sweet spot of 6 WU (with HT) have average 2.5 vs 2,485 (without HT)
yes, but
6: ~15.0/6 = 2.5
and then you can have 2 cores for GPU work + whatever else you're doing on your pc (which doesn't use anywhere near as much cpu as if they were doing llr) while only being slightly slower overall/wu than using just 4 cores for CPU only work.
|
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3245 ID: 50683 Credit: 152,646,050 RAC: 24,511
                         
|
In that case, you are right :)
Since I do just LLR-ing I put HT off :)
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|