PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147570)
Posted 42 days ago by Profile roadrunner_gsProject donor
So far (seconds per WU, Throughput, Speedup compared to 1 WU@1Thread).



The Gold 52xx is severly hampered by its cache (or the lack thereof).
2) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147444)
Posted 48 days ago by Profile roadrunner_gsProject donor
Ah no problem. 3.8.24 seems to be from July 2020.
I have an usb-stick for my test-runs, on it is version 3.8.21 (see my post below or above, depending on sort-order).
Since i have a fair amount of test-data for "697*2^530150+1" for different CPUs i would not change that for the time being. (also regarding the fact i did runs for days worth now with that version).

initial quickrun with 3.8.24 the Gold 6254:

1WU@18 threads decidedly slower @ 0.692 ms per bit (16408817)
1WU@4 threads decidedly faster @ 0.930 ms per bit
3) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147438)
Posted 48 days ago by Profile roadrunner_gsProject donor
Does the llr make use of the AVX512?
I only see "using FMA3" or the like in the output.
4) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147395)
Posted 51 days ago by Profile roadrunner_gsProject donor
Yeah, the Xeon Gold 5217 (8 Cores @ 3.0 GHz Turbo; 11 MB L3) is currently working on 8 WUs in parallel with a "Time per it: 16.570 ms", that is a lot slower than the E5-2690 with 9.317 ms and a lot slower than the Xeon Gold 6254.
It is running with 3.0 GHz, but only drawing 68 W as per rapl, whereas it was drawing 85 W (max TDP) when doing one WU with 8 threads.
Projected finishing time is around 3 days, yielding a throughput of 2.54/day, throughput was 7.22/day with 1 WU@8threads and 4.52/day with 1 WU@4threads.
But Fujitsu was cheap only fitting one memory module with 32 GB instead of populating all six channels.

Will post back when finished
5) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147385)
Posted 51 days ago by Profile roadrunner_gsProject donor
Hi, are you talking L2 or L3 CPU cache? I ask because my CPU has only 2MB of L2 but 45MB L3. Should I be calculating for L2 or L3?

Generally speaking, it's L3, but there is an "it depends" based on cache inclusiveness. AMD and consumer Intel chips store a copy of L2 in L3, so only L3 matters (and in Ryzen 3k/5k, it's L3 per CCX).

Zen L3 cache is a victim cache then the size is L2 + L3... no?


If long running "heavy" tasks and non-inlusive, then almost yes.
6) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147381)
Posted 51 days ago by Profile roadrunner_gsProject donor
Ich konnte noch folgende CPUs auftreiben
Xeon Gold 6128 (6 Kerne @ 3.6 GHz Turbo; 19.25 MB L3)
Xeon Gold 5217 (8 Kerne @ 3.0 GHz Turbo; 11 MB L3)
Xeon E7-4880 v2 (15 Kerne @ ? Turbo; 37.5 MB L3)

let's see...
7) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147375)
Posted 52 days ago by Profile roadrunner_gsProject donor
So here goes:
Xeon Gold 6254, 1 WU@1 Thread:
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 62610.163 sec.

Xeon Gold 6254, 4 WU@1 Thread:
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 63368.355 sec.
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 63656.892 sec.
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 63385.569 sec.
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 63500.971 sec.

Xeon Gold 6254, 1 WU@4 Threads:
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 18658.265 sec.

Xeon E5-2690, 4 WU@1 Thread:
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 103037.590 sec.
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 103037.493 sec.
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 103038.159 sec.
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 103036.720 sec.

Xeon E5-2690, 1 WU@2Thread:
3*2^16408818+1 is prime! (4939547 decimal digits) Time : 26465.674 sec.

Throughput/day (4WU@1T vs 1WU@4T):
Xeon Gold 6254 => 5.24 vs 4.63
xeon E5-2690 => 3.35 vs 3.25

I will go for 8WUs@1T vs 1WU@8T now.
8) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147331)
Posted 53 days ago by Profile roadrunner_gsProject donor
When those WUs are finished i will give the last found 321-prime ( 3*2^1832496+1) a shot

Do you mean 3*2^16408818+1?


Yes, my bad, wrong buffer ([ctrl]-[v] vs middle mouse button).
The code has the right one, obviously
9) Message boards : 321 Prime Search : Multi-threading: Max # of threads for each task Setting? (Message 147325)
Posted 53 days ago by Profile roadrunner_gsProject donor
I don't know.
Have not done extensive testing for now since the runtime of WUs forbids this, but i currently have following 4 WUs, two running with 2 threads per WU (the upper two), 2 running with 1 thread per WU (the lower two).



this is a Xeon E5-2690 with 8 Cores, 256kB L2-Cache per Core and 20 MB shared L3-Cache for the CPU, so yes this woul be above the

one could see that the 2-threaded WUs are slower than single-threaded ones.

When those WUs are finished i will give the last found 321-prime ( 3*2^1832496+1) a shot (since i know the outcome, but it is longer still) and run this 8 time in parallel versus 1-time 8-threaded, 2-times 4-threaded, 4-times 2-threaded.

Got a Xeon Gold 6254 with 1 ML L2-Cache per Core and 24.75 MB L3-Cache per CPU, will run it on that one, too, baseline single-threaded fired up for now, seems to be running at 3.4 GHz.
With 4 processes in parallel it should need 30 MB Cache, since the L3-Cache is a non-inclusive-victim-cache (same as in my E5-2690 above) it should utilize 28.75 MB of Cache and as per Specification Update should still run at 3.4 GHz.
I bind it to the second node to avoid cache-copies between nodes as well as foreign-node-ram-access.
Hyperthreading is not deactivated, but you have to use what you have got.

$ numactl -N 1 ./llr64 -d -q"3*2^16408818+1" -a1 & disown $ Starting Proth prime test of 3*2^16408818+1 Using all-complex FMA3 FFT length 960K, Pass1=384, Pass2=2560, a = 5 3*2^16408818+1, bit: 30000 / 16408819 [0.18%]. Time per bit: 3.825 ms.
10) Message boards : AP26 - AP27 Search : New feature with latest AP27 gpu application (Message 126514)
Posted 749 days ago by Profile roadrunner_gsProject donor
Hello
The APP27-version 2.03 is not compatible with Red Hat Enterprise Linux 6 any more.
My rig http://www.primegrid.com/results.php?hostid=222147&offset=0&show_names=0&state=7&appid=11 is failing with
../../projects/www.primegrid.com/primegrid_ap27_2.03_x86_64-pc-linux-gnu__OCL_cuda_AP27: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ../../projects/www.primegrid.com/primegrid_ap27_2.03_x86_64-pc-linux-gnu__OCL_cuda_AP27) ../../projects/www.primegrid.com/primegrid_ap27_2.03_x86_64-pc-linux-gnu__OCL_cuda_AP27: /lib64/libc.so.6: version `GLIBC_2.17' not found (required by ../../projects/www.primegrid.com/primegrid_ap27_2.03_x86_64-pc-linux-gnu__OCL_cuda_AP27)


Only found out now.

Since Red Hat Enterprise Linux 6 (or Centos/Scientific Linux) is still in the wild and supported until July 2024, would it be possible to get this running with glibc_2.12?


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2021 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 4.70, 5.19, 5.45
Generated 26 Feb 2021 | 11:58:42 UTC