PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
11) Message boards : Generalized Fermat Prime Search : Multi-threaded GFN-19 3.4.0.2 (Message 155579)
Posted 77 days ago by pascaltec
Tasks returned in the last 24 hours

Setup: PrimeGrid Preferences / Job Control and Multi-threading / Multi-threading: Max # of threads for each Task Result: Throughput (Tasks/day) = 86,400 (s/day) / average Elapsed time (s) x Concurrent Tasks (Maximum # of simultaneous PrimeGrid Tasks) V V SETUP RESULT Date Time Sub-project Host Concurrent threads Tasks Firsts First% Send/receive duration Elapsed time Throughput CPU time SMT Tasks /Task Done average/ minimum/maximum average/ minimum/maximum Tasks/day average/ minimum/maximum 23/05 07:00 Genefer 19 valtin gpu 1 0 1 0 0.00 26,918 / 26,918 / 26,918 15,153 / 15,153 / 15,153 5.7 267 / 267 / 267 23/05 07:00 Genefer 19 valtin off 4 Tasks 3 thrds 4 4 100.00 13,365 / 13,312 / 13,420 13,356 / 13,303 / 13,411 25.9 4Tx3t 39,976 / 39,857 / 40,115 23/05 07:00 Genefer 19 valtin off 2 Tasks 6 thrds 8 4 50.00 9,530 / 6,868 / 14,399 7,072 / 6,814 / 7,380 24.4 2Tx6t 42,198 / 40,803 / 43,927 24/05 07:00 Genefer 19 valtin gpu 1 0 2 2 100.00 42,410 / 35,418 / 49,402 15,139 / 15,098 / 15,180 5.7 253 / 249 / 257 24/05 07:00 Genefer 19 valtin off 4 Tasks 3 thrds 8 8 100.00 13,565 / 13,312 / 14,049 13,555 / 13,303 / 14,038 25.5 4Tx3t 40,520 / 39,857 / 41,912 24/05 07:00 Genefer 19 valtin off 2 Tasks 6 thrds 2 2 100.00 6,799 / 6,793 / 6,805 6,784 / 6,779 / 6,789 25.5 2Tx6t 40,630 / 40,616 / 40,645 24/05 07:00 Genefer 17 valtin off 12 Tsks 1 thrd 12 2 16.67 2,281 / 2,250 / 2,321 2,262 / 2,242 / 2,280 458 12Tx1t 2,256 / 2,237 / 2,270 24/05 07:00 Genefer 17 valtin off 6 Tasks 2 thrds 6 2 33.33 1,351 / 1,330 / 1,386 1,293 / 1,276 / 1,316 401 6Tx1t 2,578 / 2,544 / 2,625 24/05 07:00 Genefer 16 valtin off 12 Tsks 1 thrd 12 12 100.00 559 / 536 / 714 524 / 520 / 526 1978 12Tx1t 520 / 519 / 521 24/05 07:00 Genefer 16 valtin off 6 Tasks 2 thrds 6 5 83.33 377 / 358 / 391 311 / 310 / 312 1667 6Tx2t 618 / 614 / 620 24/05 07:00 Genefer 16 valtin off 4 Tasks 3 thrds 6 2 33.33 322 / 239 / 462 220 / 218 / 223 1571 4Tx3t 653 / 645 / 663

The GPU decreases the throughput:
without GPU: 2Tx6t 25.5 Tasks/day
or with GPU: 2Tx6t 24.4 Tasks/day

The GPU task and/or any other task take CPU time from any core from time to time because
cores are NOT reserved exclusively by boinc or genefer even if genefer threads are pinned !
Even if only one thread is slightly slowed down, the loss of time unbalances parallel execution.
The scalability is perfect without any perturbation (including background and users tasks).
Use another device to navigate in PrimeGrid Challenge Stats, to hear music, to calculate stats, ...
12) Message boards : Problems and Help : Is there a table or chart that shows the optimal CPU cache for each project? (Message 155574)
Posted 77 days ago by pascaltec
De toute évidence, la question a été analysée en profondeur ! Et ces optimisations sont peu communes ... dans mon domaine.
Par pure curiosité, j'ai regardé rapidement la portion multi-threadée du source de genefer et l'article de 2015.
https://www.researchgate.net/publication/307642059_Genefer_Programs_for_Finding_Large_Probable_Generalized_Fermat_Primes

La méthodologie ci-dessus est-elle exposée dans un article complémentaire à celui de 2015 (celui de 2014) ?

Mes outils sont dérivés des travaux suivants, mais ne sont pas optimisés subtilement comme les votres.
https://www.researchgate.net/scientific-contributions/Paul-N-Swarztrauber-70757609
https://sites.ecmwf.int/docs/atlas/getting_started/install_transi/
13) Message boards : Generalized Fermat Prime Search : Multi-threaded GFN-19 3.4.0.2 (Message 155570)
Posted 78 days ago by pascaltec
Tasks returned in the last 24 hours
Setup: PrimeGrid Preferences / Job Control and Multi-threading / Multi-threading: Max # of threads for each Task Result: Throughput (Tasks/day) = 86,400 (s/day) / average Elapsed time (s) x Concurrent Tasks (Maximum # of simultaneous PrimeGrid Tasks) V V SETUP RESULT Date Time Sub-project Host Concurrent threads Tasks Firsts First% Send/receive duration Elapsed time Throughput CPU time SMT Tasks /Task Done average/ minimum/maximum average/ minimum/maximum Tasks/day average/ minimum/maximum 23/05 07:00 Genefer 19 valtin gpu 1 0 1 0 0.00% 26,918 / 26,918 / 26,918 15,153 / 15,153 / 15,153 5.7 267 / 267 / 267 23/05 09:00 Genefer 19 valtin off 4 Tasks 3 thrds 4 4 100.00 13,365 / 13,312 / 13,420 13,356 / 13,303 / 13,411 25.9 4Tx3t 39,976 / 39,857 / 40,115 23/05 07:00 Genefer 19 valtin off 2 Tasks 6 thrds 8 4 50.00% 9,530 / 6,868 / 14,399 7,072 / 6,814 / 7,380 24.4 2Tx6t 42,198 / 40,803 / 43,927

Less threads seems to yield better throughput ... to be confirmed.
14) Message boards : Generalized Fermat Prime Search : Multi-threaded GFN-19 3.4.0.2 (Message 155569)
Posted 78 days ago by pascaltec
Reminder (as explained on the Geek Pride Day Challenge thread (everything is named thread (!)))
If you don't know how to (or don't want to) disable Symmetric MultiThreading,
you can setup in PrimeGrid Preferences / Job Control and Multi-threading the following two parameters:
Multi-threading: Max # of threads for each task = a quarter of the total number of (hyperthreaded) cores = 6 if 24 hyperthreaded cores for 12 physical cores
Max # of simultaneous PrimeGrid tasks = half of the total number of (hyperthreaded) cores divided by the number of threads per Task = 2 if 24 hyperthreaded cores for 12 physical cores
15) Message boards : Generalized Fermat Prime Search : Multi-threaded GFN-19 3.4.0.2 (Message 155567)
Posted 78 days ago by pascaltec
Tasks returned in the last 24 hours
SETUP: PrimeGrid Preferences / Job Control and Multi-threading / Multi-threading: Max # of threads for each Task RESULT: Throughput (Tasks/day) = 86,400 (s/day) / average Elapsed time (s) x Concurrent Tasks (number of Tasks running at the same time) V V SETUP RESULT Date Time Sub-project Host Concurrent threads Tasks Firsts First% Send/receive duration Elapsed time Throughput CPU time SMT Tasks /Task Done average/ minimum/maximum average/ minimum/maximum Tasks/day average/ minimum/maximum 23/05 07:00 Genefer 19 valtin gpu 1 0 1 0 0.00% 26,918 / 26,918 / 26,918 15,153 / 15,153 / 15,153 5.7 267 / 267 / 267 23/05 07:00 Genefer 19 valtin off 2 Tasks 6 thrds 8 4 50.00% 9,530 / 6,868 / 14,399 7,072 / 6,814 / 7,380 24.4 2Tx6t 42,198 / 40,803 / 43,927
16) Message boards : Problems and Help : Is there a table or chart that shows the optimal CPU cache for each project? (Message 155565)
Posted 78 days ago by pascaltec
https://www.ecmwf.int/sites/default/files/medialibrary/2022-01/NL-170-C3-DellAcqua-Figure-2.jpg

I am surprised that this computing facility doesn't use hot aisle or cool aisle design to increase energy efficiency.
https://datacenterenclosure.com/hot-and-cold-aisles-in-your-data-center-what-to-know/

The answer is watercooling. Some sort of geeks' inspired watercooling ... at another scale !

The data centre cooling system relies on three sources, operating alternately and together, based on the season: geothermal wells, adiabatic dry coolers and chillers.
https://www.ecmwf.int/en/newsletter/170/computing/ecmwfs-new-data-centre-italy

A heat exchanger at the bottom of the rack connects to the building’s water cooling system.
https://www.ecmwf.int/en/newsletter/163/computing/hpc2020-ecmwfs-new-high-performance-computing-facility
17) Message boards : Generalized Fermat Prime Search : Multi-threaded GFN-19 3.4.0.2 (Message 155554)
Posted 78 days ago by pascaltec
If I convert this table to throughput, I find
Threads per task: 2 3 6 12 Number of tasks: 6 4 2 1 GFN-19/day: 24.8 25.8 25.0 13.0

The fastest configuration is 4 tasks x 3 threads. But 2 x 6 is about as fast and the likelihood of being first is greater.

I agree that the four percent difference is indeed not significant in the presented tables.
As you probably observed, it is the dispersion of genefer from one sample to the other.
The table is not build on enough samples to be statistically significant. The challenge will help!

Let's conclude that the following configurations provide the same throughput:
- with SMT disabled, 2 tasks of 6 threads, 4 tasks of 3 threads and 6 tasks of 2 threads
- with SMT enabled, 4 tasks of 6 threads

Without loss of throughput, it is the choice of the cruncher to prefer (SMT disabled)
- long runtime 6 tasks of 2 threads or
- rapid return 2 tasks of 6 threads or
- a balanced option 4 tasks of 3 threads.
If SMT is enabled, the same throughput is achieved only in the case 4 tasks of 6 threads.
Notice that it takes the same wall (run) time as the balanced option 4 tasks of 3 threads!

As underlined by Yves, 2 tasks of 6 threads with SMT off provides
the minimum return time, hence maximizing the likelihood of being first !

Bonne chance à toutes et à tous pour le Geek Pride Day Challenge !
18) Message boards : Problems and Help : Is there a table or chart that shows the optimal CPU cache for each project? (Message 155538)
Posted 79 days ago by pascaltec
Does the FFT memory access pattern already maximize the cache hit ratio?

Basic FFT algorithm is not efficient but implementations of gwnum (LLR2) or genefer are optimized for modern cache hierarchy.
A simple test for beginners is the comparison on current CPUs of the recursive implementation of Cooley-Tukey algorithm and the iterative one.
The iterations are expected to run faster because function calls are replaced by indexing but this is not true because of memory access pattern.
The first memory efficient algorithm was implemented in 1989 on Cray supercomputers FFTs in External or Hierarchical Memory.
Today many algorithms exist but they are all based on the same method: replace the array of size n with a matrix ~ sqrt(n) x sqrt(n) and perform a convolution with two passes.

In modern meteorology, spherical harmonics direct and inverse transforms intensively use the FFTW library
to solve Navier-Stokes equations in the spectral domain on multiple nodes multiple sockets multiple cores.
The "ECMWF HPC2020" european facilities are dedicated to weather prediction and climate change.
FFTW especially features an auto tuning functionality to adapt to all kind of topology and architecture ...
Keeping the hands on the software implementation details is most of the time a very valuable approach too!
References:
https://www.fftw.org/ and https://events.ecmwf.int/event/169/timetable/
https://www.ecmwf.int/sites/default/files/medialibrary/2022-01/NL-170-C3-DellAcqua-Figure-2.jpg
19) Message boards : Generalized Fermat Prime Search : Multi-threaded GFN-19 3.4.0.2 (Message 155537)
Posted 79 days ago by pascaltec
@composite:
Use Environment lines instead of putting them on the ExecStart line.
Then you would not need to invoke a shell in ExecStart.
Multiple Environment lines are allowed.

[Service]
Environment="YIELD_SLEEP_TIME=1000"
Environment="LD_PRELOAD=/var/lib/boinc/libsleep.so"
ExecStart=/usr/bin/boinc
...

Rather than having a default in the source code, or in addition to the source code,
the default is contained in the systemd file as above. Have the source code error out
if the environment variable is not set or has an unreasonable value.

The user would use an override file with the command "systemctl edit boinc-client"
to set a different sleep time. The override file would contain

[Service]
Environment="YIELD_SLEEP_TIME=5000"

Setting up libsleep (update)
First option is to use an override file as proposed by composite : "systemctl edit boinc-client".
The override file avoids messing up the setup.
OR
Second option is to modify the original file of the service:

$ sudo vi /usr/lib/systemd/system/boinc-client.service
[Unit] Description=Berkeley Open Infrastructure Network Computing Client Documentation=man:boinc(1) After=network-online.target [Service] Type=simple ProtectHome=true ProtectSystem=strict ProtectControlGroups=true ReadWritePaths=-/var/lib/boinc -/etc/boinc-client Nice=10 User=boinc WorkingDirectory=/var/lib/boinc Environment="YIELD_SLEEP_TIME=1000" Environment="LD_PRELOAD=/var/lib/boinc/libsleep.so" ExecStart=/usr/bin/boinc #ExecStart=/bin/sh -c 'YIELD_SLEEP_TIME="500" LD_PRELOAD="/var/lib/boinc/libsleep.so" /usr/bin/boinc --dir /var/lib/boinc-client >/var/log/boinc.log 2>/var/log/boincerr.log' #ExecStart=/bin/sh -c 'LD_PRELOAD="/var/lib/boinc/libsleep.so" /usr/bin/boinc' ExecStop=/usr/bin/boinccmd --quit ExecReload=/usr/bin/boinccmd --read_cc_config ExecStopPost=/bin/rm -f lockfile ...

In any case, do not forget to:
$ systemctl daemon-reload
$ systemctl restart boinc-client.service
20) Message boards : Generalized Fermat Prime Search : Multi-threaded GFN-19 3.4.0.2 (Message 155533)
Posted 79 days ago by pascaltec
pin-genefer-threads is useful for
- processors with multiple dies (Ryzen 9) or processors (Intel 12900K, 12700K) with hybrid Performant and Efficient cores,
- or AMD Epyc / Intel Xeon platforms with multiple physical processors (sockets) and Non-Uniform Memory Access (NUMA).
Why ?
- a task should run only (with its threads) using the same L3 cache,
- each thread (of this task) is pinned to a core and keep the same L1d, L1i, L2
(all of these cores of the same task sharing the same L3 cache),
thus the core access always the same fastest "memory" (caches and DDR) (20% gain).
Each thread also uses the same AVX2 unit throughout the duration of the task (no gain).
- the memory required by the tasks using the same L3 cache should not exceed its capacity,
as pointed out by Yves Gallot (1 GFN-19: 10MB, Ryzen 9 L3 cache 32MB, Ryzen 9 5850X3D 96 MB)
Intel Alder Lake
pin-genefer-threads has NOT been tested on 12900K, 12700K, 12600K and probably requires changes
to account for the topology of the (Golden Cove + Gracemont) cores (8+8) (8+4) (6+4) respectively.
Windows
A similar approach is likely possible to be implemented under windows using the equivalent of taskset
indicating to windows task scheduler how to distribute the tasks, how to deal with (pin) genefer threads.
LLR
Just modify "grep genefer_linux" by the LLR application (in /var/lib/boinc/projects/www.primegrid.com).


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2022 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.61, 0.60, 0.64
Generated 9 Aug 2022 | 8:42:16 UTC