Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
1)
Message boards :
Number crunching :
Threading and processor affinity tool for Linux
(Message 134045)
Posted 1444 days ago by Hazel
It's not clear to me, how many tasks you were running?
It depends on the CPU... it's listed in % like how BOINC is configured. That is, i7-7700K is a 4C/8T CPU so "Processors: 50% Threads: 2" means to run 2 tasks because 50% of 8 is 4, and 4/2 is 2.
|
2)
Message boards :
Number crunching :
Threading and processor affinity tool for Linux
(Message 133997)
Posted 1446 days ago by Hazel
I did some benchmarks on my machines for the upcoming 321 challenge, using python3 primegrid.py --benchmark-llr '3*2^15083930+1'
i7-7700K @ 4.7Ghz
Processors: 100% Threads: 1 Layout: spread Tasks/day 7.22±0.06
Processors: 100% Threads: 1 Layout: free Tasks/day 7.23±0.12
Processors: 100% Threads: 2 Layout: clump Tasks/day 7.54±0.16
Processors: 100% Threads: 2 Layout: free Tasks/day 7.66±0.24
Processors: 100% Threads: 2 Layout: spread Tasks/day 7.70±0.07
Processors: 50% Threads: 1 Layout: spread Tasks/day 7.88±0.32
Processors: 50% Threads: 1 Layout: free Tasks/day 7.88±0.24
Processors: 50% Threads: 4 Layout: free Tasks/day 8.32±0.49
Processors: 100% Threads: 4 Layout: clump Tasks/day 8.35±0.29
Processors: 50% Threads: 2 Layout: free Tasks/day 8.40±0.06
Processors: 50% Threads: 2 Layout: spread Tasks/day 8.44±0.22
Processors: 100% Threads: 4 Layout: free Tasks/day 8.46±0.13
Processors: 100% Threads: 4 Layout: spread Tasks/day 8.55±0.07
Processors: 50% Threads: 4 Layout: spread Tasks/day 8.67±0.08
Processors: 88% Threads: 7 Layout: free Tasks/day 9.04±0.04
Processors: 100% Threads: 8 Layout: free Tasks/day 9.13±0.03
Hyper-threading is really doing work on this CPU...
Xeon E3-1225 V2 @ stock
Processors: 100% Threads: 1 Layout: free Tasks/day 2.94±0.03
Processors: 100% Threads: 1 Layout: spread Tasks/day 2.95±0.22
Processors: 75% Threads: 3 Layout: free Tasks/day 2.98±0.12
Processors: 100% Threads: 2 Layout: free Tasks/day 3.18±0.29
Processors: 100% Threads: 2 Layout: spread Tasks/day 3.26±0.06
Processors: 100% Threads: 4 Layout: spread Tasks/day 3.79±0.01
Processors: 100% Threads: 4 Layout: free Tasks/day 3.81±0.01
Basically as expected, multithreading is faster on this 4-core CPU without HT.
i7-3770K @ 4.3Ghz couldn't decide between these three:
Processors: 100% Threads: 8 Layout: free Tasks/day 4.59±0.03
Processors: 88% Threads: 7 Layout: free Tasks/day 4.59±0.04
Processors: 50% Threads: 4 Layout: spread Tasks/day 4.63±0.02
Which is honestly, pretty fascinating. This CPU is right on the line between HT helping and hurting.
i7-8750H @ 45W couldn't decide between these:
Processors: 100% Threads: 6 Layout: free Tasks/day 5.86±0.13
Processors: 92% Threads: 11 Layout: free Tasks/day 5.88±0.08
Processors: 50% Threads: 1 Layout: spread Tasks/day 5.95±0.06
Processors: 50% Threads: 3 Layout: free Tasks/day 5.96±0.11
Processors: 100% Threads: 6 Layout: spread Tasks/day 5.97±0.07
Processors: 50% Threads: 1 Layout: free Tasks/day 5.97±0.03
Processors: 50% Threads: 6 Layout: spread Tasks/day 6.02±0.07
Processors: 50% Threads: 2 Layout: spread Tasks/day 6.04±0.04
Processors: 50% Threads: 3 Layout: spread Tasks/day 6.04±0.07
This is a laptop and it's probably just trading off power budget between the RAM and the cores.
|
3)
Message boards :
Number crunching :
Threading and processor affinity tool for Linux
(Message 133553)
Posted 1458 days ago by Hazel
The script now supports managing the CPU temperature.
Managing CPU Temperature
For some systems which thermal throttle, or are simply too loud, the script can manage the CPU temperature by specifying '--target-temp 95' or some other temperature in °C. The script will then change the maximum allowed CPU frequency until that temperature is met but not exceeded. Do not use this feature on overclocked systems.
Thermal throttling can degrade performance when the CPU runs too fast, overheats, and then runs very slowly until it cools down, then repeats this process over and over again. It is more efficient to run the CPU at a more consistent, intermediate speed.
|
4)
Message boards :
Number crunching :
Threading and processor affinity tool for Linux
(Message 133521)
Posted 1459 days ago by Hazel
Is it giving you an error? It shouldn't require any particular numpy feature... Numpy is just required for scipy.
|
5)
Message boards :
Number crunching :
Threading and processor affinity tool for Linux
(Message 133511)
Posted 1459 days ago by Hazel
Download
The latest version is here: https://github.com/hazelybell/scripts/blob/master/primegrid.py
Requirements
- Python 3.7
- numpy
- scipy
- util-linux (with the taskset command)
Benchmarking
To run the benchmark, stop primegrid and call:
python3.7 primegrid.py --benchmark-llr 'k*2^n+1'
For example:
python3.7 primegrid.py --benchmark-llr '25*2^3962242+1'
You can get a relevant prime for the project you're interested in from the subproject status page.
This will then run the benchmark with various thread counts, using HT or not, etc. The benchmark will run until it's relatively confident that the best strategy (thread count, HT, affinity) is the best.
Example output:
INFO:__main__:---------- Current results:
INFO:__main__:Processors: 88% Threads: 7 Layout: free Tasks/day 48.68±0.60
INFO:__main__:Processors: 50% Threads: 4 Layout: free Tasks/day 50.39±0.48
INFO:__main__:Processors: 100% Threads: 1 Layout: free Tasks/day 51.79±0.48
INFO:__main__:Processors: 100% Threads: 1 Layout: spread Tasks/day 51.98±0.20
INFO:__main__:Processors: 100% Threads: 8 Layout: free Tasks/day 52.24±0.28
INFO:__main__:Processors: 50% Threads: 4 Layout: spread Tasks/day 52.90±0.30
INFO:__main__:Processors: 50% Threads: 2 Layout: free Tasks/day 57.94±0.69
INFO:__main__:Processors: 50% Threads: 2 Layout: spread Tasks/day 58.86±0.16
INFO:__main__:Processors: 100% Threads: 4 Layout: spread Tasks/day 61.50±0.59
INFO:__main__:Processors: 100% Threads: 4 Layout: clump Tasks/day 61.80±0.16
INFO:__main__:Processors: 100% Threads: 4 Layout: free Tasks/day 61.83±0.21
INFO:__main__:Processors: 100% Threads: 2 Layout: spread Tasks/day 62.34±1.40
INFO:__main__:Processors: 100% Threads: 2 Layout: free Tasks/day 64.24±1.65
INFO:__main__:Processors: 100% Threads: 2 Layout: clump Tasks/day 65.65±1.27
INFO:__main__:Processors: 50% Threads: 1 Layout: spread Tasks/day 72.00±3.00
INFO:__main__:Processors: 50% Threads: 1 Layout: free Tasks/day 72.51±2.46
INFO:__main__:----------
This indicates that the benchmark has completed a round, and that using 50% of the processors with tasks that have one thread each and letting Linux manage thread affinity was the fastest. It will continue running until it can be confident in it's choice, but you can stop whenever the error bars (the number after the ±) gets small enough for you, or if you're just sick of waiting.
Affinity
The tool currently supports 3 different affinity types:
'free': Let the OS manage affinity itself
'spread': Spread out the threads of a single task across different cores
'clump': Put the threads of a single task on the same cores
Example: Consider a 4-core CPU with hyperthreading. For a task with 2 threads, 'spread' will put the two threads on two different cores, and 'free' will put the two threads on the same core. For a task with 4 threads, 'spread' will put one thread on each core, while 'clump' will put all the threads on 2 cores.
Managing Affinity
If you decide that you want to run primegrid with an affinity layout other than 'free' you can run the script with '--layout clump' or '--layout spread' as root. The script will watch for BOINC to start primegrid LLR tasks and manage their affinity using 'taskset'.
Advanced Options
'--processors N' Benchmark only using N processors. This includes logical processors. For example on a 4-core processor with hyperthreading, setting '--processors 4' will only benchmark equivalent to setting 50% of CPUs in BOINC. This will reduce the number of different benchmarks run.
'--threads N' Benchmark only using tasks with N threads. This will reduce the number of different benchmarks to run.
'--layout free|spread|clump' Benchmark only tasks using the specified affinity layout. This will reduce the number of benchmarks to run.
Example for a 6-core hyperthreading processor: '--processors 6 --layout free' will only benchmark 1x6, 2x3, 3x2 and 6x1 tasks x threads without worrying about CPU affinity.
'--ci .90' Change the confidence interval used to compute the error bars. This will change the number of times the script re-runs benchmarks before it's "confident" in the results.
'--llr-executable path/to/llr' Specify a specific LLR executable.
Even more options: See '--help' output, but be wary, these can have unfortunate effects on your system.
Known Bugs
Systems with complicated topologies are not modelled by the script. That includes systems with multiple CPU sockets, NUMA, and Ryzen 3000-series CPUs. For thread counts strictly more than 2 with hyperthreading or more than 1 without hyperthreading the CPU affinity may be handled poorly. It is better to let the OS manage CPU affinity in these situations. For example, consider the Ryzen 3600X, a 6-core processor with hyperthreading. On this CPU, cores are organized into groups called CCXs, for which communication inside a single CCX is much faster than communication between cores in different CCXs. Thus plans like 4 threads x 3 tasks (100%) may have very poor performance if 'clump' or 'thread' is chosen. I plan supporting these systems better, eventually, if I can get my hands on one.
Future features
- Manage CPU temperature (for systems which thermal throttle :x)
- Model NUMA/multisocket/CCX CPU topologies
- Manage C-states
- Manage GPU driver affinity
- Collect power usage and temperature stats
Example Results
PPS-DIV on a i7-7700K @ 4.7Ghz: use 1 thread, 50% CPUs, 'free' affinity
PPS-DIV on a i7-3770K @ 4.3Ghz: use 1 thread, 50% CPUs, 'free' affinity
PPS-DIV on a i7-8750H @ 45W: use 1 thread, 50% CPUs, 'spread' affinity
PPS-DIV on a i7-4700MQ @ 27W: use 1 thread, 50% CPUs, 'spread' affinity
PPS-DIV on a Xeon E3-1225 V2 @ stock: use 1 thread, 100% CPUs, 'spread' affinity
PPS-DIV on a i7-9700K @ 4.8Ghz: use 1 thread, 100% CPUs, 'free' affinity
SoB on a i7-7700K @ 4.7Ghz: use 4 threads, 50% CPUs, 'free' affinity
Conclusions
From experimenting with this script I've come to the following conclusions. Note that these are only for linux. Other operating systems handle threads very differently.
- Using 1 thread on half the CPUs if hyperthreading or on all the CPUs if no hyperthreading is generally an okay choice. Even for SoB, using 1 thread isn't much slower than 4 threads.
- Choosing whether or not to use all logical processors (threads) on a hyperthreading CPU matters.
- Setting thread affinity can improve performance on some systems.
- Best thread count changes depending on project. Both k and n can have an effect here and they can have different effects.
Feel free to post your results below!
|
6)
Message boards :
Generalized Fermat Prime Search :
High CPU usage again: Genefer 21 3.19 GPU (OCLcudaGFN)
(Message 120988)
Posted 1819 days ago by Hazel
I have confirmed the LD_PRELOAD hack still fixes it. This seems to affect all my machines (but I set up all of my machines very similarly, they're all running close to the same versions of everything.)
Can you let me know specifically what libraries you linked via LD_PRELOAD to resolve this?
Cheers
- Iain
That tiny libsleep.c one that overrides sched_yield.
|
7)
Message boards :
Generalized Fermat Prime Search :
High CPU usage again: Genefer 21 3.19 GPU (OCLcudaGFN)
(Message 120929)
Posted 1823 days ago by Hazel
I have confirmed the LD_PRELOAD hack still fixes it. This seems to affect all my machines (but I set up all of my machines very similarly, they're all running close to the same versions of everything.)
|
8)
Message boards :
Generalized Fermat Prime Search :
High CPU usage again: Genefer 21 3.19 GPU (OCLcudaGFN)
(Message 120918)
Posted 1824 days ago by Hazel
The GFN GPU tasks on my linux machines still use 100% CPU (100% of a single thread) even though this is supposed to be fixed according to Mike Goetz.
The systems involved are (driver versions saved here for reference):
crystal:
http://www.primegrid.com/show_host_detail.php?hostid=936874
Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz [Family 6 Model 158 Stepping 10] (12 processors)
NVIDIA GeForce GTX 1060 (4095MB) driver: 396.54, INTEL Intel(R) UHD Graphics Coffee Lake Halo GT2 (4096MB)
Debian GNU/Linux testing (buster) [4.18.0-1-amd64|libc 2.27 (Debian GLIBC 2.27-6)]
Output from this machine:
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 1060', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '396.54'.
10 computeUnits @ 1733MHz, memSize=6078MB, cacheSize=160kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Supported transform implementations: ocl ocl2 ocl3 ocl4 ocl5
Command line: ../../projects/www.primegrid.com/primegrid_genefer_3_3_3_3.19_x86_64-pc-linux-gnu__OCLcudaGFN15 -boinc -q 101950630^32768+1 --device 0
Normal priority change failed (needs superuser privileges.
Checking available transform implementations...
OCL transform is past its b limit.
OCL3 transform is past its b limit.
OCL4 transform is past its b limit.
OCL5 transform is past its b limit.
Using OCL2 transform
Starting initialization...
Initialization complete (0.054 seconds).
Testing 101950630^32768+1...
Estimated time for 101950630^32768+1 is 0:03:26
101950630^32768+1 is complete. (262419 digits) (err = 0.0000) (time = 0:03:30) 15:20:24
15:20:24 (4685): called boinc_finish
(the GPU on this system is locked into a power-saving mode, so it should run a fraction as fast as you would expect from a normal 1060, so the CPU thread should have to communicate with it even less often. Despite the fact that it says 10 compute units at 1733Mhz its actually running at 607Mhz at best. ETA for GFN 21 was about 7 days...)
buttercup:
GenuineIntel
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9] (8 processors)
NVIDIA GeForce GTX 970 (4041MB) driver: 390.67
Debian GNU/Linux 9 (stretch) 4.17.0-0.bpo.1-amd64
http://www.primegrid.com/show_host_detail.php?hostid=910164
Output from this machine:
geneferocl 3.3.3-2 (Linux/OpenCL/64-bit)
Copyright 2001-2018, Yves Gallot
Copyright 2009, Mark Rodenkirch, David Underbakke
Copyright 2010-2012, Shoichiro Yamada, Ken Brazier
Copyright 2011-2014, Michael Goetz, Ronald Schneider
Copyright 2011-2018, Iain Bethune
Genefer is free source code, under the MIT license.
Running on platform 'NVIDIA CUDA', device 'GeForce GTX 970', vendor 'NVIDIA Corporation', version 'OpenCL 1.2 CUDA' and driver '390.67'.
13 computeUnits @ 1177MHz, memSize=4041MB, cacheSize=208kB, cacheLineSize=128B, localMemSize=48kB, maxWorkGroupSize=1024.
Supported transform implementations: ocl ocl2 ocl3 ocl4 ocl5
Command line: ../../projects/www.primegrid.com/primegrid_genefer_3_3_3_3.19_x86_64-pc-linux-gnu__OCLcudaGFN -boinc -q 266884^2097152+1 --device 0
Normal priority change failed (needs superuser privileges.
Checking available transform implementations...
A benchmark is needed to determine best transform, testing available transform implementations...
Testing OCL transform...
Testing OCL2 transform...
Testing OCL3 transform...
Testing OCL4 transform...
Testing OCL5 transform...
Benchmarks completed (20.951 seconds).
Using OCL4 transform
Starting initialization...
Initialization complete (12.814 seconds).
Testing 266884^2097152+1...
Estimated time for 266884^2097152+1 is 21:30:00
I tested both GFN-15 and GFN-21. This is all just via BOINC.
Since this interferes with my CPU tasks, I have switched to AP27 for now.
I have confirmed BOINC is using the latest GPU version: primegrid_genefer_3_3_3_3.19_x86_64-pc-linux-gnu__OCLcudaGFN
Both systems use the nVidia GPU for X11.
I will try the LD_PRELOAD hack later or this weekend if I have time.
|
9)
Message boards :
News :
Another PPS-Mega Prime!
(Message 111445)
Posted 2148 days ago by Hazel
Hi! I found this prime! I was wondering if there would be anyway to get the decimal representation of it? I know it's over 1M digits, is there any software that can produce it?
You can get the decimal representation directly on the PrimeGrid website.
If you go to your primes page, you'll see a link next to the prime's length. Click where it says decimal.
But to answer your exact question, if you ever want to get the full decimal representation of an arbitrary number, you can use PFGW with the -od command line arguments. For example:
pfgw64 -od -q"943*2^3442990+1"
Congratulations!
Thanks!
|
10)
Message boards :
News :
Another PPS-Mega Prime!
(Message 111432)
Posted 2148 days ago by Hazel
Hi! I found this prime! I was wondering if there would be anyway to get the decimal representation of it? I know it's over 1M digits, is there any software that can produce it?
|
Next 10 posts
|