Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
1)
Message boards :
Number crunching :
How to optimize the number of threads
(Message 161199)
Posted 7 days ago by mackerel
I don't know situation (maybe very rare) where HT enabled is better when using LLR.
Based on my observations, Windows can do some weird things with scheduling when HT is on. For a general use PC where you don't want to be turning HT on and off, following may be considered.
If we assume the HT off case is the reference, I have not found a case where HT on gives significantly more throughput for LLR or similar tasks. Maybe 2% at most but I'd consider this within measurement tolerances.
With HT on and you are running multi-thread tasks, Windows seems to do a good job with scheduling and there is no loss in performance.
With HT on and you are running single thread tasks, Windows can do a bad job and you may see ball park 10% throughput loss "as is". Running more total threads than cores can reduce that loss, giving the impression of an increase in throughput. The workaround to this is to manually set affinity so that BOINC (not BOINC manager) can run on only odd or even numbered CPU threads. This can be done through Task Manager, Powershell, or other software like process lasso. The setting is not remembered between restarts of BOINC.
If you have a Ryzen CPU with more than one CCX (Ryzen 3000 or earlier with 6 or more cores, Ryzen 5000/7000 with 12 or more cores), then you may find improvement from running process lasso to keep tasks on the same CCX. This probably also applies to multi-socket systems.
Above applies to CPU use only. I don't know how to optimise in the scenario if you also want to keep a thread for GPU at the same time. Maybe process lasso can be told to only work on LLR tasks for example, but I have not tried this.
|
2)
Message boards :
Number crunching :
How to optimize the number of threads
(Message 161181)
Posted 8 days ago by mackerel
This is a very nice quality of life feature, and will save some work figuring out what FFT sizes work is currently on.
It should be noted, these sizes aren't static and will tend to increase over time. So what is optimal now might change later. Occasional checking is not a bad idea.
|
3)
Message boards :
Number crunching :
I knew ram made a difference, but...
(Message 161122)
Posted 14 days ago by mackerel
The effect of rank has been known for some time, but I'm expanding it to multiple 2R modules per channel, as well as the lower end 1Rx16 modules which I hadn't encountered until more recently.
|
4)
Message boards :
Number crunching :
I knew ram made a difference, but...
(Message 161118)
Posted 14 days ago by mackerel
I wanted to look further at this in a desktop environment so managed to get some relatively cheap 1Rx16 modules for testing on a desktop. This would be a wall of text if I wrote everything, so I'll just go over the Prime95 ram-limited scenario here, which could similarly apply to bigger LLR tests. I'm looking at making a more digestible version of all the data later on.
Test system has 11700k with practically unlimited power limit. This is a dual channel 4 slot system. So tests with two modules are dual channel 1 DPC, and tests with 4 modules are dual channel 2 DPC. I did not test 2 modules in one channel 2 DPC.
So that I'm testing the ram configuration, and not other factors like ram speed/timings, this testing was run at JEDEC 2133 timings. I did note the 1Rx8 modules had a much lower tRFC than the other modules but for this test I'm not sure it made a significant impact.
I normalised the results to two 1Rx8 modules, configured in dual channel. This is likely the most common configuration unless you have high ram quantity.
104% 2x 2Rx8
104% 4x 1Rx8
103% 4x 2Rx8
100% 2x 1Rx8
98% 4x 1Rx16
74% 2x 1Rx16
I've known for a while that having effectively two rank per channel, however you reach it, performed better. But the benefit here is a bit lower than I was expecting at up to 4%. During Skylake era, I had see improvement of 25% from having 2 rank. I'll need to repeat this at a ram speed of 3200 to see how that impacts things, as that will be more representative of how it would normally be used.
The nasty 1Rx16 memory is significantly behind, but if you somehow have 4 of these it mostly catches up with the other configurations.
|
5)
Message boards :
Problems and Help :
7950x RAM problem
(Message 161117)
Posted 15 days ago by mackerel
Nice to hear a solution has been found. Adjusting timings wouldn't be something I'd normally think of trying as there are so many timings. Given it is resolved by relaxed CL, I hear but never achieved myself that to tighten those, the ram voltage needs to be raised. At this point, if it is working fine with the relaxed timing I'd leave it.
|
6)
Message boards :
Problems and Help :
7950x RAM problem
(Message 161022)
Posted 19 days ago by mackerel
Ok, no problem for
Sophie Germain Prime Search LLR (SGS)
http://www.primegrid.com/results.php?hostid=1172949&offset=0&show_names=0&state=0&appid=10
Proth Prime Search LLR (PPS)
http://www.primegrid.com/results.php?hostid=1172949&offset=0&show_names=0&state=0&appid=2
These are "small" tasks and will be mainly done in CPU cache, not significantly hitting memory.
|
7)
Message boards :
Problems and Help :
7950x RAM problem
(Message 160980)
Posted 21 days ago by mackerel
Was looking at the 7950X for other reasons and noticed the following:
Max Memory Speed
2x1R DDR5-5200
2x2R DDR5-5200
4x1R DDR5-3600
4x2R DDR5-3600
The same info was in the footnotes of the mobo page previously linked, but I missed it. I knew there was a drop going from 2 to 4 sticks, but that's a big drop. 6000 ram is above that already but it is common for many to use ram above supported speeds, through XMP or EXPO. But that isn't guaranteed. The higher you go, the more chance there is of being unstable.
Curious if any decision has been made on what will be done on the problem system? Increasing voltages? Lowering speeds? Use only two modules, either the existing or swap them with 2x32?
|
8)
Message boards :
Number crunching :
International Women's Day Challenge
(Message 160954)
Posted 21 days ago by mackerel
Hi, settings for the SGS says multithreading is available but is not recommended. Should we use multithreading at all or do you suggest to limit 1 task per thread? Thanks
It depends a bit on your goals. If it is to complete as many tasks as possible for maximum challenge score, 1 thread per task is likely optimal on most systems.
If the goal is to maximise the number of "1st" then 2 cores per task may be considered. It is impossible to give a definitive answer as it depends on hardware, settings, and what other people are doing at the time. In my opinion, as these primes are not t5k reportable, there is little value in chasing "1st" outside of the extremely slim chance you get an actual SGS.
|
9)
Message boards :
Problems and Help :
7950x RAM problem
(Message 160927)
Posted 23 days ago by mackerel
Can raise some voltage manually?
Maybe, but I'm not up to speed on ram overclocking on either AMD or DDR5.
Really? So far PG only has a problem :(
Primegrid and other intensive compute applications may show it faster, but your system is unstable even if you haven't seen it elsewhere yet. There could be silent corruption going on.
Basically the memory controller has to do more work when 4 slots are populated, and depending how close it is to the limit, intensive use could push it over where less stressful tasks do not. If you really need 64GB of ram 2x32 may work better than 4x16.
|
10)
Message boards :
Problems and Help :
7950x RAM problem
(Message 160921)
Posted 24 days ago by mackerel
What speed is the ram? It is not unusual for there to be stability problems when 4 high speed modules are fitted but ok with two. This can affect other apps too, but depending on how intensive ram is used it can take much longer to show.
If this is the case, the workaround is to back off the ram speed a little.
|
Next 10 posts
|