1) Message boards : Aggie The Pew message board (Message 153482)
Posted 6 days ago by mackerelProject donor
Seeing around 80% firsts on PPSE just running 1 task per core so I guess I'll keep going for a bit.
2) Message boards : Aggie The Pew message board (Message 153478)
Posted 6 days ago by mackerelProject donor
On the challenge it would have been nice to personally be a position higher.

On PPSE I had gone off that since I like the no race to 1st of PPS, even if the bigger primes means they are much less likely. Still, I'll move some power to PPSE again and see how it goes.
3) Message boards : Number crunching : 8 AVX-512 units (Message 153304)
Posted 10 days ago by mackerelProject donor
Since I wasn't running all cores, not sure power would be that indicative. I did conclude in past testing that the Rocket Lake system had near enough the same perf/W as the Comet Lake system. Where power was higher, so was the perf.

The Rocket Lake system I've power limited to 95W and it was running off that, with 4/8 cores in use. Don't know about the Comet Lake system without re-running it. The Skylake-X system has never reported power correctly so I don't have numbers for that.

I've pretty much given up on buying Alder Lake myself. While CPUs are ok, the mobo pricing is on the high side, and with the reported killing of unofficial AVX-512 functionality it would take away a possible use case. Also I refuse to buy another DDR4 mobo, and DDR5 ram is rarer than a GPU.
4) Message boards : Number crunching : 8 AVX-512 units (Message 153301)
Posted 11 days ago by mackerelProject donor
After I wrote the above, I did think, am I answering the right question? I was looking at it from an architecture viewpoint. But we don't buy architectures, we buy products.

SKX 7920X 402s @2.9 GHz = 291s @ 4.0 GHz (+56% rel. RKL, +90% rel. CML)
CML 10600k 491s @4.5 GHz = 552s @ 4.0 GHz
RKL 11700k 395s @4.6 GHz* = 454s @ 4.0 GHz (+22% rel CML)
RKL 11700k 395 @4.4 GHz* = 434s @ 4.0 GHz

Again, the Rocket Lake results were an eyeball average of the clock, which varied a lot due to me setting a low power limit. I was running limited units in all these cases. The 7920X system runs 2.9 GHz whenever there is AVX-512 code going, so that is predictable. The Comet Lake system was running slightly below full load (I set 45% in BOINC) so it may or may not be running slightly higher clock than if all cores were used. The Rocket Lake was running 25% so it may boost more than if all cores were loaded.

Still, based on what I've seen here, I wouldn't worry about running AVX-512 even on single unit CPUs, but if you do want to ensure you're getting the most then it might be worth manually testing. If you don't want to go through the pain of editing BOINC files, you might be able to do similar with Prime95 benchmark, although I think you still have to edit a config file to disable AVX-512 for comparison. You can disable AVX-512 in torture test, but I don't think it is present for the benchmark mode.

*as before, the Rocket Lake clocks varied a lot during the run so this may not be very accurate. I went with 4.6 GHz average, but also presented a 4.4 GHz average for indication.
5) Message boards : Number crunching : 8 AVX-512 units (Message 153298)
Posted 11 days ago by mackerelProject donor
Ok, ran a handful of units on Comet Lake, Rocket Lake (1 unit), and Skylake-X (2 unit). After normalising for their running clock, Rocket Lake was +22% faster than Comet Lake, and Skylake-X was +90% faster than Comet Lake.

So the numbers vary a bit than I wrote earlier, which may be due to many reasons, but ball park show 1 unit implementation still has a benefit and does not show regression in this case, and 2 unit clearly gives a big boost.

It should be noted that my Rocket Lake system is the only one I'm running on a low power limit as I'm limited in cooling potential. The clocks were varying a lot during the run, and I did an "eyeball average" of active cores at the time. If the average clock was actually lower, its relative performance would be higher. So take it with lower confidence than the other two.
6) Message boards : Number crunching : 8 AVX-512 units (Message 153295)
Posted 11 days ago by mackerelProject donor
I think someone at Intel messed up that entry. CPUs with AVX-512 have either 1 or 2 units per core, and that is what used to be shown in Ark historically.

That CPU has 8 cores, and someone added it all up to get 8 units. It is 1 unit per core, and will perform somewhat better than no AVX-512, but not as well as 2 unit AVX-512 implementations.

I forget the exact number, but I think I saw about 40% on Rocket Lake (1 unit) and 80% on Skylake-X (2 unit), relative to Skylake. That's normalised for cores and clocks, and for situations where there are no other limiting factors such as ram bandwidth. As such it can be seen as a best case. This was based on Prime95 benchmarks.

I think I've ran small LLR units and saw great performance, but don't recall as it was a while ago and I don't run that system often on primegrid nowadays.

Edit: I have to know! Getting ready to run some SGS units on 0, 1, 2 unit AVX-512 systems now.
7) Message boards : Aggie The Pew message board (Message 153291)
Posted 11 days ago by mackerelProject donor
Put a 980 Ti on for pre-testing. It's running at 70% power limit. My 1st rate is... not good! The other GPUs I'll add for the event itself will be much faster.
8) Message boards : Aggie The Pew message board (Message 152534)
Posted 51 days ago by mackerelProject donor
Forgot this was on. Just put on two GPUs, possibly one more part time later.
9) Message boards : Aggie The Pew message board (Message 152390)
Posted 63 days ago by mackerelProject donor
Unfortunately my GTX 1080 is out of action due to a fried mobo or otherwise my throughput would have been slightly more than double. I am having a difficult time locating a replacement mobo for my i7 7700k.

Given its age, is it worth replacing? 7700k still sells well on used market, pick up a newer CPU and mobo instead?

I do have a spare Z170 board, but by the time you factor in postage and customs fees, it is unlikely to make any sense.
10) Message boards : Number crunching : Calculating FFT length (Message 152336)
Posted 67 days ago by mackerelProject donor
I don't claim to understand the whole process, but can answer parts of the questions.

FFT size is chosen to optimise performance on the hardware. You want the smallest FFT for highest performance, but the numerical precision scales with size. So the FFT has to be big enough not to introduce errors in calculation. Generally speaking, bigger numbers require bigger FFTs, but this is not a precise thing.

Following applies to Prime95, LLR, PFGW and other implementations using the gwnum library. To my understanding, depending on the CPU architecture, different FFT sizes may be picked. Personally I'd think a FP64 operation is the same as any other FP64 operation, but apparently there are architecture specific optimisations that can affect that. The code will pick from different implementations for best performance. FFT sizes may be slightly bigger or smaller than on other architectures due to this.

As for FFT size being dynamic, it can change during a test. Because of the required precision and performance balance, the best transition point may not be exactly calculated. The software can detect rounding errors, and if they happen, it'll repeat the calculation from last checkpoint. It may first repeat at the same FFT size first to check if it might be a transient error, such as bad hardware. If the error is repeatable, then it may increase FFT size.

