## Other

drummers-lowrise

Message boards : Number crunching : Calculating FFT length

Author Message
Werinbert

Joined: 9 Jun 13
Posts: 163
ID: 233452
Credit: 363,048,523
RAC: 284,771

Message 152334 - Posted: 21 Nov 2021 | 12:24:35 UTC

How is FFT length calculated? Is is solely based on the the prime being calculated or are there other external factors such as hardware that can affect the FFT being used? And is the choosing of the FFT dynamic? (I am not asking about fitting a given FFT into cache which is covered in other threads.)
____________
Werinbert is not prime... or PRPnet keeps telling me so.
Badge score: 12x3 + 1x4 + 2x6 + 2x7 + 1x8 + 1x9 + 1x10 = 93

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 13787
ID: 53948
Credit: 345,142,938
RAC: 16,195

Message 152335 - Posted: 21 Nov 2021 | 13:24:33 UTC - in response to Message 152334.

How is FFT length calculated? Is is solely based on the the prime being calculated or are there other external factors such as hardware that can affect the FFT being used? And is the choosing of the FFT dynamic? (I am not asking about fitting a given FFT into cache which is covered in other threads.)

tl;dr: It's the number and the CPU type, but the details are very complicated.

The number (magnitude, size of the component parts, and I believe the type of number) and the hardware (CPU architecture) both have a role in selecting the FFT size.

The software tries to pick the fastest (usually the smallest) FFT size and FFT type. It guesses what it thinks the best FFT is, and uses that. If it guesses too aggressively and uses an FFT that is too small, a loss of precision will occur due to rounding errors, which will be detected, and all or part of the calculation will be retried with the next larger FFT.

If the guess is not aggressive enough, then a larger (i.e., slower) FFT is used when a faster FFT would have worked.

It works pretty well, but it's not an exact science.

Note that if you're using multithreading, it's even more complicated because the "best" size (i.e., smallest FFT that won't cause an error overflow) is **NOT** deterministic. Because threads will complete in random order, when you're close to an FFT boundary, some orderings of threads completing may produce an error overflow while other orderings won't. Here's an example...

Suppose there's four threads, and each produces an error of either +1 or -1, that the maximum allowable error is +2 (+2 is ok, but +3 is failure), and that three of the threads produce an error of +1 while one thread produces an error of -1. Consider these two orderings of how the four threads complete:

ordering one:
+1 (cumulative error +1, ok)
-1 (cumulative error 0, ok)
+1 (cumulative error +1, ok)
+1 (cumulative error +2, ok)

ordering two:
+1 (cumulative error +1, ok)
+1 (cumulative error +2, ok)
+1 (cumulative error +3, CALCULATION FAILS)
-1 (don't care)

In the second ordering, the result of the calculation has incurred too much rounding error to be considered valid, but if the threads complete in a different order, the calculation is valid.

That's not exactly how the error calculation works, but it's close enough to demonstrate why multithreading makes the rounding errors non-deterministic.
____________
My lucky number is 75898524288+1

mackerel
Volunteer tester

Joined: 2 Oct 08
Posts: 2583
ID: 29980
Credit: 550,189,840
RAC: 19,528

Message 152336 - Posted: 21 Nov 2021 | 13:31:28 UTC - in response to Message 152334.

I don't claim to understand the whole process, but can answer parts of the questions.

FFT size is chosen to optimise performance on the hardware. You want the smallest FFT for highest performance, but the numerical precision scales with size. So the FFT has to be big enough not to introduce errors in calculation. Generally speaking, bigger numbers require bigger FFTs, but this is not a precise thing.

Following applies to Prime95, LLR, PFGW and other implementations using the gwnum library. To my understanding, depending on the CPU architecture, different FFT sizes may be picked. Personally I'd think a FP64 operation is the same as any other FP64 operation, but apparently there are architecture specific optimisations that can affect that. The code will pick from different implementations for best performance. FFT sizes may be slightly bigger or smaller than on other architectures due to this.

As for FFT size being dynamic, it can change during a test. Because of the required precision and performance balance, the best transition point may not be exactly calculated. The software can detect rounding errors, and if they happen, it'll repeat the calculation from last checkpoint. It may first repeat at the same FFT size first to check if it might be a transient error, such as bad hardware. If the error is repeatable, then it may increase FFT size.

Werinbert

Joined: 9 Jun 13
Posts: 163
ID: 233452
Credit: 363,048,523
RAC: 284,771

Message 152337 - Posted: 21 Nov 2021 | 19:30:20 UTC

Thank you both.
I find that the variations found in PG tasks to be small and more or less consistent. Great for projecting long term credit and hourly goals. SRBase tasks on the other hand seem to have FFTs that are all over the place which in turn makes for inconsistent credit/hr calculations.

____________
Werinbert is not prime... or PRPnet keeps telling me so.
Badge score: 12x3 + 1x4 + 2x6 + 2x7 + 1x8 + 1x9 + 1x10 = 93

dannyridel
Volunteer tester

Joined: 3 Feb 19
Posts: 962
ID: 1097922
Credit: 32,465,708
RAC: 42,022

Message 152340 - Posted: 22 Nov 2021 | 0:37:31 UTC - in response to Message 152337.