Message boards :
Number crunching :
Calculating FFT length
Author |
Message |
|
How is FFT length calculated? Is is solely based on the the prime being calculated or are there other external factors such as hardware that can affect the FFT being used? And is the choosing of the FFT dynamic? (I am not asking about fitting a given FFT into cache which is covered in other threads.)
____________
Werinbert is not prime... or PRPnet keeps telling me so.
Badge score: 2x3 + 5x4 + 5x5 + 5x7 + 1x9 + 3x10 = 125 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14011 ID: 53948 Credit: 433,230,186 RAC: 922,880
                               
|
How is FFT length calculated? Is is solely based on the the prime being calculated or are there other external factors such as hardware that can affect the FFT being used? And is the choosing of the FFT dynamic? (I am not asking about fitting a given FFT into cache which is covered in other threads.)
tl;dr: It's the number and the CPU type, but the details are very complicated.
The number (magnitude, size of the component parts, and I believe the type of number) and the hardware (CPU architecture) both have a role in selecting the FFT size.
The software tries to pick the fastest (usually the smallest) FFT size and FFT type. It guesses what it thinks the best FFT is, and uses that. If it guesses too aggressively and uses an FFT that is too small, a loss of precision will occur due to rounding errors, which will be detected, and all or part of the calculation will be retried with the next larger FFT.
If the guess is not aggressive enough, then a larger (i.e., slower) FFT is used when a faster FFT would have worked.
It works pretty well, but it's not an exact science.
Note that if you're using multithreading, it's even more complicated because the "best" size (i.e., smallest FFT that won't cause an error overflow) is **NOT** deterministic. Because threads will complete in random order, when you're close to an FFT boundary, some orderings of threads completing may produce an error overflow while other orderings won't. Here's an example...
Suppose there's four threads, and each produces an error of either +1 or -1, that the maximum allowable error is +2 (+2 is ok, but +3 is failure), and that three of the threads produce an error of +1 while one thread produces an error of -1. Consider these two orderings of how the four threads complete:
ordering one:
+1 (cumulative error +1, ok)
-1 (cumulative error 0, ok)
+1 (cumulative error +1, ok)
+1 (cumulative error +2, ok)
ordering two:
+1 (cumulative error +1, ok)
+1 (cumulative error +2, ok)
+1 (cumulative error +3, CALCULATION FAILS)
-1 (don't care)
In the second ordering, the result of the calculation has incurred too much rounding error to be considered valid, but if the threads complete in a different order, the calculation is valid.
That's not exactly how the error calculation works, but it's close enough to demonstrate why multithreading makes the rounding errors non-deterministic.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2645 ID: 29980 Credit: 568,565,361 RAC: 198
                              
|
I don't claim to understand the whole process, but can answer parts of the questions.
FFT size is chosen to optimise performance on the hardware. You want the smallest FFT for highest performance, but the numerical precision scales with size. So the FFT has to be big enough not to introduce errors in calculation. Generally speaking, bigger numbers require bigger FFTs, but this is not a precise thing.
Following applies to Prime95, LLR, PFGW and other implementations using the gwnum library. To my understanding, depending on the CPU architecture, different FFT sizes may be picked. Personally I'd think a FP64 operation is the same as any other FP64 operation, but apparently there are architecture specific optimisations that can affect that. The code will pick from different implementations for best performance. FFT sizes may be slightly bigger or smaller than on other architectures due to this.
As for FFT size being dynamic, it can change during a test. Because of the required precision and performance balance, the best transition point may not be exactly calculated. The software can detect rounding errors, and if they happen, it'll repeat the calculation from last checkpoint. It may first repeat at the same FFT size first to check if it might be a transient error, such as bad hardware. If the error is repeatable, then it may increase FFT size. | |
|
|
Thank you both.
I find that the variations found in PG tasks to be small and more or less consistent. Great for projecting long term credit and hourly goals. SRBase tasks on the other hand seem to have FFTs that are all over the place which in turn makes for inconsistent credit/hr calculations.
____________
Werinbert is not prime... or PRPnet keeps telling me so.
Badge score: 2x3 + 5x4 + 5x5 + 5x7 + 1x9 + 3x10 = 125 | |
|
|
The reason of FFT inconsistency lies within the bases that LLR are facing with SRBase, from base 2 all the way up to base 1030, which complicates with the FFT calculations of LLR.
____________
My lucky number is 6219*2^3374198+1
| |
|
Message boards :
Number crunching :
Calculating FFT length |