Other

drummers-lowrise

Message boards : General discussion : an idea for speeding up LLR testing

Author Message
composite
Volunteer tester

Joined: 16 Feb 10
Posts: 1002
ID: 55391
Credit: 879,366,379
RAC: 316,449

Message 150846 - Posted: 18 Jul 2021 | 9:44:10 UTC

How about testing 2 candidate primes at the same time for most of the LLR iterations?

Since we are repeating the same calculations for different candidates, say X and Y, just with different modulii and different numbers of iterations, then use a common calculation for (X-1) iterations with the modulus which is the product of the candidates (X*Y).
At iteration X compute the LLR residue for candidate X as A (mod X), and replace the intermediate result A with A (mod Y), then continue as normally until iteration Y with A (mod Y).

Of course, we will be doing (X-1) iterations with a modulus twice as long, plus 1 additional (normal length) modulus operation.
But if the modulus code is faster than the rest of the code in each iteration, then we have a speedup by using the slower code just Y times for each 2 candidates instead of (X+Y) times.
There's a net speedup if the overhead of multiplying the candidates X and Y together, plus 1 additional normal length modulus operation, plus the extra time of using the longer modulus for (X-1) iterations,
is less than the time saved by not repeating the first X iterations of the slower code for candidate Y.

i.e.
A(n+1) = An2-2 (mod X*Y)
At iteration X instead taking AX (mod X*Y)
compute LLR residue for candidate X = AX (mod X)
and replace AX with AX (mod Y)
then continue as normal until iteration Y with
A(n+1) = An2-2 (mod Y)

MiauiKatze

Joined: 4 Apr 20
Posts: 22
ID: 1251453
Credit: 1,168,485
RAC: 0

Message 150933 - Posted: 23 Jul 2021 | 12:34:31 UTC

I'm not an expert, but it sounds logical to me. But I doubt it is really that easy to implement. And does the modulo function really behave like this?

A math expert should look at it. What I wrote are just my 2 cents (or maybe even only one :D )

Best regards,
MiauiKatze
____________
--------
"MiauiKatze" is german and means as much as "mewing cat"!
--------

Yves Gallot
Volunteer developer
Project scientist

Joined: 19 Aug 12
Posts: 705
ID: 164101
Credit: 305,166,630
RAC: 8

Message 150934 - Posted: 23 Jul 2021 | 14:32:36 UTC - in response to Message 150846.

Let X ~ Y be two n-bit numbers.
The computation time for one mul modulo X is M(X) = K * n * log(n).
If Z = X * Y, the computation time for one mul modulo X*Y is M(X*Y) = K * 2*n * log(2*n).

With 2 LLR, the computation time is 2 * n * M(X) = 2 * K * n^2 * log(n).
With your algorithm, it is n * M(X*Y) = 2 * K * n^2 * log(2*n).

That's not faster.

Pavel Atnashev

Joined: 11 Aug 17
Posts: 57
ID: 914937
Credit: 3,943,381,807
RAC: 1,256,394

Message 150965 - Posted: 27 Jul 2021 | 6:49:30 UTC - in response to Message 150934.

Not to mention twofold increase in cache and memory bandwidth requirements.

The first rule: one should keep FFT size as small as possible.

Message boards : General discussion : an idea for speeding up LLR testing