PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Number crunching : Running GPU slows down CPU a lot

Author Message
Profile Michael GoetzProject donor
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 21 Jan 10
Posts: 13396
ID: 53948
Credit: 229,001,509
RAC: 166,186
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,822,730)Cullen LLR Ruby: Earned 2,000,000 credits (2,005,249)ESP LLR Turquoise: Earned 5,000,000 credits (5,009,577)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Jade: Earned 10,000,000 credits (15,524,174)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (34,291,181)SR5 LLR Jade: Earned 10,000,000 credits (10,007,110)SGS LLR Ruby: Earned 2,000,000 credits (2,276,330)TRP LLR Turquoise: Earned 5,000,000 credits (5,084,329)Woodall LLR Ruby: Earned 2,000,000 credits (2,195,123)321 Sieve Jade: Earned 10,000,000 credits (10,061,196)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (22,885,121)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,118,303)GFN Emerald: Earned 50,000,000 credits (74,618,986)PSA Jade: Earned 10,000,000 credits (12,445,029)
Message 121226 - Posted: 19 Oct 2018 | 14:37:21 UTC
Last modified: 19 Oct 2018 | 14:38:31 UTC

Right now, I'm running Cullen tasks on my CPU and GFN-17-Mega on the GPU.

System is a Haswell i5-4670K (4c/4t) with faster-than-stock memory DDR3, and a GTX 1060 6GB GPU.

If I'm not running the GPU, Cullen tasks take about 12 hours, and when I am running the GPU they take about 16 hours. The LLR tasks are running -t4.

That's a pretty big performance hit.

I did some stand alone benchmarks with the GPU running a 17-Mega.

With the GPU running:

C:\bench>cllr64.3.8.21.exe -t4 -d -q"16944075*2^16944075+1"
Starting Proth prime test of 16944075*2^16944075+1
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 4 threads, a = 11
16944075*2^16944075+1, bit: 20000 / 16944099 [0.11%]. Time per bit: 3.258 ms.
Caught signal. Terminating.

C:\bench>del z*

C:\bench>cllr64.3.8.21.exe -t3 -d -q"16944075*2^16944075+1"
Starting Proth prime test of 16944075*2^16944075+1
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 3 threads, a = 11
16944075*2^16944075+1, bit: 20000 / 16944099 [0.11%]. Time per bit: 3.450 ms.
Caught signal. Terminating.


Same test, this time with the GPU idle:

C:\bench>cllr64.3.8.21.exe -t4 -d -q"16944075*2^16944075+1"
Starting Proth prime test of 16944075*2^16944075+1
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 4 threads, a = 11
16944075*2^16944075+1, bit: 20000 / 16944099 [0.11%]. Time per bit: 2.778 ms.
Caught signal. Terminating.

C:\bench>del z*

C:\bench>cllr64.3.8.21.exe -t3 -d -q"16944075*2^16944075+1"
Starting Proth prime test of 16944075*2^16944075+1
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 3 threads, a = 11
16944075*2^16944075+1, bit: 20000 / 16944099 [0.11%]. Time per bit: 3.196 ms.
Caught signal. Terminating.


Notice that the idle-GPU 3-thread test if faster than the busy-GPU 4-thread test. Running the GPU costs the LLR task more than a full core's worth of performance.

However, when looking at task manager, the Genefer task is only using about 1% of the CPU (4% of a thread), and PrecisionX is using about 5% of the CPU (25% of a thread). In total, the GPU software is only using about about 30% of a thread's CPU cycles, and yet it's effectively slowing down the CPU by more than a full thread. Something else is happening here.

While I don't have the tools necessary to definitively see what's happening, I suspect it's our old bugaboo, memory bandwidth.

GPU programs spend a lot of time transferring data between main memory and video memory. Those transfers don't use much (or any) CPU cycles, but they do consume memory bandwidth. And, of course, LLR -- especially on large numbers -- uses a lot of memory bandwidth. It appears as if running the GPU has a significant effect on LLR performance above and beyond the use of a CPU core.

This has serious ramifications for all of us, specifically:

* Running the GPU will slow down LLR even if you leave a full core free.

* Running the GPU will slow down LLR even if you have hyperthreading enabled to service the GPU.

Of course, your mileage will vary. I suspect the that this effect varies significantly with the size of the number being tested. The GPU effect is probably much more pronounced with SoB than with SGS. How fast your memory is will make a big difference. What type of memory you have and the number of memory channels will probably also have a significant effect.

Is anyone else seeing similar results?
____________
My lucky number is 75898524288+1

Profile PenguinProject donor
Avatar
Send message
Joined: 14 Sep 12
Posts: 595
ID: 172547
Credit: 1,526,854,568
RAC: 1,934,808
Discovered 4 mega primesEliminated 1 conjecture "k"Discovered 1 AP26Found 1 prime in the 2019 Tour de PrimesFound 15 primes in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (11,983,704)Cullen LLR Turquoise: Earned 5,000,000 credits (5,894,611)ESP LLR Jade: Earned 10,000,000 credits (10,390,853)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,894,366)PPS LLR Sapphire: Earned 20,000,000 credits (31,855,224)PSP LLR Jade: Earned 10,000,000 credits (13,705,459)SoB LLR Ruby: Earned 2,000,000 credits (2,517,544)SR5 LLR Jade: Earned 10,000,000 credits (12,866,436)SGS LLR Jade: Earned 10,000,000 credits (11,659,186)TRP LLR Jade: Earned 10,000,000 credits (16,438,861)Woodall LLR Turquoise: Earned 5,000,000 credits (5,152,091)321 Sieve Sapphire: Earned 20,000,000 credits (23,908,712)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,018,833)PPS Sieve Double Silver: Earned 200,000,000 credits (268,874,331)TRP Sieve (suspended) Silver: Earned 100,000 credits (110,630)AP 26/27 Emerald: Earned 50,000,000 credits (90,668,318)GFN Double Silver: Earned 200,000,000 credits (346,887,555)PSA Double Gold: Earned 500,000,000 credits (666,025,256)
Message 121229 - Posted: 19 Oct 2018 | 15:24:43 UTC - in response to Message 121226.

Is anyone else seeing similar results?


Absolutely. Even for other projects using any sort of GPU app will slow down the cpu tasks significantly for any project.

I notice that on my AMD FX 8350 and running a 1060.

Profile PenguinProject donor
Avatar
Send message
Joined: 14 Sep 12
Posts: 595
ID: 172547
Credit: 1,526,854,568
RAC: 1,934,808
Discovered 4 mega primesEliminated 1 conjecture "k"Discovered 1 AP26Found 1 prime in the 2019 Tour de PrimesFound 15 primes in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (11,983,704)Cullen LLR Turquoise: Earned 5,000,000 credits (5,894,611)ESP LLR Jade: Earned 10,000,000 credits (10,390,853)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,894,366)PPS LLR Sapphire: Earned 20,000,000 credits (31,855,224)PSP LLR Jade: Earned 10,000,000 credits (13,705,459)SoB LLR Ruby: Earned 2,000,000 credits (2,517,544)SR5 LLR Jade: Earned 10,000,000 credits (12,866,436)SGS LLR Jade: Earned 10,000,000 credits (11,659,186)TRP LLR Jade: Earned 10,000,000 credits (16,438,861)Woodall LLR Turquoise: Earned 5,000,000 credits (5,152,091)321 Sieve Sapphire: Earned 20,000,000 credits (23,908,712)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,018,833)PPS Sieve Double Silver: Earned 200,000,000 credits (268,874,331)TRP Sieve (suspended) Silver: Earned 100,000 credits (110,630)AP 26/27 Emerald: Earned 50,000,000 credits (90,668,318)GFN Double Silver: Earned 200,000,000 credits (346,887,555)PSA Double Gold: Earned 500,000,000 credits (666,025,256)
Message 121230 - Posted: 19 Oct 2018 | 15:31:57 UTC - in response to Message 121226.

All those calculations have to be done somewhere, and there are a lot of bits moving around in memory, that slows things down

Profile robishProject donor
Volunteer moderator
Avatar
Send message
Joined: 7 Jan 12
Posts: 1683
ID: 126266
Credit: 4,815,925,221
RAC: 4,112,261
Discovered the World's First AP27!!!Discovered 9 mega primesDiscovered 1 AP272018 Tour de Primes largest primeFound 4 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (9,674,445)Cullen LLR Sapphire: Earned 20,000,000 credits (24,539,571)ESP LLR Turquoise: Earned 5,000,000 credits (6,404,450)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (7,368,895)PPS LLR Emerald: Earned 50,000,000 credits (52,269,927)PSP LLR Turquoise: Earned 5,000,000 credits (7,758,266)SoB LLR Sapphire: Earned 20,000,000 credits (36,772,233)SR5 LLR Turquoise: Earned 5,000,000 credits (8,515,786)SGS LLR Turquoise: Earned 5,000,000 credits (5,225,160)TRP LLR Sapphire: Earned 20,000,000 credits (28,497,135)Woodall LLR Turquoise: Earned 5,000,000 credits (5,062,771)321 Sieve Turquoise: Earned 5,000,000 credits (7,141,753)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (7,892,369)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,515,338)PPS Sieve Double Gold: Earned 500,000,000 credits (841,182,485)TRP Sieve (suspended) Silver: Earned 100,000 credits (121,416)AP 26/27 Emerald: Earned 50,000,000 credits (88,206,131)GFN Double Ruby: Earned 2,000,000,000 credits (3,674,096,887)
Message 121231 - Posted: 19 Oct 2018 | 15:34:11 UTC

I've noticed it too Michael.

I don't have metrics but I always thought it was just me doing something wrong.

However it doesn't seem to matter. All PC's seem to average 50% firsts regardless. :)
____________
My lucky numbers 10590941048576+1 and 224584605939537911+81292139*23#*n for n=0..26

Profile Rafael
Volunteer tester
Avatar
Send message
Joined: 22 Oct 14
Posts: 870
ID: 370496
Credit: 324,597,363
RAC: 147,209
321 LLR Turquoise: Earned 5,000,000 credits (7,065,511)Cullen LLR Turquoise: Earned 5,000,000 credits (7,013,155)ESP LLR Turquoise: Earned 5,000,000 credits (7,130,193)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (7,138,464)PPS LLR Turquoise: Earned 5,000,000 credits (7,700,939)PSP LLR Turquoise: Earned 5,000,000 credits (6,995,135)SoB LLR Turquoise: Earned 5,000,000 credits (6,868,921)SR5 LLR Turquoise: Earned 5,000,000 credits (7,024,046)SGS LLR Turquoise: Earned 5,000,000 credits (7,141,730)TRP LLR Turquoise: Earned 5,000,000 credits (8,751,781)Woodall LLR Turquoise: Earned 5,000,000 credits (7,119,125)321 Sieve Jade: Earned 10,000,000 credits (10,033,828)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (10,037,204)PPS Sieve Jade: Earned 10,000,000 credits (10,305,147)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,000,053)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,030,160)AP 26/27 Jade: Earned 10,000,000 credits (10,253,048)GFN Sapphire: Earned 20,000,000 credits (29,230,861)PSA Double Bronze: Earned 100,000,000 credits (170,758,062)
Message 121232 - Posted: 19 Oct 2018 | 15:34:56 UTC - in response to Message 121226.

Is anyone else seeing similar results?


i5 4590 @3.5ghz with dual channel, dual rank, 2133mhz RAM. I can confirm a big timeloss:

Idle
C:\Users\Rafale\Desktop\Arquivos\PrimeGrid\LLR>cllr64.exe -t4 -d -q"16944075*2^16944075+1"
Starting Proth prime test of 16944075*2^16944075+1
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 4 threads, a = 11
16944075*2^16944075+1, bit: 20000 / 16944099 [0.11%]. Time per bit: 2.796 ms.
Caught signal. Terminating.

C:\Users\Rafale\Desktop\Arquivos\PrimeGrid\LLR>cllr64.exe -t3 -d -q"16944075*2^16944075+1"
Resuming Proth prime test of 16944075*2^16944075+1 at bit 20929 [0.12%]
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 3 threads, a = 11
16944075*2^16944075+1, bit: 40000 / 16944099 [0.23%]. Time per bit: 3.282 ms.
Caught signal. Terminating.


GPU running
C:\Users\Rafale\Desktop\Arquivos\PrimeGrid\LLR>cllr64.exe -t4 -d -q"16944075*2^16944075+1"
Resuming Proth prime test of 16944075*2^16944075+1 at bit 81222 [0.47%]
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 4 threads, a = 11
16944075*2^16944075+1, bit: 100000 / 16944099 [0.59%]. Time per bit: 3.072 ms.
Caught signal. Terminating.

C:\Users\Rafale\Desktop\Arquivos\PrimeGrid\LLR>cllr64.exe -t3 -d -q"16944075*2^16944075+1"
Resuming Proth prime test of 16944075*2^16944075+1 at bit 101305 [0.59%]
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 3 threads, a = 11
16944075*2^16944075+1, bit: 120000 / 16944099 [0.70%]. Time per bit: 3.535 ms.
Caught signal. Terminating.


Special attention to GPU runnning times. My t3 is a tad slower, but that's expected for a lower clocked CPU, especially if your 4670k is overclocked. However, look at the t4: it's considerably faster. Assuming you're running slower RAM (2133 is pretty good for DDR3), this could easily be the cause; assuming the speed is similar, this could be the result of a fluke in the test.


On the flipside, N=21 Manual sieving seems to be unaffect for the most part.
C:\Users\Rafale\Desktop\Arquivos\PrimeGrid\LLR>cllr64.exe -t4 -d -q"16944075*2^16944075+1"
Resuming Proth prime test of 16944075*2^16944075+1 at bit 40736 [0.24%]
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 4 threads, a = 11
16944075*2^16944075+1, bit: 60000 / 16944099 [0.35%]. Time per bit: 2.880 ms.
Caught signal. Terminating.

C:\Users\Rafale\Desktop\Arquivos\PrimeGrid\LLR>cllr64.exe -t3 -d -q"16944075*2^16944075+1"
Resuming Proth prime test of 16944075*2^16944075+1 at bit 60594 [0.35%]
Using zero-padded FMA3 FFT length 1792K, Pass1=448, Pass2=4K, 3 threads, a = 11
16944075*2^16944075+1, bit: 80000 / 16944099 [0.47%]. Time per bit: 3.313 ms.
Caught signal. Terminating.




Yves Gallot
Volunteer developer
Project scientist
Send message
Joined: 19 Aug 12
Posts: 582
ID: 164101
Credit: 304,815,144
RAC: 1,809
GFN Double Silver: Earned 200,000,000 credits (304,815,144)
Message 121235 - Posted: 19 Oct 2018 | 17:19:15 UTC - in response to Message 121226.

Is anyone else seeing similar results?

On a 8-core Ryzen 7 1800X (HT off) with GFN21 -nt 7, if no other application is running computation time is about 99,000 seconds but if geneferocl (GFN20) is running computation time is about 117,000 sec.

While I don't have the tools necessary to definitively see what's happening, I suspect it's our old bugaboo, memory bandwidth.

I suspect the task scheduling.

When I tried to bench the multithreaded version of genefer, I noticed some abnormal results. In the end I realised that no other application should run during the benchmark. Another application could be Firefox (with just some simple web pages opened, no video) or Visual Studio (just "idle", not compiling).

Multithreaded applications are highly sensitive to the CPU load. The problem is that threads must be synchronised at a high speed rate (about every ms). If one thread is delayed, all threads have to wait.

I noticed that if 7 threads are running on 8 cores, the load of each core is about 7/8 (in fact less than 7/8). We could expect to have 7 cores loaded at 100% and 1 core idle, but threads are moving. Affinity doesn't improve this because we can't set the affinity of all other applications to the 8th core.

Note that if it is memory bandwidth, it should be true if a single thread is running... I didn't notice that behaviour with a single threaded app (or multiple single threaded apps running concurrently).
I think that Linux performs better than Windows, another indication that it could be the task scheduling.

GPU programs spend a lot of time transferring data between main memory and video memory. Those transfers don't use much (or any) CPU cycles, but they do consume memory bandwidth.

Few data are transferred, all data are loaded once on GPU memory. But a lot of system/driver functions are called per second. I don't know if a large stack is created by the driver and if it consumes memory bandwidth. It may be very different with an AMD GPU.

Note that the CPU load of geneferocl is the load of the application but not the load of the driver. Nvidia driver is itself a hidden multithreaded application!
With a single threaded 3D/OpenGL app, the task manager shows that one core is using 100% but the full load of the CPU may be equivalent to 100% of 2 or 3 cores because of the driver threads.

Profile Michael GoetzProject donor
Volunteer moderator
Project administrator
Avatar
Send message
Joined: 21 Jan 10
Posts: 13396
ID: 53948
Credit: 229,001,509
RAC: 166,186
The "Shut up already!" badge:  This loud mouth has mansplained on the forums over 10 thousand times!  Sheesh!!!Discovered the World's First GFN-19 prime!!!Discovered 1 mega primeFound 1 prime in the 2018 Tour de PrimesFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Ruby: Earned 2,000,000 credits (2,822,730)Cullen LLR Ruby: Earned 2,000,000 credits (2,005,249)ESP LLR Turquoise: Earned 5,000,000 credits (5,009,577)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,145,754)PPS LLR Jade: Earned 10,000,000 credits (15,524,174)PSP LLR Turquoise: Earned 5,000,000 credits (5,197,957)SoB LLR Sapphire: Earned 20,000,000 credits (34,291,181)SR5 LLR Jade: Earned 10,000,000 credits (10,007,110)SGS LLR Ruby: Earned 2,000,000 credits (2,276,330)TRP LLR Turquoise: Earned 5,000,000 credits (5,084,329)Woodall LLR Ruby: Earned 2,000,000 credits (2,195,123)321 Sieve Jade: Earned 10,000,000 credits (10,061,196)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,059,304)PPS Sieve Sapphire: Earned 20,000,000 credits (22,885,121)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Jade: Earned 10,000,000 credits (10,118,303)GFN Emerald: Earned 50,000,000 credits (74,618,986)PSA Jade: Earned 10,000,000 credits (12,445,029)
Message 121236 - Posted: 19 Oct 2018 | 17:52:21 UTC - in response to Message 121235.

I suspect the task scheduling....


You may be right. It does make sense.

Later I think I'll do some tests with single threaded LLR and see how it behaves.
____________
My lucky number is 75898524288+1

KeithProject donor
Volunteer tester
Avatar
Send message
Joined: 8 Dec 13
Posts: 430
ID: 284516
Credit: 400,769,714
RAC: 115,487
Discovered 1 mega primeFound 2 primes in the 2018 Tour de PrimesFound 3 primes in the 2019 Tour de PrimesFound 2 primes in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (10,001,743)Cullen LLR Turquoise: Earned 5,000,000 credits (5,064,934)ESP LLR Turquoise: Earned 5,000,000 credits (5,005,551)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,010,594)PPS LLR Jade: Earned 10,000,000 credits (11,537,156)PSP LLR Turquoise: Earned 5,000,000 credits (5,032,097)SoB LLR Jade: Earned 10,000,000 credits (10,042,754)SR5 LLR Jade: Earned 10,000,000 credits (10,002,093)SGS LLR Jade: Earned 10,000,000 credits (10,000,020)TRP LLR Jade: Earned 10,000,000 credits (10,002,149)Woodall LLR Turquoise: Earned 5,000,000 credits (7,338,631)321 Sieve Jade: Earned 10,000,000 credits (10,120,432)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,004,494)PPS Sieve Double Bronze: Earned 100,000,000 credits (107,524,787)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,002,980)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,010,755)AP 26/27 Emerald: Earned 50,000,000 credits (82,586,361)GFN Emerald: Earned 50,000,000 credits (99,038,533)PSA Silver: Earned 100,000 credits (443,652)
Message 121237 - Posted: 19 Oct 2018 | 18:02:38 UTC

What happens when the opposite is tested?
Is the time a GFN GPU task takes affected by whether an LLR task is running on the CPU or not?

In testing with AP27 I found that the AP tasks run faster when a full core is left clear for a higher end card and HT threads were good enough to keep a low end card happy compared to running without HT and all cores busy with LLR.
But my testing had a lot of moving parts so I'm not sure how accurate my assumption here is.
However, I didn't check to see if CPU tasks were faster without any AP units running. I might have to check that during the upcoming challenge.
____________
My Primes
Badge Score: 2*1 + 4*2 + 6*5 + 7*8 + 9*2 + 10*1 = 124

Profile composite
Volunteer tester
Send message
Joined: 16 Feb 10
Posts: 764
ID: 55391
Credit: 688,015,200
RAC: 81,291
Discovered 2 mega primesFound 1 prime in the 2018 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (5,477,467)Cullen LLR Gold: Earned 500,000 credits (776,297)ESP LLR Ruby: Earned 2,000,000 credits (3,120,351)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,056,207)PPS LLR Jade: Earned 10,000,000 credits (18,115,510)PSP LLR Turquoise: Earned 5,000,000 credits (5,027,818)SoB LLR Sapphire: Earned 20,000,000 credits (24,671,131)SR5 LLR Turquoise: Earned 5,000,000 credits (6,110,877)SGS LLR Ruby: Earned 2,000,000 credits (3,477,744)TRP LLR Turquoise: Earned 5,000,000 credits (7,025,303)Woodall LLR Amethyst: Earned 1,000,000 credits (1,693,614)321 Sieve Emerald: Earned 50,000,000 credits (50,256,050)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,571,178)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (50,009,610)PPS Sieve Double Silver: Earned 200,000,000 credits (312,256,351)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Jade: Earned 10,000,000 credits (10,165,888)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,071,454)AP 26/27 Turquoise: Earned 5,000,000 credits (6,616,128)GFN Emerald: Earned 50,000,000 credits (52,755,348)PSA Double Bronze: Earned 100,000,000 credits (102,762,384)
Message 121255 - Posted: 20 Oct 2018 | 6:44:45 UTC - in response to Message 121235.

[Affinity doesn't improve this because we can't set the affinity of all other applications to the 8th core.

Yes, you can set affinity for all other applications, but it's a lot of manual work. One of these years I need to figure out how to set up and use kernel cgroups to do that automatically.

HonzaProject donor
Volunteer moderator
Volunteer tester
Project scientist
Send message
Joined: 15 Aug 05
Posts: 1884
ID: 352
Credit: 3,036,090,600
RAC: 1,069,146
Discovered 7 mega primesEliminated 4 conjecture "k"sFound 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2018 Tour de Primes Mountain Stage2019 Tour de Primes largest primeFound 4 primes in the 2019 Tour de PrimesFound 1 mega prime in the 2019 Tour de PrimesFound 1 prime in the 2019 Tour de Primes Mountain StageFound 1 prime in the 2020 Tour de Primes321 LLR Emerald: Earned 50,000,000 credits (75,896,912)Cullen LLR Emerald: Earned 50,000,000 credits (50,296,190)ESP LLR Emerald: Earned 50,000,000 credits (50,853,190)Generalized Cullen/Woodall LLR Emerald: Earned 50,000,000 credits (50,309,119)PPS LLR Double Bronze: Earned 100,000,000 credits (122,464,020)PSP LLR Emerald: Earned 50,000,000 credits (50,563,867)SoB LLR Double Bronze: Earned 100,000,000 credits (101,502,109)SR5 LLR Double Bronze: Earned 100,000,000 credits (101,883,529)SGS LLR Emerald: Earned 50,000,000 credits (53,154,727)TPS LLR (retired) Bronze: Earned 10,000 credits (43,033)TRP LLR Double Bronze: Earned 100,000,000 credits (100,338,924)Woodall LLR Emerald: Earned 50,000,000 credits (50,956,094)321 Sieve Double Bronze: Earned 100,000,000 credits (115,948,450)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,142,109)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (50,504,945)PPS Sieve Double Gold: Earned 500,000,000 credits (513,057,580)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,288,222)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,149,354)AP 26/27 Double Silver: Earned 200,000,000 credits (232,724,773)GFN Double Gold: Earned 500,000,000 credits (735,792,881)PSA Double Gold: Earned 500,000,000 credits (535,232,795)
Message 121264 - Posted: 20 Oct 2018 | 21:08:40 UTC

I bit more of mystery.

Today, I got my RTX 2080 replaced. Same model.
Before I got replaced, I tryed in different computer - i5-4670 / DDR3.
http://www.primegrid.com/result.php?resultid=941152973
I took ~240 sec on GFN17Mega, 4 tasks, all validated.

The replacement is run on my i7 8700K.
http://www.primegrid.com/result.php?resultid=941209060
I takes ~370 sec on GFN17Mega.

Same GPU model, same GPU clock (underclocked to 1800MHz), same driver, same OS version, negligable CPU time and what have you.
But it takes considerably longer on faster computer, even when 5 cores are idle.

For PPS Sieve and AP27, run times are about the same.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186

Yves Gallot
Volunteer developer
Project scientist
Send message
Joined: 19 Aug 12
Posts: 582
ID: 164101
Credit: 304,815,144
RAC: 1,809
GFN Double Silver: Earned 200,000,000 credits (304,815,144)
Message 121272 - Posted: 21 Oct 2018 | 11:00:16 UTC - in response to Message 121264.

I bit more of mystery.
[...]
Same GPU model, same GPU clock (underclocked to 1800MHz), same driver, same OS version, negligable CPU time and what have you.
But it takes considerably longer on faster computer, even when 5 cores are idle.

This time I suspect the power management and C-states.
It was 'improved" in the latest processors. The core C1/C1E states may be a problem. It takes times for the processor to wake up from these states.

Did you set the "high performance" power plan (Control Panel / Hardware and Sound / Power Options)? I think that this helps. (?)

Power management is a brainteaser for fast computations.
I think that it's no longer possible to estimate the speed of a program with a benchmark, a full test is required.

Below is an example of what new processors can do:

Agner Fog, An optimization guide for assembly programmers and compiler makers, 3. The microarchitecture of Intel, AMD and VIA CPUs, 11 Skylake
Warm-up period for YMM and ZMM vector instructions
The processor turns off the upper parts of the vector execution units when it is not used, in order to save power. Instructions with 256-bit vectors have a throughput that is approximately 4.5 times slower than normal during an initial warm-up period of approximately 56,000 clock cycles or 14 μs. A sequence of code containing 256-bit vector operations will run at full speed after this warm-up period. The processor returns to the mode of slow 256-bit execution 2.7 million clock cycles, or 675 μs, after the last 256-bit instruction (These times were measured on a 4 GHz processor). Similar times apply to 512-bit vectors.

HonzaProject donor
Volunteer moderator
Volunteer tester
Project scientist
Send message
Joined: 15 Aug 05
Posts: 1884
ID: 352
Credit: 3,036,090,600
RAC: 1,069,146
Discovered 7 mega primesEliminated 4 conjecture "k"sFound 2 primes in the 2018 Tour de PrimesFound 1 prime in the 2018 Tour de Primes Mountain Stage2019 Tour de Primes largest primeFound 4 primes in the 2019 Tour de PrimesFound 1 mega prime in the 2019 Tour de PrimesFound 1 prime in the 2019 Tour de Primes Mountain StageFound 1 prime in the 2020 Tour de Primes321 LLR Emerald: Earned 50,000,000 credits (75,896,912)Cullen LLR Emerald: Earned 50,000,000 credits (50,296,190)ESP LLR Emerald: Earned 50,000,000 credits (50,853,190)Generalized Cullen/Woodall LLR Emerald: Earned 50,000,000 credits (50,309,119)PPS LLR Double Bronze: Earned 100,000,000 credits (122,464,020)PSP LLR Emerald: Earned 50,000,000 credits (50,563,867)SoB LLR Double Bronze: Earned 100,000,000 credits (101,502,109)SR5 LLR Double Bronze: Earned 100,000,000 credits (101,883,529)SGS LLR Emerald: Earned 50,000,000 credits (53,154,727)TPS LLR (retired) Bronze: Earned 10,000 credits (43,033)TRP LLR Double Bronze: Earned 100,000,000 credits (100,338,924)Woodall LLR Emerald: Earned 50,000,000 credits (50,956,094)321 Sieve Double Bronze: Earned 100,000,000 credits (115,948,450)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,142,109)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (50,504,945)PPS Sieve Double Gold: Earned 500,000,000 credits (513,057,580)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,288,222)TRP Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,149,354)AP 26/27 Double Silver: Earned 200,000,000 credits (232,724,773)GFN Double Gold: Earned 500,000,000 credits (735,792,881)PSA Double Gold: Earned 500,000,000 credits (535,232,795)
Message 121273 - Posted: 21 Oct 2018 | 11:35:30 UTC

Interesting stuff, Yves.

i7 8700K runs at 4,3GHz with reasonable temperatures (no thermal throttling, 30+ C to TjMAX limit).

Looking back, 240 secs for GFN17Mega on RTX2080 is anomaly, times around 360-400 secs range is what to be expected.

If the card was faulty and not stable, it did great job on this one while it lasted.

Hopefully the replacement will be rock stable and so far is doing great job...
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186

Profile MyrskylyhtyProject donor
Avatar
Send message
Joined: 27 Jan 18
Posts: 110
ID: 972376
Credit: 438,055,885
RAC: 397,537
Discovered 1 mega primeFound 1 prime in the 2019 Tour de PrimesFound 1 prime in the 2020 Tour de Primes321 LLR Emerald: Earned 50,000,000 credits (71,923,307)Cullen LLR Gold: Earned 500,000 credits (533,200)ESP LLR Gold: Earned 500,000 credits (808,848)Generalized Cullen/Woodall LLR Silver: Earned 100,000 credits (472,570)PPS LLR Jade: Earned 10,000,000 credits (10,182,928)PSP LLR Amethyst: Earned 1,000,000 credits (1,379,782)SoB LLR Ruby: Earned 2,000,000 credits (2,269,134)SR5 LLR Amethyst: Earned 1,000,000 credits (1,583,505)SGS LLR Silver: Earned 100,000 credits (128,562)TRP LLR Amethyst: Earned 1,000,000 credits (1,126,846)Woodall LLR Silver: Earned 100,000 credits (380,849)321 Sieve Gold: Earned 500,000 credits (573,292)Generalized Cullen/Woodall Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,069,231)PPS Sieve Emerald: Earned 50,000,000 credits (61,446,588)AP 26/27 Ruby: Earned 2,000,000 credits (2,494,531)GFN Double Bronze: Earned 100,000,000 credits (196,631,652)PSA Emerald: Earned 50,000,000 credits (85,075,515)
Message 122246 - Posted: 8 Nov 2018 | 13:00:54 UTC

Does setting a higher priority in the task manager help?

Daniel FurrProject donor
Send message
Joined: 20 Oct 17
Posts: 6
ID: 937944
Credit: 354,424,954
RAC: 0
Found 6 primes in the 2019 Tour de Primes321 LLR Amethyst: Earned 1,000,000 credits (1,018,868)Cullen LLR Amethyst: Earned 1,000,000 credits (1,288,096)ESP LLR Amethyst: Earned 1,000,000 credits (1,117,144)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,409,469)PPS LLR Ruby: Earned 2,000,000 credits (2,315,528)PSP LLR Amethyst: Earned 1,000,000 credits (1,003,520)SoB LLR Ruby: Earned 2,000,000 credits (2,817,928)SR5 LLR Ruby: Earned 2,000,000 credits (2,783,282)SGS LLR Amethyst: Earned 1,000,000 credits (1,037,592)TRP LLR Ruby: Earned 2,000,000 credits (2,699,366)Woodall LLR Ruby: Earned 2,000,000 credits (2,064,880)321 Sieve Turquoise: Earned 5,000,000 credits (5,509,701)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,037,954)PPS Sieve Double Bronze: Earned 100,000,000 credits (100,445,687)AP 26/27 Turquoise: Earned 5,000,000 credits (7,693,829)GFN Double Silver: Earned 200,000,000 credits (201,182,111)
Message 124134 - Posted: 2 Jan 2019 | 19:15:18 UTC - in response to Message 121237.
Last modified: 2 Jan 2019 | 19:21:32 UTC

What happens when the opposite is tested?
Is the time a GFN GPU task takes affected by whether an LLR task is running on the CPU or not?


I've been putting off posting about this until I saw this thread. I have the exact reverse problem as Michael. Running LLR tasks multi-threaded slows down the GPU tasks. -t6 slows it down the most and I can see a performance boost in the GPU tasks with every core I remove from LLR tasks.

Machine is i7-8700k, 2080ti, Windows 10 pro updated daily. Temps well within range, high performance power mode (never sleep).

I have tried every driver for the GPU, updated chipset software, BIOS everything I can think of. I saw somewhere else in this thread someone else with a 2080 has a similar issue to mine. It's extremely frustrating especially considering the upcoming SOB challenge.

One thing I did notice, running SR5 tasks in LLR does not seem to affect GPU performance oddly enough, but SoB tasks, GCW, Cullen all do and I assume all other LLR tasks as well. I've tried many different things but have just gone back to doing GCW sieves until I can find a fix.

I also wanted to add that this was a new problem with the new GPU. I never had this problem with my 1080ti card. In fact, I have my 1080ti and a 1070ti in another machine and it runs LLR tasks fine. CPU is i7-3930k, Win10 Home.

xii5ku
Send message
Joined: 17 Dec 16
Posts: 64
ID: 476505
Credit: 576,169,134
RAC: 406,407
321 LLR Jade: Earned 10,000,000 credits (16,776,661)Cullen LLR Turquoise: Earned 5,000,000 credits (5,348,345)ESP LLR Turquoise: Earned 5,000,000 credits (9,069,027)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (8,844,412)PPS LLR Turquoise: Earned 5,000,000 credits (9,407,255)PSP LLR Jade: Earned 10,000,000 credits (11,359,246)SoB LLR Sapphire: Earned 20,000,000 credits (29,894,368)SR5 LLR Turquoise: Earned 5,000,000 credits (7,426,546)SGS LLR Ruby: Earned 2,000,000 credits (4,535,655)TRP LLR Jade: Earned 10,000,000 credits (18,341,209)Woodall LLR Turquoise: Earned 5,000,000 credits (7,154,312)321 Sieve Sapphire: Earned 20,000,000 credits (26,331,175)Generalized Cullen/Woodall Sieve (suspended) Jade: Earned 10,000,000 credits (11,205,264)PPS Sieve Double Silver: Earned 200,000,000 credits (329,019,713)AP 26/27 Sapphire: Earned 20,000,000 credits (25,212,148)GFN Emerald: Earned 50,000,000 credits (56,243,798)
Message 124186 - Posted: 3 Jan 2019 | 13:28:48 UTC - in response to Message 124134.

Daniel Furr wrote:
I also wanted to add that this was a new problem with the new GPU. I never had this problem with my 1080ti card.

In my and others' experience, various GPU based distributed computing applications are slowed down by concurrent RAM and cache heavy applications such as LLR, on pretty much every GPU. (The faster the GPU, the more it may be noticeable, perhaps.)

--------
Michael Goetz wrote:
FFT length 1792K
[...]
* Running the GPU will slow down LLR even if you leave a full core free.
* Running the GPU will slow down LLR even if you have hyperthreading enabled to service the GPU.

That's because the processor's execution units are not the bottleneck here. Cache and RAM performance are. Haswell's wide execution units are underutilized to begin with.

KeithProject donor
Volunteer tester
Avatar
Send message
Joined: 8 Dec 13
Posts: 430
ID: 284516
Credit: 400,769,714
RAC: 115,487
Discovered 1 mega primeFound 2 primes in the 2018 Tour de PrimesFound 3 primes in the 2019 Tour de PrimesFound 2 primes in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (10,001,743)Cullen LLR Turquoise: Earned 5,000,000 credits (5,064,934)ESP LLR Turquoise: Earned 5,000,000 credits (5,005,551)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (10,010,594)PPS LLR Jade: Earned 10,000,000 credits (11,537,156)PSP LLR Turquoise: Earned 5,000,000 credits (5,032,097)SoB LLR Jade: Earned 10,000,000 credits (10,042,754)SR5 LLR Jade: Earned 10,000,000 credits (10,002,093)SGS LLR Jade: Earned 10,000,000 credits (10,000,020)TRP LLR Jade: Earned 10,000,000 credits (10,002,149)Woodall LLR Turquoise: Earned 5,000,000 credits (7,338,631)321 Sieve Jade: Earned 10,000,000 credits (10,120,432)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,004,494)PPS Sieve Double Bronze: Earned 100,000,000 credits (107,524,787)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,002,980)TRP Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,010,755)AP 26/27 Emerald: Earned 50,000,000 credits (82,586,361)GFN Emerald: Earned 50,000,000 credits (99,038,533)PSA Silver: Earned 100,000 credits (443,652)
Message 124214 - Posted: 3 Jan 2019 | 18:35:37 UTC - in response to Message 124134.

What happens when the opposite is tested?
Is the time a GFN GPU task takes affected by whether an LLR task is running on the CPU or not?


I've been putting off posting about this until I saw this thread. I have the exact reverse problem as Michael. Running LLR tasks multi-threaded slows down the GPU tasks. -t6 slows it down the most and I can see a performance boost in the GPU tasks with every core I remove from LLR tasks.

Machine is i7-8700k, 2080ti, Windows 10 pro updated daily. Temps well within range, high performance power mode (never sleep).

I have tried every driver for the GPU, updated chipset software, BIOS everything I can think of. I saw somewhere else in this thread someone else with a 2080 has a similar issue to mine. It's extremely frustrating especially considering the upcoming SOB challenge.

One thing I did notice, running SR5 tasks in LLR does not seem to affect GPU performance oddly enough, but SoB tasks, GCW, Cullen all do and I assume all other LLR tasks as well. I've tried many different things but have just gone back to doing GCW sieves until I can find a fix.

I also wanted to add that this was a new problem with the new GPU. I never had this problem with my 1080ti card. In fact, I have my 1080ti and a 1070ti in another machine and it runs LLR tasks fine. CPU is i7-3930k, Win10 Home.


I posed the original question and now I seem to have my own answer about the behaviour.
Running SGS (5 tasks, one per core) + PPS-SV (one task) on my i5-8400 (DDR4-3000) + 1070ti gave me PPS-SV run times of around 250 seconds.
Running SoB (one task on 5 cores) + PPS-SV gives me 450 seconds on PPS-SV.

Clearly running SoB doesn't allow for the rest of the system to keep the GPU fed.

Similar, but less drastic, behaviour is noted on my i7-4790 (DDR3-1600) + 1050ti where PPS-SV dropped from 850 seconds to 950 seconds when switching from SGS (4 tasks, one core each) to Sob (one task on 4 cores).

So I would agree with xii5ku's assessment of what's going on.
____________
My Primes
Badge Score: 2*1 + 4*2 + 6*5 + 7*8 + 9*2 + 10*1 = 124

Daniel FurrProject donor
Send message
Joined: 20 Oct 17
Posts: 6
ID: 937944
Credit: 354,424,954
RAC: 0
Found 6 primes in the 2019 Tour de Primes321 LLR Amethyst: Earned 1,000,000 credits (1,018,868)Cullen LLR Amethyst: Earned 1,000,000 credits (1,288,096)ESP LLR Amethyst: Earned 1,000,000 credits (1,117,144)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,409,469)PPS LLR Ruby: Earned 2,000,000 credits (2,315,528)PSP LLR Amethyst: Earned 1,000,000 credits (1,003,520)SoB LLR Ruby: Earned 2,000,000 credits (2,817,928)SR5 LLR Ruby: Earned 2,000,000 credits (2,783,282)SGS LLR Amethyst: Earned 1,000,000 credits (1,037,592)TRP LLR Ruby: Earned 2,000,000 credits (2,699,366)Woodall LLR Ruby: Earned 2,000,000 credits (2,064,880)321 Sieve Turquoise: Earned 5,000,000 credits (5,509,701)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,037,954)PPS Sieve Double Bronze: Earned 100,000,000 credits (100,445,687)AP 26/27 Turquoise: Earned 5,000,000 credits (7,693,829)GFN Double Silver: Earned 200,000,000 credits (201,182,111)
Message 124266 - Posted: 4 Jan 2019 | 3:01:47 UTC - in response to Message 124186.

In my and others' experience, various GPU based distributed computing applications slowed down by concurrent RAM and cache heavy applications such as LLR, on pretty much every GPU. (The faster the GPU, the more it may be noticeable, perhaps.)


Thank you for this information. This has been bothering me since I got the new GPU. Its so frustrating. So would getting a CPU with more cache help? Or getting higher performance RAM? Or both probably?

Message boards : Number crunching : Running GPU slows down CPU a lot

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 0.40, 0.50, 0.53
Generated 28 Oct 2020 | 17:26:20 UTC