Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
GFN 4.01 Ryzen all cores: weird segmentation violation
Author |
Message |
|
https://www.primegrid.com/result.php?resultid=1411978492
Bonjour Yves and fellow code warriors,
Could you please take a closer look at what caused this weird segmentation violation ?
First GFN-21 v4.01 using 12 cores on 2 CCDs with 32MB L3 cache each (Ryzen 9 5900x).
[edit] Segmentation violation after three hours of run (expected total duration: 20 hours).
Threads are pinned from 0 to 5 to CCD #1 and from 6 to 11 to CCD #2 [end of edit].
Reported data size is 40.5 MB : what does it mean ? Is it the data per thread, per task ?
Does the entire data has to fit in each L3 cache ? If the cache spills out, data should be ejected ?
You're welcome if you need a more precise diagnostic (rerun with other (debugging) options).
Sincerely, Pascal.
Stderr output
<core_client_version>7.18.1</core_client_version>
<
|
Bonjour Pascal,
I don't like a weird issue especially on Sunday...
Many GFN-21 were tested on Zen 3/4 but they were all tested on Windows or macOs. I can't find a test on Linux.
The size (40.5 MB) is the data size of the task then your settings (one task and 2 x 6 threads per CCD) make sense (note that on Ryzen 9 5950X, the throughput is better with two 8-thread tasks but the threads were not pinned during the 16-thread test).
The memory is allocated once (a single memory block) there is no memory allocation during the loop, except when a file is written. Then why after 3 hours?
A new requirement for each GFN-21 task is 1.2 GB of disk space (2.5 GB for GFN-22 and DYFL). 128 temporary "checkpoints" a written to the disk and the proof is built from them at the end of the computation. The segmentation violation may have occured during a checkpoint.
I see that another task is in progress. If the error is still there, it would be interesting to check the same settings with a smaller GFN and also to not pin threads. Note that on Windows, if the number of threads is 12 then 15 threads are created. There are some other threads, maybe for I/O or to control the 12 computing threads. | |
|
|
Yves,
For the first instance, threads were pinned after one hour,
not at the beginning (first minute) but caches should be
managed in accordance with the distribution of threads ...
Genefer doesn't like untimely externally driven thread migration ?
This isn't really a problem but a fact to be aware of. I'll test this.
Under Linux, 13 child threads are created, the first child is
probably the one dedicated to housekeeping and checkpoints
since it's elapsed run time counts in seconds. It was pinned
to core 0 for the first instance, to any core for the second.
Indeed another instance is in progress: 33% done in 6h16m.
It should end on Monday at 0 UTC. Feedback on monday ...
Thank you for your fast feedback, have a nice sunday !
Pascal. | |
|
|
https://www.primegrid.com/result.php?resultid=1412400625
Outcome: 100% completed after 19 hours but computation error.
Next step: tests on GFN-18
Name genefer21_54597860_0
Workunit 836692659
Created 24 Nov 2022 | 12:50:06 UTC
Sent 27 Nov 2022 | 5:58:07 UTC
Received 28 Nov 2022 | 4:16:16 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x00000001) Unknown error code
Computer ID 1167280
Report deadline 19 Dec 2022 | 6:58:07 UTC
Run time 68,617.57
CPU time 807,238.50
Validate state Invalid
Credit 0.00
Application version: Genefer 21 v4.01 (cpuGFN21_mt)
Stderr output
<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
genefer22 version 22.11.4 (linux x64, gcc-11.3.0, boinc-7.20.2)
Copyright (c) 2022, Yves Gallot
genefer22 is free source code, under the MIT license.
Command line: '-boinc -p -n 21 -b 847374 -f gproof --nthreads 12'
Using fma implementation, 12 thread(s), data size: 40.5 MB.
Error: value is zero.
02:01:45 (60439): called boinc_finish(1)
</stderr_txt>
]]> | |
|
|
https://www.primegrid.com/result.php?resultid=1413918645
https://www.primegrid.com/result.php?resultid=1413912825
https://www.primegrid.com/result.php?resultid=1413910547
https://www.primegrid.com/result.php?resultid=1413924282
1/ No compute errors with thread pinning disabled.
2/ When thread pinning is enabled, collecting the results fails, then no file.
(460G available on disk). Moreover, this doesn't happened for all instances.
Really weird, n'est ce pas ?! .. To be continued ...
| |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 820 ID: 164101 Credit: 305,989,513 RAC: 2,326

|
One hypothesis is that OpenMP already applies an affinity mask. The default places are optimized per CCD. Then if affinity is set, some threads are not executed.
Affinity needs to be controlled with OpenMP environment variables. | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3233 ID: 50683 Credit: 151,443,349 RAC: 73,965
                         
|
Maybe it is not related, but on my 3700X I have trouble in Prime95 when more core "then needed" is apply to worker ( I take 4 cores for 512K results).
So in you case try to lower number of threads for GFN16 you dont need 12 threads you can done it with one thread
And second I always disable HT: so I always use only real cores ( doesnot know how muchHT cores" help in case of Genefer, and in nay case CPU is much cooler :)
Command line: '-boinc -p -n 16 -b 175662450 -f gproof --nthreads 12'
Using fma implementation, 12 thread(s), data size: 1.32 MB.
Warning: b > 160,000,000: the test may fail.
And like you that was under Linux
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 820 ID: 164101 Credit: 305,989,513 RAC: 2,326

|
So in you case try to lower number of threads for GFN16 you dont need 12 threads you can done it with one thread
You didn't read the thread No compute errors with thread pinning disabled.
12 threads and GFN-16 is fine https://www.primegrid.com/result.php?resultid=1413939660 but I doesn't work when affinity is set.
GFN-16 is just a quick test because GFN-21 takes too long. HT is disabled, Ryzen 9 5900X is a 12-core processor. | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3233 ID: 50683 Credit: 151,443,349 RAC: 73,965
                         
|
So in you case try to lower number of threads for GFN16 you dont need 12 threads you can done it with one thread
You didn't read the thread No compute errors with thread pinning disabled.
12 threads and GFN-16 is fine https://www.primegrid.com/result.php?resultid=1413939660 but I doesn't work when affinity is set.
GFN-16 is just a quick test because GFN-21 takes too long. HT is disabled, Ryzen 9 5900X is a 12-core processor.
Mea culpa :(
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
|
Pascaltec wrote: https://www.primegrid.com/result.php?resultid=1413918645
https://www.primegrid.com/result.php?resultid=1413912825
https://www.primegrid.com/result.php?resultid=1413910547
https://www.primegrid.com/result.php?resultid=1413924282
These don't show a segmentation violation. Instead, "validation failed". (From what I understand, this means that the Gerbicz-Li self test indicated a computation error.) | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 820 ID: 164101 Credit: 305,989,513 RAC: 2,326

|
These don't show a segmentation violation. Instead, "validation failed". (From what I understand, this means that the Gerbicz-Li self test indicated a computation error.)
See http://www.primegrid.com/forum_thread.php?id=10054&nowrap=true#158125
If my hypothesis is correct, incorrect data can generate "validation failed" or "value is zero" (2nd GFN-21 test) or SIGSEGV. | |
|
|
@Yves and @Crun-chi: Thank you both for your suggestive comments.
Crun-chi underlines the fact that issuing more threads than physical cores from the same task leads to troubles.
Yves underlines the fact that openmp has built-in thread mechanisms to manage and choose cores.
Moving a thread from one core to another is not a problem when threads are working independently.
taskset talks to the kernel's task scheduler and openmp talks with the task scheduler too.
But some openmp operations are not independent ...
With two genefer tasks and the same thread pinning mechanism, genefer never experiences troubles,
probably because threads issued by one openmp task can be affected to the cores of the other task,
especially the openmp thread(s?) dedicated to collect results from the other threads (e.g. a reduction).
This is not true if only one task run on all physical cores.
It may happened that the problem arise when the thread pinning (cron automated every minute)
occur during checkpointing which last longer for a GFN-21. This would explain the randomness of the error.
Migrating threads during a reduction for example is far from being a good idea ! Mea culpa !
The only solution is to dive into the code, especially the checkpointing part. Yves reported that 15 threads are issued.
Only 13 threads appear under linux. The first (checkpointing) child thread consumes very few cpu time.
What is the purpose / algorithm of the checkpointing ? Is the checkpointing multithreaded ?
Do this checkpointing thread issue (2) child threads when reducing collecting and writing results ?
During checkpointing and exchange of data between threads, it is really a bad idea to migrate threads from core to core !
@Yves: CQFD: you're right. Obviously, my basic thread pinning algorithm needs some refinements ! TBC... | |
|
|
Finally just the overclocking features which made the CPU unstable during check-pointing because less threads are involved. Back to more conservative settings. The previously described thread pinning is OK. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 820 ID: 164101 Credit: 305,989,513 RAC: 2,326

|
What is the purpose / algorithm of the checkpointing ? Is the checkpointing multithreaded?
Do this checkpointing thread issue (2) child threads when reducing collecting and writing results?
FYI, a modular exponentiation is calculated and the checkpoints are some intermediate values saved during the computation. At the end, the program reads them and computes the proof.
Only the multiplications are multi-threaded (#pragma omp parallel directive). All I/O operations are single-threaded.
I noticed that Boinc creates two threads.
On Windows, the number of threads is amazing. For "--nthreads n", n + 3 threads are initially created but after few seconds 3 threads end and the number is finally n. With the "-boinc" option it is n + 5 then n + 2. | |
|
|
So the suspicions which to my understanding were voiced in this thread — that the OpenMP implementation with which the current Linux binary of genefer22 is compiled (that is, gcc's implementation) would (a) fail to execute threads if a desired CPU binding wouldn't succeed, or (b) access wrong thread-private data when the kernel moves a thread to a different CPU, e.g. due to user-issued taskset — were entirely unfounded. It would have been astounding indeed if a production release of an OpenMP implementation had such severe bugs. | |
|
Post to thread
Message boards :
Generalized Fermat Prime Search :
GFN 4.01 Ryzen all cores: weird segmentation violation |