Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Problems and Help :
LLR2 task validation error but no Gerbicz error?
Author |
Message |
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 337 ID: 1241833 Credit: 41,186,971 RAC: 847,832
                
|
I have this failed task:
Click
But according to stderr output nothing went wrong during the LLR2 task:
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
BOINC PrimeGrid wrapper 2.01 (Aug 11 2020 22:09:56)
running ../../projects/www.primegrid.com/llr2_1.0.0_win64_200814.exe -v
LLR2 Program - Version 1.0.0, using Gwnum Library Version 29.8
running ../../projects/www.primegrid.com/llr2_1.0.0_win64_200814.exe -oGerbicz=1 -oProofName=proof -oProofCount=128 -oProductName=prod -oPietrzak=1 -oCachePoints=0 -pSavePoints -q3*2^16391988-1 -d -t4 -oDiskWriteTime=1
Gerbicz check is requested, switching to PRP.
Starting probable prime test of 3*2^16391988-1
Using FMA3 FFT length 896K, Pass1=448, Pass2=2K, clm=2, 4 threads, a = 3, L2 = 459*279, M = 128061
Compressed 128 points to 7 products. Time : 95.896 sec.
Testing complete.
11:51:50 (5804): called boinc_finish(0)
</stderr_txt>
]]>
So I guess the certificate didn't match? Is it possible to find out what went wrong? Apparently the problem was on my side as the second task for that WU went through fine.
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,492,256 RAC: 386,886
                          
|
It can happen, but it's rare. The Gerbicz error checking can't catch errors at the very end of the calculation.
I see that this computer has had several tasks completely fail before the tasks were completed. I suspect whatever caused those tasks to fail was also responsible for this error.
____________
My lucky number is 75898524288+1 | |
|
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 337 ID: 1241833 Credit: 41,186,971 RAC: 847,832
                
|
Can you see more than these four occurances and the recent GFN error on 22/10?
clickety
The Woodall errors happened when I had to shut the computer down by removing power due to problems with a gfx card I installed on those dates.
The GFN error has no immediate explanation, I hope it's a singular occurence.
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 850 ID: 301928 Credit: 495,083,024 RAC: 300,345
                       
|
I have this failed task:
Click
But according to stderr output nothing went wrong during the LLR2 task:
This task returned incorrect residue. When certificate is created, residue is regenerated on server and compared with your result. They didn't match. A second task confirmed that your residue was wrong.
The simple part of the answer is: there are few steps which are not protected by Gerbicz, and you had a hardware error during one of these steps (in this case, on few last iterations of test). The complex part is: look like your system is having problems, but they do not look like common case of CPU failures and Gerbicz errors. One of your task crashed with "Illegal instruction" (code 0xC0000005 in log). Another tasks failed to start because executable file was corrupted on disk ( https://www.primegrid.com/result.php?resultid=1130869675 ) or it's checksum was calculated incorrectly due to other hardware problems. | |
|
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 337 ID: 1241833 Credit: 41,186,971 RAC: 847,832
                
|
stream, both tasks were running when I was installing and removing the gfx card which unfortunately included a few cold shut-downs (i.e. pulling the plug). I hope it was just this that caused the problems.
I doesn't explain the failed 321 task though, as that happened two days ago. But for now I'll wait and see if it occurs again.
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) | |
|
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 337 ID: 1241833 Credit: 41,186,971 RAC: 847,832
                
|
Unfortunately the same computer is now acting up again and this time I didn't do restarting during the computation.
It began when I switched to GCW, 3 of 3 failed tasks so far. They all show this stderr:
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
BOINC PrimeGrid wrapper 2.01 (Aug 11 2020 22:09:56)
running ../../projects/www.primegrid.com/llr2_1.0.0_win64_200814.exe -v
LLR2 Program - Version 1.0.0, using Gwnum Library Version 29.8
running ../../projects/www.primegrid.com/llr2_1.0.0_win64_200814.exe -oGerbicz=1 -oProofName=proof -oProofCount=64 -oProductName=prod -oPietrzak=1 -oCachePoints=1 -pSavePoints -q2061438*11^4122876+1 -d -t4 -oDiskWriteTime=1
Gerbicz check is requested, switching to PRP.
Starting probable prime test of 2061438*11^4122876+1
Using zero-padded FMA3 FFT length 1680K, Pass1=448, Pass2=3840, clm=2, 4 threads, a = 3, L2 = 327*197, M = 64419
Application terminated with exit code -1073741819 (0xC0000005)
15:07:43 (1552): called boinc_finish(195)
</stderr_txt>
BOINC message log shows:
10.11.2020 02:45:56 | PrimeGrid | Computation for task llrGCW_352241013_0 finished
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_0 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_1 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_2 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_3 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_4 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_5 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_6 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_7 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_8 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_9 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_10 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_11 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_14 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_15 for task llrGCW_352241013_0 absent
10.11.2020 02:45:56 | PrimeGrid | Output file llrGCW_352241013_0_r1696868142_16 for task llrGCW_352241013_0 absent
10.11.2020 02:45:58 | PrimeGrid | Started upload of llrGCW_352241013_0_r1696868142_12
10.11.2020 02:45:58 | PrimeGrid | Started upload of llrGCW_352241013_0_r1696868142_13
10.11.2020 02:45:59 | PrimeGrid | Finished upload of llrGCW_352241013_0_r1696868142_13
10.11.2020 02:46:00 | PrimeGrid | Finished upload of llrGCW_352241013_0_r1696868142_12
10.11.2020 02:47:41 | PrimeGrid | Sending scheduler request: To report completed tasks.
10.11.2020 02:47:41 | PrimeGrid | Reporting 1 completed tasks
]]>
Always only output files 12 and 13 are present, the others missing.
I found this thread where it was said that about anything hardware related can cause the error. The CPU i5-4790K is running at 77-78 °C all the time which according to some gaming-forums is a good temperature.
I changed preference to SoB, so when the 4th task fails as well I'll see if it's just affecting CGW. Maybe the b != 2 is having a higher demand on error counts?
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) | |
|
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 337 ID: 1241833 Credit: 41,186,971 RAC: 847,832
                
|
After 4 consecutive failed GCW tasks I switched to SoB which now is running since 24 h with no problems. So the error seems to be triggered specifically by CGW. Does it have some hardware component it puts specific stress on? RAM?
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 850 ID: 301928 Credit: 495,083,024 RAC: 300,345
                       
|
After 4 consecutive failed GCW tasks I switched to SoB which now is running since 24 h with no problems. So the error seems to be triggered specifically by CGW. Does it have some hardware component it puts specific stress on? RAM?
Don't think so, most stressful tasks (for heat and CPU load) are PPS, PPS-MEGA and DIV. But GCW uses highest amount of RAM. Look like your task made it almost to the end, but crashed during compression of checkpoints, where maximum amount of RAM is used. Can it be that this PC is running out of memory? Each GCW task can consume up to 1 GByte of RAM.
| |
|
Bur Volunteer tester
 Send message
Joined: 25 Feb 20 Posts: 337 ID: 1241833 Credit: 41,186,971 RAC: 847,832
                
|
It has 32 GB of RAM, maybe the RAM has problems?
So even during the checkpoint compression stage it has RAM consumption higher than other subprojects?
____________
Primes: 1281979 & 12+8+1979 & 1+2+8+1+9+7+9 & 1^2+2^2+8^2+1^2+9^2+7^2+9^2 & 12*8+19*79 & 12^8-1979 & 1281979 + 4 (cousin prime) | |
|
Post to thread
Message boards :
Problems and Help :
LLR2 task validation error but no Gerbicz error? |