Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Problems and Help :
Gerbicz error checking frequency?
Author |
Message |
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2468 ID: 29980 Credit: 449,105,070 RAC: 342,739
                           
|
How often does Gerbicz error checking take place while running a unit? Per iteration? Per n iterations? Per x% progress? In case it may vary depending on project, then it would suffice to narrow it to CUL/WOO for now.
Back story: I'm running an unstable system which was found as such during the previous challenge. I'm feeling an itch for getting some more work done in this challenge, so I decided to run it while simultaneously trying to fix it. I know, it isn't the best idea to do this on live data, but due to the infrequent occurrence I though I'd take that chance. The attempted fix last time to lower power limit did not help at all, and actually made it worse. So this time I'm going the opposite route: increasing voltage offset. At same power cap, this might reduce average clocks, but I'd take stability over speed. I've been monitoring stderr while the 1st unit has been running and roughly half way through I had a detected Gerbicz error. I just made the first voltage offset increase and will see if there are more. Understanding how often Gerbicz error checks are performed would be useful to know how much potentially in-progress work may be done in potential error state after making configuration changes. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 850 ID: 301928 Credit: 495,083,024 RAC: 300,345
                       
|
How often does Gerbicz error checking take place while running a unit?
Search space is split into 'ProofCount' blocks and integrity of each block is checked.
In your particular case, it's possible to run manual tests with very high ProofCount (you don't need to save checkpoints): -oProofCount=2048 -oGerbicz=1
Note that Gerbicz check code differs in pattern (CPU load and power usage) from standard crunching, so having Gerbicz checks happening too often may change CPU temperature and hide possible problems. Look at LLR status line ("xxxx checked") to find out when check happened.
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2468 ID: 29980 Credit: 449,105,070 RAC: 342,739
                           
|
Thanks. I see from the unit output that 128 blocks are used, so for that particular system on WOO tasks, a check is performed a bit more frequently than every 4 minutes. It isn't obvious to me where or if I can check the LLR output on a live running task. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 850 ID: 301928 Credit: 495,083,024 RAC: 300,345
                       
|
If you want to see output from live Boinc task, it's not possible. This line is consumed by wrapper. You can only monitor stderr.txt for presence of Gerbicz and ROUND OFF (0.5 and above) errors.
When challenge is over, you can run LLR2 as separate task / stress test. An example for 4 threads and 1024 Gerbicz blocks:
llr2_whatever.exe -d -t4 -oProofCount=1024 -oGerbicz=1 -q"test"
Status line will contain text "xxxx checked", where xxxx updates when each Gerbicz block validates.
Edit: on Linux you may try to use "peekfd" to attach to existing LLR2 process and monitor it's stdout, but, as peekfd' man page says, "Don't be surprised if the process you are monitoring dies." | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2468 ID: 29980 Credit: 449,105,070 RAC: 342,739
                           
|
Thanks for the info. The system hasn't produced any new errors since I increased voltage offset, so I'm hoping that is enough. It is sufficient for now to have an absence of errors. If further investigation becomes necessary after the challenge is over, I'll keep that in mind. | |
|
Post to thread
Message boards :
Problems and Help :
Gerbicz error checking frequency? |