Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Validation inconclusive?
Author |
Message |
|
http://www.primegrid.com/workunit.php?wuid=272427839
That's pretty much it. I completed a Genefer_WR WU as a double-checker and apparently this is what it says.
Any idea what's up? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
http://www.primegrid.com/workunit.php?wuid=272427839
That's pretty much it. I completed a Genefer_WR WU as a double-checker and apparently this is what it says.
Any idea what's up?
That usually means the residuals from the two results didn't match, which in turn means that at least one of the two computers had an error. (It's possible, although unlikely, that both results are erroneous.)
A third task will be sent out to another computer, and hopefully that will match one of the two original results. Those two will be declared correct and will receive credit. The third one, which doesn't match, was an error, and won't receive credit.
____________
My lucky number is 75898524288+1 | |
|
|
That usually means the residuals from the two results didn't match, which in turn means that at least one of the two computers had an error. (It's possible, although unlikely, that both results are erroneous.)
A third task will be sent out to another computer, and hopefully that will match one of the two original results. Those two will be declared correct and will receive credit. The third one, which doesn't match, was an error, and won't receive credit.
Oh, I see. Well I certainly hope mine isn't the error. That's about 300 hours of computation wasted. :/ | |
|
|
This just happened to one of my GFN WUs also. But the third result came in a couple of days ago.
http://www.primegrid.com/workunit.php?wuid=306227219
Now the other two results are listed as "Complete and validated" with credit and mine still just says "Completed, validation inconclusive" with "pending" credit.
Does that mean my result was the bad one? In the task log I do see "613282^524288+1 is complete. (3034401 digits) (err = 0.2188), which I assume shows that the task did not run completely cleanly.
____________
Proud member of Team Aggie the Pew
"Wir müssen wissen. Wir werden wissen."
"We must know, we shall know."
- David Hilbert, 1930 | |
|
|
Does that mean my result was the bad one? In the task log I do see "613282^524288+1 is complete. (3034401 digits) (err = 0.2188), which I assume shows that the task did not run completely cleanly.
I think that an err=value > 0 is ok (look at the other two WU stderr: one of them has a higher value than yours and the other has the same value). Tasks will only fail (tagged as error) if err=value >0.45, If I'm not mistaken.
However, despite the fact that your err=value was within the limits, your task has reported a residue value different from the other two, due to an unknown mistake, which caused it not to validate.
____________
676754^262144+1 is prime | |
|
|
Looks like my Genefer tasks have no chance to be validated if they have suspensions during execution:
http://www.primegrid.com/result.php?resultid=422571085
http://www.primegrid.com/result.php?resultid=422581912
http://www.primegrid.com/result.php?resultid=422967245
1 task was marked as invalid after revalidation,
2 tasks are waiting for revalidation...
Michael Goetz, maybe are there some problems with task initialization after resumption?
After shifting to N=20 base I have no ability to finish any task entirely without suspention:
Estimated total run time for 11512^1048576+1 is 13:03:41
My last succeded task was
http://www.primegrid.com/result.php?resultid=422831388
which has no any suspensions and was executed from the start to end in one breath.
P.S.: there are 3 different ways the tasks may be suspended:
1) manually suspended tasks
2) Suspend GPU from BOINC menu or tray icon
3) reboot or shutdown OS...
Do you know how genefer responds to all apps terminate msg after 'shutdown' or 'reboot' command ?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Looks like my Genefer tasks have no chance to be validated if they have suspensions during execution:
http://www.primegrid.com/result.php?resultid=422571085
http://www.primegrid.com/result.php?resultid=422581912
http://www.primegrid.com/result.php?resultid=422967245
1 task was marked as invalid after revalidation,
2 tasks are waiting for revalidation...
In the first task, your GPU produced the wrong result. See below for the explanation.
In the other two tasks, it's not yet determined whether the result you returned was correct, but judging by the residual values sent back by your computer those will be marked invalid as well.
Michael Goetz, maybe are there some problems with task initialization after resumption?
Not as far as I know. Tasks get restarted very, very often. On systems with multiple GPUs of different types, the tasks even start on one type of GPU and finish on another type of GPU and complete successfully.
However, as I'm sure you're aware, overclocking does not work very well with GeneferCUDA and your GTX 460 is significantly overclocked. If you look at the tasks where you had computation errors, you'll see errors that look like this:
maxErr exceeded for 685676^524288+1, 0.5000 > 0.4500
That is the type of error you see when you're getting errors because of overclocking. You will note that this is one of the older n=19 tasks. The newer n=20 tasks are longer, and therefore are even more likely to have tasks fail due to overclocking.
Reduce the clock rates on your GPU -- especially the memory clock -- and that should eliminate the source of the errors.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
Looks like my Genefer tasks have no chance to be validated if they have suspensions during execution:
In order to verify that it's the overclocking and not the restart, here's one of yours that failed without restarting:
http://www.primegrid.com/result.php?resultid=422581929
____________
My lucky number is 75898524288+1 | |
|
|
I know, but all 8 completed tasks have no "maxErr exceeded":
http://www.primegrid.com/results.php?hostid=156165&offset=0&show_names=0&state=2&appid=16
423734155: 18678^1048576+1 is complete. (4478815 digits) (err = 0.0003)
423102783: 12792^1048576+1 is complete. (4306438 digits) (err = 0.0002)
423095982: 15496^1048576+1 is complete. (4393764 digits) (err = 0.0002)
422967245: 11636^1048576+1 is complete. (4263305 digits) (err = 0.0000)
422581912: 8214^1048576+1 is complete. (4104709 digits) (err = 0.0000)
422226763: 9092^1048576+1 is complete. (4150956 digits) (err = 0.0001)
421722159: 4550^1048576+1 is complete. (3835703 digits) (err = 0.0000)
421722120: 4398^1048576+1 is complete. (3820230 digits) (err = 0.0000)
2 of them are waiting for revalidation.
I am afraid that all the other taks the same fate.
From the point of view of probability theory how it's possible to get 8 incorrect residues instead of maxErr mistake?
Could you see that these tasks returned incorrect residues?
Maybe these values are trivial and even no need to check to say the tasks are incorrect?
Actually I've already excluded one of my GTX460 twins (--device 1) from genefer subproject cause noticed that exactly after its intervention most of the tasks completed with maxErr.
Remaining GPU is working better, maxErr almost disappeared, but...
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,923,882 RAC: 275,501
                               
|
I know, but all 8 completed tasks have no "maxErr exceeded":
From the point of view of probability theory how it's possible to get 8 incorrect residues instead of maxErr mistake?
Absolutely.
MaxErr is NOT designed to capture this error; it just happens to do so most of the time.
Your cards are overclocked and producing calculation errors. Some of these errors are caught by the maxErr detection; some are not. The errors that aren't caught make it through to the validation stage, where they are eventually rejected. But whether the error is caught by maxErr or not, the results are invalid.
There's no mystery here. There's no coincidence here. Overclocked GPUs don't work well with GeneferCUDA and produce exactly the errors you're seeing. You need to reduce the clock speeds or choose a less demanding sub-project to run.
I also have a 460, but mine's running at stock clock speeds. Generally I have a 100% success rate. It takes my GTX 460 2 or 3 hours longer to run a WU than your GTX 460 -- but my results validate and I get credit. Yours finish sooner, but don't validate, and there's no credit.
At stock clock speeds, you get valid results. At overclocked speeds, you get errors. It's really that simple.
____________
My lucky number is 75898524288+1 | |
|
|
I investigated dependences between GPU Memory Clock, GPU Core Clock and Estimated total run time for 18678^1048576+1 for my GTX 460 and that's what happened:
Dependence between GPU Core Clock and Estimated time (GPU Memory Clock is constant): _ _ _ _ Dependence between GPU Memory Clock and Estimated time (GPU Core Clock is constant):
I have not decided yet to which values downclock GPU... but if Memory Clock needs to be downclocked, the best choice is 1600MHz I suppose.
____________
| |
|
Message boards :
Generalized Fermat Prime Search :
Validation inconclusive? |