Author |
Message |
|
http://www.primegrid.com/workunit.php?wuid=288850681
Error = "Too many total results"
I wonder is this is a valid reason to abandon a B value. I've seen >20 attempts for some other B values. |
|
|
|
Although it may be abandoned right now, in not too long the upper limit of times it needs to be sent out will be automatically raised to 35 (I believe). So it's not really going to abandoned in the end.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,993,571 RAC: 281,271
                               
|
The b values are never abandoned. The only reason they would be abandoned would be if there was some mathematical or computational reason why that value could not be tested. Merely being unlucky enough to have been sent to 15, 35, 50, 60, or more hosts that were not properly configured for this job is not sufficient reason to stop processing a number.
Don't forget that there's still a lot of computers out there that are running buggy drivers which will fail every CUDA WU they ever get. Fortunately, the numbers of "100% failure rate" computers is dropping, and we're not seeing as many WUs with dozens of failures as we used to.
Whether the b value takes only 2 computers to perform 2 valid tests or 100 computers to perform 2 valid tests depends solely on the computers it's sent to, not the number.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,993,571 RAC: 281,271
                               
|
I forgot to mention the most important part. Since you've completed processing that WU, if the WU was abandoned now, you would miss out on the 300,000+ credits you will get when the result you returned is verified by another computer.
The number of allowable tasks for that WU will be automatically increased from 15 to 60 within the next 24 hours. We used to bump them to 35 or 50, but the 295/296 driver bug was causing so many errors even 50 wasn't enough.
____________
My lucky number is 75898524288+1 |
|
|
|
That sounds good.
I was afraid that there was a more morbid reason (I actually observed that this was while browsing some other computers' tasks - those that were wingmen and whatnot): I've seen a task that was finished but then deemed unneeded (probably the concurrently running sieve eliminated that particluar task).
If this is just a intital replication limit hit, then that's fine. Sanity limits are of course needed - just in case a program would totally reproducibly fail even a 100 times on perfectly working cards. It is a much better strategy to table it, debug and then add more WUs.
Thanks, I learned something new. :-)
____________
My unrelatedly lucky number is 2186*7^739474 - 1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,993,571 RAC: 281,271
                               
|
I was afraid that there was a more morbid reason (I actually observed that this was while browsing some other computers' tasks - those that were wingmen and whatnot): I've seen a task that was finished but then deemed unneeded (probably the concurrently running sieve eliminated that particluar task).
There's no such state as "finished and unneeded" as far as I know. Could you provide specific details?
We do not kill in-progress WUs because they've been removed by the sieve. What we may sometimes do, if a WU has one result returned and is then removed by the sieve, is lower the quorum to 1 -- this allows the returned result to get credit while not sending out any more tasks if we already know it's composite. Note that with n=19 sieving completed, and n=22 sieving at a very high P, this is not likely to happen often in the future.
If this is just a intital replication limit hit, then that's fine.
Yup.
There's an awful lot of computers out there that regularly fail ALL (or many) CUDA tasks of any type because they're configured incorrectly. There are additional computers that fail GeneferCUDA (especially the long WR tasks) because of overclocking and overheating problems. Throw in that Nvidia driver bug (yes, many are still using it!) and we've been chewing through tasks like a drunken Florida voter going through hanging chads on election day. :)
____________
My lucky number is 75898524288+1 |
|
|
|
I was afraid that there was a more morbid reason (I actually observed that this was while browsing some other computers' tasks - those that were wingmen and whatnot): I've seen a task that was finished but then deemed unneeded (probably the concurrently running sieve eliminated that particluar task).
There's no such state as "finished and unneeded" as far as I know. Could you provide specific details?
I'm going to make an assumption here and assume that serge is referring to a unit like this one.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,993,571 RAC: 281,271
                               
|
I'm going to make an assumption here and assume that serge is referring to a unit like this one.
He specifically said "finished but then deemed unneeded" and the task you're referring to was never sent out. It was never started, let alone finished.
But if that's is what he's referring to, yes, that happens, but it simply means there's one extra row in a database table that's got lots and lots of rows.
That's why I was asking him for details. If it's this, fine. If it's something else, it might need to be looked into.
____________
My lucky number is 75898524288+1 |
|
|
|
Oh. I didn't think it was important.
I'll have to try to retrace my steps, or maybe I took a note somewhere ...but my notes are a mess. I vaguely remember that it might have been a genefer non-WR WU (and a bit after challenge was closed). |
|
|
|
P.S. It could have been this one - I've bookmarked it (on July 09):
http://www.primegrid.com/workunit.php?wuid=285850652
I remember that there was an unusual message in it, like "Completed, can't validate" or maybe "Completed, not needed" and credit was 0.0.
The link leads nowhere at this moment. I couldn't find another instance using random walks over random WUs. There's nothing to debug; could have been a glitch. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 476,993,571 RAC: 281,271
                               
|
P.S. It could have been this one - I've bookmarked it (on July 09):
http://www.primegrid.com/workunit.php?wuid=285850652
I remember that there was an unusual message in it, like "Completed, can't validate" or maybe "Completed, not needed" and credit was 0.0.
The link leads nowhere at this moment. I couldn't find another instance using random walks over random WUs. There's nothing to debug; could have been a glitch.
It wasn't that WU.
285850652 was a WR task (B=4788) that had 2 completed results and 11 total results. Since it never got to 15 tasks, it wouldn't have ever shown the "too many tasks" messaging.
____________
My lucky number is 75898524288+1 |
|
|