Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Exited with zero status but no finished file
Author |
Message |
Dave  Send message
Joined: 13 Feb 12 Posts: 3253 ID: 130544 Credit: 2,431,231,788 RAC: 4,061,228
                           
|
Common issue I think but I keep getting one of my cards (can be either) rev down for 10 mins then when it restarts it returns the above in BM log. I don't wsnt to reset project. Have underclocked core + mem + even have a desktop fan blowing at PC to keep things below (or little over) 80C. AVG avoids folders:+ ident protection off,, did leave a core free but made no dif. Virtually every trick in the book.
Dodgy unit? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,051,011 RAC: 285,770
                               
|
Common issue I think but I keep getting one of my cards (can be either) rev down for 10 mins then when it restarts it returns the above in BM log. I don't wsnt to reset project. Have underclocked core + mem + even have a desktop fan blowing at PC to keep things below (or little over) 80C. AVG avoids folders:+ ident protection off,, did leave a core free but made no dif. Virtually every trick in the book.
Dodgy unit?
You have a link to one of the failed tasks?
____________
My lucky number is 75898524288+1 | |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3253 ID: 130544 Credit: 2,431,231,788 RAC: 4,061,228
                           
|
Definitely this 1:
http://www.primegrid.com/result.php?resultid=595428852
Jumped between cards so it's not a card or temp (top card is 5~10C warmer than bottom)
Genefer text doc in slot directory reported about 10 maxErrs so far. | |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3253 ID: 130544 Credit: 2,431,231,788 RAC: 4,061,228
                           
|
It just errored out after the 6-retry rule. Guess I was just unlucky. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,051,011 RAC: 285,770
                               
|
Definitely this 1:
http://www.primegrid.com/result.php?resultid=595428852
Jumped between cards so it's not a card or temp (top card is 5~10C warmer than bottom)
Genefer text doc in slot directory reported about 10 maxErrs so far.
It's hard to say exactly what the problem was, because there's two different types of failures in that log.
The first is a GPU or driver crash. That has an almost infinite number of causes and it's impossible to diagnose remotely. The only way to find out what the cause might be is to systematically start removing possible causes, such as "Don't run this" or "don't do that" or removing one of the video cards, and things along those lines. If you make a change and you stop seeing those "cufft error 6" messages, then you may have found the culprit.
The second set of errors, the maxerr exceeded errors, may be indicative of a GPU/overclocking/overheating problem -- but it might just be a side effect of the earlier GPU/driver crashes. Normally, you see something like "0.500 > 0.450". I don't think I've ever seen "128.000 > 0.4500" before. So my guess is that this problem is caused by the previous crashes.
Figure out what's causing the GPU/driver to crash and both problems may be solved.
____________
My lucky number is 75898524288+1 | |
|
Message boards :
Generalized Fermat Prime Search :
Exited with zero status but no finished file |