Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Sieving :
Invalid Command queue error?
Author |
Message |
tng Send message
Joined: 29 Aug 10 Posts: 465 ID: 66603 Credit: 45,659,905,036 RAC: 23,802,726
                                                   
|
I have begun getting errors like this on the system that I have running manual sieving:
Error clEnqueueWriteBuffer on device 20P/day) Found 713 ETA 39h00m
Error code -36, message: Invalid command queue
GPU Error -36 @ 2426: Invalid command queu
The errors are occurring on this system, running six instances of GFN-18 manual sieving. It seems to happen to two instances at a time, sometimes on the same card and sometimes not. They restart just fine. The interval between occurrences seems to be hours.
Any ideas what is going on?
____________
| |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 918 ID: 107307 Credit: 977,945,376 RAC: 0
                     
|
No idea here. I'm always tempted to reboot machines having problems like that - a reboot can fix a lot of problems. The other thing that comes to mind is the power supply, if it's putting out the rated power and can still handle that number of cards. | |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3171 ID: 130544 Credit: 2,233,021,669 RAC: 614,280
                           
|
...Congrats on 13B total by the way. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1022 ID: 301928 Credit: 543,195,386 RAC: 2
                        
|
I think it's a driver bug. Running 6 instances may expose hidden driver bugs/race conditions which does not reveal himself with 1-2 tasks.
Try to decrease 'B' parameter by one step - different timings / less GPU resources required, it may change behavior of the driver.
| |
|
tng Send message
Joined: 29 Aug 10 Posts: 465 ID: 66603 Credit: 45,659,905,036 RAC: 23,802,726
                                                   
|
OK -- thinking about it, I had something weird on this system before (driven from my mind by problems with another system since). At the beginning of this month, when I first started running manual sieving on this system, when I ran the application it would report that the system ahd 4 RTX 2070s, not 2. Only 2 showed up in Device manager. Reinstalling drivers got rid of that, although I think I had to do it a couple of times. I'm going to try a clean installation of the latest drivers and see if that fixes this.
If not, are utilities like DDU worth trying, or should I just go straight to a clean OS install (just a crunchbox at this point, sothat's just an annoying expenditure of time)?
____________
| |
|
Monkeydee Volunteer tester
 Send message
Joined: 8 Dec 13 Posts: 532 ID: 284516 Credit: 1,435,131,127 RAC: 1,804,590
                           
|
If not, are utilities like DDU worth trying, or should I just go straight to a clean OS install (just a crunchbox at this point, sothat's just an annoying expenditure of time)?
DDU is definitely worth trying before resorting to a full OS install.
In your case the install difficulty is minimal, but still time consuming. DDU would be a minimal time investment that could prevent a further time investment in an OS reinstall.
____________
My Primes
Badge Score: 4*2 + 6*2 + 7*4 + 8*8 + 11*3 + 12*1 = 157
| |
|
tng Send message
Joined: 29 Aug 10 Posts: 465 ID: 66603 Credit: 45,659,905,036 RAC: 23,802,726
                                                   
|
If not, are utilities like DDU worth trying, or should I just go straight to a clean OS install (just a crunchbox at this point, sothat's just an annoying expenditure of time)?
DDU is definitely worth trying before resorting to a full OS install.
In your case the install difficulty is minimal, but still time consuming. DDU would be a minimal time investment that could prevent a further time investment in an OS reinstall.
OK -- I'll try that first if the driver reinstall doesn't resolve the problem.
____________
| |
|
tng Send message
Joined: 29 Aug 10 Posts: 465 ID: 66603 Credit: 45,659,905,036 RAC: 23,802,726
                                                   
|
Update:
Just checked and had the same error on three instances and a different one on one instance:
Error clGetEventInfo on device 1 (131.7P/day) Found 6465 ETA 42h22m
Error code -5, message: Out of resources
GPU Error -5 @ 2365: Out of resources
Ran out of GPU RAM, maybe?
____________
| |
|
|
I think you should reduce your clock and memory speeds. Those errors look like what comes up and happens when the card is pushed to it's limits. Reduce it a bit and see if the errors go away.
I learned, from Azmodes, if you get errors like that reduce clocks and memory till they go away. You can adversely increase clocks till those errors occur then reduce till they subside. We learned for sieving high clock speed is best obviously but low memory is also fine and doesn't affect sieving performance. | |
|
tng Send message
Joined: 29 Aug 10 Posts: 465 ID: 66603 Credit: 45,659,905,036 RAC: 23,802,726
                                                   
|
I think you should reduce your clock and memory speeds. Those errors look like what comes up and happens when the card is pushed to it's limits. Reduce it a bit and see if the errors go away.
I learned, from Azmodes, if you get errors like that reduce clocks and memory till they go away. You can adversely increase clocks till those errors occur then reduce till they subside. We learned for sieving high clock speed is best obviously but low memory is also fine and doesn't affect sieving performance.
Will try that if needed, but I havn't overclocked these cards, and one is definitely stock clocks (the other came with the system. Also, the errors are occurring on both cards. Will at least reinstall drivers first.
It really seems to act like a memory leak -- I stopped and restarted all sieving tasks last night, and no problems so far (maybe 20-22 hours).
____________
| |
|
|
I think you should reduce your clock and memory speeds. Those errors look like what comes up and happens when the card is pushed to it's limits. Reduce it a bit and see if the errors go away.
I learned, from Azmodes, if you get errors like that reduce clocks and memory till they go away. You can adversely increase clocks till those errors occur then reduce till they subside. We learned for sieving high clock speed is best obviously but low memory is also fine and doesn't affect sieving performance.
Will try that if needed, but I havn't overclocked these cards, and one is definitely stock clocks (the other came with the system. Also, the errors are occurring on both cards. Will at least reinstall drivers first.
It really seems to act like a memory leak -- I stopped and restarted all sieving tasks last night, and no problems so far (maybe 20-22 hours).
I figured nothing was overclocked. You've sieved more than just about everyone else here... Maybe it can't handle that many instances at once... | |
|
tng Send message
Joined: 29 Aug 10 Posts: 465 ID: 66603 Credit: 45,659,905,036 RAC: 23,802,726
                                                   
|
It appears that this may have been due to a physical misconfiguration. When I installed the second card, the system had two PCI power cables available, which were bundled together and had 6+2 connectors. The card required one 8-pin connector. I plugged in the 6-pin connector from one cable and the 2-pin connector from the other.
Since I fixed this, no errors.
There's still an issue with this system. The reason I am running 3 sieving reservations on each card is that I can't get the expected performance without doing that. Running one or two sieving tasks per card, I get just under 170P/day for each task. Running three per card, performance is slightly less per task, which puts it in line with expectations.
Still trying fo figure that out.
____________
| |
|
|
It appears that this may have been due to a physical misconfiguration. When I installed the second card, the system had two PCI power cables available, which were bundled together and had 6+2 connectors. The card required one 8-pin connector. I plugged in the 6-pin connector from one cable and the 2-pin connector from the other.
Since I fixed this, no errors.
There's still an issue with this system. The reason I am running 3 sieving reservations on each card is that I can't get the expected performance without doing that. Running one or two sieving tasks per card, I get just under 170P/day for each task. Running three per card, performance is slightly less per task, which puts it in line with expectations.
Still trying fo figure that out.
GFN 18 requires a lot more CPU power to feed faster GPUs, one instance will only at most use 1 full CPU core. If it's a windows system look in task manager and see how much CPU power the sieve program is using. | |
|
|
I was getting these same errors when trying to run multiple instances of msieves on a couple cards as well. I dunno what causes it but it seems some configurations have issues with several instances.
I tried reducing clocks to stock speeds and got the same errors so it had nothing to do with speeds.
The lower ranges seem to be tricky to sieve if you have a fast card and want to max out your P/day performance. | |
|
mikey Send message
Joined: 17 Mar 09 Posts: 1652 ID: 37043 Credit: 733,524,559 RAC: 85,491
                     
|
OK -- thinking about it, I had something weird on this system before (driven from my mind by problems with another system since). At the beginning of this month, when I first started running manual sieving on this system, when I ran the application it would report that the system ahd 4 RTX 2070s, not 2. Only 2 showed up in Device manager. Reinstalling drivers got rid of that, although I think I had to do it a couple of times. I'm going to try a clean installation of the latest drivers and see if that fixes this.
If not, are utilities like DDU worth trying, or should I just go straight to a clean OS install (just a crunchbox at this point, sothat's just an annoying expenditure of time)?
Get a free program and make an image of your clean install once it's back up again, that way the next time you don't have to go thru the whole process just reload the image. I do that on all of my Boinc only machines and it's much faster. I use Macrium Reflect but even Linux's Clonezilla will work.
| |
|
darkclown Volunteer tester Send message
Joined: 3 Oct 06 Posts: 328 ID: 3605 Credit: 1,422,865,129 RAC: 337,605
                         
|
You can also use Microsoft's Media Creation Toolkit to create a usb key of install media based on your install. Helpful for once you have a clean image. | |
|
Message boards :
Sieving :
Invalid Command queue error? |