Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
AP26 - AP27 Search :
Scheduler Wait
Author |
Message |
|
Hi,
I've been trying to get my computer to run AP27 GPU tasks, but they only run for a few seconds then go into Scheduler Wait state for a while before the task reinitialises and starts again from zero percent done. This repeats forever and I end up having to abort the tasks. From what I can gather on other BOINC sites, Scheduler Wait means it's waiting for more memory, though I can't find any exact matches on the forum here.
My GPU is an AMD/ATI Radeon (Turks) HD6670 with 2Gb RAM on board. BOINC reports it with CAL version 1.4.1848, 2048MB, 1970MB available, 1632GFlops peak, driver version 1800.11, OpenCL 1.2 AMD-APP.
Other PrimeGrid GPU tasks run fine, it only seems to be AP27 which is having this problem. I know it says AP27 tasks need 1.5Gb+ of VRAM, but as mine has more than this I thought it might be ok. I've tried this in the past and given up, but thought I'd revisit it with the Wallis is Born Challenge coming up, I was hoping it might have been a bug which has been fixed by now, but apparently not. Is there a setting I'm missing somewhere to get this working, or is the HD6670 just a non-starter?
Thanks,
Gary.
| |
|
JimB Volunteer moderator Project administrator Project developer Send message
Joined: 4 Aug 11 Posts: 892 ID: 107307 Credit: 868,280,498 RAC: 716,156
                    
|
In the BOINC Manager, on the Disk and Memory tab, have a look at the memory section. Mine, for example, says BOINC can use 50% of the RAM if the computer is in use and 90% if it's not in use. As my computers all have at least 12GB of RAM, this never affects me. But it could explain what's happening to you. If your settings are similar, I'd raise the percentage of memory available to BOINC while your computer is in use. I'd also look at the Computing tab and check the "When to Suspend" settings. I've got everything in there unchecked, so I don't think my system ever hits the "in use" state.
I'm not sure this will fix your problem, but it's where I'd start looking. | |
|
|
Thanks for the suggestions JimB. My memory usage options are set at 95% when in use and 99% when idle (I don't tend to do much else with this particular computer, so just let it sit doing BOINC work most of the time and have most of the memory & cpu to itself). The BOINC log reports 3.86Gb of physical memory available (non-GPU) and 1.91Gb virtual. Plenty of disk space too (48.99Gb free). I've ticked the boxes to run cpu and gpu tasks while the computer is in use as well, so it shouldn't be suspending them unless non-BOINC work exceeds 80%, which it occasionally does when background admin tasks are running, but then ALL the tasks get suspended, not just one. I've even tried manually suspending all other tasks so there's only the AP27 one running, but it still does the same thing. I did notice that if I clicked on Properties for the task while it was in Scheduler Wait state, the virtual memory size and working set size were both only a few hundred megabytes each, but I don't know if that relates to the GPU's VRAM or just normal computer RAM, or even if that's relevant. All I can think of is that the task wants to allocate more than the 1.9Gb available, but I can't find any messages to support this or say anything else which might give me a clue what's up. | |
|
JimB Volunteer moderator Project administrator Project developer Send message
Joined: 4 Aug 11 Posts: 892 ID: 107307 Credit: 868,280,498 RAC: 716,156
                    
|
In your BOINC Manager, if you click on Options / Event Log options (might be under some other menu, I think it's moved around between sections over time), there are three debug flags that could be interesting to look at. They are: cpu_sched, cpu_sched_debug and cpu_sched_status. I would start with only cpu_sched checked and see if that gives you enough new information. cpu_sched_status looks like it outputs the current state at intervals and cpu_sched_debug is maybe too low-level to be of use here.
Anyway, with one or more of those turned on in optins, you can see what your BOINC log (Tools / Event Log) is saying when you're seeing this scheduler behavior. Ideally it should tell you why it's suspending the job. You can turn those options off again if/when you get useful information. You don't need to restart BOINC or anything - they take effect immediately when you click OK in the options dialog. | |
|
|
I don't seem to have those options under any of the BOINC Manager menus, but I managed to turn them on via my cc_config.xml file and there's a menu option to reread the config file(s), so fairly easy to turn them on & off that way. I didn't get much from it with just cpu_sched, so added the others you mentioned, and a couple more which looked interesting... there's a bit more in the log now, but I'm still not seeing what the problem is :
Thu 17 Nov 2016 09:25:00 GMT | | [mem_usage] All others: RAM 1424.62MB, page 24331.59MB, user 59.920, kernel 47.770
Thu 17 Nov 2016 09:25:00 GMT | | [mem_usage] non-BOINC CPU usage: 2.98%
Thu 17 Nov 2016 09:25:07 GMT | PrimeGrid | task ap27_155075_0 resumed by user
Thu 17 Nov 2016 09:25:07 GMT | | [cpu_sched_debug] Request CPU reschedule: task suspended, resumed or aborted by user
Thu 17 Nov 2016 09:25:08 GMT | | [cpu_sched_debug] schedule_cpus(): start
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched_debug] earliest deadline: 1479978854 ap27_155075_0
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched_debug] scheduling ap27_155075_0 (coprocessor job, EDF) (prio -1.000000)
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched_debug] reserving 1.000000 of coproc ATI
Thu 17 Nov 2016 09:25:08 GMT | | [cpu_sched_debug] enforce_schedule(): start
Thu 17 Nov 2016 09:25:08 GMT | | [cpu_sched_debug] preliminary job list:
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched_debug] 0: ap27_155075_0 (MD: yes; UTS: yes)
Thu 17 Nov 2016 09:25:08 GMT | | [cpu_sched_debug] final job list:
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched_debug] 0: ap27_155075_0 (MD: yes; UTS: no)
Thu 17 Nov 2016 09:25:08 GMT | | [mem_usage] enforce: available RAM 3754.82MB swap 1464.91MB
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [coproc] Assigning ATI instance 0 to ap27_155075_0
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched_debug] scheduling ap27_155075_0
Thu 17 Nov 2016 09:25:08 GMT | | [cpu_sched_debug] using 0.06 out of 2 CPUs
Thu 17 Nov 2016 09:25:08 GMT | Einstein@Home | [cpu_sched_debug] LATeah0006L_1008.0_0_0.0_15110025_0 sched state 1 next 1 task state 0
Thu 17 Nov 2016 09:25:08 GMT | Einstein@Home | [cpu_sched_debug] LATeah0006L_1040.0_0_0.0_2077950_0 sched state 1 next 1 task state 0
Thu 17 Nov 2016 09:25:08 GMT | Einstein@Home | [cpu_sched_debug] p2030.20160120.G176.30-00.40.N.b4s0g0.00000_192_1 sched state 1 next 1 task state 0
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched_debug] ap27_155075_0 sched state 1 next 2 task state 0
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [task] ACTIVE_TASK::start(): forked process: pid 5145
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [task] task_state=EXECUTING for ap27_155075_0 from start
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [cpu_sched] Restarting task ap27_155075_0 using ap26 version 201 (opencl_ati_AP27) in slot 3
Thu 17 Nov 2016 09:25:08 GMT | PrimeGrid | [css] running ap27_155075_0 (0.0572 CPUs + 1 ATI GPU)
Thu 17 Nov 2016 09:25:08 GMT | | [cpu_sched_debug] enforce_schedule: end
Thu 17 Nov 2016 09:25:10 GMT | PrimeGrid | [mem_usage] ap27_155075_0: WS 120.64MB, smoothed 83.20MB, page 271.52MB, 0.00 page faults/sec, user CPU 1.900, kernel CPU 0.140
Thu 17 Nov 2016 09:25:10 GMT | | [mem_usage] All others: RAM 1424.70MB, page 24331.65MB, user 60.450, kernel 48.080
Thu 17 Nov 2016 09:25:10 GMT | | [mem_usage] non-BOINC CPU usage: 4.16%
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] Process for ap27_155075_0 exited, status 0, task state 1
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] task called temporary_exit(60.000000)
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] task_state=UNINITIALIZED for ap27_155075_0 from temporary exit
Thu 17 Nov 2016 09:25:15 GMT | | [cpu_sched_debug] Request CPU reschedule: application exited
Thu 17 Nov 2016 09:25:15 GMT | | [cpu_sched_debug] schedule_cpus(): start
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [cpu_sched_debug] thrashing prevention: mark ap27_155075_0 as deadline miss
Thu 17 Nov 2016 09:25:15 GMT | | [cpu_sched_debug] enforce_schedule(): start
Thu 17 Nov 2016 09:25:15 GMT | | [cpu_sched_debug] preliminary job list:
Thu 17 Nov 2016 09:25:15 GMT | | [cpu_sched_debug] final job list:
Thu 17 Nov 2016 09:25:15 GMT | | [mem_usage] enforce: available RAM 3754.82MB swap 1464.91MB
Thu 17 Nov 2016 09:25:15 GMT | | [cpu_sched_debug] using 0.00 out of 2 CPUs
Thu 17 Nov 2016 09:25:15 GMT | Einstein@Home | [cpu_sched_debug] LATeah0006L_1008.0_0_0.0_15110025_0 sched state 1 next 1 task state 0
Thu 17 Nov 2016 09:25:15 GMT | Einstein@Home | [cpu_sched_debug] LATeah0006L_1040.0_0_0.0_2077950_0 sched state 1 next 1 task state 0
Thu 17 Nov 2016 09:25:15 GMT | Einstein@Home | [cpu_sched_debug] p2030.20160120.G176.30-00.40.N.b4s0g0.00000_192_1 sched state 1 next 1 task state 0
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [cpu_sched_debug] ap27_155075_0 sched state 2 next 1 task state 0
Thu 17 Nov 2016 09:25:15 GMT | | [cpu_sched_debug] enforce_schedule: end
The Einstein tasks mentioned are all suspended, as I suspended the project so that there's only my one PrimeGrid AP27 task active. I don't have any other tasks on that computer at the moment.
Does any of this look unusual to you perhaps?
| |
|
|
Also forgot to mention, this is on BOINC Client version 7.2.42 for x86_64-pc-linux-gnu on an AMD Athlon dual core 64-bit cpu with AMD Radeon HD6670 gpu coprocessor. | |
|
streamVolunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 579 ID: 301928 Credit: 451,697,000 RAC: 150
                     
|
Does any of this look unusual to you perhaps?
This one.
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] Process for ap27_155075_0 exited, status 0, task state 1
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] task called temporary_exit(60.000000)
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] task_state=UNINITIALIZED for ap27_155075_0 from temporary exit
temporary_exit is an unusual state (not an error, not a normal exit), may be author of the app could help you to find what the app dislikes.
| |
|
|
Thanks, yes, that does look a little unusual! Ah well, guess my gpu won't be contributing to the current challenge then, not to worry, I'll clock up a few cpu tasks instead. It would be nice to know what's wrong at some point though. Will wait and see if any app authors respond.
Cheers!
| |
|
|
Does any of this look unusual to you perhaps?
This one.
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] Process for ap27_155075_0 exited, status 0, task state 1
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] task called temporary_exit(60.000000)
Thu 17 Nov 2016 09:25:15 GMT | PrimeGrid | [task] task_state=UNINITIALIZED for ap27_155075_0 from temporary exit
temporary_exit is an unusual state (not an error, not a normal exit), may be author of the app could help you to find what the app dislikes.
The boinc_temporary_exit() state is entered if the GPU is out-of-memory. Your GPU is reported to have 2048MB of VRAM, and the AP27 app requires less than 1500MB. So BOINC manager restarts the task, hoping that whatever was using up the memory has stopped, but in your case it never seems to succeed.
So either it's the case that something is running that is consuming VRAM (could be your OS/windowing system/video driver), which you might be able to reduce, OR BOINC is reporting more memory than is actually present in your card, either due to a BOINC bug, video driver bug, or OpenCL driver bug. From a brief look, most of the "HD 6570/6670/7570/7670 series" have only 512 or 1024MB, although some have 2048MB. If you're sure you have 2GB of VRAM, I'd try upgrading to the latest driver from AMD.
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Thanks for the info Iain, I'll have a look into this. I remember having a nightmare getting it working at all with various Linux drivers and thought I had the right AMD one, but maybe not. Will have a check tomorrow with a fresh head and cup of coffee! Cheers. :-)
| |
|
Message boards :
AP26 - AP27 Search :
Scheduler Wait |