Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
1)
Message boards :
Number crunching :
Way too much work
(Message 156217)
Posted 328 days ago by Tuna Ertemalp
Reported as https://github.com/BOINC/boinc/issues/4827
Tuna
A fix has been made by David Anderson and tested by me. I'll post here when it is released.
Tuna
One caveat I discovered just now after rolling the private release to all my hosts, which seems fair to remain as a behavior.
Out of my 8 hosts, 2 were still fetching one extra CPU task. Turns out, those were the hosts with 6C/12T at 50% CPU setting, and with my thread setting in PG prefs being 4 for LLR, only one task could run using 4 threads, but BOINC 7.20.1+ still sees that there is 2C capacity left on the CPU, therefore requests a PG task, downloads it, which needs 4 threads, therefore starts waiting.
I think this is a very special case given how PG prefs allow things like this, so I will not pursue getting this fixed. I "solved" the issue by creating a PG location to use only 3T per LLR task, put those hosts into that location, therefore these two hosts now get 2 such tasks, and BOINC isn't trying to download a 3rd task.
Tuna
|
2)
Message boards :
Number crunching :
Way too much work
(Message 156216)
Posted 328 days ago by Tuna Ertemalp
Reported as https://github.com/BOINC/boinc/issues/4827
Tuna
A fix has been made by David Anderson and tested by me. I'll post here when it is released.
Tuna
|
3)
Message boards :
Number crunching :
Way too much work
(Message 156200)
Posted 330 days ago by Tuna Ertemalp
Reported as https://github.com/BOINC/boinc/issues/4827
Tuna
|
4)
Message boards :
Number crunching :
Way too much work
(Message 156199)
Posted 330 days ago by Tuna Ertemalp
And, by setting the <work_fetch_debug> flag in my cc_config.xml, here is a snippet, starting with an instance of correct state, then ending up with "buffer low":
. . .
CORRECT STATE: 7/5/2022 8:35:52 PM | PrimeGrid | can't fetch NVIDIA GPU: zero resource share
7/5/2022 8:35:52 PM | | [work_fetch] No project chosen for work fetch
7/5/2022 8:36:52 PM | | choose_project(): 1657078612.772133
7/5/2022 8:36:52 PM | | [work_fetch] ------- start work fetch state -------
7/5/2022 8:36:52 PM | | [work_fetch] target work buffer: 180.00 + 0.00 sec
7/5/2022 8:36:52 PM | | [work_fetch] --- project states ---
7/5/2022 8:36:52 PM | PrimeGrid | [work_fetch] REC 5378101.528 prio -0.021 can request work
7/5/2022 8:36:52 PM | | [work_fetch] --- state for CPU ---
7/5/2022 8:36:52 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 2433.06 busy 0.00
7/5/2022 8:36:52 PM | PrimeGrid | [work_fetch] share 0.000 zero resource share
7/5/2022 8:36:52 PM | | [work_fetch] --- state for NVIDIA GPU ---
7/5/2022 8:36:52 PM | | [work_fetch] shortfall 28.82 nidle 0.00 saturated 151.18 busy 0.00
BAD DECISION: 7/5/2022 8:36:52 PM | PrimeGrid | [work_fetch] share 1.000
7/5/2022 8:36:52 PM | | [work_fetch] ------- end work fetch state -------
7/5/2022 8:36:52 PM | PrimeGrid | choose_project: scanning
7/5/2022 8:36:52 PM | PrimeGrid | can't fetch CPU: zero resource share
7/5/2022 8:36:52 PM | PrimeGrid | can fetch NVIDIA GPU
BAD STATE: 7/5/2022 8:36:52 PM | PrimeGrid | NVIDIA GPU needs work - buffer low
7/5/2022 8:36:52 PM | PrimeGrid | checking CPU
7/5/2022 8:36:52 PM | PrimeGrid | CPU don't need
7/5/2022 8:36:52 PM | PrimeGrid | checking NVIDIA GPU
7/5/2022 8:36:52 PM | PrimeGrid | NVIDIA GPU set_request: 1.000000
7/5/2022 8:36:52 PM | PrimeGrid | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (1.00 sec, 0.00 inst)
7/5/2022 8:36:52 PM | PrimeGrid | Sending scheduler request: To fetch work.
BAD ACTION: 7/5/2022 8:36:52 PM | PrimeGrid | Requesting new tasks for NVIDIA GPU
7/5/2022 8:36:54 PM | PrimeGrid | Scheduler request completed: got 1 new tasks
. . .
Tuna
|
5)
Message boards :
Number crunching :
Way too much work
(Message 156198)
Posted 330 days ago by Tuna Ertemalp
https://github.com/BOINC/boinc/pull/2837 is the pull what had broken the 0/0/0 not pre-fetching back in 2018 by introducing a 3min prefetch at all times to avoid idle time, and https://github.com/BOINC/boinc/issues/4396 seems to be the fix that went into 7.20.1 by the pull https://github.com/BOINC/boinc/pull/4800, the actual code change being https://github.com/BOINC/boinc/pull/4800/commits/4c7657ae9c7c61a6e98532beb7b8f098cb0b28be, in case anyone here can tell why it is not working. It feels like as my GPU tasks have 2m30s estimated remaining, shortly thereafter a prefetch happens.
Tuna
|
6)
Message boards :
Number crunching :
Way too much work
(Message 156197)
Posted 330 days ago by Tuna Ertemalp
There is a NEW "Development Version" of BOINC 7.20.1 that has been released for testing. It's supposed to eliminate the mass downloading / flooding PC of excess tasks.
https://boinc.berkeley.edu/download_all.php
. . .
Client: if project has zero resource share AND work buf min is zero, don't fetch from it unless device instance is actually idle.
Tried it on one of my hosts.
- PrimeGrid Resource Share=0
- Store at least 0 days of work
- Store up to an additional 0 days of work
- 16C/32T CPU
- 3 GPU
- Use at most 50% of the CPUs
- Running LLR apps with Multithreading=4 in Primegrid project preferences on the web
- Therefore, already have (32*50%)/4=4 tasks running
- End up with anywhere from 1 to 3 CPU tasks in "Ready to start (4 CPUs)" status
- Similar story with the GPU: despite all being busy, there seems to be always at least 1 GPU task downloaded, waiting.
Oh well...
Tuna
|
7)
Message boards :
Number crunching :
Can I dictate number of CPU & GPU jobs?
(Message 156140)
Posted 334 days ago by Tuna Ertemalp
Yup, that, too. :)
|
8)
Message boards :
Number crunching :
Can I dictate number of CPU & GPU jobs?
(Message 156135)
Posted 334 days ago by Tuna Ertemalp
Thank you, Michael. All of this is MUCH appreciated. I'd been doing SETI before BOINC was separated out of it way back in the day (SETI@home member since 15 Nov 1999). And then full on BOINC well over a decade (BOINCstats Joined: 2007-10-31). But I was the guy who gave 100% access to all BOINC projects at ResShare=100 on all hosts I kept building. Sort of being an equal opportunity provider to avoid choosing whom to provide for... :) In that world, I didn't care about maximizing contributions to one project or minimizing task times, but just helping out everyone. And that essentially meant big multi-day queues for dozen+ connected projects to give BOINC the freedom to choose to do whatever & whenever. Being concentrated on just one project and trying to optimize my contribution after those "general service" decades is relatively new to me.
My cache was 0.5/0, mostly because I use BOINCstats to manage my hosts and that seemed to make sense. So, on one 4C/8T + 1 GPU machine, I took over the local prefs, and set it to 0/0 and to use 50% CPUs to avoid turning off HT. I also set the ResShare for PG to 0 on that host. Then I placed it on Pluto with:
Use CPU=Yes
Use NVIDIA GPU=YES
Max # of simultaneous PrimeGrid tasks=No limit
Multi-threading: Max # of threads for each task=4
And selected PPSE for CPU, WW for GPU. All that results in the expectation of 2 PPSE and 1 WW active at all times with 0 queue buildup, right? That should all work, right? Should...
It did start out with 2 CPU tasks and 1 GPU task. No problem. But just as they were a few minutes away from completion, prefetch happened. Argh. So, the "and then (???) supposedly fixed it" part seems to have not happened. Sadly. As I type this, I am staring at 1 GPU and 8 CPU tasks "Ready to start"... Maybe things will settle a bit better (or worse?) after enough tasks pass through to affect the task stats.
So, the options seem to be:
- Use BOINC 7.14.2 to avoid prefetch at ResShare=0
- Use multiple BOINC instances
- Convince someone to allow specifying "Max # of simultaneous PrimeGrid tasks" optionally in CPU & GPU categories.
It feels like using multiple instances is the way to go. Going back a version, who knows what is broken; as a lifelong software developer, downgrading software bothers me instinctively. Expecting someone at PG do work to make up for BOINC's faults is unfair. And, going through the exercise of multiple BOINCs on one host has the added bonus of being able to divide up resources between multiple projects, like three BOINCs each see 1 GPU on a 3 GPU machine, and each BOINC serves a different project or even subproject on the same project.
Thanks for all the input, everyone. I wish BOINC was better maintained with enough resources...
Tuna
|
9)
Message boards :
Number crunching :
Can I dictate number of CPU & GPU jobs?
(Message 156118)
Posted 335 days ago by Tuna Ertemalp
I even thought of installing BOINC twice on my Win10 machine(s), but couldn't figure out a way to do that, let alone have them appear as two separate hosts, one with just the GPUs, another with the CPU, each using a different PrimeGrid location where I can dictate one as CPU tasks, the other as GPU tasks, etc.
That's actually fairly easy to do. Just run a second copy of BOINC using a different data directory and a different RPC port (the default is 31419, aka "pi"). There's also a setting in (I think) cc_config.xml that you have to turn on to allow the computer to run multiple boinc instances. Then you start the second copy of boinc with command line parameters to set the port, data directory, and also the flag for multiple clients:
"c:\Program Files\BOINC\boinc.exe" --allow_multiple_clients --redirectio --detach_console --gui_rpc_port 31418 --dir C:\ProgramData\BOINC2
This is great info that I wasn't aware of! I'll experiment with this for any future cases where I might want to run GPU tasks from one project and CPU tasks from another project, without BOINC algorithms messing the queue up for me... :)
Thank you very much for this! But I am still interested in hearing if separate "Max # of simultaneous PrimeGrid CPU/GPU task" settings in PrimeGrid locations is a future possibility, or why not if not. If it is a technical issue, I'd love to know what that is just out of curiosity. If it is a resource/stability issue, I totally understand.
Tuna
|
10)
Message boards :
Number crunching :
Can I dictate number of CPU & GPU jobs?
(Message 156117)
Posted 335 days ago by Tuna Ertemalp
Ok. You asked for it... :)
This is:
- My hosts are serving only PrimeGrid, no other project
- Each host has N' real cores (therefore 2*N' HT threads) and X' GPUs.
- Let's assume each CPU task is to be run on just 1 core, for simplicity.
- At any moment, I want only N' CPU tasks and X' GPU tasks to be on my host, all "In Progress".
- I.e. nothing else waiting in my local queue to be grabbed when one of these tasks is completed.
- I.e. every single completion of any task results on immediate upload of that task followed by a download of a fresh new task
- To achieve these last two bullet points, I have the "fetch on update" and "report completed tasks immediately" settings turned on in my BOINC config (7.16.20). That works beautifully and doesn't need any further solutions.
- Currently, I have set my locations in PrimeGrid to
"Use CPU" = NO
"Max # of simultaneous PrimeGrid task" = X'
"Multi-threading: Max # of threads for each task" = 1
- Each host is in a group that matches their X'
- Which means, I am running as many "in progress" GPU tasks on each host at any time as there are GPUs, the queue is empty as I want it to be, and each of those GPU tasks are downloaded, processed, uploaded, and a new one is immediately downloaded. Obviously, that keeps my dozen+ GPUs occupied 100% of the time while leaving all my CPU cores idling.
- Why? One reason is that I found this increases the odds of me being "(1st)" on any task I complete since my GPUs are fast, for now, maybe for another 6-12 months, until 4080/4090 come out and are affordable. There are a few other reasons, mostly having to do with my OCD. :)
- Now, how do I add exactly N' CPU tasks to this mix without affecting the constant presence of X' in-progress GPU tasks? That is the question.
- I experimented with:
"Use CPU" = YES
"Max # of simultaneous PrimeGrid task" = N'+X'
- What happens eventually is that the BOINC/PrimeGrid understanding of task durations (I don't know who is responsible for that, BOINC or the project) change, eventually BOINC's idea of "NVIDIA GPU task request deferral interval" (and probably there is also a "CPU task request deferral interval") starts favoring CPU tasks, therefore they settle to a number where the host ends up with X'+N' CPU tasks with no GPU tasks: by uploading a GPU task and receiving a CPU task in return (even though the CPU is already busy with N' tasks, therefore this new "N'+1"th CPU task is now waiting to start after another CPU tasks finishes while leaving one GPU idle). And, when another CPU job finishes, and gets returned, most likely I get another CPU job as replacement, and boom, I have one more extra CPU job waiting in the local queue and one more idle GPU. Or some other mix that is not exactly N' CPU tasks and X' GPU tasks, all in progress, with nothing waiting. Sometimes, this gets to a point that I only have CPU jobs in progress + idle, with all GPUs idle, and it stays like that for days until I reset PrimeGrid in BOINC.
- Only if there was a way to set the following in PrimeGrid location settings:
"Max # of simultaneous PrimeGrid CPU task" = N'/4-1
"Max # of simultaneous PrimeGrid GPU task" = X'
then I could set this back to YES:
"Use CPU" = YES
and adjust this lovely and wonderful setting that already works beautifully:
"Multi-threading: Max # of threads for each task" = 4
to achieve my personal goal of having always just enough CPU tasks running concurrently each with 4 threads, plus 1 core free for any personal/background non-BOINC work as well as serving the CPU portion of the GPU tasks, plus exactly the correct number of GPU jobs to keep each GPU fully utilized all the time, with no tasks in my queue on my host waiting their turn.
Phew...
Yeah. That's it a nutshell. A large one.
Basically, if PrimeGrid (could have) had the "Max # of simultaneous PrimeGrid CPU/GPU task" settings separately, a lot could be done without ever worrying about what BOINC version the host is running, what the task cost predictions are, etc. For example, I could have set:
"Max # of simultaneous PrimeGrid GPU task" = X'+5
to make sure that my local GPU task queue always has X' in progress and 5 extra ones, completely separately from what I would want to do with my CPU tasks and their thread numbers. The sky is the limit!
Thanks for listening
Tuna
|
Next 10 posts
|