Message boards :
Number crunching :
LLR 3.8.20 Going Live!
Author |
Message |
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
Effective Immediately:
LLR v3.8.20 is live for SoB. (It's v8.00 in BOINC)
Other LLR projects will be upgraded in the near future.
You may use app_config to enable multithreading if you so desire.
We can't support 32 bit Macs anymore due to compiler incompatibilities, so once 8.00 goes live for a particular app, there will be no more 32 bit mac tasks available. (There was only a single 32-bit Mac successfully running tasks at PrimeGrid.)
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
3.8.20 is now live for all LLR projects.
____________
My lucky number is 75898524288+1 | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 929 ID: 3110 Credit: 236,858,205 RAC: 30,463
                           
|
You may use app_config to enable multithreading if you so desire.
How would I go about that? I got an app_info.xml working, but then I couldn't do Genefer at the same time. With a simple app_config, it said it couldn't find the apps named llrPPS or llrPPSE.
<app_config>
<app>
<name>llrPPS</name>
<max_concurrent>1</max_concurrent>
</app>
<app>
<name>llrPPSE</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrPPS</app_name>
<cmdline>-t 2</cmdline>
</app_version>
<app_version>
<app_name>llrPPSE</app_name>
<cmdline>-t 2</cmdline>
</app_version>
</app_config>
Thu 16 Mar 2017 04:58:33 PM MDT | PrimeGrid | Entry in app_config.xml for app 'llrPPS', plan class '' doesn't match any app versions
Thu 16 Mar 2017 04:58:33 PM MDT | PrimeGrid | Entry in app_config.xml for app 'llrPPSE', plan class '' doesn't match any app versions
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
You may use app_config to enable multithreading if you so desire.
How would I go about that? I got an app_info.xml working, but then I couldn't do Genefer at the same time. With a simple app_config, it said it couldn't find the apps named llrPPS or llrPPSE.
You can not use both app_config and app_info simultaneously, as far as I know.
If you're using app_info, just put the same <cmdline>, <avg_ncpus>, <max_ncpus>, and <max_concurrent> flags into the appropriate <app_version> blocks in app_info. You must also be using both the nw 3.8.20 version of LLR and the new 8.00 version of the wrapper.
If you're asking how to set up app_config without app_info, I posted detailed instructions in another thread. I'll repost them here. (Note that this example does not include max_concurrent. Feel free to add that if BOINC insists on downloading too many tasks.)
While the new app supports multithreading, you must use app_config.xml to activate the multithreading.
Let's say you have a 4-core CPU like a Core-5 or a Core-i7 (ignore the hyperthreads -- if you have an i7, tell BOINC to only use 50% of the "CPUs" or disable hyperthreading in the BIOS.) This is an example of app_config.xml:
<app_config>
<app>
<name>llrSOB</name>
<fraction_done_exact/>
</app>
<app_version>
<app_name>llrSOB</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
</app_config>
Two notes about that example.:
1) To set up more than one app for multithreading, duplicate the <app> and <app_version> sections within the <app_config> block. There should only be one <app_config> block.
2) Notice that in the <cmdline> tag, there IS a space between "-t" and "4". This is different than if you were feeding the -t parameter directly to LLR (such as when running it by hand), where you must NOT use a space.
That example goes in a file called app_config.xml, which, for Windows, goes in the directory C:\ProgramData\BOINC\projects\www.primegrid.com
____________
My lucky number is 75898524288+1 | |
|
Nortech Volunteer tester Send message
Joined: 7 Jun 10 Posts: 23 ID: 61946 Credit: 253,897,493 RAC: 7,508
                       
|
Thu 16 Mar 2017 04:58:33 PM MDT | PrimeGrid | Entry in app_config.xml for app 'llrPPSE', plan class '' doesn't match any app versions
I also found those lines in my event log when I deleted app_info.xml and created an app_config.xml. However, when looking at processor usage, multi-threading was actually working as intended. As a work-unit of each type is downloaded, the corresponding "app version" complaint seems to disappear from the event log... | |
|
|
Hello,
i have a question:
I get LLR 3.8.20 to work under Ubuntu 17.04, but it seems i make something wrong.
http://www.primegrid.com/show_host_detail.php?hostid=537602
I get this Error-message from the "BOINC-Event-Log" but the Multithreading works fine.
Fri 17 Mar 2017 01:59:03 PM CET | PrimeGrid | Scheduler request completed: got 1 new tasks
Fri 17 Mar 2017 01:59:03 PM CET | PrimeGrid | [error] Can't create link file projects/www.primegrid.com/stat_icon
Fri 17 Mar 2017 01:59:04 PM CET | PrimeGrid | [error] Can't create link file projects/www.primegrid.com/slideshow_gcwsieve_00
Fri 17 Mar 2017 01:59:04 PM CET | PrimeGrid | [error] Can't create link file projects/www.primegrid.com/slideshow_psp_sr2sieve_00
Fri 17 Mar 2017 01:59:05 PM CET | PrimeGrid | [error] Can't create link file projects/www.primegrid.com/slideshow_primegen_00
Fri 17 Mar 2017 01:59:05 PM CET | PrimeGrid | [error] Can't create link file projects/www.primegrid.com/slideshow_llrTPS_00
Fri 17 Mar 2017 01:59:05 PM CET | PrimeGrid | [error] Can't create link file projects/www.primegrid.com/slideshow_llrWOO_00
Fri 17 Mar 2017 01:59:06 PM CET | PrimeGrid | [error] Can't create link file projects/www.primegrid.com/slideshow_llrCUL_00.
And also all Workunits reports as "Anonymous plattform(CPU)".
Here is my llr.ini.6.07:
Work=0
PgenOutputFile=llrout.txt
PgenLine=0
Expr=Not yet implemented
ExprFile=Not yet implemented
ExprFileLine=1
WorkDone=0
OutputIterations=7000
ResultsFileIterations=99999999
DiskWriteTime=10
TwoBackupFiles=1
CumulativeTiming=0
RunOnBattery=1
HideIcon=1
TrayIcon=0
Left=36
Top=23
Right=804
Bottom=535
And my app_info.xml:
<app_info>
<app>
<name>llrTPS</name>
<user_friendly_name>SGS (LLR)</user_friendly_name>
<fraction_done_exact>1</fraction_done_exact>
</app>
<app>
<name>llrSR5</name>
<user_friendly_name>SR5 (LLR)</user_friendly_name>
<fraction_done_exact>1</fraction_done_exact>
</app>
<app>
<name>llrWOO</name>
<user_friendly_name>Woodall (LLR)</user_friendly_name>
<fraction_done_exact>1</fraction_done_exact>
</app>
<app>
<name>llrMEGA</name>
<user_friendly_name>PPS-MEGA (LLR)</user_friendly_name>
<fraction_done_exact>1</fraction_done_exact>
</app>
<file_info>
<name>primegrid_llr_wrapper_8.00_x86_64-pc-linux-gnu</name>
<executable/>
</file_info>
<file_info>
<name>llr64.3.8.20</name>
<executable/>
</file_info>
<file_info>
<name>llr.ini.6.07</name>
</file_info>
<app_version>
<app_name>llrTPS</app_name>
<version_num>800</version_num>
<api_version>6.10.6</api_version>
<cmdline>-t 16</cmdline>
<file_ref>
<file_name>primegrid_llr_wrapper_8.00_x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>llr64.3.8.20</file_name>
<open_name>primegrid_llr</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>llr.ini.6.07</file_name>
<open_name>llr.ini</open_name>
<copy_file/>
</file_ref>
</app_version>
<app_version>
<app_name>llrSR5</app_name>
<version_num>800</version_num>
<api_version>6.10.6</api_version>
<cmdline>-t 16</cmdline>
<file_ref>
<file_name>primegrid_llr_wrapper_8.00_x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>llr64.3.8.20</file_name>
<open_name>primegrid_llr</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>llr.ini.6.07</file_name>
<open_name>llr.ini</open_name>
<copy_file/>
</file_ref>
</app_version>
<app_version>
<app_name>llrWOO</app_name>
<version_num>800</version_num>
<api_version>6.10.6</api_version>
<cmdline>-t 16</cmdline>
<file_ref>
<file_name>primegrid_llr_wrapper_8.00_x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>llr64.3.8.20</file_name>
<open_name>primegrid_llr</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>llr.ini.6.07</file_name>
<open_name>llr.ini</open_name>
<copy_file/>
</file_ref>
</app_version>
<app_version>
<app_name>llrMEGA</app_name>
<version_num>800</version_num>
<api_version>6.10.6</api_version>
<cmdline>-t 16</cmdline>
<file_ref>
<file_name>primegrid_llr_wrapper_8.00_x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>llr64.3.8.20</file_name>
<open_name>primegrid_llr</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>llr.ini.6.07</file_name>
<open_name>llr.ini</open_name>
<copy_file/>
</file_ref>
</app_version>
</app_info>. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
Hello,
i have a question:
I get LLR 3.8.20 to work under Ubuntu 17.04, but it seems i make something wrong.
http://www.primegrid.com/show_host_detail.php?hostid=537602
Since 3.8.20 is now the standard production application, is there a reason you're using app_info? If you remove app_info, it should all work automatically as long as BOINC is installed correctly.
(Note that removing app_info will kill any tasks currently in progress.)
____________
My lucky number is 75898524288+1 | |
|
|
I get it run. Thanks.
| |
|
|
I can't seem to get it to run for llrMEGA. Created an app_config.xml file just as Mike said and placed in the correct directory for Win 7. Help!
[/img] | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
I can't seem to get it to run for llrMEGA. Created an app_config.xml file just as Mike said and placed in the correct directory for Win 7. Help!
[/img]
Did you restart the BOINC client?
____________
My lucky number is 75898524288+1 | |
|
|
I did not. Restarting now. Thanks Mike! | |
|
|
http://www.primegrid.com/forum_thread.php?id=7348&nowrap=true#106105 Thank you. That was exactly what I was thinking about. This method will kill the competition on LLR tests, but it will cost efficiency too./OT Surprisingly, you may find it increases efficiency.
Hi, I'm experiencing a strange behaviour on my host (id: 518288). I set the app_config.xml for llrMEGA app. Should I not do it? My host downloads 4 tasks and it finishes the first one in 1000s, the second one in 2000s, the third one in 3000s and the fourth one in 3700s. The expected time should be 1000s per multithreaded task, that is a worse time than the run time (~3600s) for 4 single-threaded tasks.
1000*4 > 3600. | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3114 ID: 50683 Credit: 76,797,694 RAC: 4,051
                       
|
http://www.primegrid.com/forum_thread.php?id=7348&nowrap=true#106105 Thank you. That was exactly what I was thinking about. This method will kill the competition on LLR tests, but it will cost efficiency too./OT Surprisingly, you may find it increases efficiency.
Hi, I'm experiencing a strange behaviour on my host (id: 518288). I set the app_config.xml for llrMEGA app. Should I not do it? My host downloads 4 tasks and it finishes the first one in 1000s, the second one in 2000s, the third one in 3000s and the fourth one in 3700s. The expected time should be 1000s per multithreaded task, that is a worse time than the run time (~3600s) for 4 single-threaded tasks.
1000*4 > 3600.
When you use multi threaded task , scaling is not perfect, especially on task that can fit your CPU cache size per core. Cache size on your CPU is 8 MB ( 2MB per core) and in that size FMA3 FFT length 256K test you do can fit. So you will lost some portion of time when you use multi threaded option.
Multi threaded option is good for much longer tasks...
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
314187728^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2329 ID: 1178 Credit: 15,618,807,018 RAC: 11,302,221
                                           
|
http://www.primegrid.com/forum_thread.php?id=7348&nowrap=true#106105 Thank you. That was exactly what I was thinking about. This method will kill the competition on LLR tests, but it will cost efficiency too./OT Surprisingly, you may find it increases efficiency.
Hi, I'm experiencing a strange behaviour on my host (id: 518288). I set the app_config.xml for llrMEGA app. Should I not do it? My host downloads 4 tasks and it finishes the first one in 1000s, the second one in 2000s, the third one in 3000s and the fourth one in 3700s. The expected time should be 1000s per multithreaded task, that is a worse time than the run time (~3600s) for 4 single-threaded tasks.
1000*4 > 3600.
Initially it will download the number of tasks equal to the number of MT CPUs set. After that , it will download a single task. For the initial set, the elapsed time counter will count from the beginning for all of them. In other words, the task with 3700s took much less than that...the 3700s is counting from the beginning of the first task.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
http://www.primegrid.com/forum_thread.php?id=7348&nowrap=true#106105 Thank you. That was exactly what I was thinking about. This method will kill the competition on LLR tests, but it will cost efficiency too./OT Surprisingly, you may find it increases efficiency.
Hi, I'm experiencing a strange behaviour on my host (id: 518288). I set the app_config.xml for llrMEGA app. Should I not do it? My host downloads 4 tasks and it finishes the first one in 1000s, the second one in 2000s, the third one in 3000s and the fourth one in 3700s. The expected time should be 1000s per multithreaded task, that is a worse time than the run time (~3600s) for 4 single-threaded tasks.
1000*4 > 3600.
Yeah, we all saw that happen too. It only happens when the first tasks are downloaded as far as we can tell.
____________
My lucky number is 75898524288+1 | |
|
|
I see. I didn't notice that the server-side run times were not the real run times (~990s). I thought my tasks were lasting more time than needed one for some unknown reason or bug. Actually, it was too strange to be real.
Why do you not reset the run time counter for every task? Is it not possible?
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
I see. I didn't notice that the server-side run times were not the real run times (~990s). I thought my tasks were lasting more time than needed one for some unknown reason or bug. Actually, it was too strange to be real.
Why do you not reset the run time counter for every task? Is it not possible?
It's not us. That's the way the way the BOINC client -- software we have no control over that you download directly from Berkeley -- is behaving.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
Can someone make up a reference app_config.xml with ALL LLR projects in it, or at least point to a list of the correct names to use to reference subprojects? I'm trying to patch one up and searching the forum isn't productive.
Edit: am I misunderstanding something here, that applications do not correlate with subprojects? If so, which applications apply to which subprojects?
Edit2: more by chance than any knowledge on my part I seem to have got it working, but would still appreciate if it could be cleared up how to configure subprojects separately as far as practical. | |
|
|
llrTPS (SGS)
llrWOO
llrCUL
llrCUL
llrPSP
pps_sr2sieve
llrPPS
ap26
llrSOB
trp_sr2sieve
llrTRP
genefer (GFN 21)
genefer_wr (GFN 22)
llrPPSE
llrSR5
llrESP
llrMEGA (PPS-MEGA)
genefer15
genefer16
genefer17low
genefer17mega
genefer18
genefer19
genefer20
gcw_sieve
llrGCW (GCW-LLR, not yet running)
Michael posted this a while back, you have to use advanced search and change the time frame to find it. These are the app names used
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
point to a list of the correct names to use to reference subprojects?
List of app names can be found here.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
Thanks both. Looks like my guesses weren't far off, but incomplete and will have to update later. Doing thread scaling testing on PSP now.
I also think I got confused when boinc client appeared to report an error, now I suspect it only does so if a subproject hasn't been run on a particular system before. | |
|
dh1sajVolunteer tester Send message
Joined: 13 Jul 08 Posts: 49 ID: 25532 Credit: 3,622,195,060 RAC: 55,838
                            
|
I also think I got confused when boinc client appeared to report an error, now I suspect it only does so if a subproject hasn't been run on a particular system before.
Right.
Had the same here, but once downloaded the first WU of SoB (in that particular case) all ran fine.....
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
Some early PSP multi-thread results, which are interesting...
PSP work is currently 1280k FFT, or near enough 10MB of ram required per task, so even a single task wont fit in the cache of consumer level Intel CPUs. Times listed below in kiloseconds are total CPU time (roughly cores multiplied by wall clock time). Times listed in hours are elapsed time.
i5-4570S which runs at 3.2 GHz all cores loaded, and has 6MB L3 cache, and dual channel dual rank DDR3 at 2400. Running units one per core took 92.2ks average, going down to 90.8ks (6.3h) for 4 threads. So 4t is 4.06x faster than single. So, not a significant boost, but no loss either. Overall a benefit as you return tasks faster.
i5-5675C OC to 3.5 GHz all cores loaded, and has 4MB L3 cache and 128MB L4 cache. Running units one per core took 77.0ks average, going UP to 81.9ks (5.7h) for 4 threads. So 4t is only 3.76x faster than single. I think this is a unique situation with the smaller L3, and huge L4 cache, that running singles isn't really limited, and going 4t overheads were not insignificant.
i7-6700k OC to 4.2 GHz all cores loaded, with 8MB L3 cache, and dual channel dual rank DDR4 2666. Running units one per core took 71.1ks average, going down to 61.0ks (4.2h) for 4 threads. So 4t is 4.67x faster than single. That is a nice boost.
I need to get some tests going on my 6600k to see if the cache helps the i7 or not.
For fun:
Xeon E5-2683v3 14 cores @ 2.3 GHz, estimating around 152ks total but that is only 3 hours in real time! I suspect there is some scaling problem as the CPU is only loaded around 85%. I want to try 2x7t later and see if that helps, but it will be a tradedoff of throughput and return time.
Ryzen 1700 8 cores @ 3.2, estimating 152ks (5.3h). That puts it comparable throughput to the i5 systems before but IPC is way down. | |
|
|
For fun:
Xeon E5-2683v3 14 cores @ 2.3 GHz, estimating around 152ks total but that is only 3 hours in real time! I suspect there is some scaling problem as the CPU is only loaded around 85%. I want to try 2x7t later and see if that helps, but it will be a tradedoff of throughput and return time.
From the tests I have run it looks like certain tasks will only consume so much CPU, and that running too many threads actually increases run time. Smaller tasks, less cpu, larger tasks more. For example, when I ran anything more than 3 threads on an SGS, and completion time increased. Granted that was done on old non-AVX cpus in a dual cpu machine, so there may have been other things going on.
Once you see CPU usage below what you would expect from the number of threads running, backing off thread count may actually improve return time--so perhaps no tradeoff. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,709,300,438 RAC: 1,076,981
                                   
|
For fun:
Xeon E5-2683v3 14 cores @ 2.3 GHz, estimating around 152ks total but that is only 3 hours in real time! I suspect there is some scaling problem as the CPU is only loaded around 85%. I want to try 2x7t later and see if that helps, but it will be a tradedoff of throughput and return time.
Can you try 13 cores?
Or in general, one core idle scenario (Ryzen).
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
Once you see CPU usage below what you would expect from the number of threads running, backing off thread count may actually improve return time--so perhaps no tradeoff.
I still have my working assumption which I'm only starting to test, that benefits are related to fitting work into the CPU cache. I will concentrate my initial testing to relatively large units (TRP or bigger) as I believe this is where the biggest gains will be seen. It seems like SR5 is around the break even point based on limited tests with the earlier (buggy) version. This is only based on running 1x4 vs 4x1, and doesn't consider scenarios of 2x2.
Can you try 13 cores?
Or in general, one core idle scenario (Ryzen).
I will try fewer cores, but not 13. I haven't decided how many as there are many permutations, but 7 is a logical place since it splits the CPU in two. Maybe I'll try 4, 10 and 12 also.
I don't intend to try further testing on Ryzen in the short term, after the current single unit is done. I might revisit this in some future version of LLR once it uses FMA3 transform, which is currently only implemented in recent test versions of 29.xx, and LLR is still built off 28.xx. | |
|
|
When you use multi threaded task , scaling is not perfect, especially on task that can fit your CPU cache size per core. Cache size on your CPU is 8 MB ( 2MB per core) and in that size FMA3 FFT length 256K test you do can fit. So you will lost some portion of time when you use multi threaded option.
Multi threaded option is good for much longer tasks...
What is the formula to check if it can fit?
I'm running 4 SoB tests without mt and they looks very slow.
8.75h, 54% ==> 16.2h
Yesterday I completed one SoB test with mt in ~3.5h => 4 tests in 14h.
So I will switch to multithreading again, but I would like to understand if there is a better configuration for my CPU (i7-4770k). | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
What is the formula to check if it can fit?
The data set size is the FFT size multiplied by 8. You can find the FFT size of completed work by clicking on tasks in your stats.
e.g. PSP work is currently 1280k FFT size. Multiply by 8, and you get 10240k or about 10MB.
If you run multiple separate tasks, you need multiple amounts of that. If you run one task with multi-threads, that is all you need. If the number is similar to or smaller than the L3 cache of the CPU, data will be held locally on the CPU and you get relatively high performance. If that number exceeds the L3 cache of the CPU, then you will be doing a lot of ram access, and performance can be limited by that.
Note the above doesn't strictly account for all usage, only that of the FFT data set, so a little bit more may be required for other code. As a guide, the above seems to work well when predicting relative performance. | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,621,444 RAC: 0
                    
|
My Phenom II X6 1100T CPU has 6 cores and 6 MB of L3 Cache.
https://en.wikipedia.org/wiki/List_of_AMD_Phenom_microprocessors#.22Thuban.22_.28E0.2C_45_nm.2C_Hexa-core.29
Based on FFT size I would expect SR5, PPS and SGS to scale well, but others not so much.
Mike has conveniently posted a table of FFT size vs Candidate Digits:
http://www.primegrid.com/forum_thread.php?id=7093&nowrap=true#105501
I'll have to do some testing to confirm this; 1x6 vs 2x3 vs 3x2 vs 6x1 (threads x instance) | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
Mike has conveniently posted a table of FFT size vs Candidate Digits:
http://www.primegrid.com/forum_thread.php?id=7093&nowrap=true#105501
I read this and thought, "I did???"
I guess I did. :)
FFT vs. digits was not the intended purpose of that table, so a few things you should know about it.
YMMY Seriously.
That table is specifically for the PSP double check, and since "K" affects FFT size, that table is only valid for PSP (and probably SoB and ESP, which have similar K values).
FFT sizes may differ for Riesel (c = -1) numbers, or numbers where the base is not 2.
FFT sizes will vary according to CPU type. In particular, the computer that does the FFT determination testing doesn't support AVX. Computers with AVX or FMA3 will use different FFT sizes for the same candidate.
The bottom line is you need to take that FFT size vs digits chart with a truckload of salt, and use it only as a generality. If you want to see what FFT size a number will actually use, actually run that number on YOUR computer and see what FFT size is used.
____________
My lucky number is 75898524288+1 | |
|
|
I'm running the new LLR on three cores out of four (i56600) and run two GFN on my two GPUs, 960 & 1060.
I noticed the following:
1) The GFN CPU jobs takes together about 10% of the CPU and the GPU load is below 90%.
2) When I suspend the three LLR jobs the CPU used by the GFN jobs drops to about 2%-3% and the GPU load rises into the nineties.
Does the new llr uses the GPU? | |
|
|
I'm running the new LLR on three cores out of four (i56600) and run two GFN on my two GPUs, 960 & 1060.
I noticed the following:
1) The GFN CPU jobs takes together about 10% of the CPU and the GPU load is below 90%.
2) When I suspend the three LLR jobs the CPU used by the GFN jobs drops to about 2%-3% and the GPU load rises into the nineties.
Does the new llr uses the GPU?
No, it doesn't use the GPU. Usually leaving one core free to feed a GPU would be enough, but perhaps in your case there are issues caused by e.g. LLR data pushing the Genefer working set out of cache, or the LLR and genefer processes migrating from one core to another.
If you are running two GeneferOCL tasks at the same time, either leave 2 cores free for those tasks, or consider turning on hyperthreading so those tasks can more efficiently share a single core. This is one of those cases where as has been said above, YMMV and there is no substitute for testing to find the best config on your machine.
P.S. I assume in all the above that you are running LLR on just a single thread.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 280 ID: 119185 Credit: 3,407,482,018 RAC: 3,936,454
                      
|
I'm running the new LLR on three cores out of four (i56600) and run two GFN on my two GPUs, 960 & 1060.
I noticed the following:
1) The GFN CPU jobs takes together about 10% of the CPU and the GPU load is below 90%.
2) When I suspend the three LLR jobs the CPU used by the GFN jobs drops to about 2%-3% and the GPU load rises into the nineties.
Does the new llr uses the GPU?
Memory bus capacity is probably the limiting factor. All jobs (CPU & GPU) access memory. LLR jobs use enough memory bus to start limiting GPU jobs. Which have to wait longer for their memory access, and their GPU load falls.
That means if LLR jobs aren't running, more memory bus is available for GPU jobs. They don't have to wait much for memory access, and GPU load rises. | |
|
|
For benchmark of multi-threaded LLR, can we use Prime95.exe v28.10?
Both use LLR v3.8.20.
| |
|
|
Assuming that memory bandwidth is a limiting factor that impairs the GPUs how does that explain why the genefer jobs consume more cpu time in the presence of LLR jobs.
I monitored the machine closely since I installed the 1060 a while back and didn't notice anything until the new LLR version came along.
I will however keep testing. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
Assuming that memory bandwidth is a limiting factor that impairs the GPUs how does that explain why the genefer jobs consume more cpu time in the presence of LLR jobs.
I monitored the machine closely since I installed the 1060 a while back and didn't notice anything until the new LLR version came along.
I will however keep testing.
I have also noticed this behavior. Running LLR on all 4 cores on a Haswell i5, I see Genefer's GPU task (also on a GTX 1060) increase, but the GPU itself keeps running at close to 100%. It's not clear to me whether the CPU usage that's being reported is accurate. It might be real, or it could be a measurement artifact. Regardless of what task manager shows, what is happening is that I'm able to run LLR on all cores (with an overall improvement in throughput, at least with some tasks), while the GPU continues to run at full speed. I'm not concerned about the CPU usage by Genefer. It's doesn't appear to be affecting anything, at least on my system.
____________
My lucky number is 75898524288+1 | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 280 ID: 119185 Credit: 3,407,482,018 RAC: 3,936,454
                      
|
Assuming that memory bandwidth is a limiting factor that impairs the GPUs how does that explain why the genefer jobs consume more cpu time in the presence of LLR jobs.
I monitored the machine closely since I installed the 1060 a while back and didn't notice anything until the new LLR version came along.
I will however keep testing.
Could be that the GPU process loops the CPU waiting for a memory transfer to complete. If the memory bus is busy, that could explain why more CPU is used by GPU process to account for memory transfer wait.
I'm assuming you have at least a dual channel memory config? I've purchased an 8GB laptop that only had a single memory stick.
| |
|
|
Question: Can I use LLR v3.8.20 also on PRPNet ? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
Question: Can I use LLR v3.8.20 also on PRPNet ?
You can, but the only obvious reason why you would want to do so would be to use multi-threading, and I'm not sure if that's currently possible.
If you want to try multithreading, it was briefly discussed here.
There's currently no way to pass the -t4 command line argument to LLR with PRPNet. Perhaps you can use llr.ini to invoke multi-threading?
____________
My lucky number is 75898524288+1 | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,621,444 RAC: 0
                    
|
Question: Can I use LLR v3.8.20 also on PRPNet ?
I just got multithreading working on PRPNet.
Assuming you want to use 2 threads, first create a cllr64.bat:
@echo off
IF "%~1"=="" (
echo. No command lines arguments passed
cllr64.exe -t2
) ELSE (
IF "%~2"=="" (
REM echo. One command line argument passed %1
cllr64.exe %1 -t2
) ELSE (
IF "%~3"=="" (
REM echo. Two command line arguments passed %1 %2
cllr64.exe %1 %2 -t2
) ELSE (
IF "%~4"=="" (
REM echo. Three command line arguments passed %1 %2 %3
cllr64.exe %1 %2 %3 -t2
) ELSE (
IF "%~5"=="" (
REM echo. Four command line arguments passed %1 %2 %3 %4
cllr64.exe %1 %2 %3 %4 -t2
) ELSE (
echo. Five or more command line arguments passed
)
)
)
)
)
Then in prpclient.ini use the line:
llrexe=cllr64.bat
Where cllr64.exe is the llr 3.8.20 version copied into your prpclient directory.
Happy crunching! :-) | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
I haven't tried it, but I think this much simpler batch file will do the same thing:
@echo off
cllr64.exe %1 %2 %3 %4 %5 %6 %7 %8 %9 -t2
This is for Windows, of course, but a similar shell file should be possible on Linux.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
Quick data point comparing i7-6700k 4.2 GHz 2666 ram, to i5-6600k 4.2 GHz 3000 ram, running PSP with 4 threads, the i5 took about 4% longer. It isn't entirely clear but it doesn't seem to suggest either a strong influence of ram or cache quantity.
The 14 core Xeon is more interesting.
Running 14 threads, PSP tasks took 129.1k CPU seconds in 11.0ks real time, for 84% utilisation.
Running 12 threads, PSP tasks took 126.0k CPU seconds in 11.9ks real time, for 88% utilisation. So less CPU time in more real time than 14.
Running 10 threads, a PSP task took 124.1k CPU seconds in 13.5ks real time, for 92% utilisation.
Running 7 threads, a PSP task took 121.3k CPU seconds in 18.1ks real time, for 96% utilisation.
Running 2x7 threads, PSP tasks took 121.7k CPU seconds in unknown real time due to reporting weirdness.
Maybe more data points would help, but this confirms my observation that beyond some point, more threads aren't able to saturate the allocated cores. For comparison, the quad core Intels were above 97% utilisation, and Ryzen 8t was about 93%.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
Multi-threading is always going to have some inherent inefficiencies due to the need for threads to pause while they wait to synchronize with the other threads. The more threads you run, the larger the overhead. The tradeoff is whether you gain more speed because of improved cache efficiencies from running a single task than you lose from thread synchronization.
On my Haswell i5-4670K, -t4 only runs at about 90% CPU utilization, but it's still usually more efficient than running 4 separate tasks. (Task size plays a factor here, of course.) I may lose 10% to the multithreading, but running 4 separate tasks loses even more although CPU utilization "rises" to 100%. The CPU may be fully occupied, but in reality it's running slower due to memory bottlenecks.
Here's some numbers from a test I ran on one of our 8-core servers:
Cores time Optimal Overhead Increase hours
1 47.018 47.018 0.00% 0.00% 412.9
2 23.972 23.509 1.97% 96.14% 210.5
3 16.092 15.673 2.68% 192.18% 141.3
4 12.191 11.755 3.71% 285.68% 107.0
5 9.869 9.404 4.95% 376.42% 86.7
6 8.455 7.836 7.89% 456.10% 74.2
7 7.322 6.717 9.01% 542.15% 64.3
The test was running a full sized leading edge n=31M SoB task, specifically 21181*2^31611548+1. The "time" column is the per-iteration time, in milliseconds, reported on the second timing output (at 20000 bits). Lower is faster.
The next column, "optimal", is simply "time/cores", i.e., the theoretical time you would get with 100% multithreading efficiency. "Overhead" is the difference between Optimal and the actual observed time. This is what you're giving up by using multithreading.
"Increase" is the increase in speed, as compared to running a single thread. In an ideal world, 2 threads would run with 100% increase, 3 threads at 200%, etc.
Finally, "hours" is how long the full test would run, if allowed to run to completion.
Note that AVX is not used in this example. Also, the server was otherwise idle (at least the 7 cores used in this test), so this is NOT a comparison of "7 single core tasks vs. 1 7-core task". It's a look at how much inefficiency multithreading causes.
____________
My lucky number is 75898524288+1 | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 989 ID: 301928 Credit: 543,186,393 RAC: 7
                        
|
I haven't tried it, but I think this much simpler batch file will do the same thing:
@echo off
cllr64.exe %1 %2 %3 %4 %5 %6 %7 %8 %9 -t2
This is a shortest one:
@cllr64.exe -t2 %*
| |
|
|
I kept on testing, trying various configuration and I found out that a 4coreLLR task interferes severely with the GPUs, 3coreLLR task leaves the GPUs alone but chokes the 1coreLLR task that uses the 4th core.
It is probably Voodoo but the following configuration works fine for me, running 4 BOINC clients:
Client #1: GPU #0 (1060), genefer20, No CPU task
Client #2: GPU #1 (960), AP27 | genefer17mega, No CPU task
Client #3: No GPU task, LLR SOB (2 cores)
Client #4: No GPU task, LLR MEGA (2 cores) | LLR SGS (2 cores)
All clients use zero queue. | |
|
|
I witnessed the same thing. However, with CPUs that support hyperthreading, I have been successful turning that on and limiting CPU threads to the physical cores. The "free" virtual cores seem to be enough to feed the GPUs.
On both a 6700K and a 5930K I have 4 and 6 threads worth of LLR, with HT on, and there is no discernable impact to the GPU tasks (GFN20) on the 1080 in the former and the 2 W9100s in the latter. If I ran 4 and 6 threads with HT off, it crushed the GPU tasks.
This also seems to work on an i3-4130 with a 290x.
I kept on testing, trying various configuration and I found out that a 4coreLLR task interferes severely with the GPUs, 3coreLLR task leaves the GPUs alone but chokes the 1coreLLR task that uses the 4th core.
It is probably Voodoo but the following configuration works fine for me, running 4 BOINC clients:
Client #1: GPU #0 (1060), genefer20, No CPU task
Client #2: GPU #1 (960), AP27 | genefer17mega, No CPU task
Client #3: No GPU task, LLR SOB (2 cores)
Client #4: No GPU task, LLR MEGA (2 cores) | LLR SGS (2 cores)
All clients use zero queue. | |
|
|
I witnessed the same thing. However, with CPUs that support hyperthreading, I have been successful turning that on and limiting CPU threads to the physical cores. The "free" virtual cores seem to be enough to feed the GPUs.
Did you do that via processor affinity and are you under Linux or Windows? | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
Did you do that via processor affinity and are you under Linux or Windows?
I have observed, on Windows at least, when using multi-threads it seems to park each thread on each pair of associated virtual cores. One of those is usually taking most of the work but not always the same one. This is an improvement over Windows behaviour with separate tasks, as they seem to wander around wherever it likes and that has resulted in a potential 10% throughput drop when HT enabled and running one per real core. I think this affinity behaviour is inherited from gwnum, but can't be sure. | |
|
|
Not sure whether it has been reported already, or should be reported elsewhere:
Multithreaded LLR tasks are reported with wrong runtime. The runtime which is listed in the results tables on this web site are all over the place, anywhere between the actual runtime, and the actual runtime times number of threads.
Obviously there is a race condition in the runtime accounting. Is this done in the application, or in the boinc client, or...?
As an example, see this results list: http://www.primegrid.com/results.php?hostid=530208
This host with 2 sockets, 14 cores per socket, 2.9 GHz all-core AVX turbo, has been working on PSP-LLR with <cmdline>-t 7</cmdline> for more than two weeks now. Timestamp of app_config.xml is April 4, 06:00 UTC. Right now, the boincmgr UI is showing an estimated task duration of 8h40m = 31,000 s. (Average runtimes earlier in April were 5h30m...6h on this host with -t 7, but WUs have gotten bigger since then.) | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
Not sure whether it has been reported already, or should be reported elsewhere:
Multithreaded LLR tasks are reported with wrong runtime. The runtime which is listed in the results tables on this web site are all over the place, anywhere between the actual runtime, and the actual runtime times number of threads.
Obviously there is a race condition in the runtime accounting. Is this done in the application, or in the boinc client, or...?
As an example, see this results list: http://www.primegrid.com/results.php?hostid=530208
This host with 2 sockets, 14 cores per socket, 2.9 GHz all-core AVX turbo, has been working on PSP-LLR with <cmdline>-t 7</cmdline> for more than two weeks now. Timestamp of app_config.xml is April 4, 06:00 UTC. Right now, the boincmgr UI is showing an estimated task duration of 8h40m = 31,000 s. (Average runtimes earlier in April were 5h30m...6h on this host with -t 7, but WUs have gotten bigger since then.)
When you first download multithreaded tasks, BOINC downloads too many of them. For example, with a four core CPU, set up with -t 4, you'll initially get 4 tasks instead of 1. The elapsed times of those 4 tasks will be strange. For example, suppose the actual run time is 10,000 seconds. The first of those 4 tasks will correctly report running for 10,000 seconds, but the second will show 20,000 seconds, the third will show show 30,000 seconds, and the fourth will show 40,000 seconds.
After that, BOINC will only download one task at a time, and the time will be correct.
There's nothing we can do about this as it's part of the BOINC client that you download directly from Berkeley and we have no control over that software.
____________
My lucky number is 75898524288+1 | |
|
|
When you first download multithreaded tasks, BOINC downloads too many of them. For example, with a four core CPU, set up with -t 4, you'll initially get 4 tasks instead of 1. The elapsed times of those 4 tasks will be strange. For example, suppose the actual run time is 10,000 seconds. The first of those 4 tasks will correctly report running for 10,000 seconds, but the second will show 20,000 seconds, the third will show show 30,000 seconds, and the fourth will show 40,000 seconds.
After that, BOINC will only download one task at a time, and the time will be correct.
In my case, the times continue to be incorrect. Possibly because this machine works on four of such tasks simultaneously, or/and downloads several tasks at once.
There's nothing we can do about this as it's part of the BOINC client that you download directly from Berkeley and we have no control over that software.
Ah, OK, so the issue is with the client, not with the applications.
PS: Oh, I see there was a related discussion on March 24 in this very thread. | |
|
|
point to a list of the correct names to use to reference subprojects?
List of app names can be found here.
Thanks a lot for the app names and the instructions to build an app_config.xml - it's working perfectly.
Is there any more information on which PG tasks do not benefit from MT?
I've seen SGS mentioned in several threads, that it's never more effiient in MT mode than on discrete cores.
Which other tasks will not benefit from a -t4 on a bread and butter four core CPU?
Other than being able to push out a few more WUs before the end of a challenge, of course..
Cheers
Holger
____________
My stats | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,709,300,438 RAC: 1,076,981
                                   
|
Is there any more information on which PG tasks do not benefit from MT?
I've seen SGS mentioned in several threads, that it's never more effiient in MT mode than on discrete cores.
Which other tasks will not benefit from a -t4 on a bread and butter four core CPU?
Other than being able to push out a few more WUs before the end of a challenge, of course..
Cheers
Holger
I think there is not single and simple answer to cover all cases.
Basically, if FFT fits in CPU cache for all tasks, there is likely no benefit running -t4.
This might not be an issue with 4 small SGS tasks but plays larger role with longer tasks with larger FFT.
As you may see, it depends not only on current FFT size of each LLR subproject but also on size of a particular CPU cache. Intel i3 is not the same as server Xeon.
Memory speed may also play a role.
Personally, I prefer to run MT for larger tasks not only when it is faster but also when it is about the same speed. This approach gives better chance of prime finder, is less disk intensive, less memory demanding, easies to meet deadline, fewer tasks in progress etc.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
There are a lot of variables and it is hard to say what is "best". There are two slightly different goals: doing units as fast as possible, doing as many units in a given time as possible.
In the following, when I refer to cores, I mean physical cores, not any extra threads from HT or SMT. While I have not tested it, it is assumed they still don't provide a significant benefit apart from possibly helping drive a GPU at the same time if you need to.
Doing LARGE units as fast as possible seems to be as simple as use all the cores you have. There seems to be some difficulty scaling beyond 8 cores where each core seems unable to be fully loaded.
In observation from the PSP challenge, it looks like ram bandwidth is far less important than it was running a separate task on each core. Similarly, it doesn't look like cache size plays much of a role, so a value sweet spot for LLR may have just moved from i3s to i5s.
For small tasks, there seems to be some inefficiency. Personally I wouldn't run 4 core on anything MEGA or smaller. It might be worth it on a dual, but I haven't tested it. | |
|
|
The one issue with i5s is the lack of HT, if you are trying to drive a GPU. I had to keep my i5s restricted to 3 threads or the GPU would starve.
I took an even more conservative approach, I didn't scale any units past six cores. I noticed the same drop in utilization if I went beyond that. | |
|
|
I just reported my first SoB using multithreading (-t4) and the run time was not what I was expecting (i7 6700k) with the unit tooking a bit over 7 hrs to complete (FFT length 1280K). I have units also running on my two other Skylakes so as to build a better overall picture. | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 929 ID: 3110 Credit: 236,858,205 RAC: 30,463
                           
|
I just reported my first SoB using multithreading (-t4) and the run time was not what I was expecting (i7 6700k) with the unit tooking a bit over 7 hrs to complete (FFT length 1280K). I have units also running on my two other Skylakes so as to build a better overall picture.
Given the credit you got for that unit, that seems fairly reasonable. My i7 6700 non-K, which doesn't have enough heat sink yet to run above 3.3GHz, did a bigger PSP WU in about 11 hours.
____________
| |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1022 ID: 55391 Credit: 888,563,321 RAC: 135,762
                       
|
HT is beneficial to throughput with multithreading because of better CPU utilization, typically when a thread stalls due to a cache miss. Here's a demonstration of 8.4% throughput increase using HT and multithreading just 1 task, compared to running 1 task per core with HT off.
5820K (Haswell-E) 3.3 GHz no OC; 6 physical cores, HT on (12 threads), 15 MB L3 cache; 16 GB RAM
Stock fan, speed is set in the BIOS to 100% to keep higher core temperatures from throttling the speed.
6 physical cores, HT on, 1 task, 10 threads
BOINC computing preference 100% (so 12 threads allowed).
appconfig: llr321 -t 10
CPU loading is one task of LLR321 and one task of GFN22 simultaneously. No CPU affinity setting, letting Linux manage threads. The speed of GFN 22 is not impacted, as there are 2 threads available for GFN. GPU utilization is 99% even with only 1 thread for GFN.
LLR321 FMA3 FFT length 768k
According to BOINC task properties: virtual memory size ~ 153 MB, working set size ~ 47 MB. There is plenty of RAM to hold the task, but the working set (memory used by data to solve the problem) does not fit in the CPU cache. One FFT fits in the L3 cache but there will be frequent cache misses for the rest of the working set.
Task CPU time 77,528s = 21h32m8s
Actual run time 8,901s = 2h28m21s
Average thread loading = run time / CPU time = 8.71 threads
Thread efficiency = thread loading / threads occupied = 8.71 / 10 = 87%.
Multithreading overhead (by Micheal's definition) = 10 / 8.71 - 1 = 14.8%
But look at this:
Core efficiency = thread loading / physical cores = 8.71 / 6 = 145%
The increase in core efficiency more than compensates for the multithreading overhead, for a net increase in throughput of 8.4%, calculated here...
6 physical cores, HT off, 6 tasks, 1 thread per task
This configuration was not directly measured, but uses mackerel's estimate of a 10% performance drop when HT is left on.
HT on
BOINC computing preference at 50% (so 6 threads allowed)
no multithreading
BOINC ran a GFN22 task and 6 LLR321 tasks in parallel. The LLR average run time (also average CPU time) was 18h2m. The average LLR task completion time 3h. Now depending on how you measure a 10% performance drop, the computed runtime with HT off would be either 10,800s / 1.1 = 9,818s or 10,800s * 0.9 = 9,720s. Just to please the skeptics, let's take the more aggressive benefits of turning off HT, with a runtime of 9,720s.
Change in Throughput
The reduction in average runtime by using multithreading and more threads than cores with HT on is 9,270s - 8,901s = 819s. This is an increase in throughput of 819 / 9,720 = 8.4%, compared to running 1 task per core on all cores with HT off.
As Micheal said, the cost of using more threads to get faster task completion is lower CPU efficiency. Other effects are higher CPU temperature and higher energy usage per task. With only physical cores, using more threads incurs a performance penalty in terms of lower throughput.
When a hyperthread stalls, the other hyperthread on the core may be able to execute, and the result can be higher throughput.
I haven't yet explored the best number of threads for highest throughput or better energy efficiency with the same throughput, but so far I'm pretty happy with -t 10. I can't play with other LLR loads right now since Rafael is getting his payback with this computer for helping me on the TRP sieve. I invite others to turn on HT, try running a single LLR with more threads than cores, and relay their result back here. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
I suppose now is as good a time as any for me to try and test this...
I've got a 6700k idle, will set up a batch file to run (with HT on) 8, 6, 4, 3, 2, 1 on the same TBD task. I'll probably try one of the previous SR5 units as an initial test, then expand to bigger projects once I got the above working as intended.
Edit: for my interests I'm only concerned about CPU performance. Adding a GPU complicates matters somewhat... | |
|
|
I just reported my first SoB using multithreading (-t4) and the run time was not what I was expecting (i7 6700k) with the unit tooking a bit over 7 hrs to complete (FFT length 1280K). I have units also running on my two other Skylakes so as to build a better overall picture.
Given the credit you got for that unit, that seems fairly reasonable. My i7 6700 non-K, which doesn't have enough heat sink yet to run above 3.3GHz, did a bigger PSP WU in about 11 hours.
This system is now offline since something went horribly wrong a number of hours ago. I will know in a few hours if the PSU is the culprit.
I just checked one of my other Skylake systems (i3 6300) and the CPU temperature is fluctuating between 98 and 100%. I have set BOINC to "no new tasks" so as to leave the system idle and for me to decide what to do with multithreading once the two tasks in queue (one virtually finished) are done with. The timing of the year does not help with ambient temperatures continuing to rise (we are expecting 29d C/84d F) later this week. I have a suspicion I will be turning back to sieving so as to minimize the stress on my CPU's this time of the year and with summer just around the block. In other words, LLR and multithreading can wait for winter.
(I will apply fresh thermal paste and proceed with a small experiment involving multithreading one last time). | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1022 ID: 55391 Credit: 888,563,321 RAC: 135,762
                       
|
Tune down the number of threads in app_config.xml so that your system runs cooler. If you have the option of CPU fan speed control in your BIOS, crank it up to the highest tolerable noise level.
Summer is indeed just around the block. It's snowing again as I look out the window right now. | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 929 ID: 3110 Credit: 236,858,205 RAC: 30,463
                           
|
Tune down the number of threads in app_config.xml so that your system runs cooler. If you have the option of CPU fan speed control in your BIOS, crank it up to the highest tolerable noise level.
I generally find that using all available cores, but lowering the CPU speed with something like ThrottleStop, is better than reducing the number of cores used. (I don't know about HT threads; I await Mackerel's tests with interest!)
____________
| |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1022 ID: 55391 Credit: 888,563,321 RAC: 135,762
                       
|
I said...
Average thread loading = run time / CPU time = 8.71 threads
Doh! Never enough time for proof-reading. This should be
Average thread loading = CPU time / run time = 77,528 / 8,901 = 8.71 threads | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
-t time (s) relative
1 8168 1.00
2 4410 1.85
3 3016 2.71
4 2347 3.48
5 2337 3.50
6 2245 3.64
7 2191 3.73
8 2108 3.87
This is for the previously "bug" SR5 unit 64598*5^2318694-1. Ran 1 through 8 threads on i7-6700k fixed at 4.2 GHz, HT enabled, 2666 dual rank dual channel ram. It does look like running on those extra threads does claw towards the ideal scaling for 4 cores. Running 8 threads over 4 gains about 11%. Those feeding GPUs can probably spare one without much impact to CPU, but someone else can test that.
I'll set up 4 tasks in parallel to run while I'm at work, for comparison against the old case of running one task per core. I can't quickly set HT off or affinity on the system so it'll likely have the HT penalty in this state.
I could also repeat the scaling on an i5-6600k too... smaller cache, faster ram, no HT.
Edit: I don't believe the 11% I saw was the HT penalty, as running with -t makes LLR appear to set affinity of each to a pair of threads. Let's see if the i5 result scales similar up to 4 cores which would suggest this, or if it scales better, it could still be HT penalty. | |
|
|
...but will the CPU be able to do 8 tasks in parallel in 8168s?
____________
My stats | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
No, and I wouldn't expect it to do 4 in that time either. That is why I'm running 4 in parallel to see how long that takes, and we can work out a benefit from that. While I expect the tasks to complete long before then, I wont be able to see the results for another 7 hours or so. | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1022 ID: 55391 Credit: 888,563,321 RAC: 135,762
                       
|
We are exploring the benefits of HT with multithreading one task. So far it's looking like this will produce a winning combination: minimizing a task's runtime as well as maximizing the CPU's throughput.
I'd like to see the impact on throughput of HT and multithreading in other combinations of tasks and threads on 4 cores:
2 tasks with -t 4
3 tasks with -t 2
4 tasks with -t 2
and the wildcard: 3 tasks with -t 3
I also have an interesting observation:
For LLR 1 task and using CPU affinity for all threads, the task's master thread has more % CPU occupancy than all the other threads of the task (~ 97% versus ~ 82% for 10 threads). If other CPU-intensive work occupies the other hyperthread on the same core as the master thread, then all the LLR task's threads have reduced CPU occupancy (and probably longer runtime). This tells me that the master thread of each simultaneous task should be pinned to its own physical core for maximum throughput. However, I don't think there's a mechanism for BOINC to handle CPU affinity. Too bad. Can we fork BOINC?
Edit; or maybe a better idea would be to run a PG coprocess that communicates with wrappers to hand out CPU assignments. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
For small tasks, the scaling of multiple smaller tasks with thread number is potentially interesting. In this scenario, multiple -t 2 or -t 4 might be tested. I might have a go at this once I'm done with big tasks.
For big tasks, I doubt multiple numbers of fewer threads would help in that. I'll leave that as an exercise for someone else.
As for benefit from manual affinity setting, I'll await the i5 test results before speculating in depth. It does look like multithread LLR already does some affinity itself. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
Ok, some more data in.
Running 4 single thread tasks it took 18% longer than running four of tasks with 4 threads, or 31% longer than running four of tasks with 8 threads. Remember this is with HT on without affinity on the single tasks, so past experience suggests about 10% penalty. I'm re-running -t 4 now with HT off to check this.
The 6600k has also completed a set. Like for like against the i7, it is 1 to 2% faster for -t 1 through 3. The -t 4 result is obviously too long and I suspect some windows background task has interfered with that, and am re-running it now.
For now, it looks like there is no significant benefit to be gained from extra affinity tinkering when running more than one thread per task.
The 6600k is set to same clock speed as the 6700k, but has smaller cache, and faster ram. I think ram speed influence is what I'll look at after the current tests are done. | |
|
|
<app_config>
<app>
<name>llr321</name>
<fraction_done_exact/>
</app>
<app_version>
<app_name>llr321/app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
</app_config>
Is this correct XML content for app_config.xml for 321?
I am running Debian 8. Where do I put the file?
Once I have the file in place, what are the steps to get started with it? (At the moment I am running 4 tasks in parallel.) | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1931 ID: 352 Credit: 5,709,300,438 RAC: 1,076,981
                                   
|
Once I have the file in place, what are the steps to get started with it? (At the moment I am running 4 tasks in parallel.)
Since you have already downloaded some LLR321 tasks, you can search for respective app_name in your client_state.xml file.
(if you setup unknown app_name, BOINC will ignore this settings and should continue using default. In case of doubt, disable BOINC network activity, stop BOINC and do a backup.)
Basically, you can continue using 4 cores per task.
Stop BOINC, place app_config.xml and run BOINC again.
BOINC will grab first task, finish it using 4 cores from check-point then will proceed to another one, etc.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 | |
|
|
Once I have the file in place, what are the steps to get started with it? (At the moment I am running 4 tasks in parallel.)
Since you have already downloaded some LLR321 tasks, you can search for respective app_name in your client_state.xml file.
(if you setup unknown app_name, BOINC will ignore this settings and should continue using default. In case of doubt, disable BOINC network activity, stop BOINC and do a backup.)
Basically, you can continue using 4 cores per task.
Stop BOINC, place app_config.xml and run BOINC again.
BOINC will grab first task, finish it using 4 cores from check-point then will proceed to another one, etc.
"place app_config.xml" -- where exactly? | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 906 ID: 370496 Credit: 481,974,741 RAC: 371,699
                   
|
"place app_config.xml" -- where exactly?
Assuming default BOINC settings on instal,
C:\ProgramData\BOINC\projects\www.primegrid.com
While you're at it, I suggest pinning that BOINC directory to quick access, so that it's easy to reach and make changes on the fly (whether to BOINC, Primgrid or any other project). | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2584 ID: 29980 Credit: 550,654,016 RAC: 5,414
                             
|
For default Windows install it is C:\ProgramData\BOINC\projects\www.primegrid.com | |
|
|
Guys, I am on Debian 8 -- not Windows. Where on this do I put app_config.xml?
I have tried ".BOINC" and I have tried "/var/lib/boinc-client/" and "/etc/boinc-client/" | |
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1077 ID: 183129 Credit: 1,365,637,185 RAC: 0
                        
|
Guys, I am on Debian 8 -- not Windows. Where on this do I put app_config.xml?
I have tried ".BOINC" and I have tried "/var/lib/boinc-client/" and "/etc/boinc-client/"
Put it in /var/lib/boinc-client/projects/www.primegrid.com | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
<app_config>
<app>
<name>llr321</name>
<fraction_done_exact/>
</app>
<app_version>
<app_name>llr321/app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
</app_config>
Is this correct XML content for app_config.xml for 321?
I am running Debian 8. Where do I put the file?
Once I have the file in place, what are the steps to get started with it? (At the moment I am running 4 tasks in parallel.)
I'm afraid you've received some responses that were less than 100% helpful. Unfortunately, I'm not home so this response also won't be 100% helpful. :)
Your app_config is close enough. It will work. I forget which, but either max_cpus or avg_cpus is not used in app_config and will be ignored. Its presence in app_config won't hurt anything. (I think there's an early message by me in this thread which says which one is unneeded, but I'm not going to search for it while on my phone. )
As to where it goes, that would be the projects/www.primegrid.com in your BOINC data directory. Look at the top of your BOINC log to see where the BOINC data directory is on your Linux system.
To activate it, you can restart BOINC Or you can select "reread configuration files" from within the BOINC manager.
Once it's activated, expect some strange behavior from the 321 tasks already on your system. they will work, but might not use all the cores they're supposed to use.
____________
My lucky number is 75898524288+1 | |
|
|
Thanks for all your replies. It is now working, although with xsensors core temperatures appear to be up from 73C to 80C. (Running at 40x 100MHz on a 4770k.)
Again ~10^4000000 thanks!
ps. I noticed that I had missed out a "<" in the app_config.xml file. | |
|
|
Combining few posts in another thread:
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107580
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107589
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107592
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107594
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107595
I created the below that I am now using on all of my 4/8/12/16 thread hosts.
Note that you cannot use it as-is; some minimal editing is required. You will need to:
1) Totally exit BOINCMgr (not just minimize!)
2) Create a {BOINCDataFolder}\projects\www.primegrid.com\app_config.xml, copy/paste this into it, and then ReplaceAll "NumProcs" with whatever value you want it to be. If you decide to give different NumProcs to different apps, then make sure to realize that each app occurs twice in this file, and to make sure each pair is consistent with each other.
3) Restart BOINCMgr
Also, when you restart BOINCMgr, you might get bunch of errors/warnings about stuff like "Your app_config.xml file refers to an unknown application 'llrSOB'. Known applications: 'pps_sr2sieve', 'ap26', 'llrPSP', 'llrTPS', 'trp_sr2sieve', 'llrPPSE', 'llrWOO', 'llrCUL', 'genefer', 'llrPPS', 'llrSR5', 'llrMEGA', 'gcw_sieve', 'llrGCW'". These are ignorable. All that means is that your host is probably set to receive tasks only from a subset of projects, which you probably manually handpicked using http://www.primegrid.com/prefs.php?subset=project, therefore to that host, not all projects are known. BOINCMgr just reports this, ignores it, and uses whatever is useful to it.
I am sure I will stand corrected if I made a mistake. There are a few warnings about doing this both in this thread as well as Michael's http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107595. So, be careful. For now, I max'd each out on all my hosts. I'll see over time if that was deadly...
Tuna
<app_config>
<app>
<name>llrGCW</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrGCW</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrWOO</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrWOO</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrCUL</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrCUL</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrPSP</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrPSP</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrSOB</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrSOB</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrTRP</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrTRP</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrSR5</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrSR5</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrESP</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrESP</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
</app_config>
____________
| |
|
|
You will need to:
1) Totally exit BOINCMgr (not just minimize!)
2) Create a {BOINCDataFolder}\projects\www.primegrid.com\app_config.xml, copy/paste this into it, and then ReplaceAll "NumProcs" with whatever value you want it to be. If you decide to give different NumProcs to different apps, then make sure to realize that each app occurs twice in this file, and to make sure each pair is consistent with each other.
3) Restart BOINCMgr
BOINC restart is not needed when you create or modify app_config.xml. It is enough to do this:
- create/update app_config.xml: file;
- disable option "Leave non-GPU tasks in memory while suspended" (Options/Computing preferences/Disk and memory tab);
- suspend all PrimeGrid tasks (or easier suspend CPU usage);
- reload config files (Options/Read config files);
- resume tasks.
Note: after doing this GUI still will display previous CPU count for apps. This is display bug only, number of running tasks and their command line will be adjusted properly.
____________
| |
|
|
In program files > BOINC
app_config xml document
<?xml version="1.0"?>
-<app_config>
-<app>
<name>llrMEGA</name>
<fraction_done_exact/>
</app>
-<app_version>
<app_name>llrMEGA</app_name>
<cmdline>-t 2</cmdline>
<avg_ncpus>2</avg_ncpus>
<max_ncpus>2</max_ncpus>
</app_version>
-<app>
<name>llrSOB</name>
<fraction_done_exact/>
</app>
-<app_version>
<app_name>llrSOB</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
-<app>
<name>llrCUL</name>
<fraction_done_exact/>
</app>
-<app_version>
<app_name>llrCUL</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
-<app>
<name>llrPSP</name>
<fraction_done_exact/>
</app>
-<app_version>
<app_name>llrPSP</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
-<app>
<name>llr321</name>
<fraction_done_exact/>
</app>
-<app_version>
<app_name>llr321</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
</app_config>
This isn't doing anything. Suggestions? | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13804 ID: 53948 Credit: 345,369,032 RAC: 2,648
                              
|
It goes in c:\ProgramData\BOINC\projects\www.primegrid.com\
____________
My lucky number is 75898524288+1 | |
|
|
Thank you my program data folder is hidden and if I search for boinc the projects folder is also hidden so I just had to search c:/ for www,prime to find it. | |
|
|
You will need to:
1) Totally exit BOINCMgr (not just minimize!)
2) Create a {BOINCDataFolder}\projects\www.primegrid.com\app_config.xml, copy/paste this into it, and then ReplaceAll "NumProcs" with whatever value you want it to be. If you decide to give different NumProcs to different apps, then make sure to realize that each app occurs twice in this file, and to make sure each pair is consistent with each other.
3) Restart BOINCMgr
BOINC restart is not needed when you create or modify app_config.xml. It is enough to do this:
- create/update app_config.xml: file;
- disable option "Leave non-GPU tasks in memory while suspended" (Options/Computing preferences/Disk and memory tab);
- suspend all PrimeGrid tasks (or easier suspend CPU usage);
- reload config files (Options/Read config files);
- resume tasks.
Note: after doing this GUI still will display previous CPU count for apps. This is display bug only, number of running tasks and their command line will be adjusted properly.
I guess I found...
- File/Exit
- Create app_config.xml
- DblClick on BOINCManager
... to be easier. ;)
Tuna
| |
|
|
Seems I forgot to include 321 in my previous app_config.xml. So, this should also be included:
<app>
<name>llr321</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llr321</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
Tuna
Combining few posts in another thread:
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107580
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107589
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107592
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107594
http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107595
I created the below that I am now using on all of my 4/8/12/16 thread hosts.
Note that you cannot use it as-is; some minimal editing is required. You will need to:
1) Totally exit BOINCMgr (not just minimize!)
2) Create a {BOINCDataFolder}\projects\www.primegrid.com\app_config.xml, copy/paste this into it, and then ReplaceAll "NumProcs" with whatever value you want it to be. If you decide to give different NumProcs to different apps, then make sure to realize that each app occurs twice in this file, and to make sure each pair is consistent with each other.
3) Restart BOINCMgr
Also, when you restart BOINCMgr, you might get bunch of errors/warnings about stuff like "Your app_config.xml file refers to an unknown application 'llrSOB'. Known applications: 'pps_sr2sieve', 'ap26', 'llrPSP', 'llrTPS', 'trp_sr2sieve', 'llrPPSE', 'llrWOO', 'llrCUL', 'genefer', 'llrPPS', 'llrSR5', 'llrMEGA', 'gcw_sieve', 'llrGCW'". These are ignorable. All that means is that your host is probably set to receive tasks only from a subset of projects, which you probably manually handpicked using http://www.primegrid.com/prefs.php?subset=project, therefore to that host, not all projects are known. BOINCMgr just reports this, ignores it, and uses whatever is useful to it.
I am sure I will stand corrected if I made a mistake. There are a few warnings about doing this both in this thread as well as Michael's http://www.primegrid.com/forum_thread.php?id=7415&nowrap=true#107595. So, be careful. For now, I max'd each out on all my hosts. I'll see over time if that was deadly...
Tuna
<app_config>
<app>
<name>llrGCW</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrGCW</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrWOO</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrWOO</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrCUL</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrCUL</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrPSP</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrPSP</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrSOB</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrSOB</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrTRP</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrTRP</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrSR5</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrSR5</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
<app>
<name>llrESP</name>
<fraction_done_exact/>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>llrESP</app_name>
<cmdline>-t NumProcs</cmdline>
<avg_ncpus>NumProcs</avg_ncpus>
</app_version>
</app_config>
| |
|
tng Send message
Joined: 29 Aug 10 Posts: 450 ID: 66603 Credit: 40,687,114,436 RAC: 20,036,224
                                                
|
You will need to:
1) Totally exit BOINCMgr (not just minimize!)
2) Create a {BOINCDataFolder}\projects\www.primegrid.com\app_config.xml, copy/paste this into it, and then ReplaceAll "NumProcs" with whatever value you want it to be. If you decide to give different NumProcs to different apps, then make sure to realize that each app occurs twice in this file, and to make sure each pair is consistent with each other.
3) Restart BOINCMgr
BOINC restart is not needed when you create or modify app_config.xml. It is enough to do this:
- create/update app_config.xml: file;
- disable option "Leave non-GPU tasks in memory while suspended" (Options/Computing preferences/Disk and memory tab);
- suspend all PrimeGrid tasks (or easier suspend CPU usage);
- reload config files (Options/Read config files);
- resume tasks.
Note: after doing this GUI still will display previous CPU count for apps. This is display bug only, number of running tasks and their command line will be adjusted properly.
After resuming tasks, you may need to force an update on the primegrid project to download more tasks if you are decreasing the number of CPUs for a subproject.
____________
| |
|
|
What about "The Riesel Problem LLR" what Name do we use here?
<app_config>
<app>
<name> ????? </name>
<max_concurrent>4</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name> ????? </app_name>
<cmdline>-t 6</cmdline>
<avg_ncpus>6</avg_ncpus>
<max_ncpus>6</max_ncpus>
</app_version>
</app_config>
Thank you, I may have missed it somewhere.
____________
Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community. | |
|
|
llrTRP
This is what I have in my app_config.xml:
<app>
<name>llrTRP</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>llrTRP</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
</app_version>
| |
|
|
llrTRP
This is what I have in my app_config.xml:
<app>
<name>llrTRP</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
<app_version>
<app_name>llrTRP</app_name>
<cmdline>-t 4</cmdline>
<avg_ncpus>4</avg_ncpus>
</app_version>
Great, thank you.
____________
Crunching@EVGA The Number One Team in the BOINC Community. Folding@EVGA The Number One Team in the Folding@Home Community. | |
|
Message boards :
Number crunching :
LLR 3.8.20 Going Live! |