Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Cullen/Woodall prime search :
Woodall - 2 cores vs 4 cores
Author |
Message |
|
I have a much better return when running WOO units with "t -2" on my i3 Skylake than with "t -4" on my i7 Skylake. With the latter, I have a gain of about 10% (shorter elapsed time) but with twice the cores being used.
Have others noticed something similar? I am tempted to have two units running on my i7 6700k with the "t -2" parm in lieu of "t -4". | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 918 ID: 370496 Credit: 604,672,541 RAC: 565,795
                         
|
I have a much better return when running WOO units with "t -2" on my i3 Skylake than with "t -4" on my i7 Skylake. With the latter, I have a gain of about 10% (shorter elapsed time) but with twice the cores being used.
Have others noticed something similar? I am tempted to have two units running on my i7 6700k with the "t -2" parm in lieu of "t -4".
Using multiple cores per WU will pretty much always be worse than using a core for each WU (when WUs are this big and go over cache regardless) as far as total througput. So yeah, this is expected. Whether it's worth it or not, that's another story.
Personally, I'd take the little bit of performance loss and run T4 in the hopes of having a much better chance of being the prime finder, in case we actually find one. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 12,182
                              
|
Sounds like a case of ram bandwidth limiting. What is the ram speed and channels on the i3 and i7? i3 are practically not limited even with cheap ram, but when you double the cores without much faster ram, that can be a limit fast. I find dual channel dual rank 3200 ram on my 6700k at 4.0 GHz is close to being unlimited.
By all means try 2x2t but I wouldn't expect it to be much different than 1x4t. For big units like this, I'd not expect much difference in overall throughput between 4x1t and 1x4t, and I'd still prefer 1x4t just to keep unit times down.
Edit:
Here's a post I did on another forum while testing 7800X, but I also have my main system 6700k data in there. Note the 6700k has two flat lines - no significant difference between 1x4t and 4x1t unless you're running smaller FFT where there is some reduction with 4t. I'm not sure I understand the Skylake-X results yet, there is something going on with the new cache structure. | |
|
|
Gents,
Thanks for your input. Here is a very good comparison using actual data (of mine):
i3 6100 - 2 physical cores --> 62,500 sec run time - 123,000 sec CPU time
i7 6700k - 4 physical cores --> 56,750 sec run time - 218,000 sec CPU time
The RAM in both machines is dual channel 2666 MHz but the mobo is limited to 2166 MHz. Since I am processing extra units so as to force the badge flip rather than wait for completed units to verify, there is no great urgency on my part to crunch WOO and, as a result, I will revert back to "t -2" so as to have two work units crunching simultaneously with the expectation that I will emulate my i3's performance. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 12,182
                              
|
2133 ram speed is going to choke the 6700k. Do let us know the results for 2x2t but I wouldn't expect i3 speeds out of it. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 918 ID: 370496 Credit: 604,672,541 RAC: 565,795
                         
|
The RAM in both machines is dual channel 2666 MHz but the mobo is limited to 2166 MHz. Since I am processing extra units so as to force the badge flip rather than wait for completed units to verify, there is no great urgency on my part to crunch WOO and, as a result, I will revert back to "t -2" so as to have two work units crunching simultaneously with the expectation that I will emulate my i3's performance.
That's the reason, RAM is slowing the i7 down. This is a "problem" with LLR, it's so efficient that RAM can't keep up with the cores' processing power.
I can only imagine how much worse the 6c Cannon Lake with AVX512 (once implemented) and dual channel will be... | |
|
|
2133 ram speed is going to choke the 6700k. Do let us know the results for 2x2t but I wouldn't expect i3 speeds out of it.
It must be the RAM since the CPU's are not running at full capacity ("-t 4") as reported by HWMonitor and, in fact, the multiplier drops significantly for a second or two before going back to 42x.
By the way, I changed the app_config file to the following, powered down BM and restarted it but the single unit is still running with all 4 CPU's:
<app>
<name>llrWOO</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact>1</fraction_done_exact>
</app>
<app_version>
<app_name>llrWOO</app_name>
<cmdline>-t 2</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
Have I screwed up somewhere or must the current unit complete before the "-t 2" takes effect? | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 12,182
                              
|
I'd change the other two 4's to 2's also. Restart should do it, I've done so in past... is it boinc indicating 4 cores in use, or do you actually see 4 in use in task manager? The former would be down to that config, the latter isn't expected. Worst case, do a reboot to ensure things are reset. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 918 ID: 370496 Credit: 604,672,541 RAC: 565,795
                         
|
By the way, I changed the app_config file to the following, powered down BM and restarted it but the single unit is still running with all 4 CPU's:
<app>
<name>llrWOO</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact>1</fraction_done_exact>
</app>
<app_version>
<app_name>llrWOO</app_name>
<cmdline>-t 2</cmdline>
<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
Have I screwed up somewhere or must the current unit complete before the "-t 2" takes effect?
If by "still running with all 4 CPUS" you mean "BOINC says it's running on 4", then that's because you've put Avg_CPU = 4. Boinc doesn't actually read usage, it just reports whatever that parameter is. | |
|
|
@Mackeral - I also changed the other two parms to "2" but BM still shows only one work unit running but now with two cores.
@Raphael - I have modified the app_config with "2" in the three critical lines but only one unit now is running with 2 cores showing in the BM as being in use.
Edit: With the "2" specified in all three critical lines in the app_config file, the clocks on the CPU's are now fixed at 3998 MHz (ie no reduction in the multipliers) but the CPU usage is pathetically low (this means I should be able to run my GTX 1080 soon with no impact). | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 12,182
                              
|
You might have to play with the CPU usage in boinc settings to get it to run the desired number of simultaneous units. This isn't great... | |
|
|
You might have to play with the CPU usage in boinc settings to get it to run the desired number of simultaneous units. This isn't great...
As with all LLR's, I have it at 50% ... I aborted the extra WOO work units and it did redownload now CUL (my next target project) with a second unit now kicked in and processing.
Life is now back to normal. | |
|
|
Change <max_ncpus>4</max_ncpus> back to 4 and it should run 2 Woodall WUs (or any other sub project) on 2 threads each. That sets the max number of TOTAL cores to use. So 2WUs with 2t each = 4 cores needed to run the same sub project simultaneously.
When you got the Cullen WU it has it's own settings (if Woodal and Cullen both have <max_ncpus>2</max_ncpus> set to 2 than that's a TOTAL of 4) and is probably why it's working now (running 2 WUs at 2t each), but to run 2 of the same sub project you need to do the above.
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
|
Thanks, Neo. | |
|
|
<max_concurrent>1</max_concurrent>
This is why WOO is only running 1 at a time. | |
|
|
<max_concurrent>1</max_concurrent>
This is why WOO is only running 1 at a time.
You don't even need this as my cc_config is:
<app>
<name>llrWOO</name>
<fraction_done_exact/>
</app>
<app_version>
<app_name>llrWOO</app_name>
<cmdline>-t 2</cmdline>
<avg_ncpus>2</avg_ncpus>
<max_ncpus>4</max_ncpus>
</app_version>
The <max_concurrent>1</max_concurrent> could also limit the number of each sub project WUs you can run simultaneously.
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
|
The <max_concurrent>1</max_concurrent> could also limit the number of each sub project WUs you can run simultaneously.
Exactly my point.
in Message 109570 (7th post in this thread) Anthony posted his app_config for WOO and said it was only running one unit. The rest of the thread was trying to fix that, ending with him switching to CUL and thinking everything was ok. However, next time he runs WOO it's still only going to run one unit at a time.
He needs to change WOO's max_concurrent value to 2 in his app_config. | |
|
|
The <max_concurrent>1</max_concurrent> could also limit the number of each sub project WUs you can run simultaneously.
Exactly my point.
in Message 109570 (7th post in this thread) Anthony posted his app_config for WOO and said it was only running one unit. The rest of the thread was trying to fix that, ending with him switching to CUL and thinking everything was ok. However, next time he runs WOO it's still only going to run one unit at a time.
He needs to change WOO's max_concurrent value to 2 in his app_config.
Or set it to 4 if he wants to run 4 single threaded WUs. Or better yet, just remove <max_concurrent>1</max_concurrent> line entirely on all CPU sub projects in app_config.
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
| |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,207,370,874 RAC: 1,073,754
                        
|
I'd change the other two 4's to 2's also. Restart should do it, I've done so in past... is it boinc indicating 4 cores in use, or do you actually see 4 in use in task manager? The former would be down to that config, the latter isn't expected. Worst case, do a reboot to ensure things are reset.
On Linux you can restart the BOINC client instead. sudo service boinc-client restart causes work units in progress to start using the project's new app_config.xml settings. That's if your filesystem contains the file /etc/init.d/boinc-client
You can even have the BOINC Manager open while you restart the service, and if you do, you'll see the work units in progress change to using the new number of CPUs. | |
|
|
Gents,
MANY thanks for your feedback. I have removed all references to <max_concurrent>1</max_concurrent> while at the same time changing the parms so as to use two CPU's per work unit. The new config file was reread within BM and it is now working like a charm!
Since I get the same behaviour whether I am running "-t 2" on my i3 Skylake or "-t 4" on my i7 Skylake, I purposely switched to now having two units running simultaneously with two cores per unit, thus replicating the scenario with the i3 Skylake. Since 321 runs fairly fast, it will be interesting to see how things pan out when things complete.
I will post back to the group when results become available since this is something that may (should) further optimize performance. | |
|
Message boards :
Cullen/Woodall prime search :
Woodall - 2 cores vs 4 cores |