Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Proth Prime Search :
Multithreading PPSE
Author |
Message |
|
has anyone experience with multithreading PPSE? I tried it on a i7-7700HQ and was really disappointed.
2 threaded needed ~7% less time avg, 3 threaded 22 % less time avg. didin't mind to use more then 3 threads
as soon as my Ryzen has done its SoB-tasks I will try it there too, maybe its intel related
____________
| |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 918 ID: 107307 Credit: 977,945,376 RAC: 0
                     
|
PPSE candidates are too small for multithreading to be worth it. I've set up multithreading here for SOB, PSP, ESP, TRP, 321 CUL, WOO and SR5. PPS, PPSE, SGS and even PPS MEGA aren't worth it as far as I'm concerned. | |
|
|
thanks jim for that info
____________
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
I just did some testing with an i7-8700 at stock clocks. Turbo clock does vary with number of cores running, so I tried to compensate for that in the results below, but they may not be exact. I basically run the same PPSE test (120k FFT) at 12, 11, 6, 5, 4, 3, 2, 1 threads. Repeated 3 times and using the lowest of each to work around other potential system interruptions.
6x 1t: 100% time taken, 100% throughput
3x 2t: 68% time taken, 76% throughput
2x 3t: 51% time taken, 72% throughput
1x 4t: 46% time taken, 42% throughput
1x 5t: 43% time taken, 44% throughput
1x 6t: 43% time taken, 45% throughput
1x 11t: 44% time taken, 43% throughput
1x 12t: 46% time taken, 42% throughput
Test task was prime 6385*2^1509894+1 I pulled off the recently found list. Took 516s unnormalised for clock running on a single core. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
I just did some testing with an i7-8700 at stock clocks. Turbo clock does vary with number of cores running, so I tried to compensate for that in the results below, but they may not be exact. I basically run the same PPSE test (120k FFT) at 12, 11, 6, 5, 4, 3, 2, 1 threads. Repeated 3 times and using the lowest of each to work around other potential system interruptions.
6x 1t: 100% time taken, 100% throughput
3x 2t: 68% time taken, 76% throughput
2x 3t: 51% time taken, 72% throughput
1x 4t: 46% time taken, 42% throughput
1x 5t: 43% time taken, 44% throughput
1x 6t: 43% time taken, 45% throughput
1x 11t: 44% time taken, 43% throughput
1x 12t: 46% time taken, 42% throughput
Test task was prime 6385*2^1509894+1 I pulled off the recently found list. Took 516s unnormalised for clock running on a single core.
Based on that, I'd go with 6x 1t (max throughput) or 2x 3t (fastest tasks). I wouldn't do 3x 2t because 2x 3t gives you a significant improvement in speed with only a small decrease in throughput. More than 3 threads extracts a significant penalty in throughput without a corresponding increase in speed.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
I hadn't mentioned one variable outside user control: when the server sends the units out. Are there stats on typically how much time might pass between sending out 1st and 2nd units? For short units as PPSE, the server side of the equation might dominate and it matters little how fast your system is. Only for longer tasks does the multi-thread usage lead a clearer advantage.
I suppose I should repeat this with PPS, MEGA tasks for the smaller end. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
I hadn't mentioned one variable outside user control: when the server sends the units out. Are there stats on typically how much time might pass between sending out 1st and 2nd units?
No, and even if I had such information, it would be obsolete.
First of all you can exclude the scenario where a task errors or times out or is aborted. The replacement task will obviously be sent out much later than the other task, and that's not under the server's control. It's going to vary according to what users are doing.
But if you confine your question to the scenario where both of the original tasks are sent to hosts that return them successfully, the question becomes complicated. I'm not going to go into the details, but the scheduler is very complex. Right now, I don't know the answer to your question. If I can figure out what the answer is, I'll let you know. Don't hold your breath.
____________
My lucky number is 75898524288+1 | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
I hadn't mentioned one variable outside user control: when the server sends the units out. Are there stats on typically how much time might pass between sending out 1st and 2nd units?
No, and even if I had such information, it would be obsolete.
First of all you can exclude the scenario where a task errors or times out or is aborted. The replacement task will obviously be sent out much later than the other task, and that's not under the server's control. It's going to vary according to what users are doing.
But if you confine your question to the scenario where both of the original tasks are sent to hosts that return them successfully, the question becomes complicated. I'm not going to go into the details, but the scheduler is very complex. Right now, I don't know the answer to your question. If I can figure out what the answer is, I'll let you know. Don't hold your breath.
You can exhale...
+-------+------+-------+-----------+
| cnt | min | max | avg |
+-------+------+-------+-----------+
| 36888 | 0 | 39238 | 1193.4199 |
+-------+------+-------+-----------+
There's 36888 PPSE workunits where exactly 2 tasks have been sent out. Those are the minimum, maximum, and average seconds between the first and second tasks being sent.
Here's the count of workunits where the difference was a specific number of minutes:
+---------+------+
| minutes | cnt |
+---------+------+
| 0 | 1934 |
| 1 | 1838 |
| 2 | 1647 |
| 3 | 1596 |
| 4 | 1532 |
| 5 | 1309 |
| 6 | 1265 |
| 7 | 1320 |
| 8 | 1136 |
| 9 | 1154 |
| 10 | 1066 |
| 11 | 1014 |
| 12 | 973 |
| 13 | 930 |
| 14 | 908 |
| 15 | 851 |
| 16 | 828 |
| 17 | 793 |
| 18 | 780 |
| 19 | 674 |
| 20 | 639 |
| 21 | 612 |
| 22 | 587 |
| 23 | 565 |
| 24 | 523 |
| 25 | 512 |
| 26 | 505 |
| 27 | 434 |
| 28 | 399 |
| 29 | 410 |
| 30 | 390 |
| 31 | 393 |
| 32 | 384 |
| 33 | 334 |
| 34 | 344 |
| 35 | 314 |
| 36 | 288 |
| 37 | 285 |
| 38 | 261 |
| 39 | 236 |
| 40 | 243 |
| 41 | 238 |
| 42 | 223 |
| 43 | 216 |
| 44 | 199 |
| 45 | 194 |
| 46 | 177 |
| 47 | 169 |
| 48 | 161 |
| 49 | 162 |
| 50 | 142 |
| 51 | 151 |
| 52 | 121 |
| 53 | 118 |
| 54 | 129 |
| 55 | 118 |
| 56 | 100 |
| 57 | 107 |
| 58 | 101 |
| 59 | 91 |
| 60 | 87 |
| 61 | 96 |
| 62 | 75 |
| 63 | 70 |
| 64 | 70 |
| 65 | 69 |
| 66 | 61 |
| 67 | 59 |
| 68 | 78 |
| 69 | 74 |
| 70 | 64 |
| 71 | 40 |
| 72 | 54 |
| 73 | 52 |
| 74 | 33 |
| 75 | 42 |
| 76 | 45 |
| 77 | 40 |
| 78 | 34 |
| 79 | 36 |
| 80 | 31 |
| 81 | 35 |
| 82 | 39 |
| 83 | 31 |
| 84 | 37 |
| 85 | 25 |
| 86 | 32 |
| 87 | 16 |
| 88 | 26 |
| 89 | 22 |
| 90 | 23 |
| 91 | 15 |
| 92 | 14 |
| 93 | 12 |
| 94 | 10 |
| 95 | 8 |
| 96 | 11 |
| 97 | 13 |
| 98 | 15 |
| 99 | 12 |
| 100 | 14 |
| 101 | 17 |
| 102 | 8 |
| 103 | 7 |
| 104 | 10 |
| 105 | 6 |
| 106 | 8 |
| 107 | 10 |
| 108 | 8 |
| 109 | 9 |
| 110 | 5 |
| 111 | 6 |
| 112 | 5 |
| 113 | 9 |
| 114 | 2 |
| 115 | 5 |
| 116 | 4 |
| 117 | 3 |
| 118 | 5 |
| 119 | 7 |
| 120 | 1 |
| 121 | 5 |
| 122 | 5 |
| 123 | 7 |
| 124 | 1 |
| 125 | 3 |
| 126 | 4 |
| 127 | 1 |
| 128 | 7 |
| 129 | 4 |
| 130 | 4 |
| 131 | 1 |
| 132 | 1 |
| 133 | 4 |
| 134 | 5 |
| 135 | 1 |
| 136 | 2 |
| 137 | 5 |
| 138 | 2 |
| 139 | 1 |
| 140 | 3 |
| 143 | 1 |
| 144 | 3 |
| 145 | 1 |
| 146 | 2 |
| 147 | 2 |
| 149 | 1 |
| 150 | 1 |
| 152 | 2 |
| 154 | 4 |
| 161 | 1 |
| 162 | 1 |
| 163 | 2 |
| 165 | 1 |
| 169 | 2 |
| 171 | 1 |
| 183 | 2 |
| 186 | 1 |
| 187 | 1 |
| 188 | 1 |
| 193 | 1 |
| 278 | 1 |
| 653 | 1 |
+---------+------+
This is the number of times the difference was N seconds, from 0 to 60 seconds:
+---------+-----+
| seconds | cnt |
+---------+-----+
| 0 | 19 |
| 1 | 26 |
| 2 | 37 |
| 3 | 38 |
| 4 | 36 |
| 5 | 31 |
| 6 | 32 |
| 7 | 27 |
| 8 | 27 |
| 9 | 26 |
| 10 | 38 |
| 11 | 28 |
| 12 | 29 |
| 13 | 41 |
| 14 | 34 |
| 15 | 25 |
| 16 | 39 |
| 17 | 38 |
| 18 | 36 |
| 19 | 30 |
| 20 | 28 |
| 21 | 28 |
| 22 | 34 |
| 23 | 35 |
| 24 | 29 |
| 25 | 24 |
| 26 | 26 |
| 27 | 22 |
| 28 | 31 |
| 29 | 46 |
| 30 | 31 |
| 31 | 34 |
| 32 | 32 |
| 33 | 32 |
| 34 | 33 |
| 35 | 41 |
| 36 | 31 |
| 37 | 30 |
| 38 | 27 |
| 39 | 35 |
| 40 | 33 |
| 41 | 29 |
| 42 | 33 |
| 43 | 33 |
| 44 | 38 |
| 45 | 45 |
| 46 | 34 |
| 47 | 32 |
| 48 | 34 |
| 49 | 38 |
| 50 | 33 |
| 51 | 30 |
| 52 | 29 |
| 53 | 42 |
| 54 | 25 |
| 55 | 29 |
| 56 | 32 |
| 57 | 24 |
| 58 | 36 |
| 59 | 39 |
| 60 | 29 |
+---------+-----+
For future reference, this is the SQL for those three queries:
select count(*) cnt,min(abs(r0.sent_time-r1.sent_time)) min, max(abs(r0.sent_time-r1.sent_time)) max, avg(abs(r0.sent_time-r1.sent_time)) avg from result r0 join result r1 on r0.workunitid=r1.workunitid and right(r0.name,2)='_0' and right(r1.name,2)='_1' left join result r2 on r0.workunitid=r2.workunitid and right(r2.name,2)='_2' where r0.appid=18 and r0.server_state>=4 and r1.server_state>=4 and r2.id is null;
select floor(abs(r0.sent_time-r1.sent_time)/60) minutes, count(*) cnt from result r0 join result r1 on r0.workunitid=r1.workunitid and right(r0.name,2)='_0' and right(r1.name,2)='_1' left join result r2 on r0.workunitid=r2.workunitid and right(r2.name,2)='_2' where r0.appid=18 and r0.server_state>=4 and r1.server_state>=4 and r2.id is null group by minutes;
select abs(r0.sent_time-r1.sent_time) seconds, count(*) cnt from result r0 join result r1 on r0.workunitid=r1.workunitid and right(r0.name,2)='_0' and right(r1.name,2)='_1' left join result r2 on r0.workunitid=r2.workunitid and right(r2.name,2)='_2' where r0.appid=18 and r0.server_state>=4 and r1.server_state>=4 and r2.id is null group by seconds limit 61;
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
Thanks for the numbers... I'm not sure how to handle them, maybe someone better at math can make something more out of it than what I'll try to below.
The average gap of 1193s is massive compared to the 516s it took to run a unit one per core on my test system. In that average scenario, if you got the 1st unit you're going to be 1st, if you get it 2nd, you're going to be 2nd. Assuming both are running zero cache.
There's going to be some smaller subset of units where a speedup gives you an advantage. It'll probably involve percentiles to come up with a number, but it feels like a relatively small proportion of the time.
Let's say you go from (with rounding) 480s to 240s by using multi-threads. 8 minutes to 4 minutes. Also assume the other person is also running a system that does 1t units in the same time, 8 minutes.
In the case both run 1t, you can be first as long as you receive the unit first. If the server is random, that's 50%.
You set up -t3 and it does it in half the time. Now you can be first even if you receive the unit up to 4 minutes after the other host. There's a 19% chance the two tasks are sent within 4 minutes of each other. We only care about the case where you get it 2nd, so that's half that, or just under 10%.
So the tradeoff now is, by using -t3 you would now get to be 1st in 10% of all units that you wouldn't have before. If my brain is functioning, it is no longer 50/50%, but now 60/40% in your favour. However, also from my numbers, you only get 72% the throughput. 60% of 72% is lower than 50% of 100%. So on this basis, the higher throughput of running one per core would seem better than doing fewer units at higher chance.
Of course, this doesn't take into consideration a load of other factors. Not everyone has the same CPU. Not everyone will run multi-thread under the same settings. Not everyone runs zero cache. There will be resends for whatever reason. There are probably other factors I haven't though of. We can get the average unit time from the project selection page, and maybe figure that in too.
As a generalisation, I think if you have a fast CPU (recent Intel at 4GHz+), you have more to lose from using threading than not. If you have a slower system (anything AMD currently sells, Intels without AVX), the time improvement starts to look more useful.
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
Thanks for the numbers... I'm not sure how to handle them, maybe someone better at math can make something more out of it than what I'll try to below.
My impression, first and foremost, is that the gap is significantly larger than I would have guessed. The average between the first and second tasks is almost 20 minutes.
The shorter delays are more likely than the longer delays, but the numbers tail off fairly slowly.
What this tells me is that it may not be worth it to worry about the 2-minute pre-fetch. If the average delay is 20 minutes, that pre-fetch time is dwarfed. As you said, if you get the task first, you'll finish first. If you get it second, you'll finish second. As long as your computer is relatively fast and you have 0 cache, the other factors won't affect who finishes first.
On thing I didn't include in the earlier numbers: the vast majority of workunits had only 2 tasks. Only 2293 workunits had a third task, while there are 26210 workunits that validated on the first two and only tasks. (The difference between the 38K workunits in the earlier numbers and the 26K workunits here are those workunits with only 2 tasks where those tasks aren't yet completed.)
____________
My lucky number is 75898524288+1 | |
|
dthonon Volunteer tester
 Send message
Joined: 6 Dec 17 Posts: 434 ID: 957147 Credit: 1,728,691,174 RAC: 13,910
                               
|
Another piece of information: I went back to my stats during TdP last year. During the second half of February, I was running lots of PPSE, PPS, MEGA in parallel (around 600 of each). All were single-threaded on 2.3 GHz Intel virtual cores.
PPSE:
- Finder : 30
- DC : 20
PPS:
- Finder : 1
- DC : 1
MEGA:
- Finder : 1
- DC : 2
For PPSE, the number of prime is significant and shows a 60% change of being first, even with fairly slow hardware. And then it goes down with larger tasks.
My advice would be to go single-threaded for PPSE to get the highest throughput possible, and then 2 to 3-4 threads for PPS and MEGA. | |
|
pschoefer Volunteer developer Volunteer tester
 Send message
Joined: 20 Sep 05 Posts: 677 ID: 845 Credit: 2,859,613,872 RAC: 217,213
                            
|
I would guess that the gap depends heavily on how many active users there are, because the second task has to be send to a different user. When a large contributer fetches lots of tasks, it may take a while until sufficiently many other users have requested tasks. This effect probably also worked to dthonon's advantage last year.
____________
| |
|
|
You can exhale...
No. I've been assuming that the send time between tasks was minimal, now I have questions.
Due to the vast amount of variables, the next question should be what is the ratio of 1st tasks to 1st returns. We all know it's not 1:1, I have _0 tasks returned 2nd and many _1s (and even a _2) returned 1st.
I suspect that getting _0 tasks are less of an advantage overall then people are thinking it is.
For future reference, this is the SQL for those three queries:
Now I know where to send my "How do I do this" questions :) | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
the next question should be what is the ratio of 1st tasks to 1st returns. We all know it's not 1:1, I have _0 tasks returned 2nd and many _1s (and even a _2) returned 1st.
I suspect that getting _0 tasks are less of an advantage overall then people are thinking it is.
That's actually a good question, but it includes a completely false (although reasonable) assumption.
_0 is NOT the first task sent out. If you want to know which one is sent out first, you need to look at the actual timestamps. My guess is _0 is sent out before _1 only 50% of the time.
For future reference, this is the SQL for those three queries:
Now I know where to send my "How do I do this" questions :)
I view SQL as being more magic and art than science. Whereas most of the stuff I do is drudgery, the SQL is akin to poetry for me. And I'm always learning new features or techniques.
Ask your questions of multiple people, and you'll likely get different answers and learn more. For example, I tend to prefer using joins to solve most problems, while others may prefer to either use sub-queries or break down the SQL into smaller pieces and add some PHP to tie it all together. I may have an overly optimistic opinion of the optimizer's abilities.
Notice the "... left join ..." followed by "... where foo.id is null..."? That's how you get all the workunits where there is no third task.
Now, addressing both parts, let's modify the most recent query to also tell us how often _0 goes out before _1. I'll also throw in a clause to skip double check tasks with a residue.
This is the original SQL:
select
r0.appid,
a.name,
count(*) cnt,
min(abs(r0.sent_time-r1.sent_time)) min,
max(abs(r0.sent_time-r1.sent_time)) max,
avg(abs(r0.sent_time-r1.sent_time)) avg
from
result r0
join app a
on r0.appid=a.id
join result r1
on r0.workunitid=r1.workunitid and right(r0.name,2)='_0' and right(r1.name,2)='_1'
left join result r2
on r0.workunitid=r2.workunitid and right(r2.name,2)='_2'
where
r0.server_state>=4
and r1.server_state>=4
and r2.id is null
group by r0.appid
with rollup;
Here's the SQL to also tally whether _0 was sent first or received first. Spoiler alert: they're both close to 50%.
select
r0.appid,
a.name,
count(*) cnt,
min(abs(r0.sent_time-r1.sent_time)) min,
max(abs(r0.sent_time-r1.sent_time)) max,
avg(abs(r0.sent_time-r1.sent_time)) avg,
concat(format(avg(if(r0.sent_time<=r1.sent_time,1,0))*100,2),'%') '_0 sent first',
concat(format(avg(if(r0.received_time<=r1.received_time,1,0))*100,2),'%') '_0 rcvd first'
from
result r0
join app a
on r0.appid=a.id
join result r1
on r0.workunitid=r1.workunitid and right(r0.name,2)='_0' and right(r1.name,2)='_1'
left join result r2
on r0.workunitid=r2.workunitid and right(r2.name,2)='_2'
where
r0.server_state>=4
and r0.userid>0
and r0.hostid>0
and r1.server_state>=4
and r2.id is null
and r0.validate_state=1
and r1.validate_state=1
group by r0.appid
with rollup;
+-------+---------------+--------+------+--------+-------------+---------------+---------------+
| appid | name | cnt | min | max | avg | _0 sent first | _0 rcvd first |
+-------+---------------+--------+------+--------+-------------+---------------+---------------+
| 2 | llrTPS | 52896 | 0 | 7056 | 659.4473 | 48.80% | 50.33% |
| 3 | llrWOO | 117 | 82 | 432951 | 70342.7350 | 52.14% | 44.44% |
| 4 | llrCUL | 135 | 233 | 493923 | 124931.8889 | 51.85% | 48.89% |
| 7 | llr321 | 379 | 76 | 638994 | 60070.1266 | 49.08% | 48.02% |
| 8 | llrPSP | 325 | 125 | 731013 | 100984.3046 | 50.46% | 50.46% |
| 9 | pps_sr2sieve | 157309 | 0 | 92840 | 887.9967 | 50.80% | 49.69% |
| 10 | llrPPS | 6011 | 0 | 197056 | 1751.4179 | 50.57% | 50.34% |
| 11 | ap26 | 15516 | 0 | 50242 | 3340.1893 | 51.04% | 49.74% |
| 13 | llrSOB | 3805 | 0 | 590197 | 34121.9338 | 51.83% | 49.72% |
| 15 | llrTRP | 377 | 310 | 388280 | 50212.0769 | 51.19% | 48.01% |
| 16 | genefer | 2140 | 0 | 292120 | 13498.0346 | 52.10% | 48.93% |
| 17 | genefer_wr | 38 | 3223 | 190014 | 60128.8158 | 57.89% | 60.53% |
| 18 | llrPPSE | 27975 | 0 | 39238 | 1171.3507 | 50.15% | 50.47% |
| 19 | llrSR5 | 1905 | 3 | 300668 | 13483.4383 | 50.39% | 48.82% |
| 20 | llrESP | 93 | 421 | 452500 | 121632.3871 | 54.84% | 46.24% |
| 21 | llrMEGA | 10672 | 0 | 23514 | 1872.4408 | 50.07% | 49.92% |
| 22 | genefer15 | 53063 | 0 | 176509 | 1054.2948 | 50.15% | 50.11% |
| 23 | genefer16 | 23500 | 0 | 14682 | 1011.1820 | 51.38% | 49.99% |
| 24 | genefer17low | 4557 | 2 | 45226 | 4115.3553 | 50.05% | 49.99% |
| 25 | genefer17mega | 15518 | 0 | 14739 | 1611.5070 | 50.93% | 50.65% |
| 26 | genefer18 | 2374 | 1 | 82311 | 7532.4701 | 50.88% | 50.88% |
| 27 | genefer19 | 2544 | 5 | 111139 | 13245.6726 | 50.35% | 50.12% |
| 28 | genefer20 | 385 | 73 | 276357 | 42184.2156 | 51.95% | 50.65% |
| 29 | gcw_sieve | 74579 | 0 | 77368 | 455.8260 | 48.78% | 50.29% |
| 30 | llrGCW | 101 | 417 | 387209 | 43737.8020 | 63.37% | 48.51% |
| NULL | llrGCW | 456314 | 0 | 731013 | 1790.4174 | 50.15% | 50.03% |
+-------+---------------+--------+------+--------+-------------+---------------+---------------+
I'm also using only validated tasks here, since I'm looking at when they come back.
Overall, _0 goes out before _1 50.15% of the time, and comes back first 50.03% of the time.
tl;dr: 50/50
____________
My lucky number is 75898524288+1 | |
|
|
michael, is it me or are you missing SGS in this table?
____________
| |
|
dthonon Volunteer tester
 Send message
Joined: 6 Dec 17 Posts: 434 ID: 957147 Credit: 1,728,691,174 RAC: 13,910
                               
|
I would guess that the gap depends heavily on how many active users there are, because the second task has to be send to a different user. When a large contributer fetches lots of tasks, it may take a while until sufficiently many other users have requested tasks. This effect probably also worked to dthonon's advantage last year.
I am not sure about that. The scheduler throttles the large contributor to about 1/3 of the total throughput, if I remember correctly. So, even with an infinite number of cores, you would not be able to starve the other users. | |
|
TimT  Send message
Joined: 2 Dec 11 Posts: 496 ID: 121414 Credit: 2,400,007,313 RAC: 1,511,085
                           
|
michael, is it me or are you missing SGS in this table?
Sophie Germain (LLR) -- "llrTPS" | |
|
|
michael, is it me or are you missing SGS in this table?
Sophie Germain (LLR) -- "llrTPS"
maybe someday I will know this...maybe...
thanks
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
I added a new column, answering the question we really want to know: How much does getting sent the first task affect your chance of "winning"?
The extra clause in the sql for this column is:
concat(format(avg(if((r0.sent_time<=r1.sent_time and r0.received_time<=r1.received_time) or (r0.sent_time>r1.sent_time and r0.received_time>r1.received_time),1,0))*100,2),'%') 'First sent wins'
+-------+---------------+--------+------+--------+-------------+---------------+---------------+-----------------+
| appid | name | cnt | min | max | avg | _0 sent first | _0 rcvd first | First sent wins |
+-------+---------------+--------+------+--------+-------------+---------------+---------------+-----------------+
| 2 | llrTPS | 52856 | 0 | 7056 | 658.9424 | 48.84% | 50.31% | 60.50% |
| 3 | llrWOO | 117 | 82 | 432951 | 70342.7350 | 52.14% | 44.44% | 58.12% |
| 4 | llrCUL | 136 | 233 | 493923 | 125035.3750 | 52.21% | 49.26% | 61.76% |
| 7 | llr321 | 384 | 76 | 638994 | 59925.2969 | 49.22% | 47.40% | 60.16% |
| 8 | llrPSP | 325 | 125 | 731013 | 100984.3046 | 50.46% | 50.46% | 60.00% |
| 9 | pps_sr2sieve | 157522 | 0 | 92840 | 888.7028 | 50.79% | 49.71% | 53.04% |
| 10 | llrPPS | 5970 | 0 | 197056 | 1764.0735 | 50.59% | 50.40% | 61.49% |
| 11 | ap26 | 15560 | 0 | 50242 | 3337.2667 | 50.98% | 49.69% | 68.71% |
| 13 | llrSOB | 3809 | 0 | 590197 | 34138.0827 | 51.77% | 49.78% | 47.39% |
| 15 | llrTRP | 379 | 310 | 388280 | 50276.9789 | 50.92% | 48.02% | 59.10% |
| 16 | genefer | 2140 | 0 | 292120 | 13498.0346 | 52.10% | 48.93% | 57.94% |
| 17 | genefer_wr | 38 | 3223 | 190014 | 60128.8158 | 57.89% | 60.53% | 65.79% |
| 18 | llrPPSE | 28180 | 0 | 39238 | 1167.5918 | 50.13% | 50.42% | 60.85% |
| 19 | llrSR5 | 1937 | 3 | 300668 | 13417.4517 | 50.65% | 48.84% | 76.51% |
| 20 | llrESP | 93 | 421 | 452500 | 121632.3871 | 54.84% | 46.24% | 76.34% |
| 21 | llrMEGA | 10728 | 0 | 23514 | 1864.0462 | 50.09% | 49.94% | 61.73% |
| 22 | genefer15 | 53058 | 0 | 176509 | 1055.5176 | 50.18% | 50.18% | 75.18% |
| 23 | genefer16 | 23556 | 0 | 14682 | 1008.6593 | 51.38% | 49.98% | 63.69% |
| 24 | genefer17low | 4579 | 2 | 45226 | 4101.0240 | 49.90% | 50.10% | 77.96% |
| 25 | genefer17mega | 15589 | 0 | 14739 | 1610.2470 | 51.01% | 50.68% | 81.05% |
| 26 | genefer18 | 2355 | 1 | 82311 | 7580.2119 | 50.87% | 50.87% | 81.91% |
| 27 | genefer19 | 2540 | 5 | 111139 | 13217.2689 | 50.28% | 50.12% | 74.72% |
| 28 | genefer20 | 385 | 73 | 276357 | 42184.2156 | 51.95% | 50.65% | 72.73% |
| 29 | gcw_sieve | 74708 | 0 | 77368 | 455.4160 | 48.76% | 50.27% | 50.60% |
| 30 | llrGCW | 102 | 417 | 387209 | 43396.3529 | 63.73% | 48.04% | 60.78% |
| NULL | llrGCW | 457046 | 0 | 731013 | 1790.9844 | 50.15% | 50.03% | 59.54% |
+-------+---------------+--------+------+--------+-------------+---------------+---------------+-----------------+
For PPSE, 60.85% of the time, the task sent out first was returned first. Overall, it's 59.54%
Only in SoB is it below 50%.
The GFN projects seem to have really high numbers, including two over 80%.
____________
My lucky number is 75898524288+1 | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
For PPSE, 60.85% of the time, the task sent out first was returned first. Overall, it's 59.54%
I'm trying to figure out what this actually means... if everyone had identical systems and zero cache, it should be 100%. The 40% who didn't return first, maybe they have slow systems. Maybe they run a cache. If a user consistently has units where [time taken to compute] is much less than [time between sending and returning] then that could indicate cache. | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
For PPSE, 60.85% of the time, the task sent out first was returned first. Overall, it's 59.54%
I'm trying to figure out what this actually means... if everyone had identical systems and zero cache, it should be 100%. The 40% who didn't return first, maybe they have slow systems. Maybe they run a cache. If a user consistently has units where [time taken to compute] is much less than [time between sending and returning] then that could indicate cache.
This is how I interpret these numbers...
The average time delay between the first and second tasks is almost 20 minutes. For a short task such as PPSE, you would expect the first task to win a large majority of the time.
That isn't happening.
What that tells me is that many people are not using their systems optimally. In particular, many use a significant cache, or are running on the hyperthreads. Probably both.
The numbers are telling me that I don't have to worry too much about being first. With my Intel AVX-or-better hosts, all I'll need to do is turn off hyperthreading and set the cache to zero and I'll win most of the time without having to resort to extreme measures.
____________
My lucky number is 75898524288+1 | |
|
|
I view SQL as being more magic and art than science. Whereas most of the stuff I do is drudgery, the SQL is akin to poetry for me. And I'm always learning new features or techniques.
Ask your questions of multiple people, and you'll likely get different answers and learn more. For example, I tend to prefer using joins to solve most problems, while others may prefer to either use sub-queries or break down the SQL into smaller pieces and add some PHP to tie it all together. I may have an overly optimistic opinion of the optimizer's abilities.
Notice the "... left join ..." followed by "... where foo.id is null..."? That's how you get all the workunits where there is no third task.
I'm self taught in Excel, VBA and SQL. I've done things like take a report that took 1 person about a 1/2 hour every day to create and fully automated it to run at 3AM and email the results, but only for days when production was entered. Even added a retro-feature in case the server goes down so at the next run it will go back and catch up on any missed days.
And yet, I can look at your code and know that I know jack squat, or rather; I think I know jack squat?
No poetry here, just pain... followed by giddiness when it all works as intended.
Not a programmer, not in the IT dept., I literally do this as a "side job" at work. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
I switched over 3 systems over to PPSE, running one task per core with zero cache. For each system I manually looked at the first 5 pages of valid PPSE results (100 tasks), and counted the 1st ratio.
8350k (4.0 GHz): 69% first, 581s average run time
5675C (3.6 GHz): 68% first, 732s average run time
6600k (3.6 GHz): 72% first, 654s average run time
I also looked at the first 10 pages (200 results) for all valid PPSE tasks, and that came out at 67.5% first.
Maybe a bigger sample average is needed but I don't see a significant correlation between run time and first rate here.
Going back to the earlier stat that 60% of first sent units were returned 1st, as opposed to the expected 100% if systems were equal, that implies 40% of 1st assignee were running slow for whatever reason. Assuming a fair server, we would get 1st assignment 50% of the time and that would be expected 1st rate (again, assuming all is equal). Let's say I get 50% of 1st rate by running zero cache and being assigned first. The other 50% slice I would be assigned 2nd. Of that 50%, we know 40% do not return 1st as expected, so that would be 20% of that 50% slice (equivalent to 40% of 100%). Add to the 50% from being assigned first, would be an estimated 70% 1st rate, which is pretty close to that observed.
I don't expect this to be static, so there may be changes over time if people change configuration. This also assumes a user is relatively insignificant out of the total pool of users, otherwise they would skew the outcome. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
I've now got some similar data for PPS... this will take even more caution as I'll describe below.
8350k (4.0 GHz): 67% first, 1575s average run time
5675C (3.6 GHz): 65% first, 2088s average run time
6600k (3.6 GHz): 76% first, 1801s average run time
The procedure for this was a little different than before. I observed on the 1st page of results, I had a very low 1st rate. 7, 5, and 2 out of 20 for the above systems respectively. It was apparent as I went through the pages the latter ones were much higher. So the above are the average of results 21 to 120.
There is a further complication, that the 6600k was set to NNW and stopped crunching around 10:00, and it is now about 17:30. My assumption is that if I'm 2nd, units would validate instantly. If I'm 1st, I have to wait for the wingman to report before I know I'm 1st. I think that is why the 1st page is skewed so heavily. Because the 6600k was set to NNW for many hours, that may contribute to its higher 1st rate. What I don't have an answer for, is how long is long enough before I get a good long term picture? This skew effect may also apply to PPSE but it wasn't obvious at the time.
The average time between units is 1764s, so run times are comparable to that magnitude. An efficiency test would be needed to check if multi-thread would offer a benefit in this case.
Edit: looking only at the 5675C system, I have 52 pending PPS units. Assuming no computation errors, that'll be 52 more 1st units not counted in the results. | |
|
Monkeydee Volunteer tester
 Send message
Joined: 8 Dec 13 Posts: 527 ID: 284516 Credit: 1,385,652,182 RAC: 852,134
                         
|
Mackerel, for the purpose of your testing I would count "Pending" as 1st. The only way it wouldn't be 1st is if it is invalid. And if you have invalid results then you have something else to fix before you can draw any conclusions from your testing.
Also, if you go back (I think) greater than 3 days in the results you will see a higher proportion of 1st's as the older units are purged from the system. The ones where you came second would be purged before the ones where you came first because many of those 1st's would have been Pending for a while before the wingperson came in.
____________
My Primes
Badge Score: 4*2 + 6*2 + 7*4 + 8*8 + 11*3 + 12*1 = 157
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
I flipped the systems back to PPSE and will get a fresh set of data in the morning :) | |
|
dukebgVolunteer tester
 Send message
Joined: 21 Nov 17 Posts: 242 ID: 950482 Credit: 23,670,125 RAC: 0
                  
|
Because the 6600k was set to NNW for many hours, that may contribute to its higher 1st rate. What I don't have an answer for, is how long is long enough before I get a good long term picture? This skew effect may also apply to PPSE but it wasn't obvious at the time.
I'm not sure what skew you are describing, but there is a significant skew on the last pages.
If you've been running a subproject long enough, the last pages are where the units are about to be purged or have already been purged. And units, where you were second are going to be purged before units where you were first. Because the purge timer on the units where you were second starts right away and on the ones you were 1st – starts after the wingman completes, who by construction completes after you. In some cases much-much after you – those 1sts of yours will linger the longest on the last pages. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2619 ID: 29980 Credit: 566,949,824 RAC: 14,002
                             
|
Looks like about 3 days or so for units to fall out, and as I was only looking at the most recent units, that shouldn't be a problem for PPSE.
Anyway, just for the 8350k I looked at the latest 100 units. 4 were still in progress. 1st + pending work out to 67%. So... no significant change from last time. | |
|
|
there is a exception from every rule. if you are simply to slow, you will be always last.
run test on my ryzen 1700. 1 threaded 2500 sec or ~ 41 min. only 25% first...
lets see if I can tweak that so I have a run time of 1200 sec or 20 min.
____________
| |
|
|
so I did some testing with my ryzen 7 1700 (16 threads, 15 for boinc/primegrid avaible)
run min run max cpu min cpu max run avg cpu avg % Ratio run/cpu time Ratio cpu/run time con task
1 thread 2,329.46 2,388.45 2,322.19 2,381.47 2,360.87 2,352.39 100.00 1.00 1.00 15
2 thread 1,100.00 2,733.91 2,588.59 2,795.09 1,883.60 2,692.28 79.78 0.70 1.43 7
3 thread 988.00 2,906.53 2,781.97 2,906.53 1,393.29 2,837.94 59.02 0.49 2.04 5
4 thread 653.00 2,448.08 2,252.27 2,456.38 1,389.05 2,398.94 58.84 0.58 1.73 3
7 thread 576.00 1,046.00 2,834.58 2,903.75 662.25 2,874.76 28.05 0.23 4.34 2
14 thread 512.00 545.00 4,119.69 4,152.22 530.63 4,134.25 22.48 0.13 7.79 1
3threaded has a competitive avg run time with decent throughput.
I will run 2threaded until this evening and then decided if I run 2 or 3 threaded next month...
reason: most (more then every second) 2nd tasks, as far I have seen, are send ~10 minutes after the first task. all other are send within seconds or 30+min later. so a run time of 20-25 minutes is enough to win if my tasks is send first. If I get the second tasks I have little chance to win (less then 20%), but I can only change that If I reduce my throughput drastically. thats not worth it...
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
so I did some testing with my ryzen 7 1700 (16 threads, 15 for boinc/primegrid avaible)
run min run max cpu min cpu max run avg cpu avg % Ratio run/cpu time Ratio cpu/run time con task
1 thread 2,329.46 2,388.45 2,322.19 2,381.47 2,360.87 2,352.39 100.00 1.00 1.00 15
2 thread 1,100.00 2,733.91 2,588.59 2,795.09 1,883.60 2,692.28 79.78 0.70 1.43 7
3 thread 988.00 2,906.53 2,781.97 2,906.53 1,393.29 2,837.94 59.02 0.49 2.04 5
4 thread 653.00 2,448.08 2,252.27 2,456.38 1,389.05 2,398.94 58.84 0.58 1.73 3
7 thread 576.00 1,046.00 2,834.58 2,903.75 662.25 2,874.76 28.05 0.23 4.34 2
14 thread 512.00 545.00 4,119.69 4,152.22 530.63 4,134.25 22.48 0.13 7.79 1
3threaded has a competitive avg run time with decent throughput.
I will run 2threaded until this evening and then decided if I run 2 or 3 threaded next month...
reason: most (more then every second) 2nd tasks, as far I have seen, are send ~10 minutes after the first task. all other are send within seconds or 30+min later. so a run time of 20-25 minutes is enough to win if my tasks is send first. If I get the second tasks I have little chance to win (less then 20%), but I can only change that If I reduce my throughput drastically. thats not worth it...
When you tested "1 thread", did that mean you were running 1 task of 1 thread? Or does it mean you ran 15 tasks of 1 thread each? The latter is how you would probably be running, and tests a realistic memory contention situation.
____________
My lucky number is 75898524288+1 | |
|
|
the last collumn ist the number of concurrent tasks run. so at 1threaded I run 15 tasks at the same time.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13881 ID: 53948 Credit: 383,609,451 RAC: 119,268
                              
|
the last collumn ist the number of concurrent tasks run. so at 1threaded I run 15 tasks at the same time.
Ah, missed that, my fault.
____________
My lucky number is 75898524288+1 | |
|
|
so, I decided to take the 3threaded path at least 50% of my tasks come back with 1st, so I'm happy :)
____________
| |
|
Message boards :
Proth Prime Search :
Multithreading PPSE |