Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Number crunching :
Year of the Rat Challenge
Author |
Message |
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
Welcome to the Year of the Rat Challenge!
The second challenge of the 2020 Series will be a 5-day challenge celebrating Chinese New Year (better late than never)! The challenge will be offered on the Sierpinski / Riesel Base 5 (LLR) application, beginning 12 March 06:00 UTC and ending 17 March 06:00 UTC
The Year of the Rat is the first zodiac sign in the Chinese zodiac cycle. According to the Chinese zodiac story, in the competition held by the Jade Emperor to decide the zodiac animals, the quick-witted rat asked the diligent ox to take him on a ride to cross the river and jumped down before the ox crossed the finish line, so the rat won the race and became the first of the zodiac animals.
The Rat is also associated with the Earthly Branch (地支—dì zhī) Zi (子) and the midnight hours. In the terms of yin and yang (阴阳—yīn yáng), the Rat is yang and represents the beginning of a new day. Rats are clever, quick thinkers; successful, but content with living a quiet and peaceful life.
So, theoretically, that should translate into good luck for our crunchers! We got a glimpse of it with our first SR5 prime of the year just a few days ago. However, the horoscope does indicate 5 as an unlucky number in the Year of the Rat, so we might be in for an unusually challenging challenge.
To participate in the Challenge, please select only the Sierpinski / Riesel Base 5 LLR (SR5) project in your PrimeGrid preferences section.
NOTE: If the candidate being tested is indeed prime, the task will take ALMOST 10 TIMES AS LONG TO COMPLETE. If a task is taking longer than expected, DO NOT ABORT IT!
Application builds are available for Linux 32 and 64 bit, Windows 32 and 64 bit and MacIntel. Intel CPUs with FMA3 capabilities (Haswell, Broadwell, Skylake, Kaby Lake, Coffee Lake) will have a very large advantage, and Intel CPUs with dual AVX-512 (certain recent Intel Skylake-X and Xeon CPUs) will be the fastest.
ATTENTION: The primality program LLR is CPU intensive; so, it is vital to have a stable system with good cooling. It does not tolerate "even the slightest of errors." Please see this post for more details on how you can "stress test" your computer. Tasks on one CPU core will take 18 hours on fast/newer computers and 3 days+ on slower/older computers. If your computer is highly overclocked, please consider "stress testing" it. Sieving is an excellent alternative for computers that are not able to LLR. :)
Highly overclocked Haswell, Broadwell, Skylake, Kaby Lake or Coffee Lake (i.e., Intel Core i7, i5, and i3 -4xxx or better) computers running the application will see fastest times. Note that SR5 is running the latest AVX-512 version of LLR which takes full advantage of the features of these newer CPUs. It's faster than the previous LLR app and draws more power and produces more heat. If you have certain recent Intel Skylake-X and Xeon CPUs, especially if it's overclocked or has overclocked memory, and haven't run the new AVX-512 LLR before, we strongly suggest running it before the challenge while you are monitoring the temperatures.
Please, please, please make sure your machines are up to the task.
Multi-threading optimisation instructions
Those looking to maximise their computer's performance during this challenge, or when running LLR in general, may find this information useful.
- Your mileage may vary. Before the challenge starts, take some time and experiment and see what works best on your computer.
- If you have an Intel CPU with hyperthreading, either turn off the hyperthreading in the BIOS, or set BOINC to use 50% of the processors.
- If you're using a GPU for other tasks, it may be beneficial to leave hyperthreading on in the BIOS and instead tell BOINC to use 50% of the CPU's. This will allow one of the hyperthreads to service the GPU.
- The new multi-threading system is now live. This will allow you to select multi-threading from the project preferences web page. No more app_config.xml. It works like this:
- In the preferences selection, there are selections for "max jobs" and "max cpus", similar to the settings in app_config.
- Unlike app_config, these two settings apply to ALL apps. You can't chose 1 thread for SGS and 4 for SoB. When you change apps, you need to change your multithreading settings if you want to run a different number of threads.
- There will be individual settings for each venue (location).
- This will eliminate the problem of BOINC downloading 1 task for every core.
- The hyperthreading control isn't possible at this time.
- The "max cpus" control will only apply to LLR apps. The "max jobs" control applies to all apps.
- If you want to continue to use app_config.xml for LLR tasks, you need to change it if you want it to work. Please see this message for more information.
- Some people have observed that when using multithreaded LLR, hyperthreading is actually beneficial. We encourage you to experiment and see what works best for you.
Time zone converter:
The World Clock - Time Zone Converter
NOTE: The countdown clock on the front page uses the host computer time. Therefore, if your computer time is off, so will the countdown clock. For precise timing, use the UTC Time in the data section at the very top, above the countdown clock.
Scoring Information
Scores will be kept for individuals and teams. Only tasks issued AFTER 12th March 2020 06:00 UTC and received BEFORE 17th March 2020 06:00 UTC will be considered for credit. We will be using the same scoring method as we currently use for BOINC credits. A quorum of 2 is NOT needed to award Challenge score - i.e. no double checker. Therefore, each returned result will earn a Challenge score. Please note that if the result is eventually declared invalid, the score will be removed.
At the Conclusion of the Challenge
We kindly ask users "moving on" to ABORT their tasks instead of DETACHING, RESETTING, or PAUSING.
ABORTING tasks allows them to be recycled immediately; thus a much faster "clean up" to the end of an LLR Challenge. DETACHING, RESETTING, and PAUSING tasks causes them to remain in limbo until they EXPIRE. Therefore, we must wait until tasks expire to send them out to be completed.
Please consider either completing what's in the queue or ABORTING them. Thank you. :)
About the SR5 Project
Sierpinski Base 5 - The smallest even Sierpinski base 5 number is suggested to be k=159986. To prove this, it is sufficient to show that k*5^n+1 is prime for each even k < 159986. This has currently been achieved for all even k, with the exception of the following 31 values (as of 26 April 2019):
k = 6436, 7528, 10918, 26798, 29914, 31712, 36412, 41738, 44348, 44738, 45748, 51208, 58642, 60394, 62698, 64258, 67612, 67748, 71492, 74632, 76724, 83936, 84284, 90056, 92906, 93484, 105464, 118568, 126134, 139196, 152588 Riesel Base 5 - The smallest even Riesel base 5 number is suggested to be k=346802. To prove this, it is sufficient to show that k*5^n-1 is prime for each even k < 346802. This has currently been achieved for all even k, with the exception of the following 66 values (as of 5 March 2020):
k = 3622, 4906, 23906, 26222, 35248, 52922, 63838, 64598, 68132, 71146, 76354, 81134, 92936, 102818, 102952, 109238, 109838, 109862, 127174, 131848, 134266, 136804, 143632, 145462, 145484, 146264, 146756, 147844, 151042, 152428, 154844, 159388, 164852, 170386, 170908, 177742, 182398, 187916, 189766, 190334, 195872, 201778, 204394, 206894, 207494, 213988, 231674, 238694, 239062, 239342, 246238, 248546, 259072, 265702, 267298, 271162, 273662, 285598, 285728, 298442, 304004, 313126, 318278, 325922, 335414, 338866 History
Robert Smith originally presented the idea of a Sierpinski/Riesel base 5 search on 17 September 2004, in the primeform yahoo group. Using {3,7,13,31,601} as the covering set, he proposed that k=346802 is the smallest Riesel base 5 number. Shortly afterwards, Guido Smetrijns proposed that k=159986 is the smallest Sierpinski base 5 number.
After doing most of the initial work himself, Robert posted in the mersenneforum.org on 28 September 2004, and thus, the distributed effort began. Other principle players in the development, management, and growth of the project are Lars Dausch, Geoff Reynolds, Anand S Nair, and Thomas Masser.[/list]
Primes found by PrimeGrid
35816*5^2945294-1 found by Pavel Atnashev on 5 March 2020
322498*5^2800819-1 found by Jordan Romaidis on 23 June 2019 | Official Announcement
88444*5^2799269-1 found by Scott Brown on 21 June 2019 | Official Announcement
138514*5^2771922+1 found by Ken Ito on 26 April 2019 | Official Announcement
194368*5^2638045-1 found by Honza Cholt on 15 August 2018 | Official Announcement
66916*5^2628609-1 found by Honza Cholt on 29 July 2018 | Official Announcement
81556*5^2539960+1 found by Jiří Bočan on 20 June 2018 | Official Announcement
327926*5^2542838-1 found by Seiya Tsuji on 19 June 2018 | Official Announcement
301562*5^2408646-1 found by Håkan Lind on 17 September 2017 | Official Announcement
171362*5^2400996-1 found by Frank Schwegler on 25 August 2017 | Official Announcement
180062*5^2249192-1 found by Stefan Larsson on 20 August 2016 | Official Announcement
53546*5^2216664-1 found by Tom Greer on 30 May 2016 | Official Announcement
296024*5^2185270-1 found by Steven Wong on 25 March 2016 | Official Announcement
92158*5^2145024+1 found by Karl Burridge on 15 March 2016 | Official Announcement
77072*5^2139921+1 found by Wolfgang Becker on 6 March 2016 | Official Announcement
306398*5^2112410-1 found by André Ahlfors Dahl on 11 January 2016 | Official Announcement
154222*5^2091432+1 found by Scott Brown on 10 November 2015 | Official Announcement
100186*5^2079747-1 found by Toshitaka Kumagai on 21 October 2015 | Official Announcement
144052*5^2018290+1 found by Wolfgang Schmidt on 23 May 2015 | Official Announcement
109208*5^1816285+1 found by Scott Brown on 18 October 2014 | Official Announcement
325918*5^1803339-1 found by Jörg Meili on 21 September 2014 | Official Announcement
133778*5^1785689+1 found by Guo Hua Miao on 17 August 2014 | Official Announcement
24032*5^1768249+1 found by Hiroyuki Okazaki on 23 July 2014 | Official Announcement
138172*5^1714207-1 found by Walter Darimont on 27 June 2014 | Official Announcement
22478*5^1675150-1 found by Guo Hua Miao on 19 June 2014 | Official Announcement
326834*5^1634978-1 found by Scott Brown on 25 April 2014 | Official Announcement
207394*5^1612573-1 found by Honza Cholt on 9 April 2014 | Official Announcement
104944*5^1610735-1 found by Brian Smith on 9 April 2014 | Official Announcement
330286*5^1584399-1 found by Scott Brown on 21 March 2014 | Official Announcement
22934*5^1536762-1 found by Keishi Toda on 6 February 2014 | Official Announcement
178658*5^1525224-1 found by Keishi Toda on 31 January 2014 | Official Announcement
59912*5^1500861+1 found by Raymond Ottusch on 17 January 2014 | Official Announcement
37292*5^1487989+1 found by Stephen R Cilliers on 29 December 2013 | Official Announcement
173198*5^1457792-1 found by Motohiro Ohno on 4 December 2013 | Official Announcement
245114*5^1424104-1 found by David Yost on 1 November 2013
175124*5^1422646-1 found by David Yost on 31 October 2013
256612*5^1335485-1 found by Wolfgang Schwieger on 4 August 2013
268514*5^1292240-1 found by Raymond Schouten on 16 July 2013
243944*5^1258576-1 found by Tod Slakans on 5 July 2013
97366*5^1259955-1 found by Jörg Meili on 4 July 2013
84466*5^1215373-1 found by Raymond Schouten on 29 June 2013
150344*5^1205508-1 found by Randy Ready on 28 June 2013
1396*5^1146713-1 found by Randy Ready on 23 June 2013
17152*5^1131205-1 found by Bob Benson on 22 June 2013
92182*5^1135262+1 found by Randy Ready on 21 June 2013
329584*5^1122935-1 found by Stephen R Cilliers on 21 June 2013
305716*5^1093095-1 found by Randy Ready on 18 June 2013
130484*5^1080012-1 found by Randy Ready on 17 June 2013
97768*5^987383-1 found by Ulrich Hartel on 17 June 2013
55154*5^1063213+1 found by Senji Yamashita on 16 June 2013
243686*5^1036954-1 found by Katsumi Hirai on 16 June 2013
70082*5^936972-1 found by Scott Brown on 30 May 2013
102976*5^929801-1 found by David Yost on 9 May 2013
110488*5^917100+1 found by Ronny Willig on 25 March 2013
162434*5^856004-1 found by Predrag Kurtovic on 10 January 2013
174344*5^855138-1 found by Ronny Willig on 9 January 2013
57406*5^844253-1 found by David Yost on 7 November 2012
48764*5^831946-1 found by David Yost on 12 October 2012
162668*5^785748-1 found by Lennart Vogel on 3 July 2012
289184*5^770116-1 found by David Yost on 7 June 2012
11812*5^769343-1 found by Göran Schmidt on 2 June 2012
316594*5^766005-1 found by Michael Becker on 30 May 2012
340168*5^753789-1 found by Kimmo Myllyvirta on 18 May 2012
338948*5^743996-1 found by Ricky L Hubbard on 7 May 2012
18656*5^735326-1 found by Lennart Vogel on 3 May 2012
5374*5^723697-1 found by Kelvin Lewis on 13 April 2012
72532*5^708453-1 found by Göran Schmidt on 7 February 2012
2488*5^679769-1 found by Sascha Beat Dinkel on 24 November 2011
331882*5^674961-1 found by Ronny Willig on 11 November 2011
27994*5^645221-1 found by Philipp Bliedung on 18 July 2011
262172*5^643342-1 found by Kimmo Myllyvirta on 13 July 2011
49568*5^640900-1 found by Sascha Beat Dinkel on 1 July 2011
270748*5^614625-1 found by Puzzle Peter on 14 February 2011
266206*5^608649-1 found by Puzzle Peter on 10 February 2011
210092*5^618136-1 found by Puzzle Peter on 31 January 2011
301016*5^586858-1 found by Puzzle Peter on 24 January 2011
Primes found by SR5 since collaboration
109988*5^544269+1 found by ltd on 23 April 2011
68492*5^542553+1 found by ltd on 24 April 2011
Primes found by others
114986*5^1052966-1 found by Sergey Batalov on 3 June 2013
119878*5^1019645-1 found by Sergey Batalov on 3 June 2013 | |
|
|
I have only one hing to say to this:
https://www.youtube.com/watch?v=0zSWqJGTa-I | |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3254 ID: 130544 Credit: 2,447,397,123 RAC: 4,246,072
                           
|
Hmm just before my alarm goes off. | |
|
|
Hmm just before my alarm goes off.
ATP breaks out the ice breakers to go rescue AC :) | |
|
|
One prime found just before the challenge...let's see if we get lucky and find another couple during the challenge.
____________
| |
|
|
If my math is right, we should find one SR5 prime per 40k tasks. With 10k tasks/day we have a good chance to find one during the challenge. | |
|
|
With just 20 minutes left to the challenge, my SoB task has 20 hours 😂😂
Definitely no top 300 this time
____________
My lucky number is 6219*2^3374198+1
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14039 ID: 53948 Credit: 479,790,081 RAC: 436,036
                               
|
One prime found just before the challenge...let's see if we get lucky and find another couple during the challenge.
There were two found right before the challenge.
____________
My lucky number is 75898524288+1 | |
|
|
One prime found just before the challenge...let's see if we get lucky and find another couple during the challenge.
There were two found right before the challenge.
Perhaps we were all too excited...
Besides, who is going to report challenge stats day-to-day now?
____________
My lucky number is 6219*2^3374198+1
| |
|
|
Is there a YOTRC Stats page anywhere? | |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3254 ID: 130544 Credit: 2,447,397,123 RAC: 4,246,072
                           
|
Is there a YOTRC Stats page anywhere?
There is now. | |
|
|
I am trying to run this challenge, on my Preferences sheet, it states the CPU effort is disabled? Repeatedly pushing for WU's gets me zero, what is wrong in this scenario?
____________
| |
|
TernVolunteer developer Volunteer tester
 Send message
Joined: 20 Sep 15 Posts: 32 ID: 421148 Credit: 536,166,476 RAC: 1,219,329
                        
|
I am trying to run this challenge, on my Preferences sheet, it states the CPU effort is disabled? Repeatedly pushing for WU's gets me zero, what is wrong in this scenario?
"Use CPU" probably turned off at the top of your preferences. | |
|
|
Thank you, sometimes it is the tiniest overlooked errors. I am human. | |
|
|
You dirty rats!!!
I am trying to run AP27 tasks and I get sent an SR5 task. In response we called the pest control people to deal with the vermin, unfortunate they claim they are unable to come out until after next week. Coincidence? I think not.
(We actually did call the exterminator as we have rats and squirrels eating from our fruit trees.)
____________
Werinbert is not prime... or PRPnet keeps telling me so.
Badge score: 2x3 + 5x4 + 5x5 + 4x7 + 1x8 + 1x9 + 3x10 = 126 | |
|
|
I'm running SR5 on Ryzen3900X and Threadripper3970X, but threadripper somehow takes more than twice as long as 3900X with the same settings.
Changing the number of multithreads is the same sign.
Is threadripper incompatible with SR5?
Is it because L3 cache not shared? | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 7,565
                              
|
Is it because L3 cache not shared?
This shouldn't matter at low thread counts per task. From what I have observed for small to medium sized tasks, as long as you don't exceed a CCX with each task the performance should be good. You appear to be running 2 threads per task, and only using the total threads equal to the core count. This should be fine.
Without seeing the system running, I can only guess to possible causes.
1, what clock are the 3900X and 3970X running at while doing this work?
3970X: 280W TDP (=PPT?) / 32 cores = 8.75W/core
3900X: 142W PPT / 12 cores = 11.8W/core
Note above is ignoring IOD requirements, assuming it scales similarly.
The 3900X would be expected to clock a bit higher from the extra potential power per core, but not 2x difference. There may be other limits in place, if you use Ryzen Master you can see if maybe the current limits are activating, or if there is thermal throttling.
2, Fire up task manager and set it to show CPU core activity individually. Check Windows is not doing something silly with where it is placing the tasks. Going across, every pair are the two threads of one core from SMT. You should see approximately one filled box per two threads. This may be entirely on one, or split over two. You should not see both maxed out, or both idle. If you do, this is Windows not being smart. A workaround would be to disable SMT while running this work. | |
|
|
I have common settings for both as follows
Multithreading : 2core per task
SMT : OFF
CPU USE : 100% (6 and 16 tasks simultaneously)
1.
-3970X : Use PBO PPT=350W(@83%usage) CPU Power≒198W, TDC,EDC,Temp limit not reached.
Clock : 4000~4020MHz all core
Elapsed time : 23,388 / 17,713 / 28,532
CPU time : 42,503 / 29,973 / 48,719
-3900X : Use PBO PPT=180W(@82%usage) CPU Power≒118W, TDC,EDC,Temp limit not reached.
Clock : 4070MHz all core
Elapsed time : 11,129 / 8,759 / 12,105
CPU time : 21,533 / 16,944 / 23,430
2.Tasks seem to be distributed without problems to all cores.
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 7,565
                              
|
I didn't notice that SMT was already disabled. I don't know of any reason why there is the slowdown in performance given what has been stated. I'd check that nothing else is running in Windows that might be taking up CPU time. Spot checking one result the time between server sending/receiving lines up with the runtime so that shouldn't be it.
The only remaining possibility I can think of is if Windows is for some reason splitting a task across different CCX, but I have no idea how to check for this.
It might be an interesting test to limit the tasks running to equal the 3900X, and see if you then get similar times or if it is still slow. | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 304 ID: 119185 Credit: 4,286,700,235 RAC: 1,738,363
                      
|
Each SR5 task uses about 6 MB of L3 cache.
If you exceed 72 MB of L3 usage, you're going to get throttling because of accessing memory.
You need to have <12 SR5 tasks to stay within your L3 cache.
Try 8 tasks by 4 threads.
____________
| |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,873,511 RAC: 44,026
                     
|
Everyone please keep in mind that LLR is doing a PRP test on SR5 candidates. If that test comes out as PRP (probable prime), then LLR starts a second round of testing. If c=1 that test takes as long as the first test and so it'll take twice as long overall. If c=-1 the second test takes 4x as long as the first one for a total of 5 times as long. If you have a candidate that's taking longer, do not abort the test. You could be throwing away your chance to be a prime finder.
GCW behaves similarly, just so everyone knows. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 7,565
                              
|
Each SR5 task uses about 6 MB of L3 cache.
If you exceed 72 MB of L3 usage, you're going to get throttling because of accessing memory.
You need to have <12 SR5 tasks to stay within your L3 cache.
Try 8 tasks by 4 threads.
Zen 2 CPUs have 16MB L3 per CCX (up to 4 cores depending on model). Running multiple tasks on 2 cores should not exhaust L3. The 3970X has 128MB of L3 in total.
Edit: within the limits of my testing, comparing the FFT data size to L3 cache has generally shown good indication. I'm wondering now that we are getting more cores than ever, if other effects could start coming into play. I'm aware that in addition to the FFT data there is some other lookup data used during the calculation. In the past this doesn't seem to have an impact, but with more tasks running at the same time, without a similar upgrade in connectivity bandwidth and ram bandwidth, they may start to limit. I'd agree that running 8 tasks of 4 threads each would be interesting to see if that helps. Based on benchmarks I did using Prime95 on 3700X, running with 4 threads per task was only about 2% lower throughput than 2 threads per task in that scenario. | |
|
|
Thank you everyone.
No other heavy application is running.
I sed it to 4 thread per task.
It doesn't seem to improve throughput at the moment
I will look at the situation for 24 hours.
At present Zen2 threadripper is only half as efficient as AM4 ryzen, so my 3970X has the same or lower throughput than 3950X.
In other tasks such as PPSE, the throughput was based on the clock and the core, so I think that it is an phenomenon that occurs only in SR5. | |
|
tng Send message
Joined: 29 Aug 10 Posts: 500 ID: 66603 Credit: 50,901,391,226 RAC: 30,731,800
                                                    
|
Everyone please keep in mind that LLR is doing a PRP test on SR5 candidates. If that test comes out as PRP (probable prime), then LLR starts a second round of testing. If c=1 that test takes as long as the first test and so it'll take twice as long overall. If c=-1 the second test takes 4x as long as the first one for a total of 5 times as long. If you have a candidate that's taking longer, do not abort the test. You could be throwing away your chance to be a prime finder.
GCW behaves similarly, just so everyone knows.
Didn't know that GCW did that -- glad I didn't spot that one and abort it.
____________
| |
|
|
The 2nd prime that was found before the challenge (too late to be mentioned in the initial post above):
DeleteNull: 146264*5^2953282-1
A prime the was found during the challenge:
(pending verification): 238694*5^2979422-1
/JeppeSN | |
|
Chooka  Send message
Joined: 15 May 18 Posts: 335 ID: 1014486 Credit: 1,312,549,885 RAC: 3,982,703
                         
|
The 2nd prime that was found before the challenge (too late to be mentioned in the initial post above):
DeleteNull: 146264*5^2953282-1
A prime the was found during the challenge:
(pending verification): 238694*5^2979422-1
/JeppeSN
Its not my surname :(
____________
Слава Україні! | |
|
|
The 2nd prime that was found before the challenge (too late to be mentioned in the initial post above):
DeleteNull: 146264*5^2953282-1
A prime the was found during the challenge:
(pending verification): 238694*5^2979422-1
/JeppeSN
Its not my surname :(
Its mine and I am thrilled with it!
This is only my second prime and it is much larger then the first I found back in 2015. The 2015 find was Decimal Digits:388,341. This new ones dwarfs that at Decimal Digits:2,082,532. | |
|
|
Congrats 👏
And the double checker was my threadripper. lol
http://www.primegrid.com/workunit.php?wuid=649556277
I know that tasks involving prime are very long, but threadripper still seems to be slow.
Threadripper's simultaneous execution of 4 threads results in the same processing time as AM4 Ryzen's simultaneous execution of 2 threads. | |
|
|
http://www.primegrid.com/workunit.php?wuid=649556277
Congratulations.
http://www.primegrid.com/workunit.php?wuid=649556277
Khali, when I look at your task from this link, I see 14 threads (-oThreadsPerTest=14). When I look at your computer from this link, I see 213 tasks "In progress". That is too many.
I think you should look into not buffering tasks. Because then you can return them much sooner after you receive them. In addition, I think you should check if 14 threads is not too many.
With such changes, you will increase your probability of being the finder of more primes.
/JeppeSN
| |
|
|
http://www.primegrid.com/workunit.php?wuid=649556277
Congratulations.
http://www.primegrid.com/workunit.php?wuid=649556277
Khali, when I look at your task from this link, I see 14 threads (-oThreadsPerTest=14). When I look at your computer from this link, I see 213 tasks "In progress". That is too many.
I think you should look into not buffering tasks. Because then you can return them much sooner after you receive them. In addition, I think you should check if 14 threads is not too many.
With such changes, you will increase your probability of being the finder of more primes.
/JeppeSN
The buffer thing was a mistake. It was a setting for another project that runs out of work almost weekly and I wanted a big buffer so I had work during the periods the project didn't have any available. Some how that setting overrode all my other projects buffer settings. I have since fixed that issue. Instead of aborting all those tasks I have simply set Prime Grid to not get new work. I should be good for another 4 or 5 days at this rate.
As for the threads, my CPU has 16 threads and I reserved 2 for GPU tasks. I just put the other 14 to work on CPU tasks. I'm not sure how less threads would make things go faster. But I might try dropping it down as a test.
| |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
Sorry for the delay, one of these days I'll get my act together! :)
Challenge: Year of the Rat
App: 19 (SR5-LLR)
(As of 2020-03-14 23:48:28 UTC)
83986 tasks have been sent out. [CPU/GPU/anonymous_platform: 83946 (100%) / 0 (0%) / 40 (0%)]
Of those tasks that have been sent out:
6847 (8%) were aborted. [6847 (8%) / 0 (0%) / 0 (0%)]
620 (1%) came back with some kind of an error. [620 (1%) / 0 (0%) / 0 (0%)]
60015 (71%) have returned a successful result. [59976 (71%) / 0 (0%) / 39 (0%)]
16504 (20%) are still in progress. [16503 (20%) / 0 (0%) / 1 (0%)]
Of the tasks that have been returned successfully:
11629 (19%) are pending validation. [11626 (19%) / 0 (0%) / 3 (0%)]
48177 (80%) have been successfully validated. [48141 (80%) / 0 (0%) / 36 (0%)]
167 (0%) were invalid. [167 (0%) / 0 (0%) / 0 (0%)]
42 (0%) are inconclusive. [42 (0%) / 0 (0%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is n=3025127. The leading edge was at n=2968224 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 1.92% as much as it had prior to the challenge! | |
|
robish Volunteer moderator Volunteer tester
 Send message
Joined: 7 Jan 12 Posts: 2223 ID: 126266 Credit: 7,959,544,636 RAC: 5,439,112
                               
|
http://www.primegrid.com/workunit.php?wuid=649556277
Congratulations.
http://www.primegrid.com/workunit.php?wuid=649556277
Khali, when I look at your task from this link, I see 14 threads (-oThreadsPerTest=14). When I look at your computer from this link, I see 213 tasks "In progress". That is too many.
I think you should look into not buffering tasks. Because then you can return them much sooner after you receive them. In addition, I think you should check if 14 threads is not too many.
With such changes, you will increase your probability of being the finder of more primes.
/JeppeSN
The buffer thing was a mistake. It was a setting for another project that runs out of work almost weekly and I wanted a big buffer so I had work during the periods the project didn't have any available. Some how that setting overrode all my other projects buffer settings. I have since fixed that issue. Instead of aborting all those tasks I have simply set Prime Grid to not get new work. I should be good for another 4 or 5 days at this rate.
As for the threads, my CPU has 16 threads and I reserved 2 for GPU tasks. I just put the other 14 to work on CPU tasks. I'm not sure how less threads would make things go faster. But I might try dropping it down as a test.
Its ok to abort tasks, they get recycled anyway. Congrats!
____________
My lucky number 10590941048576+1 | |
|
|
The 2nd prime that was found before the challenge (too late to be mentioned in the initial post above):
DeleteNull: 146264*5^2953282-1
A prime the was found during the challenge:
(pending verification): 238694*5^2979422-1
/JeppeSN
Its not my surname :(
Its mine and I am thrilled with it!
This is only my second prime and it is much larger then the first I found back in 2015. The 2015 find was Decimal Digits:388,341. This new ones dwarfs that at Decimal Digits:2,082,532.
Congratulations!!!!!!! Great job!
____________
My lucky number is 6219*2^3374198+1
| |
|
James Project administrator Volunteer tester Send message
Joined: 19 Sep 14 Posts: 101 ID: 366225 Credit: 1,565,253,279 RAC: 27,899
                         
|
http://www.primegrid.com/workunit.php?wuid=649556277
Congratulations.
http://www.primegrid.com/workunit.php?wuid=649556277
Khali, when I look at your task from this link, I see 14 threads (-oThreadsPerTest=14). When I look at your computer from this link, I see 213 tasks "In progress". That is too many.
I think you should look into not buffering tasks. Because then you can return them much sooner after you receive them. In addition, I think you should check if 14 threads is not too many.
With such changes, you will increase your probability of being the finder of more primes.
/JeppeSN
The buffer thing was a mistake. It was a setting for another project that runs out of work almost weekly and I wanted a big buffer so I had work during the periods the project didn't have any available. Some how that setting overrode all my other projects buffer settings. I have since fixed that issue. Instead of aborting all those tasks I have simply set Prime Grid to not get new work. I should be good for another 4 or 5 days at this rate.
As for the threads, my CPU has 16 threads and I reserved 2 for GPU tasks. I just put the other 14 to work on CPU tasks. I'm not sure how less threads would make things go faster. But I might try dropping it down as a test.
Its ok to abort tasks, they get recycled anyway. Congrats!
Not only is it okay to abort tasks, it'd be preferable (given that you have 200ish long tasks) to abort them instead of setting PrimeGrid to No New Task. Once a task is aborted, the server immediately recycles them and sends them back out (less waiting for both you and others).
The longer you leave a task in a cache, the lower the likelihood of your task being the first one being returned to the server, so the lower your chances of being the finder of a prime.
| |
|
|
Sorry for the delay, one of these days I'll get my act together! :)
Challenge: Year of the Rat
App: 19 (SR5-LLR)
(As of 2020-03-14 23:48:28 UTC)
83986 tasks have been sent out. [CPU/GPU/anonymous_platform: 83946 (100%) / 0 (0%) / 40 (0%)]
Of those tasks that have been sent out:
6847 (8%) were aborted. [6847 (8%) / 0 (0%) / 0 (0%)]
620 (1%) came back with some kind of an error. [620 (1%) / 0 (0%) / 0 (0%)]
60015 (71%) have returned a successful result. [59976 (71%) / 0 (0%) / 39 (0%)]
16504 (20%) are still in progress. [16503 (20%) / 0 (0%) / 1 (0%)]
Of the tasks that have been returned successfully:
11629 (19%) are pending validation. [11626 (19%) / 0 (0%) / 3 (0%)]
48177 (80%) have been successfully validated. [48141 (80%) / 0 (0%) / 36 (0%)]
167 (0%) were invalid. [167 (0%) / 0 (0%) / 0 (0%)]
42 (0%) are inconclusive. [42 (0%) / 0 (0%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is n=3025127. The leading edge was at n=2968224 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 1.92% as much as it had prior to the challenge!
A small suggestion, possibly off topic:
Could you put the number of primes found in that post?
____________
My lucky number is 6219*2^3374198+1
| |
|
|
@SAKAGE@AMD@jisaku
I see your point. I also had similar problems with my 3900x when I tried to run 12 PSP with 2 threads. The cpu (for some reason back then) couldnt handle it. Temperatures dropped, watts used dropped, all counter intuitively.
My solution was to set affinity (Task manager, details, primegrid_cllr.exe) and to set it to certain threads where multiple primegrid_cllr do not overlap each other and they dont cross weird boundaries in the CCD/CCX architecture. That is, when you have a 32 core setup, try running it with 8 tasks and 8 threads. So task 1 gets affinity 0-7, tasks 2 gets affinity 8-15 etc.. | |
|
|
The buffer thing was a mistake. It was a setting for another project that runs out of work almost weekly and I wanted a big buffer so I had work during the periods the project didn't have any available. Some how that setting overrode all my other projects buffer settings.
I think the "Computing preferences" (as opposed to "PrimeGrid preferences") will be shared across different projects as soon as one of your computer knows of both projects' servers.
So you may need to use "venues" (Mercury, Venus, etc.) to avoid that. Or else, make sure no computer knows of more than one BOINC project before you change settings ("Computing preferences") that you want to apply to one project only.
Like others said, it is good to abort tasks you acquire by accident.
/JeppeSN | |
|
mikey Send message
Joined: 17 Mar 09 Posts: 1904 ID: 37043 Credit: 828,891,437 RAC: 735,790
                     
|
Congrats 👏
And the double checker was my threadripper. lol
http://www.primegrid.com/workunit.php?wuid=649556277
I know that tasks involving prime are very long, but threadripper still seems to be slow.
Threadripper's simultaneous execution of 4 threads results in the same processing time as AM4 Ryzen's simultaneous execution of 2 threads.
You are right the 3970X Threadripper is slower than an Intel, I run the same workunits on an I7-3930K and do them on 6 threads, with HT disabled, in around half the time:
http://www.primegrid.com/result.php?resultid=1078380671
Google says you have 32 threads and 64 with HT have you tried upping the threads per workunit to see if that's faster overall. I'm guessing you are running 8 workunits at a time with 4 threads per workunit right now. I have an older 1920X Threadripper, 12/24, and when I put 11 threads per workunit on it it almost matched the same I7-3930K cpu in time for a different LLR Project. My Threadrippers, I have 2 of them, are doing a different project right now so I can't compare them on SR5. | |
|
|
I tried the setting of 1 thread and 8 threads, but the situation has not changed.
Only half the throughput (per core) compared to AM4 Ryzen (Zen2). | |
|
Chooka  Send message
Joined: 15 May 18 Posts: 335 ID: 1014486 Credit: 1,312,549,885 RAC: 3,982,703
                         
|
I'm running a 1950X and a 3950X if you want to check my stats.
I've been a bit all over the place though. I was running 4 * 4 but now running 8 * 2 on both setups. Also I haven't had the 3950X running 24hrs as I shut down BOINC at times over the weekend for some gaming.
The figures should become cleaner today/tomorrow but currently the 3950X looks slightly slower than the 1950X despite the difference in clock speeds. (CPU time is 50,756 vs 48,025)
____________
Слава Україні! | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 7,565
                              
|
Hi SAKAGE@AMD@jisaku and Chooka,
I'm still interested in the performance you report and only wish I had the hardware to try it myself. In theory the Zen 2 CPUs should be really great in this use case, but I only bought up to 3700X (8 cores) so I can't test if something else is happening at higher core counts.
Could I request you do a Prime95 benchmark as I describe in the post linked?
http://www.primegrid.com/forum_thread.php?id=8240&nowrap=true#138938
SAKAGE@AMD@jisaku: please try 1, 2, 4, 8, 16, 32 workers on 3970X
Chook: please try 1, 2, 4, 8, 16 workers on both 1950X and 3950X
Obviously this should be done when nothing else is using CPU. It should be ok to suspend computation in BOINC temporarily for this. Results are also written to a text file results.bench.txt in the prime95 folder. Suggest you copy/paste the results in a private message to me to avoid filling this thread up. | |
|
Chooka  Send message
Joined: 15 May 18 Posts: 335 ID: 1014486 Credit: 1,312,549,885 RAC: 3,982,703
                         
|
Happy to help. I need to get through the working day first though :/
____________
Слава Україні! | |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
One day left, folks! We still might see another prime sometime soon, they seem to come in groups of two!
Challenge: Year of the Rat
App: 19 (SR5-LLR)
(As of 2020-03-16 02:25:54 UTC)
114957 tasks have been sent out. [CPU/GPU/anonymous_platform: 114901 (100%) / 0 (0%) / 56 (0%)]
Of those tasks that have been sent out:
7862 (7%) were aborted. [7862 (7%) / 0 (0%) / 0 (0%)]
805 (1%) came back with some kind of an error. [805 (1%) / 0 (0%) / 0 (0%)]
89023 (77%) have returned a successful result. [88968 (77%) / 0 (0%) / 55 (0%)]
17267 (15%) are still in progress. [17266 (15%) / 0 (0%) / 1 (0%)]
Of the tasks that have been returned successfully:
12788 (14%) are pending validation. [12782 (14%) / 0 (0%) / 6 (0%)]
75935 (85%) have been successfully validated. [75886 (85%) / 0 (0%) / 49 (0%)]
252 (0%) were invalid. [252 (0%) / 0 (0%) / 0 (0%)]
48 (0%) are inconclusive. [48 (0%) / 0 (0%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is n=3046959. The leading edge was at n=2968224 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 2.65% as much as it had prior to the challenge! | |
|
|
Hi SAKAGE@AMD@jisaku and Chooka,
I'm still interested in the performance you report and only wish I had the hardware to try it myself. In theory the Zen 2 CPUs should be really great in this use case, but I only bought up to 3700X (8 cores) so I can't test if something else is happening at higher core counts.
Could I request you do a Prime95 benchmark as I describe in the post linked?
http://www.primegrid.com/forum_thread.php?id=8240&nowrap=true#138938
SAKAGE@AMD@jisaku: please try 1, 2, 4, 8, 16, 32 workers on 3970X
Chook: please try 1, 2, 4, 8, 16 workers on both 1950X and 3950X
Obviously this should be done when nothing else is using CPU. It should be ok to suspend computation in BOINC temporarily for this. Results are also written to a text file results.bench.txt in the prime95 folder. Suggest you copy/paste the results in a private message to me to avoid filling this thread up.
I tested prime95 on 3970x and 3900x.
From this result, it seems that prime95 has a throughput corresponding to the number of cores.
CPUModel="AMD Ryzen Threadripper 3970X 32-Core Processor "
FFTlen=768K all-complex, (32 cores, 1 worker): 0.81 ms. Throughput: 1230.75 iter/sec.
FFTlen=768K all-complex, (32 cores, 2 workers): 0.63, 0.59 ms. Throughput: 3265.77 iter/sec.
FFTlen=768K all-complex, (32 cores, 4 workers): 0.58, 0.57, 0.60, 0.54 ms. Throughput: 6991.00 iter/sec.
FFTlen=768K all-complex, (32 cores, 8 workers): 0.73, 0.69, 0.75, 0.71, 0.78, 0.69, 0.69, 0.69 ms. Throughput: 11184.88 iter/sec.
FFTlen=768K all-complex, (32 cores, 16 workers): 1.47, 1.36, 1.38, 1.37, 1.43, 1.48, 1.35, 1.38, 1.49, 1.44, 1.35, 1.35, 1.36, 1.39, 1.37, 1.35 ms. Throughput: 11484.03 iter/sec.
FFTlen=768K all-complex, (32 cores, 32 workers): 11.87, 10.07, 8.86, 8.67, 9.76, 9.27, 8.96, 8.96, 10.31, 9.34, 9.81, 9.28, 9.46, 9.26, 9.03, 9.06, 10.95, 9.23, 8.53, 8.32, 11.08, 9.54, 9.00, 8.65, 9.56, 9.33, 9.41, 9.39, 11.29, 8.84, 9.05, 8.62 ms. Throughput: 3405.75 iter/sec.
CPUModel="AMD Ryzen 9 3900X 12-Core Processor "
FFTlen=768K all-complex, (12 cores, 1 worker): 0.49 ms. Throughput: 2056.66 iter/sec.
FFTlen=768K all-complex, (12 cores, 2 workers): 0.62, 0.59 ms. Throughput: 3295.53 iter/sec.
FFTlen=768K all-complex, (12 cores, 4 workers): 0.88, 0.88, 0.88, 0.88 ms. Throughput: 4558.04 iter/sec.
FFTlen=768K all-complex, (12 cores, 6 workers): 2.50, 1.27, 2.38, 1.28, 0.86, 0.87 ms. Throughput: 4696.47 iter/sec.
FFTlen=768K all-complex, (12 cores, 12 workers): 5.83, 5.46, 4.31, 4.41, 5.75, 5.02, 4.55, 4.52, 6.02, 4.28, 4.78, 5.94 ms. Throughput: 2405.59 iter/sec. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 7,565
                              
|
Hi SAKAGE@AMD@jisaku and Chooka,
Thanks for the Prime95 benchmark results. They are within my expectations in that the throughput seems to be optimal when tasks are each run with 2 or 3/4 cores (depending on CPU model). If I compare the per-task throughput with my 3700X (average 5400 seconds at 4 cores per task running approx. 4.1 GHz), they are roughly within expectations. So this doesn't show the observed longer runtimes.
My only guess at this point remains that doing FFTs is known to involve other data usage (look up tables to save CPU usage). When running multiple tasks this other data might start to become more significant, although I don't see any big difference in scaling between 2 and 4 cores per task, with the corresponding change in number of tasks. Prime95 might be able to share data better since it is one program instance, whereas we are running multiple LLR instances.
I think I might have to buy a higher core count CPU just to test this myself... | |
|
|
I have two 3970X machines. FWIW, I find that MT tasks run ~2x faster under linux than Windows. My guess is that it has something to do with the way the OS keeps MT tasks assigned to the chiplets. I think linux is just better at it.
____________
Reno, NV
| |
|
|
I have two 3970X machines. FWIW, I find that MT tasks run ~2x faster under linux than Windows. My guess is that it has something to do with the way the OS keeps MT tasks assigned to the chiplets. I think linux is just better at it.
On my 2990WX, Linux is at least 4x faster than Windows on these SR5 tasks. The big TR's really need Linux to perform at their potential.
____________
| |
|
|
With proper configuration there is zero difference between Windows and Linux. But it could be rather challenging to achieve the proper configuration with BOINC. | |
|
|
I have two 3970X machines. FWIW, I find that MT tasks run ~2x faster under linux than Windows. My guess is that it has something to do with the way the OS keeps MT tasks assigned to the chiplets. I think linux is just better at it.
On my 2990WX, Linux is at least 4x faster than Windows on these SR5 tasks. The big TR's really need Linux to perform at their potential.
hmm..
Thank you for the helpful information. | |
|
|
With proper configuration there is zero difference between Windows and Linux. But it could be rather challenging to achieve the proper configuration with BOINC.
I disagree. I have identical machines, and the run times are significantly different windows vs linux.
I don't think this is a BOINC-level issue. I think it comes down to how the OS manages CPU threads, and do they keep them assigned to the chiplets. If an MT task can keep the threads in the same chiplet, it is much faster than having to communicate over the bus to the other chiplets. Similar to when you have a multi-CPU net up, but now we are talking about the chiplets with the AMD CPUs. And I think windows is just bad at it.
____________
Reno, NV
| |
|
|
Setting affinity of threads is a part of the proper configuration. If you don't do it, you're betting that OS's default scheduler plan coincides with the layout of a particular chip. Things get even more complicated when you have NUMA architecture. | |
|
|
Setting affinity of threads is a part of the proper configuration. If you don't do it, you're betting that OS's default scheduler plan coincides with the layout of a particular chip. Things get even more complicated when you have NUMA architecture.
Yes, I think we are saying the same thing.
And I believe that linux is doing a better job with the defaults.
____________
Reno, NV
| |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
Less than 12 hours to go!
Challenge: Year of the Rat
App: 19 (SR5-LLR)
(As of 2020-03-16 16:59:14 UTC)
130492 tasks have been sent out. [CPU/GPU/anonymous_platform: 130427 (100%) / 0 (0%) / 65 (0%)]
Of those tasks that have been sent out:
8321 (6%) were aborted. [8321 (6%) / 0 (0%) / 0 (0%)]
870 (1%) came back with some kind of an error. [870 (1%) / 0 (0%) / 0 (0%)]
105087 (81%) have returned a successful result. [105023 (80%) / 0 (0%) / 64 (0%)]
16214 (12%) are still in progress. [16213 (12%) / 0 (0%) / 1 (0%)]
Of the tasks that have been returned successfully:
12614 (12%) are pending validation. [12609 (12%) / 0 (0%) / 5 (0%)]
92093 (88%) have been successfully validated. [92034 (88%) / 0 (0%) / 59 (0%)]
316 (0%) were invalid. [316 (0%) / 0 (0%) / 0 (0%)]
64 (0%) are inconclusive. [64 (0%) / 0 (0%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is n=3056908. The leading edge was at n=2968224 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 2.99% as much as it had prior to the challenge! | |
|
mikey Send message
Joined: 17 Mar 09 Posts: 1904 ID: 37043 Credit: 828,891,437 RAC: 735,790
                     
|
I have two 3970X machines. FWIW, I find that MT tasks run ~2x faster under linux than Windows. My guess is that it has something to do with the way the OS keeps MT tasks assigned to the chiplets. I think linux is just better at it.
On my 2990WX, Linux is at least 4x faster than Windows on these SR5 tasks. The big TR's really need Linux to perform at their potential.
hmm..
Thank you for the helpful information.
You can load Linux on a 120gb SSD drive in under an hour and test it out. I do that and just unplug the Windows drive and plug in a bare drive and load up a Linux distro I like for testing sometimes. In my case Windows 10 does not always like my older machines. | |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
Some friendly reminders... :)
At the Conclusion of the Challenge
When the challenge completes, we would prefer users "moving on" to finish those tasks they have downloaded, if not then please ABORT the WU's (and then UPDATE the PrimeGrid project) instead of DETACHING, RESETTING, or PAUSING.
ABORTING WU's allows them to be recycled immediately; thus a much faster "clean up" to the end of a Challenge. DETACHING, RESETTING, and PAUSING WU's causes them to remain in limbo until they EXPIRE. Therefore, we must wait until WU's expire to send them out to be completed.
Likewise, if you're shutting down the computer for an extended period of time, or deleting the VM (Virtual Machine), please ABORT all remaining tasks first. Also, be aware that merely shutting off a cloud server doesn't stop the billing. You have to destroy/delete the server if you don't want to continue to be charged for it.
Thank you! | |
|
|
Setting affinity of threads is a part of the proper configuration. If you don't do it, you're betting that OS's default scheduler plan coincides with the layout of a particular chip. Things get even more complicated when you have NUMA architecture.
Yes, I think we are saying the same thing.
And I believe that linux is doing a better job with the defaults.
This is what I am talking about. Same machine, dual boot. Same BOINC settings (HT off, 8 threads per task, 4 tasks at a time). No manual affinity modifications. bone = win10, l-bone = linux
P.S. In this case, linux is only about 40% faster. I have seen even more on other apps combined with different MT settings. But this was the example I have to share due to the current challenge.
____________
Reno, NV
| |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
The challenge is winding down, just thirty minutes left on the clock!
Challenge: Year of the Rat
App: 19 (SR5-LLR)
(As of 2020-03-17 05:32:48 UTC)
143426 tasks have been sent out. [CPU/GPU/anonymous_platform: 143354 (100%) / 0 (0%) / 72 (0%)]
Of those tasks that have been sent out:
10010 (7%) were aborted. [10010 (7%) / 0 (0%) / 0 (0%)]
939 (1%) came back with some kind of an error. [939 (1%) / 0 (0%) / 0 (0%)]
119000 (83%) have returned a successful result. [118929 (83%) / 0 (0%) / 71 (0%)]
13477 (9%) are still in progress. [13477 (9%) / 0 (0%) / 1 (0%)]
Of the tasks that have been returned successfully:
11598 (10%) are pending validation. [11594 (10%) / 0 (0%) / 4 (0%)]
106970 (90%) have been successfully validated. [106903 (90%) / 0 (0%) / 67 (0%)]
375 (0%) were invalid. [375 (0%) / 0 (0%) / 0 (0%)]
57 (0%) are inconclusive. [57 (0%) / 0 (0%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is n=3064654. The leading edge was at n=2968224 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 3.25% as much as it had prior to the challenge! | |
|
|
Aw man my last task came in just too late for the end :(
Congrats to Pavel and all others who participated!
____________
My lucky number is 6219*2^3374198+1
| |
|
Chooka  Send message
Joined: 15 May 18 Posts: 335 ID: 1014486 Credit: 1,312,549,885 RAC: 3,982,703
                         
|
Setting affinity of threads is a part of the proper configuration. If you don't do it, you're betting that OS's default scheduler plan coincides with the layout of a particular chip. Things get even more complicated when you have NUMA architecture.
Yes, I think we are saying the same thing.
And I believe that linux is doing a better job with the defaults.
This is what I am talking about. Same machine, dual boot. Same BOINC settings (HT off, 8 threads per task, 4 tasks at a time). No manual affinity modifications. bone = win10, l-bone = linux
P.S. In this case, linux is only about 40% faster. I have seen even more on other apps combined with different MT settings. But this was the example I have to share due to the current challenge.
Wow that's a huge difference!
I'm too lazy and not dedicated enough to run Linux. :D
____________
Слава Україні! | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,215,537,736 RAC: 1,264,686
                        
|
P.S. In this case, linux is only about 40% faster. I have seen even more on other apps combined with different MT settings. But this was the example I have to share due to the current challenge.
AMD Zen 2 uses NUMA processor architecture, which is more common in server-class CPUs. There are 4 cores in a CCX connected by a crossbar switch to the local portion of L3 cache. Each core in a CCX has equal access time to the L3 cache in that CCX. A chiplet contains 2 CCXs, and there are 4 chiplets in that CPU, connected by Infinity fabric (a bus is not as fast as a crossbar switch). Linux knows about NUMA architectures and aligns memory usage closer to the cores. I speculate that Windows (desktop version) does this relatively poorly for this architecture. If you reduce your thread count per task from 8 to 4, does performance become nearly identical between Windows and Linux? If so, then you should use Linux for tasks that run with more than 4 threads. However, I would rather stick to 4 threads to maximize the utility of that architecture, especially with HT off. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 7,565
                              
|
Consumer Zen 2 isn't NUMA, as that refers to how the memory is organised, not the CPU internals as such. The fragmented L3 and limited internal bandwidth is likely a contributing factor in demanding use cases, and this is where a monolithic design can make things easier.
I recall that AMD have worked with Microsoft in getting more consistent performance from Zen 2. I'm not sure what the latest on it is, but the general guideline is to make sure you motherboard has latest bios, get Windows to the latest version and patch level, and also install the latest chipset driver package from AMD. Even if you have updated it in the past, check again as they do update it from time to time. | |
|
|
Right. The NUMA thing becomes a problem with more than 64 threads. At that point, windows requires a second NUMA to run, which can impact performance where MT tasks might get split over multiple NUMAs. In this case, with HT off, there are only 32 threads and a single NUMA. Even with HT on, that is only 64 threads, which still fits in a single NUMA. The only Threadripper chip that has to deal with multiple NUMA is the 3990X, which is 64 cores / 128 threads, so two NUMAs.
____________
Reno, NV
| |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
Here are the final stats for the challenge, incredible job everybody! We managed to find TWO SR5 primes during this challenge, for a record total of FOUR THIS MONTH!
Cleanup is starting, it will likely take about 4-6 weeks.
Challenge: Year of the Rat
App: 19 (SR5-LLR)
(As of 2020-03-17 14:30:11 UTC)
143767 tasks have been sent out. [CPU/GPU/anonymous_platform: 143695 (100%) / 0 (0%) / 72 (0%)]
Of those tasks that have been sent out:
10131 (7%) were aborted. [10131 (7%) / 0 (0%) / 0 (0%)]
945 (1%) came back with some kind of an error. [945 (1%) / 0 (0%) / 0 (0%)]
119453 (83%) have returned a successful result. [119382 (83%) / 0 (0%) / 71 (0%)]
9523 (7%) are still in progress. [9523 (7%) / 0 (0%) / 0 (0%)]
Of the tasks that have been returned successfully:
8957 (7%) are pending validation. [8955 (7%) / 0 (0%) / 2 (0%)]
110057 (92%) have been successfully validated. [109988 (92%) / 0 (0%) / 69 (0%)]
395 (0%) were invalid. [395 (0%) / 0 (0%) / 0 (0%)]
44 (0%) are inconclusive. [44 (0%) / 0 (0%) / 0 (0%)]
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is n=3064654. The leading edge was at n=2968224 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 3.25% as much as it had prior to the challenge! | |
|
mikey Send message
Joined: 17 Mar 09 Posts: 1904 ID: 37043 Credit: 828,891,437 RAC: 735,790
                     
|
Here are the final stats for the challenge, incredible job everybody! We managed to find TWO SR5 primes during this challenge, for a record total of FOUR THIS MONTH!
The current leading edge (i.e., latest work unit for which work has actually been sent out to a host) is n=3064654. The leading edge was at n=2968224 at the beginning of the challenge. Since the challenge started, the leading edge has advanced 3.25% as much as it had prior to the challenge!
That 3.25% sounds like alot in a challenge like this. | |
|
|
Yeah! Considering that it's base 5 and not base 2, it is a lot indeed
____________
My lucky number is 6219*2^3374198+1
| |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
Cleanup status:
Mar 18: Year of the Rat: 7578 tasks outstanding; 4560 affecting individual (256) scoring positions; 4308 affecting team (39) scoring positions.
Mar 19: Year of the Rat: 4886 tasks outstanding; 2603 affecting individual (228) scoring positions; 2154 affecting team (24) scoring positions.
Mar 20: Year of the Rat: 2602 tasks outstanding; 1233 affecting individual (175) scoring positions; 704 affecting team (15) scoring positions.
Mar 21: Year of the Rat: 2499 tasks outstanding; 1150 affecting individual (172) scoring positions; 653 affecting team (12) scoring positions.
Mar 22: Year of the Rat: 1801 tasks outstanding; 751 affecting individual (136) scoring positions; 348 affecting team (9) scoring positions. | |
|
|
The last prime found was 207494*5^3017502 - 1 (addition, finder should be EXT64). /JeppeSN | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,215,537,736 RAC: 1,264,686
                        
|
Consumer Zen 2 isn't NUMA, as that refers to how the memory is organised, not the CPU internals as such. The fragmented L3 and limited internal bandwidth is likely a contributing factor in demanding use cases, and this is where a monolithic design can make things easier.
I see your point and agree with you. I was mistakenly thinking of the cache as part of the memory architecture but NUMA strictly considers the unreplicated parts of the memory (i.e. the RAM), whereas cache contains copies of data in RAM.
NUMA considerations aside, fetching data from another CCX's cache is slower than accessing data from the local cache. On top of that we don't know if hitting a CCX's cache with 8 threads saturates it, or if there is a penalty to local cores' access time when remote cores access a cache.
So I reiterate, I would like to see zombie try an experiment, comparing run time between Windows and Linux using 4 threads per task rather than 8. | |
|
|
So I reiterate, I would like to see zombie try an experiment, comparing run time between Windows and Linux using 4 threads per task rather than 8.
I am in the middle of trying to help a teammate hit a goal at seti before it shuts down at the end of the month. I will run this experiment after that.
____________
Reno, NV
| |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
Cleanup Status:
Mar 23: Year of the Rat: 1391 tasks outstanding; 557 affecting individual (120) scoring positions; 265 affecting team (8) scoring positions.
Mar 24: Year of the Rat: 897 tasks outstanding; 310 affecting individual (76) scoring positions; 187 affecting team (7) scoring positions.
Mar 25: Year of the Rat: 614 tasks outstanding; 197 affecting individual (55) scoring positions; 134 affecting team (5) scoring positions.
Mar 26: Year of the Rat: 337 tasks outstanding; 90 affecting individual (36) scoring positions; 20 affecting team (3) scoring positions.
Mar 27: Year of the Rat: 205 tasks outstanding; 49 affecting individual (23) scoring positions; 11 affecting team (2) scoring positions.
Mar 28: Year of the Rat: 106 tasks outstanding; 12 affecting individual (7) scoring positions; 4 affecting team (2) scoring positions.
Mar 29: Year of the Rat: 79 tasks outstanding; 4 affecting individual (3) scoring positions; 1 affecting team (1) scoring positions. | |
|
Michael Gutierrez Volunteer moderator Project administrator Project scientist
 Send message
Joined: 21 Mar 17 Posts: 376 ID: 764476 Credit: 46,631,160 RAC: 14,424
                 
|
The results are final!
Top 3 Individuals:
1. Pavel Atnashev
2. tng*
3. Sean
Top 3 Teams:
1. Czech National Team
2. Ural Federal University
3. Aggie The Pew
Very well done everyone! See you at Sophie Germain's Birthday Challenge! | |
|
|
Current Overall Standings updated accordingly.
____________
"Accidit in puncto, quod non contingit in anno."
Something that does not occur in a year may, perchance, happen in a moment. | |
|
|
So I reiterate, I would like to see zombie try an experiment, comparing run time between Windows and Linux using 4 threads per task rather than 8.
I am in the middle of trying to help a teammate hit a goal at seti before it shuts down at the end of the month. I will run this experiment after that.
Okay, here are the results. The first image is with 8 threads per core x 4 tasks. The second images is with 4 threads per core x 8 tasks. As before l-bone is the linux OS, and bone is the win10 OS. Same machine, just dual boot. HT is off in all cases.
For the run with 4 threads, I ran 8 tasks at a time, two sets of tasks, without any stopping in between. So the only time the tasks were running at less than 8 at a time, was only at the end of the second set. I only mention this, because I am not sure of the time when I ran with 8 threads.
Observations: Wow, Win does even worse. It really just doesn't know how to work the affinity thing. At least not yet. Also, I should have ben running this with linux on 4 threads for the challenge, not 8. :)
8 threads per task:
4 threads per task:
____________
Reno, NV
| |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1172 ID: 55391 Credit: 1,215,537,736 RAC: 1,264,686
                        
|
zombie67 wrote: I think it comes down to how the OS manages CPU threads, and do they keep them assigned to the chiplets. If an MT task can keep the threads in the same chiplet, it is much faster than having to communicate over the bus to the other chiplets. Similar to when you have a multi-CPU net up, but now we are talking about the chiplets with the AMD CPUs. And I think windows is just bad at it.
zombie67 wrote: Observations: Wow, Win does even worse. It really just doesn't know how to work the affinity thing. At least not yet. Also, I should have ben running this with linux on 4 threads for the challenge, not 8. :)
For this dataset Zombie's conclusion is supported by the measurements.
From the perspective of average throughput, highest to lowest for SR5:
Linux, 8 simultaneous tasks, 4 threads per task: 103.3 tasks/day
Linux, 4 simultaneous tasks, 8 threads per task: 80.9 tasks/day
Windows, 8 simultaneous tasks, 4 threads per task: 62.8 tasks/day
Windows, 4 simultaneous tasks, 8 threads per task: 56.7 tasks/day
From the perspective of "firsts per day" for SR5:
Warning: small sample size is small, but the relative ranking is fine
Linux, 8 simultaneous tasks, 4 threads per task: 77 firsts/day
Linux, 4 simultaneous tasks, 8 threads per task: 60 firsts/day
Windows, 8 simultaneous tasks, 4 threads per task: 34 firsts/day
Windows, 4 simultaneous tasks, 8 threads per task: 15 firsts/day
Since I was way off in my guess that Windows and Linux would perform about the same at 8 threads per task, I would be looking at what services are running on Windows at the same time as BOINC. If you can't control that, then the Linux setup is your crunching powerhouse. | |
|
robish Volunteer moderator Volunteer tester
 Send message
Joined: 7 Jan 12 Posts: 2223 ID: 126266 Credit: 7,959,544,636 RAC: 5,439,112
                               
|
zombie67 wrote: I think it comes down to how the OS manages CPU threads, and do they keep them assigned to the chiplets. If an MT task can keep the threads in the same chiplet, it is much faster than having to communicate over the bus to the other chiplets. Similar to when you have a multi-CPU net up, but now we are talking about the chiplets with the AMD CPUs. And I think windows is just bad at it.
zombie67 wrote: Observations: Wow, Win does even worse. It really just doesn't know how to work the affinity thing. At least not yet. Also, I should have ben running this with linux on 4 threads for the challenge, not 8. :)
For this dataset Zombie's conclusion is supported by the measurements.
From the perspective of average throughput, highest to lowest for SR5:
Linux, 8 simultaneous tasks, 4 threads per task: 103.3 tasks/day
Linux, 4 simultaneous tasks, 8 threads per task: 80.9 tasks/day
Windows, 8 simultaneous tasks, 4 threads per task: 62.8 tasks/day
Windows, 4 simultaneous tasks, 8 threads per task: 56.7 tasks/day
From the perspective of "firsts per day" for SR5:
Warning: small sample size is small, but the relative ranking is fine
Linux, 8 simultaneous tasks, 4 threads per task: 77 firsts/day
Linux, 4 simultaneous tasks, 8 threads per task: 60 firsts/day
Windows, 8 simultaneous tasks, 4 threads per task: 34 firsts/day
Windows, 4 simultaneous tasks, 8 threads per task: 15 firsts/day
Since I was way off in my guess that Windows and Linux would perform about the same at 8 threads per task, I would be looking at what services are running on Windows at the same time as BOINC. If you can't control that, then the Linux setup is your crunching powerhouse.
What distro? might have to switch :)
____________
My lucky number 10590941048576+1 | |
|
|
What distro? might have to switch :)
Linux Mint. I think it is based on Ubuntu.
____________
Reno, NV
| |
|
robish Volunteer moderator Volunteer tester
 Send message
Joined: 7 Jan 12 Posts: 2223 ID: 126266 Credit: 7,959,544,636 RAC: 5,439,112
                               
|
What distro? might have to switch :)
Linux Mint. I think it is based on Ubuntu.
Thanks I'm familiar with it. Might try making a dual boot box and run the same tests 👍
____________
My lucky number 10590941048576+1 | |
|
|
FWIW, I tried the same experiment with my intel machines, which include both single and dual chip configurations. I did not find any significant speed differences between windows and linux. I guess windows is smart enough to use affinity with intel chips. Maybe the Zen2 stuff is just so new, Microsoft hasn't updated windows yet.
____________
Reno, NV
| |
|
Message boards :
Number crunching :
Year of the Rat Challenge |