PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Number crunching : Winter Solstice Challenge (Message 123950)
Posted 1548 days ago by River~~
.... Is there an easy way to limit each task to one physical processor? I am not very skilled at linux so the word easy is important. The only way I would know how to do this is to just disable one of the sockets in BIOS and then run the task at 16 threads.


I have started a new thread to answer this question.
2) Message boards : Number crunching : Setting core affinities: Linux, GFN multi-threading (Message 123949)
Posted 1548 days ago by River~~
k4m1k4z3 asked in the recent challenge thread

.... Is there an easy way to limit each task to one physical processor? I am not very skilled at linux so the word easy is important.


I thought my answer is probably too detailed to be regarded as "on-topic" where the question was asked.

Do you have the taskset command installed? (enter the command and see if it gives you brief usage instructions, or if you get an error) (NB tasksel is different -- don't accept that as an alernative if it is offered!)

If it is not installed, Ubuntu and some Ubuntu-derivatives will tell you what package to install -- but on Debian / Ubuntu / Linuxmint try this before you start looking the hard way

apt-get install util-linux schedtool


Having installed the relevant commands, the next hardest thing is to figure out which apparent cores are doubled up into a physical core. On a four-core machine with HT on then logical core 0 and 4 are the same, 5 and 1 are the same, cores 6 and 2, etc. I do not know how that works with two sockets. Perhaps someone else can enlighten us? With a multi-socket motherboard does that depend on the motherboard as well as on the processors?

This Page might help you find out how the numbering works on your system. It refers to a command called lstopo, and you might need to figure out which package includes that, it it is not already installed. Good Luck with that!


In the following I will be assuming that adding (subtracting) 20 to (from) the number of a virtual core brings you back to the same physical core. I am imagining cores 0-9 are on socket A, 10-19 on B, then 20-29 back on socket A as the and 30-39 on socket B again. But there are other possibilities.

Assuming you have figured out which core numbers share a physical core, the rest is straightforward.

My suggestion is to force each Genefer thread into its own core, and make it the higher number of the pair. So I would want to force the genefer tasks onto logical cores 20 upwards (if numbering was as I am guessing).

When genefer is already running, from the command line, run
top -H


This shows the prcsess numbers of each thread and its cpu usage, among other things. If app_config said 10 threads, you will likely see 11. This is because the -nt count does not include an "overhead" thread that is always there in practice but doesn't use any cpu. You should see 10 of them each using 99% or more cpu - those are the process numbers you would want to change. Press q to exit, and the info you wanted from top usually stays on screen.

The easiest syntax for the taskset comment is

taskset -pc 20 <pid1> taskset -pc 21 <pid2> ...


where 20, 21 etc are the core numbers you want to run each thread and <pidn> the PID shown by top.

For the remainder of that linux thread, it will only run on the specified core. The Linux kernal will notice that that core is busy and will not run anything else alongside unless it has already filled all the cores.

Each thread will retain its local context in the local cache of its own core. This affinity setting will last over the task being suspended in memory, but will be lost it the task is suspended to disk, or if it restarts from checkpoint.

BAD NEWS

you need to do this for each new CPU task, after it starts. Without altering the genefer threading code there seems no *easy* way to automate the affinity setting. There is no urgency about this: the taskset command gives you a small extra edge but not an enormous one. Do the taskset stuff when checking the machine, but (I suggest) don't get up in the night specially to do it.

Bonus info

The above means that each genefer thread stays in its "own" hypercore for the rest of its life, eliminating migrations that destroy the cache.

We can do even better: we can ensure that nothing else runs in that hypercore (it might run in the same physical core of course). To do this we confine the Linux kernel to using only one apparent core out of each physical one. If we never issue a taskset command, this will be just like disabling HT in BIOS, except that we retain the ability to move task threads into the excluded zone.

You need to edit a system file as root. First make a backup copy of it as it is,

sudo cp /etc/default/grub {,-bkp}


then open an editor, either

sudo gedit /etc/default/grub


or try "nano" or "xed" instead of gedit. CLI gurus will want to use vim or emacs.

Find a line that starts GRUB_CMDLINE_LINUX_DEFAULT, and it should already have a string in quotes. Insert a space then the following just before the closing quote

isolcpus=20-39


This "isolates" logical cores 20-39 from the Linux scheduler, but allows us to use them via taskset.

To include this in your boot code, from the command line run

sudo update-grub



I hope that helps: how easy it is depends on your previous experience, but I have made it as easy as I know how.

Expect at best a 1% improvement -- as has been said, even without this fine-tuning the Linux scheduler gets it right most of the time, and xii5ku says it is not worth his while to do this. In my case I enjoy tweaking so I did it.

It did not affect my challenge standing: I would have been able to submit exactly the same number of WU without, that 1% was not enough to squeeze an extra qualifying task in.

With one excpetion, the above should not make things worse.

If the performance drops drastically, most likley thing is that you have misunderstood which apparent cores are paired up -- and I can't really give you any help with that.


Hope that is useful to someone
3) Message boards : Number crunching : Winter Solstice Challenge (Message 123947)
Posted 1548 days ago by River~~
I decided to see how fast the multithreaded CPU tasks are taking. In each chart below, p_model is the CPU type, count is the number of validated tasks, ...


Yes those are interesting results. Shows a huge decrease in end-to-end time but without having single-threaded from the same machines we don't know if throughput was better, worse, or about the same. Of course for a challenge the end-to-end time matters.

I think we can say the multi-threaded concept is hereby validated

I am fairly sure from those tables that I had the slowest machine to get a qualifying WU ;)

(Mine is the m3 in the 2-threads table with quoted clock of 0.9GHz - not only slowest but I think the only sub Ghz)

What seemed odd for a moment is that my other machine, which was fast enough to get a massive TWO tasks in does not show up at all. Turns out its first WU was returned before the tables were produced and at that time had not validated, and its second WU was still in progress.

But I got the coveted "(1st)" marker on that one -- not something I had even hoped for.

My ancient desktop with 2.6GHz clock is old enough not to have AVX and would not have got anything in.

4) Message boards : Number crunching : Winter Solstice Challenge (Message 123699)
Posted 1554 days ago by River~~
I have noticed a consistent near-coincidence throughout this challenge, so far.

I wonder if it will go away now I mention it.

At the time of this post there are just under 400 different individuals on the scoreboard (392 to be exact) and the current top scorer, Ryan Propper, has returned either 401 or 402 workunits, judging by their score and the largest/smallest WU scores I have seen.

This means that over the challenge so far, the servers have been receiving work back from first-time reporters at almost the same rate as receiving work from Ryan's 56 machines.

That is just a quick look at the figures. After the challenge is over I might do a further analysis about the distribution of work over the scoreboard. Dividing competitors into cohorts of the number of WU returned, which cohort achieved most?

Let me know (here or PM) if you would be interested in that?

Intuition suggests it would be neither Ryanat the top, nor the cohort containing me near the bottom, but somewhere mid table.

Warmly
R~~
5) Message boards : Number crunching : Winter Solstice Challenge (Message 123608)
Posted 1556 days ago by River~~
Michael Goetz wrote:
...
I also added <report_results_immediately/>, which goes outside of any <app> or <app_version> block. You only need this once. I'm not actually sure that this is necessary, but it won't hurt. This too is set on the server, but if the BOINC client is ignoring the server's <fraction_done_exact/> perhaps it's also ignoring the server's <report_results_immediately/>. By including it explicitly you can insure it's turned on.


I am using app config to run GFN multithreaded, and without the <report results immediately> flag.

My first CPU GFN of this challenge completed computation at 01:07:20 and by 01:07:27 had been reported. My impression is that this is about as "immediate" as the upload and report process ever gets.

Boinc client versions may vary on how they treat the interaction between server settings and app config file settings -- the above was with client v 7.6.33 on Linux, if that makes a difference.

As you say, no harm in making the flag explicit, and it's a wise precaution. But, like, don't panic if you left it out...

In other news:

I got up to check at 2am and found it had uploaded almost an hour before, and am now going back to bed happy that I achieved my basic goal of getting at least one task onto the scoreboard, and happy that it completed sooner than I expected (Current projection is that I will have a total of 3 across 2 computers).

And delighted that my laptop with its passive-cooled CPU has beaten a wingmate with a task on a GPU who is yet to return their task. Unexpected surprise there.

There are currently just over 1/3 way through and just over fifty of us at the foot of the scoreboard with one task each (all slightly different scores of course), or about one in six individual participants. If I do get a total of 3 tasks by full time, I am likely to have some company in the three-task zone :)

I have put in a delay before the next task starts, hoping to get a higher b and therefore a few extra points to put me nearer the top of that zone. Tricky decision is how much delay: push it too far and one task will fall off the end, and I have no feeling for how the times will increase.

I will achieve this delay by running a script to allow new work at a preset time. (linux "at" command runs script at appointed time, script calls boinccmd).

Sporting good wishes to fellow CPU crunchers
6) Message boards : Number crunching : Best FREE version of Linux (Message 123587)
Posted 1556 days ago by River~~
Hope it is helpful to add my two penn'orth at this late stage -- I only just saw this thread and thought I would share my experience from 2017.

Another twist to consider is to install Linux cli only (in my opinion the Debian net install disk is the way to go on that).

Install boinc-client rather than boinc. If you install the full boinc into a cli machine then it pulls in a lot of graphics stuff.

Edit the boinc config files to allow rpc access from other machines on your local network (you put a password into one file, and a list of IPs into another, if I remember right).

Then in Boinc Manager from your Windows machine on the same LAN, you can open another Boinc window and attach computer.

Depending on the Linux distro, it often works right away. In some cases you will also need to remove firewall protection that the distro devs thought you would like.

Only downside is that it is so much like running Boinc on the same machine that sometimes you forget which client this Boinc Manager window is talking to.

A final option is to download and run Boincview (which has not been updated for years, but still runs OK under Windows, and runs reasonably under Linux using Wine). Boincview is a GUI specifically designed to display and control up to a dozen or twenty crunchingboxes, one of which might or might not be the local host. It used to be a pain to read the small print and tiny icons, but modern monitors remove that problem.

In the TDP 2017 I was running eight identical PCs, none of them with GUI, all of them controlled form a Boincview running on a laptop. And I also ran that hardware when we had an AP challenge that year, and one of my boxes found an AP and altogether they were wingboxes for about ten more.

If I remember right, those boxes were about 3% faster running without graphics as compared to running Linux with Graphics. Windows was not considered because I did not have access to eight Windows licences (that might or might not matter to you -- but with Linux legal and free my thought was why break the law?).

I tell a lie: the boxes did have stickers entitling me to run Vista on them...

R~~
7) Message boards : Number crunching : Output File Absent (Message 123583)
Posted 1557 days ago by River~~
Just swapped a GTX 970 from one PC to another, tried to run GFN and I get an error after 2 seconds:

Output file genefer16_20259418_0_r1191446169_0 for task genefer16_20259418_0 absent

Both PC's were doing GFN before the swap with no problems.

Any suggestions?

Thanks


No suggestions, but to comment I have seen this on CPU tasks as well

The output file is absent because the task fails before it begins to produce any output, and it is the boinc client that produces that message when it cannot find the file to send back.

So it could be almost any error that is almost immediately fatal at the point where control is passed to the PG code.

In the case of graphics cards my initial guess would be drivers?

More detail:

On CPU tasks I have seen it twice. Once it was triggered by having grossly insufficient memory allocated to the virtual machine running BOINC, and the other time it was an as-yet unidentified effect that only occurs when the Xen hypervisor runs a virtual machine using the Qubes OS and then only on PPS sieve. There were too many interacting bits to properly identify the actual trigger.

For your sake I hope it is drivers, as it is reasonably straightforward to check which ones were involved with each card before and after the swap

Don't waste time (as I did) looking for the output file.

Do look at the stderr listing on the website for your task -- there is a slim chance it might tell you something helpful.

R~~


8) Message boards : Number crunching : Some T5k stats (Message 123581)
Posted 1557 days ago by River~~
A lot of work went into this! I appreciated it silently at the time, but thought I would look backat it today.

It is cheeky of me to ask, but would you be willing to do a similar analysis around the end of the year?

How far the T5K boundaries have shifted in 2018 would (I think) interest many people.

R~~

(PS: of course, a valid response is that if I want this I should do it myself...)
9) Message boards : Number crunching : Winter Solstice Challenge (Message 123580)
Posted 1557 days ago by River~~

I had five boxes reboot overnight, grrrr, thankfully no driver issues........ (I hope).


I hear ya. One here only, but I woke up and was like why is it so cool in here? GPUs were idling not crunching.


Sincere commiserations to everyone who has been slowed down by this. Penguin posted about 48hiurs into the challenge -- and I just picked his post to quote at random.

I can afford to be sincere, as a CPU-only participant in just one day anyone with a GPU will be way ahead of me, so far that it makes not difference.

This strikes me as being like Ferrari doing a recall on its racing cars half way through Le Mans. ;)

Microsoft does not help its own machines by leaving you with no wiggle room on updates. (And yeas I do understand why they think that is a good idea)

When I was perfomance testing on the 9th, to see how many threads to run concurrently, I started to do the same tests on Windows. OK OK I am a Linux fanboy, but if Windows had run faster I would have used it for the challenge, I am not that one-track-minded

The first machine I tried Windows on locked me out for windows update (probably that gradual roll-out that stream mentioned, because it was a long time. After twenty minutes of configuring updates and no clue how much longer, I forced a reboot of the same hardware running Linux and that is the slower of the two machines I have in the challenge.

After that experience I did not even try to test GFN21 on the faster Windows machine -- that is representing me and Linux too.

My suggestion for the next challenge is to dual boot your machine. Get BOINC running under each OS.

If Windows is your preferred OS, or if it runs faster for the tasks you are running, go with that.

But you will have a tested, working alternate system to fall back on if the Windows BOINC falls over, whether that is caused by Microsoft, or by a virus, or whatever.

(Ditto if it is a Mac -- get it ready to run under Linux as an alternative)

If you feel inclined to take my advice, one further annoyance is that Windows Update usually prevents you from booting into installed alternatives.

My tip is that immediately after you install the dual boot, as soon as you are in your installed Linux system, from the command line run grub-install again, this time installing onto a dedicated USB. This will only install a very few files onto the USB, but will set up the bootloader.

That means you can get back into Linux when Windows Update next deletes the Linux boot info. WU does not trash your whole install, just the bootloader, and that USB gets you back in.

hope that seems useful.

R~~
10) Message boards : Number crunching : Please consider greater variability in deadlines at the "short" end (Message 123579)
Posted 1557 days ago by River~~

Once-a-week will be even worse on long tasks, since they'll probably be only be running a few hours a week. I lump them in the same category as once-a-month or once-a-year: usually they're not going to complete anything. [quote]

I did not explain my point well at all.

I think you took me to mean machines that only run for a few hours once a week -- and in that context I agree with your response.

In contrast I was thinking of machines that are crunching 24/7 but for one reason or another only connect to PG servers once a week. Including machines that crunch PG 24/7 (like mine at present)

Most of the points towards my SOB badge were generated on machines that were "on" 24/7 but "offline" re the internet other than sporadic connections. Used as background heating in an outbuilding, as I preferred to use the electricity to do something else as well as to keep the frost off. (In really cold weather additional electric heat was deployed). Through the quirks of thermodynamics it costs no more power to deliver so many watts heat with crunching on the side as it costs to run a dumb fan heater.

My first ever distributed computing contribution was from a machine that took 120 days to crunch its first task (CPDN, in its pre-BOINC incarnation) and was connected at weekends to trickle up.

My two computers currently taking part in the Winter Solstice Challenge are going to be connecting at around four occasions during the challenge -- right now they are running offline with trickles pending, but I won't be connecting them till one of them has a task to return. Then they will connect when each has done all it can complete during the challenge.

They are crunching throughout the challenge even though you will not see trickles every day.

It would be easy (I am guessing) to distinguish that from a machine that has run PG for a few hours once a week.

Reflecting on this use case, does your opinion change? Or do you still feel that this use case is sufficiently rare to warrant making changes?

[quote]It's up to the user to run tasks that are appropriate to his computer and lifestyle. It's simply not possible for us to accommodate *everyone*.


Absolutely. And if something is not worth doing, then it does not matter how easy it might be. I do get that.

R~~


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2023 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 1.61, 1.76, 1.93
Generated 21 Mar 2023 | 14:26:19 UTC