PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143768)
Posted 22 hours ago by Profile Michael GoetzProject donor
composite wrote:
Something's wrong for me too.


AAAAAAAAAAAAAAAAAAAAAAAAAAAAAH!

You're running DO directly, you're not using TSC. Oops.

What I said in the previous message still applies -- except it's your responsibility to make sure you have several GB of swap space available rather than Rytis's.

I recommend at least a 5 GB swap file.
2) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143767)
Posted 23 hours ago by Profile Michael GoetzProject donor
Something's wrong for me too. I've just had 2 LLR2 SR5 tasks fail on a DO droplet that has never had failures before - in fact it's already run several LLR2 SR5 tasks without issues in the last few days.

Does LLR2 have a memory leak? It keeps growing in memory usage as the computation continues - PrimeGrid website also tells me that the first task was killed due to too much RAM usage, and you can see on the left of this graph the moment that it was killed. With the 2nd one, again it was growing in memory usage, and then when it came to generating the files, you can see that it absolutely slammed the disk before suddenly saying that every file was missing. (link to the task: https://www.primegrid.com/result.php?resultid=1133440100)

It can't be a disk space issue because the amount of disk storage used remained steady at 12% the whole time. BOINC is allowed to use a full 25% of the disk's storage.


It turns out that our "fix" was actually always in place. At this point, we don't understand why your first task failed.

The behavior you see is correct for the second task on those charts. With SR5 (or GCW) to work around a problem in gwnum, we keep those large checkpoint files in memory as the calculation pregresses. That's why you see the memory ramping up. It's not a memory leak; it's by design.

On a 1 GB droplet, you *almost* have room to fit those files in memory, but not quite. So it starts using the swap file to hold the excess. That's fine because we're not actively using that data until the very end. You see this on the second task, where the memory levels out and there's disk activity as the swap file becomes active.
3) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143763)
Posted 1 day ago by Profile Michael GoetzProject donor
Until Rytis pushes the change, however, neither GCW nor SR5 will work on 1 or 3 core droplets.


The change is now in place, and all PG apps can be used on 1 and 3 core droplets. The only restriction is that you can only run a single GCW task at a time on the 3 core droplets. If you're running GCW on a TSC 3 core droplet, make sure to set "Multi-threading: Max # of threads for each task" to "No Limit" on PrimeGrid's Project Preferences Settings.
4) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143759)
Posted 1 day ago by Profile Michael GoetzProject donor
I am curious with the first graph - that the memory builds up to just before midnight and then just before 12 pm. Is there something about 12 / 24 hour clocks?


The first big drop is the first LLR2 SR5 task that failed. The moment where it drops is right where it got terminated for using too much memory. After that, the second build-up is the second LLR2 SR5 task, which also eventually failed after using around the same amount of RAM.

After SR5 switched to LLR2 the tasks grew to about 12 hours each (up from around 8 hours before), which is why the graph looks like it's running from midnight to noon - a 12 hour period. In fact the time shown in that graph is in my time zone, not the VPS's time zone, so the precise hours are merely a coincidence. :)

Typically, these droplets will have 3 cores and 1GB. If you’re running single threaded you have only about 300 MB per task. I’m not sure if virtual memory is enabled by default on these images. If not, the tasks will just fail if memory is exceeded.


In my case it's a single-core droplet, so one task has almost all of the 1GB RAM to itself. I didn't expect a single task to end up using what appears to be >800MB of RAM.

Pavel, thanks for the advice about b!=2 tests. For now I've switched the droplet to TRP and 321 LLR2, hopefully things run a bit more smoothly now. :)


<insert lots of admin chat here>

Rytis should be able to put in a fix for this tomorrow, hopefully. The reason DO/TSC is so inexpensive is, in part, because the droplets have very little memory.

Which we never needed.

Until now.

We have a workaround. It won't slow down the calculation. It will just work.

CAUTION: On a three core droplet, if you're going to run GCW, you MUST run -t3. These use a lot of memory and you can't fit three of them in 1 GB. With SR5, it uses the memory just for the cache points, and that's ok. You should be able to run 3 of them in 1 GB after the fix. But not GCW.

Until Rytis pushes the change, however, neither GCW nor SR5 will work on 1 or 3 core droplets.
5) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143749)
Posted 2 days ago by Profile Michael GoetzProject donor
This is normal behavior. As mentioned before, we had to turn on caching of intermediate checkpoints in memory for GCW and SR5. Due to radix conversion issues of b!=2 numbers, file reading is extremely slow (for now). Checkpoints are stored in memory in ready-to-use format, so they can be accessed fast. Note that they're not needed until the end of the test (until the "compression" stage), so it's okay if they're swapped out to disk.

Please increase BOINC memory limits if you're doing SR5 or GCW.


Typically, these droplets will have 3 cores and 1GB. If you’re running single threaded you have only about 300 MB per task. I’m not sure if virtual memory is enabled by default on these images. If not, the tasks will just fail if memory is exceeded.
6) Message boards : Number crunching : How to compare performance for a particular sub-project (Message 143738)
Posted 2 days ago by Profile Michael GoetzProject donor
I'm curious to see the (near) median values for cpu_time and elapsed_time for this example.
select MIN(cpu_time) FROM (SELECT TOP 50 PERCENT cpu_time from result r join host h on r.hostid=h.id join workunit w on r.workunitid=w.id where r.appid=13 and server_state=5 and outcome=1 and validate_state in (0,1,4) and !(w.opaque & 0x4000) and h.p_model like '%3970x%');
select MIN(elapsed_time) FROM (SELECT TOP 50 PERCENT elapsed_time from result r join host h on r.hostid=h.id join workunit w on r.workunitid=w.id where r.appid=13 and server_state=5 and outcome=1 and validate_state in (0,1,4) and !(w.opaque & 0x4000) and h.p_model like '%3970x%');



MySQL does not support TOP.
7) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143728)
Posted 2 days ago by Profile Michael GoetzProject donor
One of my hosts has problems with LLR2 Extended Sierpinski Problem v9.00.

Crunches wu but then comes File transfer errors. Proof of computation transfers?

http://www.primegrid.com/result.php?resultid=1130877910

<file_xfer_error> <file_name>llrESP_345876363_0_r1102069354_0</file_name> <error_code>-161</error_code> </file_xfer_error>


It could be old Boinc client or old Linux... Havent tested yet in others LLR projects.


Are you running out of disk space? Not just actual space on the disk, but the amount of space you've configured BOINC to use? LLR2 uses a lot of disk space compared to the old LLR.
8) Message boards : 321 Prime Search : 321 Sieve is being SUSPENDED (Message 143686)
Posted 4 days ago by Profile Michael GoetzProject donor

We will need to restart the CW sieve at some point.

that would be a disaster.


Why would it be a disaster?
9) Message boards : Number crunching : How to compare performance for a particular sub-project (Message 143675)
Posted 4 days ago by Profile Michael GoetzProject donor
I have particular problems with SOB, I am unsure as well if my setting are in the best config. I have 3 GPUs and tasks between each card (all 2070supers) seem to vary quite a bit as well. My system acts funny when I turn HT off, I am running ubuntu, and when I turn HT off in Bios, funny things happen. Generally, 50% cpus gets me the best result, I found that SOB runs best using 8 threads per task, where as most other subprojects the magic number is 4 threads per task


+---------+---------+----------+-------------------+-------------------+ | userid | hostid | count(*) | avg(cpu_time) | avg(elapsed_time) | +---------+---------+----------+-------------------+-------------------+ | | | 18 | 7276841.777777778 | 895767.7073937221 | | | | 26 | 5213018.423076923 | 639105.9981281538 | | 1257095 | 1002364 | 4 | 801615.3 | 105434.445987 | +---------+---------+----------+-------------------+-------------------+


(SQL: select r.userid,r.hostid,count(*),avg(cpu_time),avg(elapsed_time) from result r join host h on r.hostid=h.id join workunit w on r.workunitid=w.id where r.appid=13 and server_state=5 and outcome=1 and validate_state in (0,1,4) and !(w.opaque & 0x4000) and h.p_model like '%3970x%' group by r.userid,r.hostid;)

Considering SOB times vary depending on the K, I'm not sure if there's any useful information you can get from that data. It excludes the fast DC tasks. Those would really mess with then numbers!

EDIT: I incorrectly removed the fast DC tasks before. The numbers now are correct. The sample size is larger but the times didn't change much.
10) Message boards : Number crunching : How to compare performance for a particular sub-project (Message 143651)
Posted 5 days ago by Profile Michael GoetzProject donor
Let's say I want to crunch SOB. And I want to compare my computer's performance against the rest of the field for that sub-project. Specifically, I want to see what machines are fastest, and then drill down to see their configs (and threads per task). Is there a way to show all results and filter by sub-project?


Unfortunately not.

I can't even do that manually querying the database. At least not if I want to get solid, reliable results.

The reason I don't even try to get that information for myself, let alone make it available to users, is that the signal to noise ratio is so low as to make it almost useless.

While I can see the CPU models, and with some difficulty can see how many threads were used, we don't have any information about hyperthreading, what else is running on the computer, clock speeds, memory speeds, etc.


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 3.37, 3.35, 3.28
Generated 30 Sep 2020 | 10:51:48 UTC