PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143411)
Posted 11 hours ago by Profile Michael GoetzProject donor
We haven't looked at it yet, but since 321 now takes half as long to check as it did before, the optimal sieving point for the 321 sieve should be half of what it was before.

Off the top of my head, we're almost certainly now past the revised optimal sieving point. We haven't run the numbers yet, but those of you prone to panicking about getting that next/last badge whenever something is shut down should probably start thinking about your plans for 321 sieve badges.

It's entirely possible that the official 30 day warning may be given in the very near future. On the other hand, this decision has not been made yet, and there's other reasons to keep the sieve running. So it's not 100% certain that we'll be suspending the sieve soon, but it's certainly possible.

Consider this to be the pre-warning to the 30-day warning for the suspension of the 321-sieve.
2) Message boards : 321 Prime Search : More than 60 hours to find a factor? (Message 143410)
Posted 11 hours ago by Profile Michael GoetzProject donor
I was looking through my stats and from a little over 3000 sieve tasks I found 55 factors or one in 63 tasks. Since each task now takes more than one hour that's 63 hours to eliminate one candidate.

321 LLR takes 8 hours on average, so I could have done 350 or more LLR tests instead of finding 55 composites.

I know the n-range of the sieve is much larger, but still it seems a waste of time to me. When LLR will reach the current n values of the sieve computers will be faster as well, so will they take 63 hours to LLR the same number?

Or am I missing something? I'm quite sure you guys have a good reason to keep the sieve running even now, but could someone please explain? :)


You answered your own question: We're not just sieving the candidates we're testing now, but also the candidates we'll be testing with LLR years from now. 321 is currently in the n=16M range, and we're sieving up to n=50M. Those tasks will take about 10 times as long to run on LLR as the current tasks.

So, yes, you found 55 factors, but by the end of the sieve file, you would be able to only 35 tasks on LLR rather than the 350 you could do today.

That being said, we just switched to fast double checks, which either makes LLR twice as fast, or the sieve twice as slow. When this is taken into account, we may already be at the point where it's time to stop sieving 321. More information on this aspect will be forthcoming once we decide what we're going to do.
3) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143385)
Posted 18 hours ago by Profile Michael GoetzProject donor
All seven of the projects where LLR2 fast double checking is enabled should now be primarily sending out tasks using fast double checking. There will be some stragglers with the old LLR for a while, because of resends, but the old tasks have all now been sent out.
4) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143372)
Posted 1 day ago by Profile Michael GoetzProject donor
The Initial (Large) tasks and Double check (small tasks) all show up as 1st. Is/was this expected?


Yes.
5) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143362)
Posted 1 day ago by Profile Michael GoetzProject donor
There seems to be quite a bit of confusion as to how LLR2 fast double checking works. I'm going to describe what the process is, so you have a better understanding of what to expect, and also what the drawbacks are. I'm going to skip all the math parts that make this possible, because it's not necessary to understand the "how" in order to comprehend the "what".


OLD LLR process:

Identical tasks are sent to two computers, which run the full, long, computation. A very short result is sent back to the server, and the results from the two computers must match to be validated. We're all familiar with this paradigm.


NEW LLR2 process:

Just one full task is sent out. At the end of the computation, the same (or at least similar) short result is returned to the server.

There's an additional new step, however. At various times during the computation, LLR2 is recording checkpoints to disk. These are fairly large, and a substantial number of them are recorded. This takes up disk space on your computer, so you'll need somewhat more disk space for LLR2 than you did for LLR.

At the end of the computation, all of those checkpoints are compressed, and the compressed checkpoints are sent to the server along with normal short result. The checkpoints are then deleted from your computer, freeing up your disk space.

Sending the compressed checkpoints to the server uses a lot of bandwidth, and then the checkpoint files use a lot of disk space on the server. When we built these servers, our applications did not consume a lot of bandwidth and did not use a lot of disk space. The current servers aren't designed for LLR2's requirements, so we have to manage LLR2's rollout very carefully, or PrimeGrid will essentially die.

Once the short result and the compressed checkpoints are sent to the server, your computer tells the server that the task is completed. Your task then goes into the "pending validation" state.

The server now uses the compressed checkpoints from your full task to create the fast DC task. It decompresses the checkpoints, does a moderately lengthy FFT-style computation (single core takes a few tens of seconds for a TRP), and creates the fast DC task in a new workunit. The large checkpoint files are then deleted, freeing up disk space on the server. The new fast DC task then gets sent to a user, gets quickly processed (it's less than 1% of the main task), and sent back to the server. The server checks this result against information from the main result, and if correct, both are validated and get credit. The fast DC's credit is proportionally smaller than the credit for the main task, of course.

From the server's perspective, there are three problems here:

* The bandwidth used sending the compressed checkpoints from the main task up to the server.

* The CPU time consumed processing those checkpoints in order to create the fast DC tasks.

* The potential for exhausting all of the server's disk space if we fall behind in creating the fast DC tasks.

Smaller, more numerous tasks are a larger problem than a small number of larger tasks. This is why we're moving the big tasks to LLR2 but the small tasks, at least for now, are staying with the old style full double checking.
6) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143359)
Posted 1 day ago by Profile Michael GoetzProject donor
Given that all of the work for DIV has been generated, would it make sense to remove, re-process, and move it to LLR2?


The state of DIV is currently being discussed. It's a big temptation to move it to LLR2 considering upcoming challenge, which will receive a 2x boost. On the other hand, the challenge is a problem - network bandwidth required to upload checkpoints and CPU resources to validate them in reasonable time may overload our current infrastructure.


One option would be to only send out the actual tasks during the challenge and then after it's over send out the checkpoint files. Since the checkpoint files are smaller they will be done very quickly and most people will get their credits for the challenge within a couple of days after it's over. There has always been 'cleanup time' after a challenge anyway so this would just be part of that. You could even put a 2 or 3 day time limit on the checkpoint tasks to ensure they get done asap.


That is the worst possible thing we could do.

The main LLR2 tasks generate large checkpoint files that must be sent to the server (that's the bandwidth problem), and then must be stored on the server until the server processes them and creates the fast double check task (that's the disk space problem). If the creation of the double check tasks is delayed -- either because we do it intentionally, or because the server falls behind -- those files start filling up our disks.

The expected number of tasks processed during the DIV Cullen/Woodall challenge will consume over three times the amount of free disk space that exists on the server. If we don't process those files and create the double check tasks promptly, the server dies. PrimeGrid would be completely shut down for an extended period while we clean up the fiasco.

Actually sending out the double check tasks, once they're created, and processing their results, have very little impact on the server.
7) Message boards : Number crunching : GPU queue (Message 143332)
Posted 3 days ago by Profile Michael GoetzProject donor
What a mess.

Ok, first: the limit for CPUs and the limit for GPUs are both increased to 1000. As you figured out, that's total. Not per core. It can't be made any larger than that or you may start having trouble reporting the tasks back to the server.

It used to be per core, but there's a bug in the BOINC software. Unless you set your "Max threads per task" to "unlimited", you don't get the per-core part. And, of course, you don't want to run your SGS tasks multi-threaded using ALL cores, so that's not a great option. You might, maybe, possibly be able to beat BOINC into submission by setting the threads to "no limit" and then using app_info to force it to run single threaded. Go ahead and try it if you want more than 1000 tasks. If you do, let us know how it works out.

Now I've got a question for you: Why do you want such a large queue for tasks like these? You're very likely to have almost all the tasks come in second, no?
8) Message boards : General discussion : You've found a huge prime, now what ? (Message 143327)
Posted 3 days ago by Profile Michael GoetzProject donor
I think if I found big one I would have it engraved on my headstone when I die. They better not get any digits mixed up or I'll come back and haunt them :)


I think they charge by the letter. :)
9) Message boards : Number crunching : LLR2 installed on all big LLR projects (Message 143326)
Posted 3 days ago by Profile Michael GoetzProject donor
greetings ... I have some 9.00 SoB wu's dowloaded ... so I guess there is none old, 8.04 work units ... left ...


Indeed, as of right now, SoB, TRP, 321, and PSP are sending out the new 9.00 tasks.

Woodall still has 287 old workunits to go. Cullen still has 13 to go. Finally, ESP has 241 to go.
10) Message boards : Number crunching : GPU queue (Message 143322)
Posted 3 days ago by Profile Michael GoetzProject donor
While we're there, I guess it's worth mentioning that the 300 cpu task limit also doesn't last very long with the shorter tasks and 8+ core cpus.


That limit is per core. It's 320 tasks per core. That's 2560 tasks for an 8 core CPU.

Likewise, the 100 task limit on GPUs is per GPU, but an RTX 3090 is still just one GPU. It can chew through 100 small tasks very quickly.

Was it changed during the challenge? I was only able to download 320 total even on a 48 core.


It's always per core.

It is possible that the number was lowered to less than 320, but that's usually only for the start of the challenge. Then it's raised back to normal.

I don't do the challenges anymore, so I'm not sure what was done during this last challenge.


it may have once been per core but it isn't any more:




I just tried it, and I couldn't get more than 320 SGS tasks either. Not sure what's up here. That's not the way it's supposed to be working.


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 2.35, 1.89, 1.78
Generated 19 Sep 2020 | 10:04:49 UTC