Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummerslowrise

Message boards :
Sieving :
Questions about sieving process..
Author 
Message 

1. Am I right in assuming that you are looking for lowest possible p for a given n, or is there only one possible combination, or does the p even matter? If lowest, then I am confused as to why the woodall sieve has been sieved with so high a p before the low range p was sieved...? Won't you miss lower p's for the n's?  Or am I misunderstanding the goal entirely, and that's not to identify the p's at all, it's merely to remove the n's from the sieve file once a possible p is discovered?
2. Does the Sieve time per p decrease slightly with each factor discovered? (and subsequently smaller sieve file..)  

JohnHonorary cruncher
Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0

1. Am I right in assuming that you are looking for lowest possible p for a given n, or is there only one possible combination, or does the p even matter?
Basically yes, lowest p that factors n*2^n1 (Woodall) or n*2^n+1 (Cullen). Lowest meaning that we started out at 1 and are working our way up towards optimal sieve depth.
Minimally, optimal sieve depth means sieving to a point in which the time it takes to find a factor p is the same as the time it takes to LLR test an n. There are other reasons to sieve higher/deeper...mainly error rates in the LLR testing. i.e. LLR returning an incorrect result...which happens. So that's why double checking an LLR result can be a good thing.
Finding a factor in the sieving processs, we definitely know that that n is not prime.
If lowest, then I am confused as to why the woodall sieve has been sieved with so high a p before the low range p was sieved...? Won't you miss lower p's for the n's?  Or am I misunderstanding the goal entirely...
If you're referring to the Woodall 2M sieve file where 1490G1500G was sieved after 1090G, that was just testing.
At p=1500G, it was taking twice as long to find a factor as it was to test n=2M. Also, PG was ready for new work so we decided to stop sieving at p=1090G and start on the 10M range. Was it optimal, no. But it was a good depth.
Yes, there are lower p's in the 1090G1490G that would have removed n...and if we had more time, we would have sieved to around 1200G.
2. Does the Sieve time per p decrease slightly with each factor discovered? (and subsequently smaller sieve file..)
Yes, we have found out that sieve time is related proportionally to the size of the sieve file...number of n remaining.
Therefore, the smaller the sieve file (less n), the faster the sieve.
Currently the sieve files are split up as so: 1<n<2M, 2M<n<10M, 10M<n<25M. This will more than likely change as we do our best to stay ahead of LLR testing. Right now, we're behind. We may have to break out 2M<n<3M and sieve it separately.
Also, we'll look for more appropriate break points.
Ideally we'd like to sieve one file and break off the front of it as needed for LLR testing by PG. This may never happen but it's a goal. Right now, n=10M is a lofty goal and will give us plenty of time to decide the best course. The 25M sieve will have to wait until then.
____________
 


So then, to *FIRST* sieve 110M140M, and then going back and sieve (for the first time) 0110M using the sieve file from 110M140M would be improper?
Case in question :
current sieve file and corresponding gcwsievecommandline file: woodall10M_p170Gplus520G.zip (291767 candidates remaining at p=170G including 200G520G factors)
Yet, 192200G has not been sieved yet... I'm not understanding why the results from the 200G520G sieve are being removed from the list of candidates for 192200G....?  

JohnHonorary cruncher
Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0

So then, to *FIRST* sieve 110M140M, and then going back and sieve (for the first time) 0110M using the sieve file from 110M140M would be improper?
Case in question :
current sieve file and corresponding gcwsievecommandline file: woodall10M_p170Gplus520G.zip (291767 candidates remaining at p=170G including 200G520G factors)
Yet, 192200G has not been sieved yet... I'm not understanding why the results from the 200G520G sieve are being removed from the list of candidates for 192200G....?
We are working with one master sieve file, Woodall10M (2000000<n<10000000), and running factor ranges against it. As we run factor ranges (p; example 200G520G or 200000000000520000000000) it removes candidates (n) from the master sieve file in which sieving finds factors for.
Since p=200G520G was already completed, I decided to remove the candidates (n) in which factors were found, from the master sieve file...so the sieve would run faster. 11033 candidates (n) were removed from master file Woodall 10M which represented about 4% decrease in candidates.
edit: factor ranges 192G200G will still find factors for the remaining n.
____________
 


So then, to *FIRST* sieve 110M140M, and then going back and sieve (for the first time) 0110M using the sieve file from 110M140M would be improper?
Case in question :
current sieve file and corresponding gcwsievecommandline file: woodall10M_p170Gplus520G.zip (291767 candidates remaining at p=170G including 200G520G factors)
Yet, 192200G has not been sieved yet... I'm not understanding why the results from the 200G520G sieve are being removed from the list of candidates for 192200G....?
We are working with one master sieve file, Woodall10M (2000000<n<10000000), and running factor ranges against it. As we run factor ranges (p; example 200G520G or 200000000000520000000000) it removes candidates (n) from the master sieve file in which sieving finds factors for.
Since p=200G520G was already completed, I decided to remove the candidates (n) in which factors were found, from the master sieve file...so the sieve would run faster. 11033 candidates (n) were removed from master file Woodall 10M which represented about 4% decrease in candidates.
edit: factor ranges 192G200G will still find factors for the remaining n.
So... ok.. it doesn't really matter what order they are removed in, those numbers are discarded.. it's the remaining numbers that we really care about, correct?
(sorry, I know very little about what we're doing here, I just wanted to help out cuz this is 'the boincstats coproject')  

JohnHonorary cruncher
Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0

So... ok.. it doesn't really matter what order they are removed in, those numbers are discarded.. it's the remaining numbers that we really care about, correct?
That's correct. We continue to search for factors to the remaining n.
Also, a factor p could easily remove a small n as well as a large n...and vice versa. For example:
factor 5172511811 removes 939813*2^939813+1
factor 5170252987 removes 1976440*2^1976440+1
I even received this today from someone doing other testing:
factor 307653395464509967970123309 removes 251*2^251+1
Look how LARGE the factor p is and how small the n is. He actually removed our smallest n in the Cullen 2M file. While it's good to have this factor, an LLR test of 251*2^251+1 would take only milliseconds...so not a major removal from the sieve file. However, it is a VERY NICE factor!
(sorry, I know very little about what we're doing here, I just wanted to help out cuz this is 'the boincstats coproject')
No need to be sorry. I'm receiving a lot of technical support and advice from others who are much more knowledgeable than me.
BTW, thanks for contributing to the sieve. We need all the volunteers we can get and your help is greatly appreciated.
____________
 


After reading this whole Article I want to thank you John so much for answering questions that Aaron asked that I had as well. You help so much my friend. I just donated because of how nicely this community helps each other and I wanted to do my part. Thanks again my friend :)  


Basically yes, lowest p that factors n*2^n1 (Woodall) or n*2^n+1 (Cullen). Lowest meaning that we started out at 1 and are working our way up towards optimal sieve depth.
Minimally, optimal sieve depth means sieving to a point in which the time it takes to find a factor p is the same as the time it takes to LLR test an n. There are other reasons to sieve higher/deeper...mainly error rates in the LLR testing. i.e. LLR returning an incorrect result...which happens. So that's why double checking an LLR result can be a good thing.
Hi John (and Rytis)
Surely this is key to the process. It seems to me that the rate at which we are finding factors is not nearly high enough and the depth to which we are sieveing is not enough to justify the relative time taken to LLR test. Should the focus not be on extending the depth of the sieving?
____________
Warped
 


In case there is a misunderstanding, what I mean by the above post is that we should sieve more. That is more deeply as this would be more efficient.
In the very short time I've been involved, I've crunched well over 300 sieve tasks and found only 2 factors. This seems to be a very small ratio relative to the rate at which the remaining tasks will yield primes (most likely nil). The time taken to crunch a sieve unit is very short in comparison to the time taken to run an LLR task.
The only explanation (in my extremely limited understanding) would be that the sieving work has been broken up into smaller blocks of p and I've come in at the time when the "deep end" and therefore least productive part of the sieve range is being crunched.
If this is the case, could the sieve range not be extended? I would expect this would result in less load on the servers and bandwidth.
____________
Warped
 

JohnHonorary cruncher
Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0

Should the focus not be on extending the depth of the sieving?
The simple answer is yes, the primary focus should ALWAYS be on sieving. :)
If you look at PrimeGrid's Projects post, you'll see that 3 of 4 priorities are sieving. The one that's not is just cleanup work that will be done hopefully in 45 days.
Yes, in an ideal situation, we'd sieve to "optimal" depth and then begin LLRing. However, that can become long and tedious. Throwing a little splash of "prime finding" fun in helps keep the process interesting and moving forward.
The current 321 LLR effort is on a 1<n<5M sieve file that was sieved to a depth of p=805T which is very close to “optimal”. The current 321 Sieve effort is for 5M<n<25M and we want to reach at least p=805T; “optimal” will be quite a bit further.
The PPS LLR is only in its testing phase. It can be stopped at any time if sieving slows down too much. At the current sieve depth of p=65T, the rate is one factor every 320 seconds. On the same machine, the LLR rate for one candidate is 50 seconds. Each factor found removes two LLR tests since we are double checking; so basically one candidate is worth 100 seconds to LLR.
Therefore, LLR is removing candidates 3.2 times faster than sieving. While this may look nice, it definitely does not mean we should stop sieving. Since we plan on testing up to n=5M, we still have a long way to go. The “optimal” depth is around 43,000 seconds per factor. We expect to be in that ballpark somewhere around p=800T.
The GCWsieve is advancing nicely as well. So much so that it is close to "optimal" for 32 bit machines...64 bit machines can still continue. This sieve file includes 5M<n<10M.
The PSPsieve is managed by the PSP project. This sieve file includes 15 k's for 991<n<50M. Since they are checking up to n=50M (if necessary), this sieve is far from "optimal". However, at the current n for LLRing, it's in a good place for SoB and an excellent place for PSP.
BTW, increasing the sieve WU range will increase the factors found per WU. However, it will not affect the rate in which they are found.
It’s a balancing game, and we will do our best to minimize any inefficiency.
The only explanation (in my extremely limited understanding) would be that the sieving work has been broken up into smaller blocks of p and I've come in at the time when the "deep end" and therefore least productive part of the sieve range is being crunched.
Yes, that is the case. :)
If this is the case, could the sieve range not be extended? I would expect this would result in less load on the servers and bandwidth.
Again, it's a balancing game of "feeling productive" and keeping WU's short. GCW and PSP sieve WU's are relatively short even after recently quadrupling their length due to server/database issues. If long WU's are desired, the PPS sieve (64 bit only) has 34 hr long WU's and the manual 321 sieve can be as long as you like. :)
BTW, the Lunar Landing Challenge was dedicated to all the sievers. Without them, prime finding would be an extremely timeconsuming task.
____________
 


How often will you update the sieve files? I'm mainly interested in 321 sieving because that's what I am doing ;)
As far as I understand, each line of the sieve file is one prime candidate, so at the moment, there are about 2.1M possible primes left in the 5M25M range, correct? So for the update to be worth the trouble, the file should be reduced by several thousand lines, I guess. Or maybe I'm guessing wrong. So how are you planning to do this?
Regards, Peter  


How often will you update the sieve files? I'm mainly interested in 321 sieving because that's what I am doing ;)
As far as I understand, each line of the sieve file is one prime candidate, so at the moment, there are about 2.1M possible primes left in the 5M25M range, correct? So for the update to be worth the trouble, the file should be reduced by several thousand lines, I guess. Or maybe I'm guessing wrong. So how are you planning to do this?
Regards, Peter
You can get a new one now.
It is the same name + E1
All factors found is removed.
We usually make a new file when we have 35% factor's compared to candidates in sievefile.
Thank's for all help
/Lennart  


We usually make a new file when we have 35% factor's compared to candidates in sievefile.
/Lennart
I see. Thanks for the quick answer and for the quick update, of course!
Regards, Peter  


A general technical question about sieving:
As far as I remember, you only have to testdivide a candidate x using the primes which are smaller than sqrt(x) or, in our case, smaller than a desired limit.
The candidates are stored in the sieve file. But where is the list of divisors? Storing them in the application code seems rather unflexible as the sieving range would be limited. Besides, the application file seems to be too small for containing a long list.
Just curious,
Peter
____________
There are only 10 kinds of people  those who understand binary and those who don't
 

geoffVolunteer developer Send message
Joined: 3 Aug 07 Posts: 99 ID: 10427 Credit: 343,437 RAC: 0

As far as I remember, you only have to testdivide a candidate x using the primes which are smaller than sqrt(x) or, in our case, smaller than a desired limit.
The candidates are stored in the sieve file. But where is the list of divisors? Storing them in the application code seems rather unflexible as the sieving range would be limited. Besides, the application file seems to be too small for containing a long list.
The primes used to divide x (which are much much smaller than sqrt(x)), are not stored but generated on the fly using another sieve. And the primes used by that sieve were generated at program startup by yet another sieve. Only a very small list of primes is needed to bootstrap that whole process.  


That sounds very sophisticated. Thanks for explaining!
____________
There are only 10 kinds of people  those who understand binary and those who don't
 

Message boards :
Sieving :
Questions about sieving process.. 