## Other

drummers-lowrise

Message boards : Sieving : Curiousity

 Subscribe SortOldest firstNewest firstHighest rated posts first
Author Message
Allen Paschke

Joined: 12 Nov 15
Posts: 38
ID: 428118
Credit: 47,485,872
RAC: 24,458

Message 131592 - Posted: 31 Jul 2019 | 16:23:33 UTC

I am curious and I like to learn.

When I run Manual Sieving for GFN17, for each 1P that I run, approximately 100 lines are written, such as 213888748689316642817 | 120405194^131072+1.
I assume that 120405194^131072+1 is a factor for GFN17.

However, for each 1P, for the approximately 100 records that are written:
- Why are only about 20 records considered factors?
> Why are about 5 of the 20 records removed from the sieve?
- What are the other 80 records?

How is 213888 . . . . from 213888P - 213889P related to 120405194^131072+1?
There is a “C” file for restarting with a value of 815920922987789. --- I expected to see a number between 213888P and 213889P
How are the “C” file value (815 . . .), 213888 . . . and 120405194^131072+1 all related?

Crun-chi
Volunteer tester

Joined: 25 Nov 09
Posts: 3250
ID: 50683
Credit: 152,646,050
RAC: 10,054

Message 131593 - Posted: 31 Jul 2019 | 16:53:52 UTC - in response to Message 131592.

213888748689316642817 | 120405194^131072+1.

That means 120405194^131072+1 has a factor 213888748689316642817 and cannot be prime, since it can be divided with 1 , ourself and 213888748689316642817.

Why are only about 20 records considered factors?

Every line is one factor for one candidate

Why are about 5 of the 20 records removed from the sieve?
What are the other 80 records?

Because you run sieveless program, and it is normal that candidate has lower factor so it is removed some time before. But your program doesnot know that fact and found one more valid factor for that candidate.

How is 213888 . . . . from 213888P - 213889P related to 120405194^131072+1?

It is not related : you reserved range from 213888P to 213889P and in that range you found factor for that candidate. Simple as that :)

And last 213888748689316642817 can be written as 2^18*7*116560117824727+1 :)

For double factor look at example

15901*52^413236+1 has factor 480222013133 but also factor 10000154315101
So both divide same candidate :)
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie!

JeppeSN

Joined: 5 Apr 14
Posts: 1852
ID: 306875
Credit: 52,576,736
RAC: 30,650

Message 131594 - Posted: 31 Jul 2019 | 17:00:10 UTC - in response to Message 131592.

are written, such as 213888748689316642817 | 120405194^131072+1.

It means that the relatively small prime 213888748689316642817 divides the huge GFN17 number 120405194^131072+1 also known as 120405194^(2^17)+1.

So that is good, we will not have to test 120405194^131072+1 in the genefer project for GFN17.

The small number 213888748689316642817 is approximately what you could call 213 exa (or 213E) or 213000 peta (or 213000P).

/JeppeSN

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 920
ID: 107307
Credit: 990,113,257
RAC: 55,890

Message 131595 - Posted: 31 Jul 2019 | 17:42:30 UTC

The statistics are skewed in that they only show factors affecting my current sieve file. For GFN17 that stops just below b=400M. Any factors for b=400M-2G are still completely valid and I could produce a sieve from them in about an hour, but currently such candidates are not testable by the genefer program. It's the same with the graphs for the various GFN sieving. You can see that http://www.primegrid.com/sieving/gfn/GFN131072.png goes to b=400M on the y-axis while for http://www.primegrid.com/sieving/gfn/GFN262144.png b only goes to 100M. That's because all the early sieving for GFN18 was done to only b=100M. It's currently being redone (by me) to b=2G and so in late August that graph will change.

If we got to the point of never needng to sieve a GFN subproject any further, I'd take the time to produce a bmax=2G sieve. The reason I don't right now is that it just makes everything take a lot longer with nothing much to show for it.

And since it seems like a lot of people have never seen those graphs, they're available from the Show Stats link on the manual sieving projects listing.

As far as 213888748689316642817 | 120405194^131072+1 goes 213888748689316642817 is a factor of 120405194^131072+1. So 120405194^131072+1 gets removed from the sieve file and will never be looked at again. Someone else may find another factor at some point, but that's OK. You can compare the counts in the red columns to those in the yellow columns, the difference is what's already been removed by prior sieving. Click the + by the title to open up the full table. Finding factors takes a lot of time, verifying that any factor is correct takes a fraction of a second. All submitted factors are tested as the first step in processing them. We get maybe ten a year that aren't correct, not sure why and really don't care. They're automatically eliminated from consideration.

JimB
Honorary cruncher

Joined: 4 Aug 11
Posts: 920
ID: 107307
Credit: 990,113,257
RAC: 55,890

Message 131703 - Posted: 3 Aug 2019 | 19:12:53 UTC

I have to amend my original answer. It's been so long since I set this up (six years or so) that I forgot some of the details. For any GFN n value, I have two completely separate sieve files. While I think I've gone through the process before in the big manual sieving system thread, it probably bears repeating here so I can point people to a short thread.

When I'm about the validate manual sieving, I do the following:

1) Update the actual reservations themselves. In the very beginning I used to do this manually but it was error-prone and as more sieving happened it became a huge pain. So now there's a program that synchronizes the reservations between the PrimeGrid server and my local server (which generates the GFN stats and graphs). For those with a technical bent, I keep a tunnel open through each SSH client to the mysql port on each server. So my local workstation can run queries and update my local server quickly and efficiently.

2) Download the actual factor files. Again, this used to be done manually, but I wrote a small program that connects to the server and retrieves all the pending factor files. It just grabs every file in each reservation upload directory without regard to what kind of extension it has.

3) I run a program that I call "normalize". It does the following:
a) If the file is in .zip, .7z or .rar format, unpack it.
b) Test every factor to make sure it's valid. n value is tested to make sure it matches the filename.
c) Sorts the the entries in factor,n order. The sieving program output can be out of order.
d) Remove duplicate entries from the factor file. If you restart after a crash, there can be duplicates. That's mostly to keep the stats accurate as duplicate factors don't matter otherwise.
e) The output is always in "factor | candidate" format like 23803926529 | 3480^65536+1 as opposed to early versions of David Underbakke's GFN sieving program which produced output like 1*3480^65536 + 1 factor : 123803926529. The program can read both styles. That's how it originally got the name "normalize".

4) I run a program that resieves from the last factor appearing in each factor file to the end of the range (as given by the filename). It's quite common for GFN22 factor files to not have factors for up to 0.12P without it being an error. For any file apparently missing more than 0.1P of sieving the program throws up a message. Unless I interrupt it, it'll finish sieving on each range. This is where I find most of the problems with uploaded factor files - they end far earlier than they should. If I'm doing processing around the 0400 UTC deadline for the system to give credit (more on that below), then I interrupt processing and remove that factor file from consideration, writing a PM to the user involved. If doing it at a different time of day and the range is not huge, I may let my workstation finish the sieving.

5) Once #4 is finished, the factor files are automatically copied to their appropriate directories on both my workstation and my home server. Each n has its own directory (names are 32768, 65536, 131072 etc. so it's harder to accidentally be in the wrong directory than names like GFN15, GFN16 etc.).

6) Local workstation is done first. For each n in which there are new factors, I run a program that opens the old sieve, reads every factor file and applies it to the sieve. As part of that process, the following tests are done:
a) The header line is tested to make sure it's appropriate for the file. n and bmax values must be valid. Any file missing a header line is flagged.
b) First and last factors are checked to make sure they correspond with the filename
c) Every factor is again tested to make sure it really divides the candidate in question.
d) Gaps between successive candidates are looked at and flagged if they're too far apart.
e) Early sieving on Underbakke's program could have continuations. Special care was made to ensure there was no gap around a continuation. As there was no checkpoint file, the user involved had to re-enter all the parameters of the search and often made mistakes. The factor value could have gaps, the n value could completely change, a different range could be appended, etc.
f) Every newly-generated factor file is expected to have b values above 100M if the user is running the right program. Any factor file that doesn't is flagged here.

Those tests are all run on my local workstation where the only output is the new sieve and nothing else can get screwed up by bad files. Any file that doesn't pass testing here is removed from my local server.

7) A similar program is run on my local server. This program doesn't do all the checking that happened on my workstation, but talks to my local database. It makes certain that factor files exist to completely cover each reservation. Factor counts, sieve removals and the values of the removals themselves are all recorded in database files. At the end of each run (one per n) two lists are printed. One is the list of newly-removed candidates that are currently loaded on PrimeGrid and should be removed. The other is a list of missing ranges that's haven't yet been submitted (the gaps in the sieving). It's painful and time-consuming to remove bad data from the database, which is why this program is only run after the one on my workstation completes without errors.

8) I have a web page where I copy and paste the factors to remove sieved-out work already loaded on the server. While this could be automated, it's helpful for me to see what shows up. This web page either removes candidates entirely if not yet turned into a workunit or cancels the workunit, turns it to quorum 1 (any finished job validates immediately) and sets the residue field to "FACTOR FOUND". A factor is better than a genefer test result and that value will not be overwritten by the validator.

9) After doing all current n values on the local server, I run another command there that generates the stats and regenerates any graph where the data has changed. That program automatically copies those updated files to PrimeGrid's server when it finishes.

10) Somewhere in all of this, usually as each n is done testing on my workstation, I manually validate each pending manual sieving reservation that I downloaded factors for. It's not unusual for more uploads to happen during this processing and those are either left until the next time I "do" manual sieving or downloaded immediately and processed before step 9 above. Each factor file moves from its upload directory to the factor file directory for that n.

11) At 0400 UTC each day, credit is moved from the PSA badge pending (PRPNet and manual sieving) into actual PrimeGrid credit. The amount transferred is up to 80% of your current Recent Average Credit (RAC). Of course this has the effect of boosting your RAC so if you have too much credit to transfer all at once, the amount transferred the next day is much larger.

Back to the sieve files: The sieves on my local workstation are for the full b range for that n. For example, on GFN19 (524288) early sieving only went to b=100M so that's what my sieve goes to. On GFN15, GFN16, GFN17 and soon GFN18 sieving went to b=2G from the beginning and sieves go that high too. Those sieves on my local workstation also have candidates removed due to algebraic factors (some candidates can't possibly be prime as they have known divisors that won't be found by our sieving). Those workstation sieves are the ones used to produce new work to be loaded on PrimeGrid.

Sieve files on my local server are only for the stats and graphs. They all end at either b=100M or b=400M. But it's also useful as another copy of the factor files involved. There are at least four copies of every factor file kept by us. One is on the PrimeGrid web server box , one is on the PrimeGrid database server box which autosyncs with the web server, one is on my workstation and one is on my server. Additionally, every three months I make a backup of my entire sieving directory structure (565 gigabytes at the moment) onto a completely different local box. I have a year's worth of those. And when we finish sieving any project, I burn a copy to DVDR. We're serious about not losing data. Technically I don't need the factors after a new sieve has been generated, but if there are ever questions about whether sieving was done properly, those factors are invaluable.

Finally, bear in mind it takes a lot longer to talk (or read) about this processing than it takes to do it. Most of the tests don't ever find anything wrong, but we can't have improperly-eliminated candidates.

Kellen

Joined: 10 Jan 18
Posts: 484
ID: 967938
Credit: 1,600,003,090
RAC: 0

Message 131704 - Posted: 3 Aug 2019 | 19:48:15 UTC - in response to Message 131703.

Hi Jim,

Many thanks for this excellent summary of how the manual sieving is processed! It was an interesting read and very informative. Step 8 is something I have long been curious about, and it is great to see that all factor files are kept in addition to the sieve files.

Despite the numerous automations to the process, it still seems like a significant effort to run and maintain the sieves for GFN. Your efforts are greatly appreciated!

Regards,
Kellen

walli
Volunteer moderator

Joined: 12 Jun 16
Posts: 11
ID: 449456
Credit: 7,345,540,741
RAC: 4,165,263

Message 131705 - Posted: 3 Aug 2019 | 20:16:42 UTC - in response to Message 131704.

+1 :)

Allen Paschke

Joined: 12 Nov 15
Posts: 38
ID: 428118
Credit: 47,485,872
RAC: 24,458

Message 131745 - Posted: 5 Aug 2019 | 16:19:37 UTC - in response to Message 131703.

Jim, thank you!!!!

Thank you for all your efforts!!!! They are greatly appreciated!!!!

vaughan

Joined: 11 Aug 05
Posts: 351
ID: 224
Credit: 14,235,517,916
RAC: 34,779,650

Message 131760 - Posted: 5 Aug 2019 | 23:50:55 UTC

Jim that is really good information. Thank you for your efforts with this. Greatly appreciated.
____________

Message boards : Sieving : Curiousity