Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Sieving :
New manual sieving: GFN65536 and GFN131072
Author |
Message |
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,270,184 RAC: 150,909
                     
|
I've just created two new manual sieving projects. We've been looking at how well sieved we are for GFN32768, GFN65536 and GFN131072 since those are likely to start as BOINC projects once the OCL/OCL2/OCL3 client has been written and tested. GFN32768 isn't bad off at all, but GFN65536 and GFN131072 are very undersieved given the OCL3 max b.
GFN65536 and GFN131072 are the current sieving priority.
At those n levels, you should run 4 copies of gfnsvcuda for 65536 and 3 copies for 131072. At least that's what's optimal on my machine. And be aggressive with the b values. I'm using b11 because b12 starts to generate driver timeouts on my card. Experiment and see what the best overall P/day is that you can get. The program reports it, just give it 15-30 seconds for the number to stablize after starting the program (or an additional copy of the program). Alternatively, you could run one thread of this sieving while the GPU is busy with something else and see how that works out.
Sieving is removing candidates over 100x faster than they can be checked with genefer. Those need to be sieved now and will probably keep being sieved even after we start testing with genefer. I'll be removing candidates already loaded as sieving comes in.
The more these are sieved, the sooner we can start looking for primes. Now is the time to try manual sieving if you haven't before. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 912 ID: 370496 Credit: 552,372,755 RAC: 457,337
                         
|
A few preliminary tests on my Gtx 970 (OCed to 1510mhz). One thing to note is that GFN 65536 completely sucks CPU cycles. I could do GFN 2097152 and only use around 1~2% of CPU per task, while having GPU usage at around 90~95% all the times (with temperatures on the roof...). With this one, however, it eats an entire CPU core, with GPU barely being used and temps on the higher 40s lower 50s (depending on when usage spikes).
First, B sizes:
B7 = 17.4 P/Day
B8 = 19.0 P/Day
B9 = 19.6 P/Day
B10 = 19.6 P/Day
B11 = 20.4 P/Day
B12 = 20.4 P/Day
B13 = 20.7 P/Day.
Next up, multiple tasks running. Er.... I'm not going over that. A quick test using 2x B12 and B13 gave me around 39 P/Day, so that's almost a 2x scaling.... at the cost of 2x CPU usage. And that is BAD. GPU usage and temps were still very low, so I suppose I could push it further with more tasks running. But I won't do that, for my CPU is a 3 core one, and I want to use my PC for other things. So as soon as I'm done with my 20P reservation, I'll abandon GFN 65536.
BUT, one thing that I did notice is that if I'm doing a regular Boinc GFN-WR WHILE doing the sieve for GFN 65536, there doesn't seem to be any performance lost on EITHER task. Well, sure, there is a 0.1~0.2 P/Day reduction on the Sieve, and every once in a while I get 1 additional second on the Boinc WU, but aside for that, it seems that one does not interfere much with the other. Makes sense, as WR (Cuda) uses almost 0 CPU, with the sieve using very little GPU.
Still sucks resources from my CPU WUs, but hey, at least that's already something.... | |
|
|
B7 default yields 35% GPU usage on my GTX760m. What does your 970 show for usage at the default level?
my 760m is at 1110mhz.
Also looking at 16% CPU usage for that task as well. Not quite 1 core
Running 65536 and 4194304 at the same time is showing about 40P/Day. | |
|
|
How high of a b value can we set for the sieve?
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14011 ID: 53948 Credit: 435,567,510 RAC: 868,496
                               
|
How high of a b value can we set for the sieve?
I think it goes up to 13 -- but bigger doesn't necessarily mean faster. If you make it too high, it runs slower.
____________
My lucky number is 75898524288+1 | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,270,184 RAC: 150,909
                     
|
Run the program with no arguments to get help on the parameters. B13 is the highest but on my GTX 570, using B13 crashes the driver with a warning that the card took too long to return a result (or something like that). It's been a while, but I'm pretty sure it also kills GFNSvCUDA too. I find it better to run lower B values and either run multiple sieves simulaneously or run on B9 while simultaneously running genefer under BOINC. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1033 ID: 301928 Credit: 543,624,271 RAC: 5,083
                         
|
As a general rule, use a tool which monitors GPU usage (bundled with video card or freeware GPU-Z) and run many copies of sieve, simultaneously playing with B-values, to get GPU usage close to 100%. 2 copies are minimal required value - the program is written in straightforward single-threaded way: when CPU is working, GPU is idle, and vice versa. I have to run 4 copies at B11 (GFN65536) to feed even middle-class GTX750TI at rate close to maximum, but yet not max. So I'm also running PPR12M sieve to pickup remaining GPU time (which appears to be about 25% :-( ). Even at 4xB11 screen lag is so big that system is unusable, thanks it's a dedicated cruncher. Lesser B values will avoid lag but decrease overall performance in my tests.
| |
|
Dave  Send message
Joined: 13 Feb 12 Posts: 3209 ID: 130544 Credit: 2,288,183,124 RAC: 754,139
                           
|
65536 nearly at 10000. Will it stop then with no more reservations, deemed as optimal? | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,270,184 RAC: 150,909
                     
|
65536 nearly at 10000. Will it stop then with no more reservations, deemed as optimal?
No. While we're using the OCL3 transform, the optimal sieving point is somewhere around 50,000P (50E). At the moment it's almost five times faster to remove candidates by sieving than it is by running two OCL3 tests per candidate.
If/when we eventually move to a slower transform, the optimal sieving point gets even higher because sieving stays close to the same speed. Actually, sieving slows down at p>=9223P and again at p>=18446P because the algorithm used changes. We've recently reached that point on GFN65536 and I need to go back and account for that in the credit given out. That also brings down the optimal sieving point, but not by much.
I'll get the credit adjustments done and display multiple credit rates per GFN project by the end of the day, testing it all now.
Edit: Since sieving slows but we want credit per hour to remain the same, the credit will increase on GFN65536 for ranges above 9223P. There's a range that spans that point, it'll be given the old rate below 9223P and the new one above 9223P.
Later edit: Credit is now adjusted. I've verified that the higher credit is correct and that it's been added to the PSA credit queue. Newly-vetted ranges will now get the new credit. I'm just sieving a tiny range here since the upload routines changed a bit (for better error handling) and then I'll work on the backlog of completed sieving work. | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 298 ID: 119185 Credit: 4,070,348,388 RAC: 1,960,737
                      
|
With the GFN challenge finishing in 5 days, you may want to post some notices to entice some GPU users to switch to manual sieving. While they won't find primes sieving, they will eliminate candidates 400X faster @ n=22 than running a GFN-WR. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1033 ID: 301928 Credit: 543,624,271 RAC: 5,083
                         
|
65536 nearly at 10000. Will it stop then with no more reservations, deemed as optimal?
No. While we're using the OCL3 transform, the optimal sieving point is somewhere around 50,000P (50E). At the moment it's almost five times faster to remove candidates by sieving than it is by running two OCL3 tests per candidate.
Hmm... It may depend on GPU type very much. For example, my 750ti did maximum of 50P sieve per day (and lot of tricks were required to reach 100% GPU load level). It was before speed has dropped, now numbers will be less. Re-adjusting stats of last crunched 60P range to 50P, we'll have about 218 OCL3 factors removed per day. Genefer/OCL3 takes about 250 seconds on this GPU, i.e. 345 workunits per day. Considering double-check, it's not a optimal point yet, but very close. OCL4 is even better - 200 second, 432 workunits per day (but maximum range is expected to be less then OCL3).
It just a small refinement. Really, it does not matter much because we have these OCL3-range factors "for free" while sieving OCL2 range - and OCL2 range is still far away from optimal sieve point.
| |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 912 ID: 370496 Credit: 552,372,755 RAC: 457,337
                         
|
With the GFN challenge finishing in 5 days, you may want to post some notices to entice some GPU users to switch to manual sieving. While they won't find primes sieving, they will eliminate candidates 400X faster @ n=22 than running a GFN-WR.
Poor AMD users, cant even do sieving... | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,270,184 RAC: 150,909
                     
|
Poor AMD users, cant even do sieving...
Because the person who wrote the program didn't own an AMD GPU at the time, and probably still doesn't. Feel free to write a similar program for AMD users. | |
|
Azmodes Volunteer tester
 Send message
Joined: 30 Dec 16 Posts: 184 ID: 479275 Credit: 2,197,541,354 RAC: 252
                       
|
I hope it's okay to sort of necro this thread instead of creating a new one.
Just wanted to share that while I haven't played around with the b values yet, for my GTX 1060 6GB the sweet spot in simultaneous sieving instances for GFN 16 seems to be 6-7. After that the increase in P/day plateaus or might even decrease. P/day maxed out that way is ~33-33.5; with 1 instance it's 12.9.
I have also tried it with a GTX 750 Ti, up to 3 instances, which yields 11.7 P/day as opposed to 8.2 for one.
For GFN 19 (1060) it's closer to 4 instances at once. (~45 P/day; 35.2 with 1 instance).
In terms of credit/sec, GFN 16 appears to be the best choice if you max it out that way, though it's pretty annoying to set up this many Command Prompts and all those low-GFN sieves demand a lot of CPU power as well. | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 298 ID: 119185 Credit: 4,070,348,388 RAC: 1,960,737
                      
|
If you want to maximize your GPU usage, try running 1 GFN22 sieve, and 1+ other sieve. GFN22 sieves need a lot of GPU, but don't use too much CPU. | |
|
Azmodes Volunteer tester
 Send message
Joined: 30 Dec 16 Posts: 184 ID: 479275 Credit: 2,197,541,354 RAC: 252
                       
|
I hadn't even considered combining different n values. I'll definitely give this a try. | |
|
Azmodes Volunteer tester
 Send message
Joined: 30 Dec 16 Posts: 184 ID: 479275 Credit: 2,197,541,354 RAC: 252
                       
|
Just tried 1x GFN 22 b=11 along with 1x GFN 16 b=8. About 71 and 8.6P/day, respectively. Already far more efficient than just piling GFN 16 sieves on top of each other without upping the b values. Still, probably a lot of room for tweaking and improvement.
hah, everytime I think I got something figured out on this site... | |
|
|
on my 1080ti I run at B13 for max productivity. I find 3-4 files running at the same time to be best. Although it's a bit extreme for most I was running 2 40p sieves of GFN22 and 1 17P sieve of 16 and I was running 110p/d for both GFN22 and 1.7P/D for GFN16. I stopped my gpu tasks in the boinc client and my GFN22 went up to 120p/d and my 16 went up to 1.8P/D. When I finished the GFN22 and just ran 16 it went up to 19.7P/D. For me GFN22 is way more cred/sec however my A10-7800 can run them both at ~12% of each other so whatever sieve gives the most cred per P is best. | |
|
Azmodes Volunteer tester
 Send message
Joined: 30 Dec 16 Posts: 184 ID: 479275 Credit: 2,197,541,354 RAC: 252
                       
|
I haven't yet tried two instances of GFN22, since I figured a single one almost maxes out my GPU anyway.
Currently running 1x GFN22 (~53 P/day) and 3x GFN16 (10.2 P/day each) on the same card, all with b = 11. That amounts to about 5.64 credits per second, if you care about that sort of thing (that's more efficient than PPS sieve by 0.2 on my 1060 6GB, I suppose I could push that value with further instances of GFN16 or who knows what; btw, 7x GFN16 with b = 7 was about 3.32).
So far the setup seems like a good compromise between crunching and still being able to use my computer without going crazy with the lag. | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 298 ID: 119185 Credit: 4,070,348,388 RAC: 1,960,737
                      
|
GFN22 is so massively undersieved that I can't believe that people run GFN22, when manual GFN22 sieving eliminates candidates almost 1,000 times faster now. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 912 ID: 370496 Credit: 552,372,755 RAC: 457,337
                         
|
GFN22 is so massively undersieved that I can't believe that people run GFN22, when manual GFN22 sieving eliminates candidates almost 1,000 times faster now.
I asked this as well... basically, the thought proccess is "who knows if PrimeGrid will exist 5~10 years in, so instead of sieveing the entire B range, only candidates we would test up to that ammount of time (at the current pace we're in) are taken into consideration for sieve purposes, at which point we're oversieved / close to optimal depth". Oh, and reminder, this is also taken with a Gtx Titan - the original one - as a reference for GFN and SV performance.
Personally, I don't agree, but that's how it goes. | |
|
GDBSend message
Joined: 15 Nov 11 Posts: 298 ID: 119185 Credit: 4,070,348,388 RAC: 1,960,737
                      
|
Is there any tracking of GPU TeraFLOPs available to PG? If so, I'm guesstimating you'd see about a 30% annual growth in PG GPU TeraFLOPs. That means your 5-10 projections are probably wrong if you're doing a linear estimate. | |
|
|
ohh Jim and scott are very intelligent and aren't planning linear 5 year. I do agree that we could sieve to a higher factor for the area we're searching, however I haven't seen any of the code to know if that's a feasible adjustment. If it's not then I definitely am not volunteering to write a brand new code because I don't know how to. | |
|
Message boards :
Sieving :
New manual sieving: GFN65536 and GFN131072 |