PrimeGrid
Please visit donation page to help the project cover running costs for this month

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Project Staging Area : WFS/WSS task size thoughts

Author Message
Profile GrebulonerProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Nov 09
Posts: 185
ID: 49572
Credit: 563,552,941
RAC: 566,494
321 LLR Turquoise: Earned 5,000,000 credits (5,034,434)Cullen LLR Ruby: Earned 2,000,000 credits (2,756,796)ESP LLR Turquoise: Earned 5,000,000 credits (5,107,987)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,803,667)PPS LLR Turquoise: Earned 5,000,000 credits (8,435,758)PSP LLR Turquoise: Earned 5,000,000 credits (7,106,332)SoB LLR Ruby: Earned 2,000,000 credits (4,136,044)SR5 LLR Turquoise: Earned 5,000,000 credits (5,456,203)SGS LLR Turquoise: Earned 5,000,000 credits (6,850,184)TRP LLR Turquoise: Earned 5,000,000 credits (6,675,139)Woodall LLR Ruby: Earned 2,000,000 credits (2,533,429)321 Sieve (suspended) Bronze: Earned 10,000 credits (38,652)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,178,073)Generalized Cullen/Woodall Sieve Turquoise: Earned 5,000,000 credits (9,963,914)PPS Sieve Double Silver: Earned 200,000,000 credits (200,021,529)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (9,468,384)TRP Sieve (suspended) Jade: Earned 10,000,000 credits (10,076,645)AP 26/27 Emerald: Earned 50,000,000 credits (98,078,172)GFN Emerald: Earned 50,000,000 credits (88,638,756)PSA Emerald: Earned 50,000,000 credits (83,200,632)
Message 102911 - Posted: 1 Jan 2017 | 4:10:14 UTC

After the last few challenges, and looking at all the data from my various gpus as well as the recent optimization thread and other threads about modern GPU resource utilization, I did some thinking and testing on the workunit size of the Wieferich and Wall-Sun-Sun subprojects. My main conclusion is that the workunits are simply too small. I did some testing on my machines, and I have the following proposal:

What if we increased the unit size by 10x? instead of a 1e11/1e10 range, how about 10e11 for WFS and 10e10 for WSS? (10x credit, too, of course)

This is what I discovered from my tests of 1e10/1e11 vs 10e10/10e11, results of which of several ranges were rather consistent. (Methodology: ran a big 10x range at or near current leading edge as one big unit and the same range in 10 1x units, just one unit at a time on each GPU, tested on Maxwell and Fermi):

1. The "init time" was within a few 1/100ths of a second for short and long ranges.
2. P/sec actually increased by up to 13% (and GPU load was consistently higher as measured through GPU-Z).
3. Doing 10x range in one unit took ~9-9.3 times (Maxwell) and ~8.9-9.1 times (Fermi) as long as doing 10 individual 1x ranges, not including any interstitial times between workunits that the prpclient creates.
4. Memory usage didn't change.
5. On little Fermi (GT430, 96 CU), 100% GPU usage is easily reached, so the time gains were generally limited to doing only a single init vs. 10, but more importantly, it never took more than 10x the time to do 10x the work.

In real world timing, on my old and hopefully soon to be upgraded GTX580, a 10e11 WFS unit took ~510 seconds or 8m30s, which is still rather fast for something that is slower than a GTX1050. Certainly lesser GPUs will take much longer (My unoptimized 980ti is over 20x faster than my GT430 at WSS, but people will often run what they have, big or small, and that's OK), but comparing to some BOINC projects, it's not a bad time at all. Heck, make the task size 100x and it's about the same time requirement as an AP27 task (theoretically, I didn't test it), though I think 10x is a good place to start the discussion, especially considering midrange and lower GPU users. I know I'm probably ignoring the needs of CPU users, but like PPS Sieve, the GPU over CPU advantage is too great already.

I like completing 70k tasks in a weeklong challenge (more if I didn't also love gaming) who wouldn't? But if I could do 8-12% more in the same time frame and have one fewer digit of tasks instead, wouldn't that be more worthwhile project-wise?

Sample raw comparison outputs of single ranges vs. 10x range on my 980ti. 104 Mp/s! On PRPnet, I can't even crack 100 consistently with 2+ tasks running at the same time:

WFS

>wwwwcl64.exe -v -p 591537500000000000 -P 591537600000000000 -T Wieferich
wwwwcl v2.2.5, a GPU program to search for Wieferich and WallSunSun primes
Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.2 CUDA 8.0.0
Device 0 is a NVIDIA Corporation GeForce GTX 980 Ti
workGroupSize = 8650752 = 12288 * 32 * 22 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 16 threads
Allocated memory (prior to sieving): 1584 MB in CPU, 1584 MB in GPU
Sieve started: (cmdline) 591537500000000000 <= p < 591537600000000000

Sieve complete: 591537500000000001 <= p < 591537600000000000 2443719505 primes tested
Clock time: 26.40 seconds at at 92552827 p/sec.
Processor time: 241.24 sec. (25.43 init + 215.81 sieve).
Seconds spent in CPU and GPU: 49.85 (cpu), 194.01 (gpu)
Percent of time spent in CPU vs. GPU: 20.44 (cpu), 79.56 (gpu)
CPU/GPU utilization: 9.14 (cores), 1.00 (devices)
Percent of GPU time waiting for GPU: 39.67

>wwwwcl64.exe -v -p 587713200000000000 -P 587714200000000000 -T Wieferich
wwwwcl v2.2.5, a GPU program to search for Wieferich and WallSunSun primes
Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.2 CUDA 8.0.0
Device 0 is a NVIDIA Corporation GeForce GTX 980 Ti
workGroupSize = 8650752 = 12288 * 32 * 22 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 16 threads
Allocated memory (prior to sieving): 1584 MB in CPU, 1584 MB in GPU
Sieve started: (cmdline) 587713200000000000 <= p < 587714200000000000
p=587714181312583151, 104.4M p/sec, 9.33 CPU cores, 98.1% done. ETA 31 Dec 18:31
Sieve complete: 587713200000000001 <= p < 587714200000000000 24440820358 primes tested
Clock time: 236.72 seconds at at 103248005 p/sec.
Processor time: 2206.95 sec. (23.49 init + 2183.45 sieve).
Seconds spent in CPU and GPU: 156.59 (cpu), 2107.95 (gpu)
Percent of time spent in CPU vs. GPU: 6.91 (cpu), 93.09 (gpu)
CPU/GPU utilization: 9.32 (cores), 1.00 (devices)
Percent of GPU time waiting for GPU: 49.41


WSS
>wwwwcl64.exe -v -p 235389880000000000 -P 235389890000000000 -T WallSunSun
wwwwcl v2.2.5, a GPU program to search for Wieferich and WallSunSun primes
setting 3072
Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.2 CUDA 8.0.0
Device 0 is a NVIDIA Corporation GeForce GTX 980 Ti
workGroupSize = 2162688 = 3072 * 32 * 22 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 4 threads
Allocated memory (prior to sieving): 115 MB in CPU, 115 MB in GPU
Sieve started: (cmdline) 235389880000000000 <= p < 235389890000000000

Sieve complete: 235389880000000001 <= p < 235389890000000000 249992251 primes tested
Clock time: 15.94 seconds at at 15683416 p/sec.
Processor time: 18.35 sec. (3.14 init + 15.21 sieve).
Seconds spent in CPU and GPU: 0.71 (cpu), 51.90 (gpu)
Percent of time spent in CPU vs. GPU: 1.35 (cpu), 98.65 (gpu)
CPU/GPU utilization: 1.15 (cores), 1.00 (devices)
Percent of GPU time waiting for GPU: 56.40

>wwwwcl64.exe -v -p 235389880000000000 -P 235389980000000000 -T WallSunSun
wwwwcl v2.2.5, a GPU program to search for Wieferich and WallSunSun primes
setting 3072
Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.2 CUDA 8.0.0
Device 0 is a NVIDIA Corporation GeForce GTX 980 Ti
workGroupSize = 2162688 = 3072 * 32 * 22 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 4 threads
Allocated memory (prior to sieving): 115 MB in CPU, 115 MB in GPU
Sieve started: (cmdline) 235389880000000000 <= p < 235389980000000000
p=235389975458280079, 16.93M p/sec, 1.04 CPU cores, 95.5% done. ETA 31 Dec 18:47
Sieve complete: 235389880000000001 <= p < 235389980000000000 2499971252 primes tested
Clock time: 148.22 seconds at at 16866230 p/sec.
Processor time: 157.58 sec. (3.12 init + 154.46 sieve).
Seconds spent in CPU and GPU: 4.95 (cpu), 522.39 (gpu)
Percent of time spent in CPU vs. GPU: 0.94 (cpu), 99.06 (gpu)
CPU/GPU utilization: 1.06 (cores), 1.00 (devices)
Percent of GPU time waiting for GPU: 63.11

____________
Eating more cheese on Thursdays.

Michael GoetzProject donor
Volunteer moderator
Project scientist
Avatar
Send message
Joined: 21 Jan 10
Posts: 9640
ID: 53948
Credit: 107,070,964
RAC: 95,795
321 LLR Amethyst: Earned 1,000,000 credits (1,169,719)Cullen LLR Amethyst: Earned 1,000,000 credits (1,157,331)ESP LLR Amethyst: Earned 1,000,000 credits (1,179,211)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,330,821)PPS LLR Amethyst: Earned 1,000,000 credits (1,254,873)PSP LLR Ruby: Earned 2,000,000 credits (2,632,269)SoB LLR Ruby: Earned 2,000,000 credits (2,153,211)SR5 LLR Turquoise: Earned 5,000,000 credits (6,048,315)SGS LLR Amethyst: Earned 1,000,000 credits (1,680,461)TRP LLR Amethyst: Earned 1,000,000 credits (1,183,026)Woodall LLR Amethyst: Earned 1,000,000 credits (1,145,077)321 Sieve (suspended) Silver: Earned 100,000 credits (200,576)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,085,723)PPS Sieve Jade: Earned 10,000,000 credits (18,175,834)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Turquoise: Earned 5,000,000 credits (5,881,239)GFN Sapphire: Earned 20,000,000 credits (42,507,589)PSA Jade: Earned 10,000,000 credits (10,028,791)
Message 102912 - Posted: 1 Jan 2017 | 4:58:25 UTC

Yes, you're ignoring the CPU users.
____________

Please do not PM me with support questions. They will usually go unanswered. Ask on the forums instead. Thank you!

Profile GrebulonerProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Nov 09
Posts: 185
ID: 49572
Credit: 563,552,941
RAC: 566,494
321 LLR Turquoise: Earned 5,000,000 credits (5,034,434)Cullen LLR Ruby: Earned 2,000,000 credits (2,756,796)ESP LLR Turquoise: Earned 5,000,000 credits (5,107,987)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,803,667)PPS LLR Turquoise: Earned 5,000,000 credits (8,435,758)PSP LLR Turquoise: Earned 5,000,000 credits (7,106,332)SoB LLR Ruby: Earned 2,000,000 credits (4,136,044)SR5 LLR Turquoise: Earned 5,000,000 credits (5,456,203)SGS LLR Turquoise: Earned 5,000,000 credits (6,850,184)TRP LLR Turquoise: Earned 5,000,000 credits (6,675,139)Woodall LLR Ruby: Earned 2,000,000 credits (2,533,429)321 Sieve (suspended) Bronze: Earned 10,000 credits (38,652)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,178,073)Generalized Cullen/Woodall Sieve Turquoise: Earned 5,000,000 credits (9,963,914)PPS Sieve Double Silver: Earned 200,000,000 credits (200,021,529)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (9,468,384)TRP Sieve (suspended) Jade: Earned 10,000,000 credits (10,076,645)AP 26/27 Emerald: Earned 50,000,000 credits (98,078,172)GFN Emerald: Earned 50,000,000 credits (88,638,756)PSA Emerald: Earned 50,000,000 credits (83,200,632)
Message 102914 - Posted: 1 Jan 2017 | 7:17:43 UTC - in response to Message 102912.

Yes, you're ignoring the CPU users.


Very well, let me "unignore" the CPU users.

I ran a 1x WFS task on my 3.8 GHz Sandy Bridge 3930k; it went rather quickly, too, 41 minutes. I'll multiply the time by about 2 to pretend I ran a full CPU+HT load, so (rough calculation) a 10x CPU unit would take...13.5 or so hours on a full CPU, maybe some more if I'm overly optimistic about scaling. Not a bad time, and multiple 10x tasks can be completed in a 1 week challenge. I'll do a real 12 thread test tomorrow and see how close my calculation was. The program is sieve based, so do older CPUs (eg. Core 2, Phenom) hold up as robustly as they do now on sieve projects?

I imagine WSS would be shorter seeing as the tasks complete in half the time on GPUs vs. WFS. That, of course, would also be completely ignoring the content of this thread, which suggests that CPU users are already out in the cold (I can't run it, either, so I have no data).

Though is silly to use the PPSsieve average runtimes to compare CPU/GPU times in a sieve (it happens to be 52x, I doubt it's representative of anything, although old hardware is pretty good at sieving), but the concept is enough to start on that PPSieve and AP27 task lengths are designed (as you've said repeatedly in threads) to be primarily run by GPUs that cut through the tasks like a fork through soup, not CPUs. Why is it unreasonable to extend the same concept to PRPNet projects, where the userbase is far smaller and the tasks can currently be completed in under a minute on a $100 GPU? Tasks have time floors due to overheads in running and initialization among other limitations. Increasing task length in this case will help to mitigate these wastes in WFS/WSS, no configuration trickery or advanced knowledge needed (aside from wwww.ini but that's not the issue here), "flip the switch" as it were and throughput could be up 10% immediately. Of course more data are needed, I'd love to see some midrange GPU owners chime in with tests, and some older CPUs as well report some runtime info.
____________
Eating more cheese on Thursdays.

Profile Dave
Send message
Joined: 13 Feb 12
Posts: 1896
ID: 130544
Credit: 534,438,690
RAC: 842,662
321 LLR Turquoise: Earned 5,000,000 credits (5,008,573)Cullen LLR Ruby: Earned 2,000,000 credits (2,527,712)ESP LLR Ruby: Earned 2,000,000 credits (2,502,087)Generalized Cullen/Woodall LLR Ruby: Earned 2,000,000 credits (2,502,040)PPS LLR Ruby: Earned 2,000,000 credits (2,500,022)PSP LLR Ruby: Earned 2,000,000 credits (3,521,831)SoB LLR Ruby: Earned 2,000,000 credits (2,504,136)SR5 LLR Ruby: Earned 2,000,000 credits (2,500,985)SGS LLR Ruby: Earned 2,000,000 credits (2,555,530)TRP LLR Ruby: Earned 2,000,000 credits (2,500,365)Woodall LLR Ruby: Earned 2,000,000 credits (2,513,877)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (268,250)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (4,846,098)PPS Sieve Double Silver: Earned 200,000,000 credits (255,555,510)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Jade: Earned 10,000,000 credits (10,000,133)TRP Sieve (suspended) Jade: Earned 10,000,000 credits (10,000,970)AP 26/27 Sapphire: Earned 20,000,000 credits (47,214,154)GFN Emerald: Earned 50,000,000 credits (85,416,417)PSA Emerald: Earned 50,000,000 credits (90,000,001)
Message 102916 - Posted: 1 Jan 2017 | 8:05:16 UTC

Never knew it could work on CPU personally.

I like being able to blip the throttle & do a few filler tasks as required. Just yesterday needed to do exactly 16 tasks to help get my PSA total to a clean point. The concept of longer tssks still has merit of course - how short will they become on say the upcoming 1080Ti?!

GTX580 a) still rocks & b) ~58 secs for me with wwww.ini & no BOINC.

JeppeSNProject donor
Send message
Joined: 5 Apr 14
Posts: 566
ID: 306875
Credit: 7,695,120
RAC: 1,911
PPS LLR Bronze: Earned 10,000 credits (65,501)TRP LLR Bronze: Earned 10,000 credits (14,746)PSA Turquoise: Earned 5,000,000 credits (7,614,290)
Message 102922 - Posted: 1 Jan 2017 | 10:09:28 UTC - in response to Message 102911.

What if we increased the unit size by 10x? instead of a 1e11/1e10 range, how about 10e11 for WFS and 10e10 for WSS? (10x credit, too, of course)

I think it is a good idea. /JeppeSN

Profile Roger
Volunteer moderator
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Nov 11
Posts: 945
ID: 120786
Credit: 186,194,961
RAC: 65,390
321 LLR Amethyst: Earned 1,000,000 credits (1,130,571)Cullen LLR Amethyst: Earned 1,000,000 credits (1,040,598)ESP LLR Amethyst: Earned 1,000,000 credits (1,019,489)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,088,286)PPS LLR Amethyst: Earned 1,000,000 credits (1,002,303)PSP LLR Ruby: Earned 2,000,000 credits (2,420,512)SoB LLR Amethyst: Earned 1,000,000 credits (1,071,208)SR5 LLR Ruby: Earned 2,000,000 credits (2,035,801)SGS LLR Amethyst: Earned 1,000,000 credits (1,765,522)TRP LLR Amethyst: Earned 1,000,000 credits (1,465,760)Woodall LLR Amethyst: Earned 1,000,000 credits (1,059,108)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (207,387)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,025,618)PPS Sieve Emerald: Earned 50,000,000 credits (50,460,532)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,227,972)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,021,659)AP 26/27 Jade: Earned 10,000,000 credits (14,146,457)GFN Emerald: Earned 50,000,000 credits (52,714,641)PSA Sapphire: Earned 20,000,000 credits (43,298,465)
Message 102928 - Posted: 1 Jan 2017 | 15:41:09 UTC - in response to Message 102911.

I will have to give the suggested ranges a try tomorrow.
Last 2 days I have been successfully testing WFS/WSS on an AMD R9 280X, Catalyst 14.12, with the following results from PRPNet:
Wieferich, wwwwcl v2.1.9
6 threads, 1024 blocks, 5x WU, 89% GPU, 98% CPU: 320 WU's in 5:13:09. This is average one Work Unit in 58.7 seconds.
6 threads, 2048 blocks, 5x WU, 93% GPU, 99% CPU: 630 WU's in 9:46:01. This is average one Work Unit in 55.8 seconds.

WallSunSun, wwwwcl v2.2.5
blocks=4096, threads=2, 94-99% GPU, 0-33% CPU: 300 Work Units in 3:16:19. This is average one Work Unit in 39.2 seconds.
2 Directories:
blocks=4096, threads=1, 85 Work Units in 1:41:14. This is average one Work Unit in 71.5 seconds.
blocks=4096, threads=1, 85 Work Units in 1:42:30. This is average one Work Unit in 72.4 seconds.
So total one Work Unit every 36.0 seconds. Therefore 2 Directories is superior.

There is overhead of init time and talking to the servers every 1-20 WU's. Wasted GPU time can be avoided running multiple instances per GPU.
2 instances is better than 1 as shown above. I am not sure how far that scales up though.

Michael GoetzProject donor
Volunteer moderator
Project scientist
Avatar
Send message
Joined: 21 Jan 10
Posts: 9640
ID: 53948
Credit: 107,070,964
RAC: 95,795
321 LLR Amethyst: Earned 1,000,000 credits (1,169,719)Cullen LLR Amethyst: Earned 1,000,000 credits (1,157,331)ESP LLR Amethyst: Earned 1,000,000 credits (1,179,211)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,330,821)PPS LLR Amethyst: Earned 1,000,000 credits (1,254,873)PSP LLR Ruby: Earned 2,000,000 credits (2,632,269)SoB LLR Ruby: Earned 2,000,000 credits (2,153,211)SR5 LLR Turquoise: Earned 5,000,000 credits (6,048,315)SGS LLR Amethyst: Earned 1,000,000 credits (1,680,461)TRP LLR Amethyst: Earned 1,000,000 credits (1,183,026)Woodall LLR Amethyst: Earned 1,000,000 credits (1,145,077)321 Sieve (suspended) Silver: Earned 100,000 credits (200,576)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,170,256)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,085,723)PPS Sieve Jade: Earned 10,000,000 credits (18,175,834)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Amethyst: Earned 1,000,000 credits (1,035,522)TRP Sieve (suspended) Ruby: Earned 2,000,000 credits (2,051,121)AP 26/27 Turquoise: Earned 5,000,000 credits (5,881,239)GFN Sapphire: Earned 20,000,000 credits (42,507,589)PSA Jade: Earned 10,000,000 credits (10,028,791)
Message 102931 - Posted: 1 Jan 2017 | 17:15:17 UTC - in response to Message 102922.
Last modified: 1 Jan 2017 | 17:18:07 UTC

What if we increased the unit size by 10x? instead of a 1e11/1e10 range, how about 10e11 for WFS and 10e10 for WSS? (10x credit, too, of course)


We'll think about. If you really want us to...

Bear in mind that us thinking about this might be a bad thing.

WWWW doesn't produce a usable residue, so double check comparisons are impossible. Not double checking GPU tasks is a HORRIBLE idea. I'm sure you're aware of how we feel about the need to double check results.

We know many of the wwww results are faulty because of the occasional false near-finds. We just have no way of detecting them, or determining how frequently they occur.

We've been gradually moving all of the projects off of PRPNet and onto BOINC mostly to get them into an environment where we can easily double check everything. It also brings vastly more participation, of course, but double checking is the primary reason.

WSS and and Wieferich will not be moving to BOINC.

If someone came to us with the wwww app today, we wouldn't run it due to the lack of double checking.

Basically, we don't trust any of the results. To me, that makes it worthless.

If we start thinking about wwww, we're as likely to shut it off as we are to make changes.

EDIT: I wrote this post several hours ago, and was discussing it with the other admins before posting it. No decisions have been made, but my personal feelings that it's pointless to be running this project without double checking are shared by a lot of others. The genie is officially out of the bottle. Pandora's Box is wide open. We shall see what comes of this. (I'm not saying we're shutting it down, just that the project as it exists today doesn't make a lot of sense. It's way too early to be worrying about what comes next.)
____________

Please do not PM me with support questions. They will usually go unanswered. Ask on the forums instead. Thank you!

Profile GrebulonerProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Nov 09
Posts: 185
ID: 49572
Credit: 563,552,941
RAC: 566,494
321 LLR Turquoise: Earned 5,000,000 credits (5,034,434)Cullen LLR Ruby: Earned 2,000,000 credits (2,756,796)ESP LLR Turquoise: Earned 5,000,000 credits (5,107,987)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,803,667)PPS LLR Turquoise: Earned 5,000,000 credits (8,435,758)PSP LLR Turquoise: Earned 5,000,000 credits (7,106,332)SoB LLR Ruby: Earned 2,000,000 credits (4,136,044)SR5 LLR Turquoise: Earned 5,000,000 credits (5,456,203)SGS LLR Turquoise: Earned 5,000,000 credits (6,850,184)TRP LLR Turquoise: Earned 5,000,000 credits (6,675,139)Woodall LLR Ruby: Earned 2,000,000 credits (2,533,429)321 Sieve (suspended) Bronze: Earned 10,000 credits (38,652)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,178,073)Generalized Cullen/Woodall Sieve Turquoise: Earned 5,000,000 credits (9,963,914)PPS Sieve Double Silver: Earned 200,000,000 credits (200,021,529)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (9,468,384)TRP Sieve (suspended) Jade: Earned 10,000,000 credits (10,076,645)AP 26/27 Emerald: Earned 50,000,000 credits (98,078,172)GFN Emerald: Earned 50,000,000 credits (88,638,756)PSA Emerald: Earned 50,000,000 credits (83,200,632)
Message 102934 - Posted: 1 Jan 2017 | 18:11:17 UTC
Last modified: 1 Jan 2017 | 18:14:38 UTC

Ran a full CPU (w/HT) run of WFS tasks, and the average time was about 68 minutes, so a 1.66x time increase on a single task, much better than I guessed yesterday. I would expect it might be a little higher on CPUs where the ratio of threads to memory channels is larger. I'm finding that calculation time on CPU scales roughly linear with task size, as there is a minimal initialization of the task, so I will still use the "worst case scenario" 10x length for 10x size.

So, CPU-wise, a 10x task on every core would take just 680 minutes or 11.3 hours, which is a really good time, all things considered. The credit sucks (just 2000 points per core per day, perhaps one of the lowest rates of all PG outside of the x87 GFN tasks), but by the same token it's also low for GPUs, too.

Edit: Just saw your post, Michael, and I definitely agree with you that the lack of double check (or even the abilty to verify) is a problem. I think there are many of us who would love to see an updated and fully usable wwww app, but I would understand if you just shut it off one day.
____________
Eating more cheese on Thursdays.

JeppeSNProject donor
Send message
Joined: 5 Apr 14
Posts: 566
ID: 306875
Credit: 7,695,120
RAC: 1,911
PPS LLR Bronze: Earned 10,000 credits (65,501)TRP LLR Bronze: Earned 10,000 credits (14,746)PSA Turquoise: Earned 5,000,000 credits (7,614,290)
Message 102942 - Posted: 1 Jan 2017 | 21:13:35 UTC - in response to Message 102931.

We'll think about. If you really want us to...

Bear in mind that us thinking about this might be a bad thing.

WWWW doesn't produce a usable residue, so double check comparisons are impossible. Not double checking GPU tasks is a HORRIBLE idea. I'm sure you're aware of how we feel about the need to double check results.

We know many of the wwww results are faulty because of the occasional false near-finds. We just have no way of detecting them, or determining how frequently they occur.

We've been gradually moving all of the projects off of PRPNet and onto BOINC mostly to get them into an environment where we can easily double check everything. It also brings vastly more participation, of course, but double checking is the primary reason.

WSS and and Wieferich will not be moving to BOINC.

If someone came to us with the wwww app today, we wouldn't run it due to the lack of double checking.

Basically, we don't trust any of the results. To me, that makes it worthless.

If we start thinking about wwww, we're as likely to shut it off as we are to make changes.

EDIT: I wrote this post several hours ago, and was discussing it with the other admins before posting it. No decisions have been made, but my personal feelings that it's pointless to be running this project without double checking are shared by a lot of others. The genie is officially out of the bottle. Pandora's Box is wide open. We shall see what comes of this. (I'm not saying we're shutting it down, just that the project as it exists today doesn't make a lot of sense. It's way too early to be worrying about what comes next.)


I have also thought about this major short-coming. I hope one day someone will come up with a way to have a "result" of a WWWW range. It could be the XORed total of the last 64 bits of each of the A values (when the residue modulo p^2 is written as ±1 + A*p or 0 + A*p, that is the "A" I am talking about) of each of the primes in that range, or something. Then it would need to be implemented in each WWWW processor flavor. And at that point PrimeGrid ought to restart the entire search from zero again, with double checking.

It would be cool.

/JeppeSN

Profile Roger
Volunteer moderator
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Nov 11
Posts: 945
ID: 120786
Credit: 186,194,961
RAC: 65,390
321 LLR Amethyst: Earned 1,000,000 credits (1,130,571)Cullen LLR Amethyst: Earned 1,000,000 credits (1,040,598)ESP LLR Amethyst: Earned 1,000,000 credits (1,019,489)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,088,286)PPS LLR Amethyst: Earned 1,000,000 credits (1,002,303)PSP LLR Ruby: Earned 2,000,000 credits (2,420,512)SoB LLR Amethyst: Earned 1,000,000 credits (1,071,208)SR5 LLR Ruby: Earned 2,000,000 credits (2,035,801)SGS LLR Amethyst: Earned 1,000,000 credits (1,765,522)TRP LLR Amethyst: Earned 1,000,000 credits (1,465,760)Woodall LLR Amethyst: Earned 1,000,000 credits (1,059,108)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (207,387)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,025,618)PPS Sieve Emerald: Earned 50,000,000 credits (50,460,532)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,227,972)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,021,659)AP 26/27 Jade: Earned 10,000,000 credits (14,146,457)GFN Emerald: Earned 50,000,000 credits (52,714,641)PSA Sapphire: Earned 20,000,000 credits (43,298,465)
Message 102946 - Posted: 2 Jan 2017 | 0:11:08 UTC - in response to Message 102911.

Running Grebuloner's suggested single range vs. 10x range on my AMD 280X gave me 9.68x with WFS and 9.36x with WSS.

Stopwatch time of WFS v2.1.9 below is the sum of "init" and "sieve" in brackets after "Elapsed time".
Stopwatch time of WSS v2.2.5 below is simply "Clock time".

WFS

>wwwwcl64.exe -v -p 591537500000000000 -P 591537600000000000 -T Wieferich
wwwwcl v2.1.9, a GPU program to search for Wieferich and WallSunSun primes
Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 2.0 AMD-APP (1642.5)
Device 0 is an Advanced Micro Devices, Inc. Tahiti
workGroupSize = 4194304 = 2048 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 6 threads
Allocated memory (prior to sieving): 288 MB in CPU, 288 MB in GPU
Sieve started: 591537500000000000 <= p < 591537600000000000

Sieve complete: 591537500000000001 <= p < 591537600000000000 2443719505 primes tested
Elapsed time: 182.81 sec. (3.69 init + 52.49 sieve) at 43411320 p/sec.
Processor time: 306.79 sec. (16.97 init + 289.82 sieve).
Seconds spent in CPU and GPU: 126.52 (cpu), 65.70 (gpu)
Percent of time spent in CPU vs. GPU: 0.66 (cpu), 0.34 (gpu)
CPU/GPU utilization: 0.17 (cores), 0.09 (devices)

>wwwwcl64.exe -v -p 587713200000000000 -P 587714200000000000 -T Wieferich
wwwwcl v2.1.9, a GPU program to search for Wieferich and WallSunSun primes
Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 2.0 AMD-APP (1642.5)
Device 0 is an Advanced Micro Devices, Inc. Tahiti
workGroupSize = 4194304 = 2048 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 6 threads
Allocated memory (prior to sieving): 288 MB in CPU, 288 MB in GPU
Sieve started: 587713200000000000 <= p < 587714200000000000
p=587713273849653499, 47.67M p/sec, 5.76 CPU cores, 7.4% done. ETA 02 Jan 07:5
p=587713436880537173, 47.19M p/sec, 5.76 CPU cores, 23.7% done. ETA 02 Jan 07:
p=587713555036130687, 46.85M p/sec, 5.76 CPU cores, 35.5% done. ETA 02 Jan 07:
p=587713670440993127, 46.79M p/sec, 5.78 CPU cores, 47.0% done. ETA 02 Jan 07:
p=587713776778918211, 46.89M p/sec, 5.77 CPU cores, 57.7% done. ETA 02 Jan 07:
p=587713897588352671, 46.91M p/sec, 5.77 CPU cores, 69.8% done. ETA 02 Jan 07:
p=587714017044026473, 46.86M p/sec, 5.77 CPU cores, 81.7% done. ETA 02 Jan 07:
p=587714116490963333, 45.67M p/sec, 5.60 CPU cores, 91.6% done. ETA 02 Jan 07:48
Sieve complete: 587713200000000001 <= p < 587714200000000000 24440820358 primes tested
Elapsed time: 1806.12 sec. (3.65 init + 540.34 sieve) at 44919133 p/sec.
Processor time: 2980.34 sec. (16.85 init + 2963.49 sieve).
Seconds spent in CPU and GPU: 1262.01 (cpu), 652.69 (gpu)
Percent of time spent in CPU vs. GPU: 0.66 (cpu), 0.34 (gpu)
CPU/GPU utilization: 0.17 (cores), 0.09 (devices)


WSS
>wwwwcl64.exe -v -p 235389880000000000 -P 235389890000000000 -T WallSunSun
wwwwcl v2.2.5, a GPU program to search for Wieferich and WallSunSun primes
Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 2.0 AMD-APP (1642.5)
Device 0 is an Advanced Micro Devices, Inc. Tahiti
workGroupSize = 8388608 = 4096 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving): 224 MB in CPU, 224 MB in GPU
Sieve started: (cmdline) 235389880000000000 <= p < 235389890000000000

Sieve complete: 235389880000000001 <= p < 235389890000000000 249992251 primes tested
Clock time: 36.67 seconds at at 6816775 p/sec.
Processor time: 21.37 sec. (3.68 init + 17.69 sieve).
Seconds spent in CPU and GPU: 0.87 (cpu), 57.98 (gpu)
Percent of time spent in CPU vs. GPU: 1.48 (cpu), 98.52 (gpu)
CPU/GPU utilization: 0.58 (cores), 1.00 (devices)
Percent of GPU time waiting for GPU: 29.04

>wwwwcl64.exe -v -p 235389880000000000 -P 235389980000000000 -T WallSunSun
wwwwcl v2.2.5, a GPU program to search for Wieferich and WallSunSun primes
Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 2.0 AMD-APP (1642.5)
Device 0 is an Advanced Micro Devices, Inc. Tahiti
workGroupSize = 8388608 = 4096 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving): 224 MB in CPU, 224 MB in GPU
Sieve started: (cmdline) 235389880000000000 <= p < 235389980000000000
p=235389888890408723, 7.283M p/sec, 0.53 CPU cores, 8.9% done. ETA 02 Jan 07:3
p=235389897950144033, 7.413M p/sec, 0.52 CPU cores, 18.0% done. ETA 02 Jan 07:
p=235389957013068167, 7.411M p/sec, 0.51 CPU cores, 77.0% done. ETA 02 Jan 07:
p=235389966069204251, 7.311M p/sec, 0.52 CPU cores, 86.1% done. ETA 02 Jan 07:
p=235389974961156503, 7.245M p/sec, 0.52 CPU cores, 95.0% done. ETA 02 Jan 07:25
Sieve complete: 235389880000000001 <= p < 235389980000000000 2499971252 primestested
Clock time: 343.41 seconds at at 7279915 p/sec.
Processor time: 179.31 sec. (3.68 init + 175.63 sieve).
Seconds spent in CPU and GPU: 5.81 (cpu), 587.34 (gpu)
Percent of time spent in CPU vs. GPU: 0.98 (cpu), 99.02 (gpu)
CPU/GPU utilization: 0.52 (cores), 1.00 (devices)
Percent of GPU time waiting for GPU: 35.52

Profile Roger
Volunteer moderator
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Nov 11
Posts: 945
ID: 120786
Credit: 186,194,961
RAC: 65,390
321 LLR Amethyst: Earned 1,000,000 credits (1,130,571)Cullen LLR Amethyst: Earned 1,000,000 credits (1,040,598)ESP LLR Amethyst: Earned 1,000,000 credits (1,019,489)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,088,286)PPS LLR Amethyst: Earned 1,000,000 credits (1,002,303)PSP LLR Ruby: Earned 2,000,000 credits (2,420,512)SoB LLR Amethyst: Earned 1,000,000 credits (1,071,208)SR5 LLR Ruby: Earned 2,000,000 credits (2,035,801)SGS LLR Amethyst: Earned 1,000,000 credits (1,765,522)TRP LLR Amethyst: Earned 1,000,000 credits (1,465,760)Woodall LLR Amethyst: Earned 1,000,000 credits (1,059,108)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (207,387)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,025,618)PPS Sieve Emerald: Earned 50,000,000 credits (50,460,532)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,227,972)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,021,659)AP 26/27 Jade: Earned 10,000,000 credits (14,146,457)GFN Emerald: Earned 50,000,000 credits (52,714,641)PSA Sapphire: Earned 20,000,000 credits (43,298,465)
Message 102950 - Posted: 2 Jan 2017 | 3:06:50 UTC
Last modified: 2 Jan 2017 | 3:13:22 UTC

I figured out why the checksum is zero, for WSS at least.
It's not saving the A value in the Kernel when it's not a Special Result.
In wallsunsun_kernel.h after:

"result[gid] = 0;\n" \
just add the line:
"quot[gid] = c21;\n" \
The A value will then always become available in the ii_QuotientList[].

Checksum is currently just a simple addition, not an XOR:
il_CheckSum += ii_QuotientList[ii];
To print out the Checksum just add this line to WallSunSun.cpp ChildTestRange() after the for loop:
ip_WWWW->ReportSpecial("Final Checksum: %016llx", il_CheckSum);
It's not getting as far as WWWW.cpp LogStats() for some reason, but should be easy to debug.

Example output:
>wwwwcl64.exe -v -p 1217727803528000 -P 1217727803529000 -T WallSunSun
wwwwcl v2.2.5, a GPU program to search for Wieferich and WallSunSun primes
Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 2.0 AMD-APP (1642.5)
Device 0 is an Advanced Micro Devices, Inc. Tahiti
workGroupSize = 8388608 = 4096 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Running with 2 threads
Allocated memory (prior to sieving): 224 MB in CPU, 224 MB in GPU
Sieve started: (cmdline) 1217727803528000 <= p < 1217727803529000
1217727803528521 is a special instance (+0 -49 p)
Final Checksum: ffffffffffffa5f7

Sieve complete: 1217727803528001 <= p < 1217727803529000 31 primes tested
Clock time: 1.90 seconds at at 16 p/sec.
Processor time: 0.80 sec. (0.78 init + 0.02 sieve).
Seconds spent in CPU and GPU: 0.56 (cpu), 0.96 (gpu)
Percent of time spent in CPU vs. GPU: 36.97 (cpu), 63.03 (gpu)
CPU/GPU utilization: 0.42 (cores), 0.50 (devices)

Profile Roger
Volunteer moderator
Volunteer developer
Volunteer tester
Project scientist
Avatar
Send message
Joined: 27 Nov 11
Posts: 945
ID: 120786
Credit: 186,194,961
RAC: 65,390
321 LLR Amethyst: Earned 1,000,000 credits (1,130,571)Cullen LLR Amethyst: Earned 1,000,000 credits (1,040,598)ESP LLR Amethyst: Earned 1,000,000 credits (1,019,489)Generalized Cullen/Woodall LLR Amethyst: Earned 1,000,000 credits (1,088,286)PPS LLR Amethyst: Earned 1,000,000 credits (1,002,303)PSP LLR Ruby: Earned 2,000,000 credits (2,420,512)SoB LLR Amethyst: Earned 1,000,000 credits (1,071,208)SR5 LLR Ruby: Earned 2,000,000 credits (2,035,801)SGS LLR Amethyst: Earned 1,000,000 credits (1,765,522)TRP LLR Amethyst: Earned 1,000,000 credits (1,465,760)Woodall LLR Amethyst: Earned 1,000,000 credits (1,059,108)Cullen/Woodall Sieve (suspended) Silver: Earned 100,000 credits (207,387)Generalized Cullen/Woodall Sieve Ruby: Earned 2,000,000 credits (2,025,618)PPS Sieve Emerald: Earned 50,000,000 credits (50,460,532)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (3,227,972)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,021,659)AP 26/27 Jade: Earned 10,000,000 credits (14,146,457)GFN Emerald: Earned 50,000,000 credits (52,714,641)PSA Sapphire: Earned 20,000,000 credits (43,298,465)
Message 102952 - Posted: 2 Jan 2017 | 5:43:34 UTC

OK, I got the WWWW.cpp LogStats() working too. The Checksum is reported by the threads to the main App in a call to WriteCheckPoint(), and that is not currently called unless the Sieve is interrupted. To fix just add a call to WriteCheckpoint() before printing out the "Sieve complete" message in App.cpp Finish(). That and the additional line in wallsunsun_kernel.h is all you need to get Checksum to report in WSS.

Similarly quot[gid] is not being set in wieferich_kernel.h unless it is a Special Result.

Post to thread

Message boards : Project Staging Area : WFS/WSS task size thoughts

[Return to PrimeGrid main page]
Copyright © 2005 - 2017 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 2.64, 2.42, 3.00
Generated 23 Nov 2017 | 9:21:09 UTC