PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 60094)
Posted 2636 days ago by Profile BiBiProject donor
With an AMD Phenom II X6 1100T I managed 6 FPS WU with average time 14h 49m. This works out to 9.7 per day for my CPU. Given I can discover 82 new factors per day with my GPU, (see my last post), that means my GPU is currently 8.5x more effective than my CPU. So my 82.5 hours of GPU Sieving has saved 29 days of CPU time.

Of course Sieving effectiveness will decrease as the range increases and FPS WU will also take longer as the number prime tested gets higher. Not sure what the values would be at the limit. Anyone know how I can test a 1e6!+1 WU? Would be interesting to know if GPU Sieving becomes more or less effective compared to CPU work. Factorials go up exceedingly fast.

Here we go again, put TDP in the equation, it gives a better number for effectiveness in W/unit.

1100T TDP max 125 W
7970 TDP max 230 W

Intel is more efficient than AMD (in W/u)
2) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 59827)
Posted 2645 days ago by Profile BiBiProject donor
Thought to give it a try on my 4650:

fsievecl v1.0.7, a GPU program to find factors numbers of the form n!+/-1 Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 1. 2 AMD-APP (937.2) Device 0 is an Advanced Micro Devices, Inc. ATI RV730 OpenCL Error: Program build failure in call to clBuildProgram "C:\Users\xxx\AppData\Local\Temp\OCLxxx.tmp.cl", line 88: error: function "atomic_inc" declared implicitly int old = atomic_inc(counter); ^ 1 error detected in the compilation of "C:\Users\xx\AppData\Local\Temp\OCLxxx.tmp.cl". Internal error: clc compiler invocation failed.
3) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 59775)
Posted 2648 days ago by Profile BiBiProject donor
If you d/l the .pfgw file and use that with the -i option to fsievecl, then it will not output factors for numbers that have already been factored.


I get this error when I use the file in the sieve post:
Fatal Error: Line 27 +1
is malformed
4) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 59707)
Posted 2650 days ago by Profile BiBiProject donor
I also have the problem of this gobbling up an entire CPU core, no doubt due to the buggy driver behavior that BiBi posted a link to earlier.


The CPU waits for the GPU to finish before it can do anything. Apparently the Linux kernel is counting that wait time against the CPU. I can't do anything about that.


What I can do is start a sieve together with some LLR jobs for each core. If I am curious how much CPU the fsieve job would take.

EDIT: fsieve takes 100%, its IS busy ...
5) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 59674)
Posted 2651 days ago by Profile BiBiProject donor
How many factors did we remove from the factor file in the first 50G with fsievecl?
6) Message boards : Project Staging Area : Call for wwwwcl beta testers (OpenCL) (Message 59434)
Posted 2658 days ago by Profile BiBiProject donor
You can run with more threads, but the GPU will continue to wait for work to do. The best balance for wwwwcl is to actually run two instances with fewer threads and more blocks.


With one instance and two thread i get 16.86M p/sec
With two instances both two thread i get 2x 15.10M p/sec

Thanks!
7) Message boards : Project Staging Area : Call for wwwwcl beta testers (OpenCL) (Message 59430)
Posted 2658 days ago by Profile BiBiProject donor
EDIT:
using -t2 in 2.1.8 improves the speed by factor ~2
EDIT2:
found the problem to be in the new compile, rebuilding 2.1.8 makes it slow (yeah)
EDIT3 option -g for debugging makes it slow now switch back to 2.2.2
EDIT4 new 2.2.2 without debugging
~/wwwwcl_2.2.2$ ./wwwwcl -p14528620000000000 -P14528630000000000 -TWieferich -t2 -b8000
wwwwcl v2.2.2, a GPU program to search for Wieferich and WallSunSun primes
Sieve started: (cmdline) 14528620000000000 <= p < 14528630000000000

Sieve complete: 14528620000000001 <= p < 14528630000000000 268704542 primes tested
Clock time: 14.05 seconds at at 19119407 p/sec.
Processor time: 26.31 sec. (0.38 init + 25.93 sieve).
Seconds spent in CPU and GPU: 0.72 (cpu), 10.86 (gpu)
Percent of time spent in CPU vs. GPU: 6.20 (cpu), 93.80 (gpu)
CPU/GPU utilization: 1.87 (cores), 0.77 (devices)

But processor usage is at 190%

EDIT5 removed comparison between 2.1.8 and 2.2.2 because
old 2.1.8 also uses CPU
8) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 59429)
Posted 2658 days ago by Profile BiBiProject donor
I can only guess that something about the build itself is causing the high CPU usage.

I guess you mean the build that the driver does on the cl code?

There are some differences between wwwwcl and fsievecl as I made a number of enhancements to the framework between those programs. I have updated wwwwcl to use the same framework as fsievecl, so after I release it, it would be interesting to see if it starts to exhibit the same behavior.

I will be happy to test it, i am very curious

The main difference is between the two kernels. The fsievecl kernel requires more parameters and thus more memory than the wwwwcl kernel, but I can't imagine that would impact performance by much.

It could if the internal registers are full but they aren't as the profiler showed.

fsievecl is far more GPU intensive that wwwwcl, so fsievecl should use far less CPU, even if only one thread is used.

I could not get the GPU usage above the 47% that the profiler showed playing with the -b and -s options. Adding an extra thread only caused the CPU to be waiting for the GPU.
9) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 59385)
Posted 2659 days ago by Profile BiBiProject donor
I checked wwwwcl and that application does not take >100% but only 20% which seems normal when the CPU is only suporting the GPU calculations.


10) Message boards : Sieving : Factorial/Primorial Sieve Discussion (Message 59379)
Posted 2659 days ago by Profile BiBiProject donor
@rogue I was using -t1 -b2000 -s1000 in the run with the profiler. What the profiler shows is that the cpu is only used for 20microsecond while executing the kernal that takes 116.3 miliseconds. Strange because top shows that the application uses the equivalent of a full core.

Could it be that I am experiencing the Nvidia OpenCL bug described here


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 1.70, 1.78, 1.77
Generated 23 Feb 2020 | 10:53:24 UTC