Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Sieving :
OpenCL GFN Sieve now available, sieving GFN16 to b=2G, reopening GFN15 to public sieving
Author |
Message |
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,158,850 RAC: 327,515
                     
|
Thanks to the efforts of stream, we now have Windows and linux OpenCL versions of the GFN sieving software. They also run well on nVidia hardware and on my system are faster than the CUDA version we've been using. There are 32 and 64-bit versions for Windows. 64-bit is faster than 32. The linux version is 64-bit only.
At the same time there's a new version of axn's GFN CUDA sieve. All programs will now return factors from b=0-2G instead of b=0-100M (that's 20 times as many factors being returned). The old program was already computing factors in that range, but there were not being output because at the time the Genefer program couldn't test that high. The new sieving program is now REQUIRED for GFN15 and GFN16 sieving. Stream has been sieving GFN15 for b=2G since February. I'm reopening public sieving on GFN15 now and we're starting over on GFN16 from p=20P. Sieving below p=20P needed special software and has been completed. I've reserved 20P-1000P for myself, as those will be the largest factor files. Those files are up to 50x larger than sieving starting at 1000P. The old GFN16, now labelled "GFN16 b=1-100M" is waiting on a single reservation to finish, then it will be hidden. Anyone who sieved there will still be able to see old reservations. The new GFN16 sieving is a separate sieving subproject. While we don't need those extra candidates above b=100M for several years, it's time to start sieving for them now. Does this mean we're redoing the previous GFN16 sieving from b=1-100M? Yes, but it's unavoidable.
Actual file sizes for GFN sieving for b=1-2G p=20P-100P:
416,089,582 f16_20P_100P.7z
1,656,151,469 f16_20P_100P.txt
I suggest that everyone use the new programs for all GFN sieving. Anyone using the old program to sieve GFN15 or GFN16 will not get credit. There's a huge red warning on the "how to sieve" page you see when reserving a range. Please consider uploading factor files in .zip, .7z or .rar format to save yourself bandwidth. The server will currently not allow an upload greater than 32MB.
For GFNs 17 and above, genefer testing doesn't happen quickly enough that we foresee needing these additional candidates. Nonetheless new sieving should use the newer programs because they're doing the exact same work, just not inhibiting output of factors where b>100M. If you use the same program for every GFN n, you'll never have to worry about picking the right one. Please note that credit rates will not be changing. The work being done has not changed in any way. There's just a larger output file to upload.
Testing has shown that to sieve GFN15 effectively with the OpenCL program (use 100% of the GPU), you need to run 3 separate jobs at the same time. For GFN16, 2 or 3 jobs are needed. And I'm talking about full non-HT cores. Your mileage may vary.
And to be clear about the CUDA version (which I'm not sure anyone will be running any more as the OpenCL seems faster), GFNSvCUDA-0_7-win32.exe is the old program and GFNSvCUDA-0_7_1-win32.exe is the new one. They run at the same speed as each other, so please use the new one (which MUST be used for GFN15 and GFN16).
Edit: Please don't send me offers to take over the overdue reservation on GFN16 b=1-100M as I've already run it myself.
Edit 2: The reason the old and new GFN16 show different credit rates is that the program is faster at lower P values, hence the credit is lower. The older GFN16 had that same credit rate when it was at that P level. The rate shown automatically adjusts based on how high we've sieved. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 911 ID: 370496 Credit: 547,778,147 RAC: 423,598
                         
|
Does it work on Intel iGPUs? If so, are there any weird requirements (like AP27's 1.4gb of vRAM)?
Also, has the app been fully tested on various different hardware to make sure there's no random bug going on, or was it just a handful of GPUs?
EDIT: Here's what I've seen on my Gtx 970 (all for n=22).
1- The new CUDA APP runs a bit slower than the previous one. Around 5~6 slower.
2- The new OCL apps both run faster than the CUDA one.
3- Both OCL apps require a full CPU core, where as both CUDA apps require almost no CPU time (for n=22 anyways). So while the new apps are faster, there's still a compelling reason to use the old ones.
4- The OCL 64 app was actually 1P/day slower. Just a bit, but it was consistently slower.
5- B9 is still the best choice. Now, you could use B12 or 13 and get that extra 3P/day (out of 100), but the screen lag is just unbearable if you'll be using your GPU as a daily driver, so I would say it's only worth it for dedicated crunchers / when you leave your PC on for the night while you sleep. | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,158,850 RAC: 327,515
                     
|
There's an extra flag I forgot to talk about: -w1
Use it on GFN21 and GFN22, it will cause the program to use a lot less CPU on nVidia GPUs. It will not help on GFN 15, 16 or 17.
As far as Intel iGPU, I neither know nor care. I doubt it. | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 911 ID: 370496 Credit: 547,778,147 RAC: 423,598
                         
|
There's an extra flag I forgot to talk about: -w1
Use it on GFN21 and GFN22, it will cause the program to use a lot less CPU on nVidia GPUs. It will not help on GFN 15, 16 or 17.
Well, it does reduce CPU usage, at a 2P/day cost, though. Worth it to keep the core "free" for things such as LLR testing, but I figure I'd mention the little loss anyways. Too bad, I was getting to the 100p/day mark, now I'm stuck at 98. Mergh....
As far as Intel iGPU, I neither know nor care. I doubt it.
Okay, let's change the question then. Is the app vRAM bandwidth intensive? | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 911 ID: 370496 Credit: 547,778,147 RAC: 423,598
                         
|
News as to the Intel iGPU stats: it works.... kinda. The app runs and seems to be returning factors as expected. But it throws this message after the 79bit core part and right before the calculations:
Build Log (119 bytes):
1:70:2: warning: No built-in carry flag support known, using plain C code
fcl build 1 succeeded.
bcl build succeeded.
Jim, could you quickly explain it for a noob like me?
...
Hm?
What is it?
Oh yeah, that's right, performance... I almost forgot about it. It was mighty 1.5P/day for my HD 530 (skylake) with 3000mhz RAM. Earth shattering numbers, I know. Not too good at first glance, but it's already something nonetheless. I'm yet to see if this has an effect on LLR testing due to bandwidth limitations; that's something I'll have to do later, for I don't have the time right now. But as far as CPU usage goes, it was minimal, so this could be promising.
PS: Guyz, don't give up on intel just because it's not as powerful as a dGPU. Please. That's unused potential just waiting to be awakened. | |
|
|
PS: Guyz, don't give up on intel just because it's not as powerful as a dGPU. Please. That's unused potential just waiting to be awakened.
I agree with you fully. Hopefully we can use our Intel iGPU's for AP27. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1957 ID: 352 Credit: 6,125,985,225 RAC: 2,246,678
                                      
|
Testing has shown that to sieve GFN15 effectively with the OpenCL program (use 100% of the GPU), you need to run 3 separate jobs at the same time. For GFN16, 2 or 3 jobs are needed. And I'm talking about full non-HT cores. Your mileage may vary.
GTX 1070/Skylake.
Each instance is doing about ~21P/day, OCLW64 is the way to go.
After 4th job, GPU started to be saturated and still doing 20P/day each.
Will test GFN22 later.
Good effort, thanks to those involved.
____________
My stats | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1957 ID: 352 Credit: 6,125,985,225 RAC: 2,246,678
                                      
|
Also getting warning on Fury Nano.
(Doing around 19.5P/day on my i5-3570 single instance).
c:\temp\GFN_Sieve>gfnsvocl_w64_2G.exe 15 14022 14032 B12
GFNSvCUDA+ v0.7.1 (c) 2015 Anand Nair (anand.s.nair AT gmail)
OpenCL port by Roman Trunov (stream AT proxyma ru)
GFN Sieve for k^32768+1 [k == 2 to 2000000000]
Using factor file 'f15_14022P_14032P.txt'
Using checkpoint file 'c15_14022P_14032P.txt'
Found OCL platform "AMD Accelerated Parallel Processing" by "Advanced Micro Devices, Inc."
GPU devices on platform: 1
D0: "Fiji"
Using 64 bit core.
Build Log (193 bytes):
C:\Users\ADMINI~1\AppData\Local\Temp\\OCL6036T5.cl:63:2: warning: No built-in carry flag support known, using plain C code
#warning No built-in carry flag support known, using plain C code
^
14022000960238256129 | 60237696^32768+1
14022001375917506561 | 859700300^32768+1
____________
My stats | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3232 ID: 50683 Credit: 151,439,306 RAC: 133,570
                         
|
I run it on R280 and got about 37.P on my GFN 16 reservation, but if I start second instance of program I got 1/2 speed so there is no benefit from second instance. GPU load on 1 instance is 99%
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,158,850 RAC: 327,515
                     
|
I did not write this software, so I can't answer questions about it. Stream is away for the rest of July and does not have internet access to answer questions until he gets back. Technical questions about this program will have to wait until after that.
On GFN16, I'm getting about 85P/day with my first-generation Titan running two instances and 70P/day on my GTX 570 also running two instances. I think I could squeeze out more with a third instance, but the cores are already committed elsewhere. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
The warning about "plain C code" was left for debugging purposes and must be ignored. It only mean that pure C version of the code had been used and everything depends on quality of the compiler (video driver).
Number of CPU cores required depends on speed of your video card. Numbers 3 and 2 were based on 750ti output (55 p/day)
| |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
I'm writing from phone (hard to type) and only when I'll have WiFi nearby (not every day) so please hold big questions until my return. | |
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1957 ID: 352 Credit: 6,125,985,225 RAC: 2,246,678
                                      
|
Thanks for feedback, Stream, and enjoy wherever you are.
____________
My stats | |
|
|
It does work on Intel GPU. On my HD Graphics 6000 it was going at about 8P/day, which is fine for me.
Also, the Linux version interestingly doesn't work on Mac, but the Windows version does work using Wine.
____________
1 PPSE (+2 DC) & 5 SGS primes | |
|
|
I am able to report that I was able to sieve successfully on my LINUX MINT 18 64-bit system.
“sudo ./gfnsvocl_linux_x86_64 16 1131 1132
GFNSvCUDA+ v0.7.1 (c) 2015 Anand Nair (anand.s.nair AT gmail)
OpenCL port by Roman Trunov (stream AT proxyma ru)
GFN Sieve for k^65536+1 [k == 2 to 2000000000]
Using factor file 'f16_1131P_1132P.txt'
Using checkpoint file 'c16_1131P_1132P.txt'
Found OCL platform "NVIDIA CUDA" by "NVIDIA Corporation"
GPU devices on platform: 1
D0: "GeForce 9800 GT"
size_t on device 0: 32 bits, host: 64 bits
Using 63 bit core.
1131000120986173441 | 570060946^65536+1
…”
However it seems that my particular system is too weak to rationally sieve manually as it hang-up several times and I had to restart again and again. I also was not able to sieve my last reservation: “22 153940 153941”. I got an error message right at the start, but I think it is not a problem of the program, it seems to be my hardware.
| |
|
|
Question: Has anybody manually sieved with the new OpenCL GFN sieve program on recent AMD cards? RX480/479/460 or R 390/380? Linux or Windows? Experiences? Thanks! | |
|
|
@stream-any chance you could compile mac versions for the genefer sieving? | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
@stream-any chance you could compile mac versions for the genefer sieving?
Nope. Not a single mac in sight. But it's open source - https://github.com/stream1972/gfnsieve_ocl
Published with permission of original author. Both old-style cuda and new opencl versions can be built from single source using corresponding makefiles/targets. Patches and makefiles for new platforms are always welcome.
| |
|
|
@stream-any chance you could compile mac versions for the genefer sieving?
Nope. Not a single mac in sight. But it's open source - https://github.com/stream1972/gfnsieve_ocl
Published with permission of original author. Both old-style cuda and new opencl versions can be built from single source using corresponding makefiles/targets. Patches and makefiles for new platforms are always welcome.
Mac build is available at http://www.epcc.ed.ac.uk/~ibethune/files/gfnsvocl_macos_x86_64.tar.gz. Please test it first as I've done nothing except to check that it *runs*. If there is a reference range available that would be best.
I've created a pull request into stream's repo now.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14009 ID: 53948 Credit: 428,222,075 RAC: 1,089,553
                               
|
Mac build is available at http://www.epcc.ed.ac.uk/~ibethune/files/gfnsvocl_macos_x86_64.tar.gz. Please test it first as I've done nothing except to check that it *runs*. If there is a reference range available that would be best.
I've created a pull request into stream's repo now.
- Iain
For testing, pick any small GFN 15 or GFN 16 range that's already been done and Jim can compare the results to the original run.
____________
My lucky number is 75898524288+1 | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
Please test it first as I've done nothing except to check that it *runs*. If there is a reference range available that would be best.
There must be three reference ranges for each core, I've used 2001P-2002P, 9301P-9302P, 18600P-18601P. You can run them and send me a result, or I'll give you a link to these reference files a bit later.
| |
|
|
Please test it first as I've done nothing except to check that it *runs*. If there is a reference range available that would be best.
There must be three reference ranges for each core, I've used 2001P-2002P, 9301P-9302P, 18600P-18601P. You can run them and send me a result, or I'll give you a link to these reference files a bit later.
Results from those three ranges are at http://www.epcc.ed.ac.uk/~ibethune/files/gfnsvocl_tests.tar.gz. As expected, they ran with the 63,64 and 79 bit cores, respectively. The only possible issue I observed (assuming the results are correct), is that for the latter two, I got a warning:
Build Log (159 bytes):
<program source>:72:2: warning: No built-in carry flag support known, using plain C code
#warning No built-in carry flag support known, using plain C code
^
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
Please test it first as I've done nothing except to check that it *runs*. If there is a reference range available that would be best.
There must be three reference ranges for each core, I've used 2001P-2002P, 9301P-9302P, 18600P-18601P. You can run them and send me a result, or I'll give you a link to these reference files a bit later.
Running those three ranges for n=15 now.
Same cores as Iain, same warnings. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
Please test it first as I've done nothing except to check that it *runs*. If there is a reference range available that would be best.
There must be three reference ranges for each core, I've used 2001P-2002P, 9301P-9302P, 18600P-18601P. You can run them and send me a result, or I'll give you a link to these reference files a bit later.
Results from those three ranges are at http://www.epcc.ed.ac.uk/~ibethune/files/gfnsvocl_tests.tar.gz.
Oops. Forgot to mention that my reference files are from GFN-21, it's using almost no CPU time and completes faster. Never mind, by luck I've sieved two of three ranges on GFN-15 too and still have these factors (except for 2001-2002). I'll compare them later in home.
The only possible issue I observed (assuming the results are correct), is that for the latter two, I got a warning:
Build Log (159 bytes):
<program source>:72:2: warning: No built-in carry flag support known, using plain C code
#warning No built-in carry flag support known, using plain C code
^
Which GPU you have? This message is normal for ATI/Intel/whatever cards where core falls back to slower plain C source code. But you should not see this on NVIDIA which should pick up separate faster version of the core with inline assembly.
| |
|
|
Which GPU you have? This message is normal for ATI/Intel/whatever cards where core falls back to slower plain C source code. But you should not see this on NVIDIA which should pick up separate faster version of the core with inline assembly.
Found OCL platform "Apple" by "Apple"
GPU devices on platform: 2
D0: "AMD Radeon HD - FirePro D700 Compute Engine"
size_t on device 0: 32 bits, host: 64 bits
D1: "AMD Radeon HD - FirePro D700 Compute Engine"
size_t on device 1: 32 bits, host: 64 bits
So I think that's expected, then.
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! | |
|
|
I got the same results as Iain, except for the 2001P-2002P, where diff complained.
From Iain's file (around line 9360 or so)
2001799268747968513 | 371057386^32768+1
2001799267012509697 | 1507867900^32768+1
2001799263202902017 | 1125660446^32768+1
From my file
2001799268747968513 | 371057386^32768+1
2001799263202902017 | 1125660446^32768+1
2001799267012509697 | 1507867900^32768+1
They appear to be the same, just ordered differently. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
They appear to be the same, just ordered differently.
Yes, it's recommended to sort files before diffing. Output order from GPU is unpredictable.
| |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
Please test it first as I've done nothing except to check that it *runs*. If there is a reference range available that would be best.
There must be three reference ranges for each core, I've used 2001P-2002P, 9301P-9302P, 18600P-18601P. You can run them and send me a result, or I'll give you a link to these reference files a bit later.
Results from those three ranges are at http://www.epcc.ed.ac.uk/~ibethune/files/gfnsvocl_tests.tar.gz.
Finally, I compared them with my files and they all matched.
| |
|
|
The only possible issue I observed (assuming the results are correct), is that for the latter two, I got a warning:
Build Log (159 bytes):
<program source>:72:2: warning: No built-in carry flag support known, using plain C code
#warning No built-in carry flag support known, using plain C code
^
Which GPU you have? This message is normal for ATI/Intel/whatever cards where core falls back to slower plain C source code. But you should not see this on NVIDIA which should pick up separate faster version of the core with inline assembly.
[/quote]
Nvidia. GTX 980. Using most recent Nvidia web drivers.
| |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
The only possible issue I observed (assuming the results are correct), is that for the latter two, I got a warning:
Build Log (159 bytes):
<program source>:72:2: warning: No built-in carry flag support known, using plain C code
#warning No built-in carry flag support known, using plain C code
^
Which GPU you have? This message is normal for ATI/Intel/whatever cards where core falls back to slower plain C source code. But you should not see this on NVIDIA which should pick up separate faster version of the core with inline assembly.
Nvidia. GTX 980. Using most recent Nvidia web drivers.
It seem to be some broken quoting, so let me ask again. So, you have NVIDIA GPU and still see this warning?
If so, I afraid you have to find out himself how to deal with it because I have no macs anywhere in sight. The decision is made during compilation phase of OpenCL core by checking
#ifdef __NV_CL_C_VERSION
// we have NVIDIA, use inline assembly
#else
// use plain C code
#endif
__NV_CL_C_VERSION is defined on all PC NVIDIA drivers. You have to find out what the hell is going on on MAC and which predefined symbol can be used to distinguish between NVIDIA and other drivers.
| |
|
|
I'm trying to understand why the OpenCL program uses 100% of my CPU. In fact, it seems to take four instances of the OpenCL program to fully use my NVIDIA GPU, each one using 100% of its own CPU core.
I'm currently running on GFN15, but even GFN22 uses 60% of a CPU core per process. If one process runs at 10P per day, two or three run at 10P per day each, and four run at about 9P per day, on one machine.
Machines:
Windows 10, 64 bit, Skylake CPU, GTX 970, using the 64 bit OpenCL program
Linux, 64 bit, GTX 470, using the OpenCL program. Same behavior when compiling from your Git repo, stream.
Linux, 64 bit, Tesla M2050 accelerator, compiled from Git repo (the binary distribution was built with a libc that is too new for this system)
All three require multiple processes to fully use a single GPU, and the processes peg a CPU core at 100% each.
I notice that you're using `usleep` in at least one place, and I don't see any infinite loops anywhere - so why is this program using 100% CPU? If this isn't happening on your systems is there anything I can do to help you find the problem with mine?
____________
| |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
I'm trying to understand why the OpenCL program uses 100% of my CPU. In fact, it seems to take four instances of the OpenCL program to fully use my NVIDIA GPU, each one using 100% of its own CPU core.
There are two points. First, GFN-15/16 indeed requires lot of CPU power. If you could build from the git, look for xxxx_sim program which is also built. This is a benchmark/simulation tool which does everything except GPU part. Try it at different GFN's to measure performance of your CPU. Compare your numbers with my benchmarks from README file. I've used i7 Haswell overclocked to 4000 Mhz, and your Skylake should be at least close even if not overclocked. CPU part does lot of complex and slow calculations on 96- and 192-bit numbers which are beyond my understanding, so I didn't tried to optimize them except for some compiler optimization options tweaking.
Second is a problem in NVIDIA drivers. Each GFN is two times faster then previous, so GFN-22 must not use noticeable CPU time. Make sure that you've enabled NVIDIA workaround with W1 option. NVIDIA driver like to use 100% CPU, this is a problem for many applications. When an asynchronous OpenCL functions requires synchronization, driver instead of working asynchronously (or sleeping) will enter a infinite loop waiting for the function to complete, consuming 100% of CPU. Sieving program have a very simple workaround for this, it's checking kernel completion status ourselves in an infinite loop with usleep inside. This is enabled by W1 option, don't forget to use it at least on NVIDIA. Note that running two or more copies on NVIDIA even with W1 may, alas, still trigger infinite loop inside driver. But under normal circumstances, running one copy of GFN-22 with W1 option, you should get 100% GPU load and 2-3% CPU. This problem does not exist on ATI/AMD.
| |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3232 ID: 50683 Credit: 151,439,306 RAC: 133,570
                         
|
And if you use W1 option you will lost 1 to 3 P per day. Sieving GF22 on my GTX 750 Ti is optimal, I left one core for whole OS and sieving and can use rest of cores for other things.
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Trying GFN sieving for the first time. I am not having much luck. Does it work on AMD?
Running an AMD GIGABYTE R9 280X GPU in Windows 7, Catalyst 14.12
>gfnsvocl_w64_2G.exe 16 49234 49239
GFNSvCUDA+ v0.7.1 (c) 2015 Anand Nair (anand.s.nair AT gmail)
OpenCL port by Roman Trunov (stream AT proxyma ru)
GFN Sieve for k^65536+1 [k == 2 to 2000000000]
Using factor file 'f16_49234P_49239P.txt'
Using checkpoint file 'c16_49234P_49239P.txt'
Found OCL platform "AMD Accelerated Parallel Processing" by "Advanced Micro Devi
ces, Inc."
GPU devices on platform: 1
D0: "Tahiti"
size_t on device 0: 32 bits, host: 64 bits
Using 79 bit core.
Error building cl program on device 0
Error code -11, message: Program build failure
Build Log (922 bytes):
"C:\Users\AppData\Local\Temp\OCL3020T5.cl", line 70: warning:
#warning directive: No built-in carry flag support known, using
plain C code
#warning No built-in carry flag support known, using plain C code
^
"C:\Users\AppData\Local\Temp\OCL3020T5.cl", line 90: error:
identifier "init_fac_shift" is undefined
u32 si = index >> init_fac_shift;
^
"C:\Users\AppData\Local\Temp\OCL3020T5.cl", line 113: error:
identifier "param_count" is undefined
count = param_count;
^
"C:\Users\AppData\Local\Temp\OCL3020T5.cl", line 132: error:
identifier "bmax" is undefined
if( ((i1|i2) == 0) && (i0 < bmax) ) {
^
3 errors detected in the compilation of "C:\Users\AppData\Local\Temp\OC
L3020T5.cl".
Frontend phase failed compilation.
GPU Error -11 @ 2582: Program build failure | |
|
RafaelVolunteer tester
 Send message
Joined: 22 Oct 14 Posts: 911 ID: 370496 Credit: 547,778,147 RAC: 423,598
                         
|
Trying GFN sieving for the first time. I am not having much luck. Does it work on AMD?
Supposedly, it works. Try resintalling the driver, I recently updated my Windows to the latest version and it broke my OpenCL support until I reinstalled it. | |
|
streamVolunteer moderator Project administrator Volunteer developer Volunteer tester Send message
Joined: 1 Mar 14 Posts: 1032 ID: 301928 Credit: 543,593,602 RAC: 8,480
                         
|
Trying GFN sieving for the first time. I am not having much luck. Does it work on AMD?
It look a bug in specific version of video driver. It ignored all macro definitions specified in "command line options" during compilation of source code. These definitions MUST be supported according to OpenCL standard, so this is definitely a driver bug.
It's not much can be done here. Try to reinstall the driver or use different version (even if it's older).
| |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Trying GFN sieving for the first time. I am not having much luck. Does it work on AMD?
It look a bug in specific version of video driver. It ignored all macro definitions specified in "command line options" during compilation of source code. These definitions MUST be supported according to OpenCL standard, so this is definitely a driver bug.
It's not much can be done here. Try to reinstall the driver or use different version (even if it's older).
It is an older driver. I don't like upgrading all the time because it can break apps that previously worked, and I am unfortunately too busy for playing games anymore (work/wife/kids OMG). I'll upgrade to the newest Catalyst driver and report back. Not a Friday night activity.
Cheers,
Roger | |
|
|
I just tested on a 280x running 17.8.2 drivers, and it works fine. | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Installed the latest Catalyst 17.9.1 graphics driver and it seems to have done the trick.
>gfnsvocl_w64_2G.exe 16 49234 49239
GFNSvCUDA+ v0.7.1 (c) 2015 Anand Nair (anand.s.nair AT gmail)
OpenCL port by Roman Trunov (stream AT proxyma ru)
GFN Sieve for k^65536+1 [k == 2 to 2000000000]
Using factor file 'f16_49234P_49239P.txt'
Using checkpoint file 'c16_49234P_49239P.txt'
Found OCL platform "AMD Accelerated Parallel Processing" by "Advanced Micro Devi
ces, Inc."
GPU devices on platform: 1
D0: "Tahiti"
size_t on device 0: 32 bits, host: 64 bits
Using 79 bit core.
Build Log (244 bytes):
"C:\Users\AppData\Local\Temp\OCL2832T1.cl", line 70: warning:
#warning directive: No built-in carry flag support known, using
plain C code
#warning No built-in carry flag support known, using plain C code
^
49234000255544983553 4773.9/s (1.2P/day) Found 0 ETA 97h53m | |
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I tried different block sizes:
B7 one instance - 1.2P/day no lag
B8 one instance - 2.1P/day small lag
B9 one instance - 3.2P/day small lag
B10 one instance - 3.3P/day some lag
B11 one instance - 3.3P/day very laggy
B9 seems like a good compromise.
Using 99% GPU so having a second instance running isn't likely to give benefit.
Light 3% use of a CPU core, which is nice. | |
|
Message boards :
Sieving :
OpenCL GFN Sieve now available, sieving GFN16 to b=2G, reopening GFN15 to public sieving |