Author |
Message |
|
After seeing this thread here - http://www.primegrid.com/forum_thread.php?id=8214 it made me wonder. It reports slow AP performances with the new 400.xx series Nvidia drivers for AP tasks.
And now today, after installing a 1070 along with my 1060, my 1070 began running AP tasks slower than my 1060 used to do a few months back. This was with the latest Nvidia driver 416.34 released 10/11/2018.
So, as per that thread above, I reverted back to 397.93.
AP running as it should now, fast. Not sure why, but those newer drivers are slowing down AP crunching/searching. |
|
|
|
About twice as long with the 400.xx series drivers vs 397.93 :( about 51 min vs about 25 min on the 1070. When the 1060 was getting about 37-38 min b4 the 400.xx drivers. |
|
|
|
This is very interesting.
My two GTX 1060's are running Collatz and I have an obvious difference in run times per unit (705 seconds vs 770 seconds) but in the reverse order where the card with the 416.16 driver has the faster time against the card with the 397.64 driver.
I was scratching my head as to why but deep down I did have a suspicion the driver may be the reason. Your message is further evidence that the driver is most probably the culprit. I will update the card with the older driver to the latest driver and I will report back later today.
Thanks for the heads-up. |
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 660 ID: 164101 Credit: 305,042,960 RAC: 102

|
With geneferocl (GFN20) I noticed that Run time (GPU time) is about the same but CPU time increased:
3xx drivers ~ 40 s
4xx drivers ~ 150 s
|
|
|
|
My two GTX 1060's are running Collatz and I have an obvious difference in run times per unit (705 seconds vs 770 seconds) but in the reverse order where the card with the 416.16 driver has the faster time against the card with the 397.64 driver.
As promised, I updated the driver for the card with the higher run time to the latest driver (416.36) but there is no difference in the run time as it remains slower than the other card with similar elapsed time as before with the 397.64 driver. |
|
|
|
My two GTX 1060's are running Collatz and I have an obvious difference in run times per unit (705 seconds vs 770 seconds) but in the reverse order where the card with the 416.16 driver has the faster time against the card with the 397.64 driver.
As promised, I updated the driver for the card with the higher run time to the latest driver (416.36) but there is no difference in the run time as it remains slower than the other card with similar elapsed time as before with the 397.64 driver.
I take it all back. After exiting the remote connect software (TeamViewer) and letting the system run totally unattended, I do have a reduction in the run time thanks to the latest driver.
As far as Collatz is concerned, the latest driver yields noticeably faster run times than the earlier 397.64 driver. The CPU contribution remains trivial at about 0.70 seconds. |
|
|
|
My two GTX 1060's are running Collatz and I have an obvious difference in run times per unit (705 seconds vs 770 seconds) but in the reverse order where the card with the 416.16 driver has the faster time against the card with the 397.64 driver.
As promised, I updated the driver for the card with the higher run time to the latest driver (416.36) but there is no difference in the run time as it remains slower than the other card with similar elapsed time as before with the 397.64 driver.
I take it all back. After exiting the remote connect software (TeamViewer) and letting the system run totally unattended, I do have a reduction in the run time thanks to the latest driver.
As far as Collatz is concerned, the latest driver yields noticeably faster run times than the earlier 397.64 driver. The CPU contribution remains trivial at about 0.70 seconds.
Good interesting results. I believe in the post I learned of this from, they said GFN was basically the same, while AP was being affected greatly. There was another post about a compute option somewhere and its effects on or off. It has something to do with how it is doing certain calculation operations. Exactly what they changed i do not know. Seems optimized better for somethings but worse for others. |
|
|
|
So to clarify, what you've found is that the 400 series drivers increase run time?
I'm running 398.82 on 1070ti. Does anyone know if the performance of these drivers are equivalent to the 397.93 drivers you mentioned or would it be worth while rolling these back?
|
|
|
|
So to clarify, what you've found is that the 400 series drivers increase run time?
I'm running 398.82 on 1070ti. Does anyone know if the performance of these drivers are equivalent to the 397.93 drivers you mentioned or would it be worth while rolling these back?
I'm gonna say those are fine, it's only after running the 400.xx series that I saw things slow. If you are getting times around 30 min or less that's what it should be if not faster. With the 400.xx I was getting times around 50 min. |
|
|
|
Ok great.
I'll take a look at speeds when I'm back and post just fyi |
|
|
|
Ok great.
I'll take a look at speeds when I'm back and post just fyi
My 1070 is under 30 min. When I used the 400.xx drivers it was around 50-55min. For your 1070ti it should be under 30 also. |
|
|
|
I just saw your speeds, under 1300 sec. That's how it should be. Your drivers are not slowing down your AP search. You are good to go. |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1900 ID: 352 Credit: 3,330,840,823 RAC: 1,494,305
                                 
|
There is 416.81 driver out.
I've updated in on RTX 2080 and it is working now.
(update scenario from 416.34 and got both AP27 and GFN error out while starting task so I had to uninstall old one, did DDU in safe mode and then installed 416.81).
For older cards like GTX 1070, I'm staying with latest from 3-series (399.24).
Running 416.34 was slowing down AP27 and can't test the latest one now (distant location that I don't want to brake).
Anyone willing to try 416.81 on older cards before AP27 Challenge to share experience?
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1900 ID: 352 Credit: 3,330,840,823 RAC: 1,494,305
                                 
|
I was told that - at least on 1080Ti - latest 416.81 still slows down AP27.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
Azmodes Volunteer tester
 Send message
Joined: 30 Dec 16 Posts: 184 ID: 479275 Credit: 1,752,411,013 RAC: 523,716
                       
|
I have also seen significant reductions in speed on my 1060 6GB and 3GB (in the ballpark of 10 minutes longer), but I don't dare reverting the driver (416.34) since I had a pretty hard time getting those two to run properly with an RTX on the same board. At least the RTX more than makes up for the speed loss (947 seconds per task), but it's still lame.
____________
Long live the sievers.
+ Encyclopaedia Metallum: The Metal Archives + |
|
|
|
Hello!
I've been running some Ap27 wu's this evening on two of my humble machines, both with
the latest driver(416.81).
Both of them running clean install and restarted mine machines. If the wu's are not too different, it seems that my 1070Ti runs faster; run time about 100 sec and Cpu time 20 sec faster!
I installed the driver yesterday evening around 20.00 UTC and have completed 8 wu's so far.
My other pc, with an "old" 680 have only completed one (one still running), run time has increased a lot but Cpu time gone down.
At least faster than previous driver (416.34)!
Stats not good but just what I got so far!
With regards,
Hans Sveen
____________
MyStats
My Badges |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1900 ID: 352 Credit: 3,330,840,823 RAC: 1,494,305
                                 
|
Hans, I see your run times on 1070 Ti about 2150 - 2300 secs.
This is quite slow, most probably due to latest drivers.
I'm doing 1070 (no Ti) using older 399.xx driver around 1550 - 1600 secs.
It might be worth trying with 399.xx driver...
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
Hans, I see your run times on 1070 Ti about 2150 - 2300 secs.
This is quite slow, most probably due to latest drivers.
I'm doing 1070 (no Ti) using older 399.xx driver around 1550 - 1600 secs.
It might be worth trying with 399.xx driver...
Agreeing with Honza here. Those times are slow for a 1070 Ti. My regular 1070 bests that by a good margin. Try the other 399.xx drivers and see if your AP times improve. |
|
|
|
Hello!
Thank You all for the advice and help!
For the record my 1070 Ti now uses about 1300 seconds to run an Ap26-Ap27 , with driver 416.34 it used about 2100-2300 seconds, nice achivement!
My old nvidia 680 used between 7700 and 8100 seconds with driver 416.34, now down to about 5000 seconds running driver 399.05, very nice!
Ready and eager to start the november challenge indeed😉
With regards,
Hans Sveen
____________
MyStats
My Badges |
|
|
|
Hello!
Thank You all for the advice and help!
For the record my 1070 Ti now uses about 1300 seconds to run an Ap26-Ap27 , with driver 416.34 it used about 2100-2300 seconds, nice achivement!
My old nvidia 680 used between 7700 and 8100 seconds with driver 416.34, now down to about 5000 seconds running driver 399.05, very nice!
Ready and eager to start the november challenge indeed😉
With regards,
Hans Sveen
I am very glad we could help.
Whatever is different in those driver releases is the cause. I saw a thread mentioning something about some "compute" type setting in the nvidia control panel. I did not look around for it. Maybe it's the cause and is only in the 400 series drivers. |
|
|
|
Penguin wrote: ... I saw a thread mentioning something about some "compute" type setting in the nvidia control panel ...
According to what recoil44 posted here, it's intended to NVidia second generation Maxwell GPUs only.
____________
"Accidit in puncto, quod non contingit in anno."
Something that does not occur in a year may, perchance, happen in a moment. |
|
|
|
Wow the 400 series driver is a pig of a thing...
I went from 390.65 to 416.34 and the run times on my GTX1080 went from 1250 sec to 2200 sec!... I've gone back to 390.65.
The same thing happened on my Windows 10 PC with a GTX970, BUT it will now not let me roll back to the older driver or anything in the 300's
It will only let me install 411.63 or above, anything older and I get this message:
'This Nvidia graphics driver is not compatible with this version of Windows'
and
'This graphics driver could not find compatible graphics hardware'
Any suggestions for how I can get 3xx back on my Win 10 machine?
Thanks |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1900 ID: 352 Credit: 3,330,840,823 RAC: 1,494,305
                                 
|
Try DDU (Display Driver Uninstaller) to clean previous/any driver version before installing new one.
Personally, for older cards like GTX 1070, I'm staying with latest from 300's (399.24).
4xx is good (only good??) for RTX 20xx series.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
Try DDU (Display Driver Uninstaller) to clean previous/any driver version before installing new one.
Personally, for older cards like GTX 1070, I'm staying with latest from 300's (399.24).
4xx is good (only good??) for RTX 20xx series.
Do the 300 drivers work for the 20xx series or do you have to use the 400 ones for the RTX stuff to work? 2080 coming soon to Penguin land guess I can experiment then. |
|
|
|
Try DDU (Display Driver Uninstaller) to clean previous/any driver version before installing new one.
Thanks I tried that but I'm still getting the same error message...
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1900 ID: 352 Credit: 3,330,840,823 RAC: 1,494,305
                                 
|
Do the 300 drivers work for the 20xx series or do you have to use the 400 ones for the RTX stuff to work? 2080 coming soon to Penguin land guess I can experiment then.
You need to use 400's series for RTX 20xx.
But don't worry - performance is great using this combination.
See RUMOR [Confirmed] NVIDIA Launching RTX 2080 Ti. I've posted time for RTX 2080 two months ago.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
|
Do the 300 drivers work for the 20xx series or do you have to use the 400 ones for the RTX stuff to work? 2080 coming soon to Penguin land guess I can experiment then.
You need to use 400's series for RTX 20xx.
But don't worry - performance is great using this combination.
See RUMOR [Confirmed] NVIDIA Launching RTX 2080 Ti. I've posted time for RTX 2080 two months ago.
Cool thanks, going to read that thread now. 2080 on the way. |
|
|
|
Is this an OpenCL or CUDA Issue?
Need a little help here to give NVIDIA the scoop |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13569 ID: 53948 Credit: 249,359,006 RAC: 133,198
                           
|
Is this an OpenCL or CUDA Issue?
Need a little help here to give NVIDIA the scoop
AP27 is an OpenCL app.
____________
My lucky number is 75898524288+1 |
|
|
|
Good to know Nvidia knows of this bug now and is working on it. I'm not sure if it affects any other projects or any other of Primegrid's other GPU computing apps. They are treating it so far as AP being the only thing it adversely affects. |
|
|
|
Nvidia has a fix -
Topic: GeForce Drivers 4xx.xx drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers
There is no point in testing newer drivers; I don't expect any changes in this respect. Changes are required in the application if they want to restore performance with the newer drivers.
Current Scenario in ap26 app:
1. App queries CL_KERNEL_WORK_GROUP_SIZE in order to decide local work group size of either 1024 (seems optimal) or 64 (sub-optimal). If app gets value for query <1024 it reduces local work group size to 64 assuming device doesn't support 1024.
2. Nvidia OpenCL Driver changed return value for CL_KERNEL_WORK_GROUP_SIZE from 1024 to 256.
3. App is not using CL_KERNEL_WORK_GROUP_SIZE returned by driver as is, but just choosing a non-optimal local work-group size (64) based on this query.
What should developers do:
• Query CL_KERNEL_WORK_GROUP_SIZE to get just hint about work group size from driver and use it to launch kernel with that specific value. It need not be optimal for all kernels.
• App is free to choose any value from range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE] to get best possible work group size for different kernels, irrespective of CL_KERNEL_WORK_GROUP_SIZE returned by driver.
Suggestions specific to ap26:
• App can query CL_DEVICE_MAX_WORK_GROUP_SIZE and set work group size accordingly instead of using CL_KERNEL_WORK_GROUP_SIZE.
• Simplest solution for ap26 would be to use 1024 work group size directly if it comes in range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE].
I don't know how to best communicate the above information to the developers. If there is a good way to do that, please advise.
from here - https://devtalk.nvidia.com/default/topic/1044539/cuda-programming-and-performance/geforce-drivers-4xx-xx-drop-more-than-2-3-in-opencl-performance-from-the-3xx-xx-drivers/post/5303366/#5303366 |
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 226 ID: 38042 Credit: 891,353,470 RAC: 296,607
                         
|
I'll get to work on the changes in the AP app.
2. Nvidia OpenCL Driver changed return value for CL_KERNEL_WORK_GROUP_SIZE from 1024 to 256.
Why?
This change caused the slowdown because the app reverts to slower code that doesn't use the super fast on chip local memory. |
|
|
|
I'll get to work on the changes in the AP app.
2. Nvidia OpenCL Driver changed return value for CL_KERNEL_WORK_GROUP_SIZE from 1024 to 256.
Why?
This change caused the slowdown because the app reverts to slower code that doesn't use the super fast on chip local memory.
I really dunno. You'd have to discuss it with Nvidia. Does seem foolish if it does what you say and not using the local on chip memory... |
|
|
mfl0p Project administrator Volunteer developer Send message
Joined: 5 Apr 09 Posts: 226 ID: 38042 Credit: 891,353,470 RAC: 296,607
                         
|
I'm just wondering why they decided to change the CL_KERNEL_WORK_GROUP_SIZE returned value to 256 now, after years of it returning 1024.
We'll need to test some changes to the app to see if 1024 will still work. |
|
|
Yves GallotVolunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 660 ID: 164101 Credit: 305,042,960 RAC: 102

|
I'm just wondering why they decided to change the CL_KERNEL_WORK_GROUP_SIZE returned value to 256 now, after years of it returning 1024.
We'll need to test some changes to the app to see if 1024 will still work.
CL_KERNEL_WORK_GROUP_SIZE This provides a mechanism for the application to query the maximum work-group size that can be used to execute the kernel on a specific device given by device. The OpenCL implementation uses the resource requirements of the kernel (register usage etc.) to determine what this work-group size should be.
As a result and unlike CL_DEVICE_MAX_WORK_GROUP_SIZE this value may vary from one kernel to another as well as one device to another.
CL_KERNEL_WORK_GROUP_SIZE will be less than or equal to CL_DEVICE_MAX_WORK_GROUP_SIZE for a given kernel object.
If the OpenCL compiler is different then the register usage is different and CL_KERNEL_WORK_GROUP_SIZE may differ on some devices.
A benchmark is included in genefer: OpenCL profiling is used to select the optimal parameters for each GPU.
|
|
|
|
Just read through the entire thread. Not sure if the 400.xx problem is now officially fixed or should we still be using 399.xx? |
|
|
Monkeydee Volunteer tester
 Send message
Joined: 8 Dec 13 Posts: 441 ID: 284516 Credit: 448,963,025 RAC: 510,473
                        
|
It has been fixed
____________
My Primes
Badge Score: 2*1 + 4*2 + 6*4 + 7*10 + 9*1 + 10*2 = 133
|
|
|
dukebgVolunteer tester
 Send message
Joined: 21 Nov 17 Posts: 238 ID: 950482 Credit: 23,670,125 RAC: 33
                  
|
To be more specific, it's fixed with the new apps that are live since 30 Jan 2019, they are also faster and have a cmdline flag to speed up even more |
|
|
|
Thank you Keith and Dukebg! |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13569 ID: 53948 Credit: 249,359,006 RAC: 133,198
                           
|
This topic is obsolete; the problem discussed here is no longer an issue. I'm locking the thread.
____________
My lucky number is 75898524288+1 |
|
|