Author |
Message |
|
Is possible to run 2 WU at the same time in one GTX560?
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Yes, you need an App_info file for this.
Inside the xml search for: <coproc>
<type>CUDA</type>
<count>1.0</count>
</coproc> ...and change it to: <coproc>
<type>CUDA</type>
<count>0.5</count>
</coproc>
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Thanks for the help.
Do you know if running 2 WU in the same GTX450 realy makes diference in PrimeGrid like it does for example in Seti (+/- 50% increase in RAC)?
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
The parallel calculation of two units can result in a shorter runtime compared to the calculation one after one. It depends of the power of your GPU...
Please try it out and post your result.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
I tried a similar experiment, but one which is not directly applicable to the original question because I used two different apps. But hopefully the result will be interesting in its own right. Instead of going the app_info route, I continued to run 1 PPS Sieve WU at a time, one after the other, as usual. Simultaneously, I ran 1 PRPnet GeneferCUDA 262144 task. Note, this is one sieve and one primality test at the same time on one GPU.
Results:
- My PRPnet GeneferCUDA time was almost exactly double what it normally is when run solo. Maybe even just a tad slower than that.
- My PPS Sieve (cuda) WUs have been very steady between 700 and 705 seconds each. During the time GeneferCUDA was running, I had times between 1233-1236 seconds each. Note that PPS Sieve (cuda) uses less than 10% of a CPU (hyper)thread on average. GeneferCUDA uses 100% of a thread.
- I did not notice any significant change in the thermal state of the GPU or CPU or MB.
- Interactive screen response was noticeably more laggy when both were running as opposed to PPS Sieve alone. I don't recall how the responsiveness is when running only GeneferCUDA, so I can't comment on that.
Setup: GTX570 @ 786/2100/1572, per factory. i7 2600K, moderate o/c.
I await news from anyone who runs two sieves via app_info. I generally avoid messing around at that level, but if called for I could take one for the team (Go AtP!)
--Gary
____________
"I am he as you are he as you are me and we are all together"
87*2^3496188+1 is prime! (1052460 digits)
4 is not prime! (1 digit) |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Gary, i posted some times in the thread run times with a GT440 and later this day (saturday in europe...) i will post new times for a GTS450 eco-version.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
My experience is that the GPU's are kept at a constant 99-100% usage when running either of the sieve jobs. Even when run one at a time.
Doubling up tasks will probably just cause resource contention on the card and may negatively impact cache performance.
Other projects, which frequently do not use 100% of the GPU, it may help to double up the tasks in order to take advantage of the otherwise wasted processing power. One of the MilkyWay apps was like that for me. However, the sieves seem to be very efficient at keeping the GPU running at full load.
My recommendation, for this project, is that you not double up tasks on a GPU.
Even my slow core2 duo can keep a GTX560 busy at 100% GPU usage while crunching other LLR tasks on both CPU cores without a problem. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I will test this too on my new GTS450 "eco" later this day.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
Would the same principles apply to an AMD card instead of a CUDA card? :-)
Some of us are just fans of AMD GPUs, even though they don't perform nearly as well on PrimeGrid.
____________
|
|
|
|
Well lol when i see the load which is 98-100% i would not advise you to do that since the load is more then enough, running 2 instances for just 2% is kinda a non go for me.
The gpu is constant jumping between 98 and 100% means in my book a 100% load.
If it was under 90% i would try it, the trick is to just use this when the gpu is not fully loaded and enough percentage is not used to gain some.
The lower the load is over time say for instance the einstein app which does only give a gpu load of 40% so 2 instances is fine. Now here you would think 3 instances would work but einstein is a tricky one because it needs between 300 and 450 Mb video ram. So if your card has 1 gb ram running 3 is out of the question :D
The reason why nvidia cards perform better is because cuda is a much older and mature processing tool.
ATI came later on the gpu processing train, so they have a lot to catchup.
Our hope is that opencl will narrow the gap between them, but ofcourse nivida as being the bigger and richer probably spends more money to research and develop. |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I don't know the load on an ATI/AMD-card. Try it out and post your results. ;)
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
I don't know the load on an ATI/AMD-card. Try it out and post your results. ;)
I may just do that once the challenge is over :-)
Although I think PrimeGrid is pretty good about using 100% utilization on an AMD card.
Other projects I think I could see huge benefits however though. I have a dual-gpu card (amd 6990) so it always is running two WUs already anyway, but other projects such as DNETC and some others, like to run a single WU across both GPUs. Which works most of the time, however it always seems that towards the end of a WU that one of the GPUs finished its "half" of processing first and is stuck on idle just waiting for the other GPU to finish. Splitting may be ideal..although for all I know then I'll end up with two WUs, each with half their work on each GPU. :-)
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Although I think PrimeGrid is pretty good about using 100% utilization on an AMD card.
I think this is a mistake.
PPS-sieve was designed originally for Cuda and was for the only known basis between both vendors cross-compiled, OpenCL.
IMO, OpenCL is in the same situation like Java. Neither a native language for different hardware vendors nor the holy gral of performance, only platform independend...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2323 ID: 1178 Credit: 14,673,686,444 RAC: 23,381,280
                                           
|
Although I think PrimeGrid is pretty good about using 100% utilization on an AMD card.
I think this is a mistake.
PPS-sieve was designed originally for Cuda and was for the only known basis between both vendors cross-compiled, OpenCL.
IMO, OpenCL is in the same situation like Java. Neither a native language for different hardware vendors nor the holy gral of performance, only platform independend...
True...but even inefficient code can ultimately use the GPU at 100% (i.e., OpenCL code runs slower than CAL even at 100% utilization).
____________
141941*2^4299438-1 is prime!
|
|
|
|
@Gary:
By what mechanism or configuration did you cause 2 WU's to run at the same time, if not with an app_info?
thx
Dave
|
|
|
|
@Gary:
By what mechanism or configuration did you cause 2 WU's to run at the same time, if not with an app_info?
thx
Dave
One on BOINC via the normal mechanism. The other using PRPnet. With many PRPnet instances on your machine, you could run as many as you want at the same time. It would become increasingly inefficient.
This wasn't a recommended configuration. It was just an experiment.
--Gary |
|
|