| Author |
Message |
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
|
Anyone interested in beta testing GeneferCUDA in BOINC, please PM me. You must have Windows OS and a GPU with compute capability >= 1.3.
We are taking a limited number for testing. Validator still needs updating so no credit granted for now. Tasks are less than a minute but will be bumped up once the current cache is complete.
____________
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
|
The response has been good so far. Thank you!
With the onset of the GFN Prime Search transitioning to BOINC, it makes it even more of a priority to get the sieves to their proper depths.
If you have Win64 (or Linux 64 running Windows in VM), you can participate. Please see GFN Prime Search Sieving.
Depths are respectable right now but it sure would be nice to push them to their goals.
____________
|
|
|
|
|
|
John, can you extend the deadline? With only 1h my client aborted all missing deadline tasks. |
|
|
|
|
John, can you extend the deadline? With only 1h my client aborted all missing deadline tasks.
Perhaps lowering your buffer will help?
Normally you would easily make the deadline with your buffer not too large.
Currently crunching smoothly @34seconds per unit on a GTX580.
GPU usage is 99% and temps stay pretty low compared to PPS sieve.
60c for genefer, against 75c for PPS sieve.
____________
|
|
|
|
|
John, can you extend the deadline? With only 1h my client aborted all missing deadline tasks.
Perhaps lowering your buffer will help?
Normally you would easily make the deadline with your buffer not too large.
Currently crunching smoothly @34seconds per unit on a GTX580.
GPU usage is 99% and temps stay pretty low compared to PPS sieve.
I had 0.1 now 0.01, 1h is pretty low ^^ |
|
|
|
|
|
Ok that is pretty low.
How long do your units take on what gpu?
Ah you just edited your msg already.
I see why I didn't have that problem, I had a PPS Sieve waiting, just returned that one and I also got 129 task on a 0,1day buffer.
So I agree 1h is pretty low ;)
____________
|
|
|
|
|
Ok that is pretty low.
How long do your units take on what gpu?
GTX460 46sec. |
|
|
|
|
|
I assume when the tasks get bigger the deadline will be adjusted.
Only 500 remaining in cache according to the frontpage.
I just OC'ed the core/shaders of my card with 15% but strangely enough it had no effect of runtimes.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
I just OC'ed the core/shaders of my card with 15% but strangely enough it had no effect of runtimes.
That depends on how much you OC'ed the shaders. Keep in mind 2 things:
1) Shaders frequency can be adjusted in single unit increments in software like MSI afterburner, but the actual hardware only adjusts in clock "chunks". For example, on my old 9600 GSO, a 1700 shader clock does not bump up the clock to the next "chunk", but going to a 1703 clock does (and you need to go past 1728 for the next one if I remember correctly). You might not be pushing past a particular block of shader frequency, whereas one or two more clock ticks would do it (Sorry, I do not remember how the frequency blocks are organized on Fermi cards).
2) Even ignoring the above point, GeneferCUDA is using the DP capabilities of the card. Whereas increasing the shader clocks gives a directly noticeable gain on SP applications like PPS Sieve (i.e., 10% shader clock increase = about 10% performance increase), the effect on DP processes is much lower. This is due to the DP capabilities of consumer cards being limited to 1/8th of the SP FLOPs. With your GTX 580 with stock 1544 shader clocks, the DP (theoretical) FLOPs will be about 197 (1581 for SP). OC the shaders to 1644, and the DP FLOPs only increase to about 210 in DP (1683 in SP...about a 6% gain, but a 102 FLOPs gain in SP vs. 13 FLOPs in DP). With a 34 sec/task time you reported above, that 6% gain would only equate to 2 seconds faster (assuming the same start of stock clocks, your 15% OC would equate to about a 4 sec/task reduction). Add in the fact that actual FLOPS run lower than the theoretical, and you should see very little gain if any...a quirk of the combination of extremely short app. times and the 1/8th DP capability issue.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
I used MSI afterburner.
I bumped the core/shaders up to 825/1650.
That takes 1 second of the runtime hehe.
I guess it's not much use to OC it, at least not for this small units.
Thanx for explaining.
____________
|
|
|
|
|
|
just finished my first genefer - GTX460 - 20 seconds - o/c'd on shader to 1790
____________
@AggieThePew
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I assume when the tasks get bigger the deadline will be adjusted.
Only 500 remaining in cache according to the frontpage.
LOL.
It will be increased just a tiny, itsy, bitsy, bit.
I don't know exactly how we're going to ramp up to it, since it's still being discussed, but we're currently running really small WUs just for testing.
When we get going for real, this project's goal is to find a new world record prime number. Not a new world record GFN prime. Not a new record for PrimeGrid. A new world record prime, period. To be completely unambiguous: #1 on Chris Caldwell's top 500 prime list. That's the goal here.
These test WUs take 40 seconds on my GPU.
The real WUs take about 8 days. They'll actually start slightly shorter than because we'll be starting just below world record territory. But after the first 200 or so WUs, the run time will be about 8 days, and it will slowly climb from there.
Times are for a stock 460.
It's not up to me, but my guess would be a deadline of 3 weeks is probably appropriate. That's similar to SoB.
Mike
Yeah, the deadline will be a little bit longer. ;-)
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
8 days on a single unit on a GPU?
I hope the save point work properly than ;)
@Rick: 20 seconds? WTF, how you do that? That's over 50% faster then my GTX580. Even with the shaders 'only' at 1650, I would expect the GTX580 to be at least close to that 20seconds on your GTX460.
By the way, cache is empty ;)
____________
|
|
|
|
|
@Rick: 20 seconds? WTF, how you do that? That's over 50% faster then my GTX580. Even with the shaders 'only' at 1650, I would expect the GTX580 to be at least close to that 20seconds on your GTX460.
By the way, cache is empty ;)
Not sure really - I've only done the one and it said run time was 20.53 seconds. It pays to have old technology sometimes... NOT.
Really it's the fact that a rat is quicker than a cow so these short wu's fit real well.
____________
@AggieThePew
|
|
|
|
|
|
Strange though that a non oc 460 takes 40 seconds and your OC card does it in half the time?
That would be a 100% overclock in runtime, and given the explanation given above by Michael, it doesn't seem to be very trustworthy.
It sounds like something is wrong there?
Too bad there is no validator yet active to see if the task is valid.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
8 days on a single unit on a GPU?
I hope the save point work properly than ;)
It does.
In fact, unlike most boinc programs, I made genefer's boinc interface a little bit atypical such that if boinc shuts genefer down (i. e., you suspend the task with "keep tasks in memory" turned off), Genefer will write a checkpoint immediately before shutting down. That way, you don't lose any work since the last checkpoint.
It may (or may not) checkpoint when you shut boinc down completely, such as when you reboot. I aught to test that, but if it doesn't checkpoint when boinc shuts down, there's really nothing more I could do beyond what I'm already doing.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Rick: you have one finished now in 4 seconds? Something is not going as it should i think ;)
____________
|
|
|
|
|
Rick: you have one finished now in 4 seconds? Something is not going as it should i think ;)
Even though they are not showing an error I can't see how they are right ... so something is a miss but I'm not reporting it as an error.. have to go reboot |
|
|
|
|
|
Yep - the beta doesnt like it o/c'd that much... cw works fine at 1790 but i dropped it back down to 1600 and it's now finishing them correctly it appears... at about 1:22 a unit
so A, never fear your gpu is faster :)
____________
@AggieThePew
|
|
|
|
|
|
"Tasks are less than a minute"? Not on my GTX 570 apparently :| Just ran 2 units that both needed ~62 seconds. Could this be due to old drivers perhaps?
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
"Tasks are less than a minute"? Not on my GTX 570 apparently :| Just ran 2 units that both needed ~62 seconds. Could this be due to old drivers perhaps?
New work has been added at the next N. Previous work was at N=8192. Current work is at 16384.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Yep - the beta doesnt like it o/c'd that much... cw works fine at 1790 but i dropped it back down to 1600 and it's now finishing them correctly it appears... at about 1:22 a unit
Awesome. I was wondering what was causing that particular problem. It's a real computing error, and will, in the next release, show up as a computing error on the boinc pages, rather than as a 'success' that would eventually be tossed by the validator.
What's actually happening, if you're interested, is that for whatever reason (heat, power fluctuations, or just circuits that simply can't respond fast enough) you're getting floating point errors in the result of the math that's being done. The actual culprit could be almost anything on the card -- the actual FPU, memory, the SMs, or who knows what else. But something's not working quite right at that speed.
Genefer (and llrCUDA, which has a lot of similarities) literally use different circuitry than the sieves use. In particular, just like their CPU counterparts, the sieves do integer math and the primality programs use floating point math. I'm not surprised that you can overclock one type of program more than the other.
Personally, I'd recommend not overclocking for this project. With the amount of credit that each WU is going to be worth, I wouldn't want computation errors popping up 95% of the way through a WU. Plus, I wouldn't want my wingman to find the world record prime because I had my card overclocked just a smidgeon too much.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Shall current work be validated or just used for test purposes? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Shall current work be validated or just used for test purposes?
It's my understanding that the plan is for the current work to be validated (those that ARE valid, that is), and that they will receive credit. However, I have no insight into what the problem is, so it's conceivable that they can't validate those WUs.
That being said, from my perspective, they're just for test purposes. It's possible they're also being used to double check work that's been done previously, but I don't think so.
I don't expect testing to last long. The guts of this program has been in production for a year already; the testing is primarily to ensure that genefer and boinc play nicely together.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
It may (or may not) checkpoint when you shut boinc down completely, such as when you reboot. I aught to test that, but if it doesn't checkpoint when boinc shuts down, there's really nothing more I could do beyond what I'm already doing.
I just checked, and it will checkpoint during a boinc shutdown.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Michael, is the BOINC GeneferCUDA application slower than the original one in PSA?
Cheers. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Michael, is the BOINC GeneferCUDA application slower than the original one in PSA?
Cheers.
The one we're using for boinc should be the fastest of all of them.
However, there's a flaw in the benchmarks of older versions, so the benchmarks will say that 0.99 (what they're using in PSA right now) is faster. But if you actually run real tests through the two, you should get identical run times.
But, pretty soon the boinc version will be faster than any other, which will be most noticeable at higher Ns. There's a CPU-bound initialization process which is trivial at lower N. However, at N=4194304, this initialization takes two hours. I've been able to optimize that 2 hour phase down to 21 minutes, and I think I can get it down to a about a minute. That's not released yet, so the boinc version is pretty much the same speed as the PSA version. (The boinc version, btw, is compatible with PRPNet.)
EDIT: I think they're using 0.97 on PRPNet, but the timing should be the same.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere. |
|
|
|
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere.
Is your GPU going hot ?
I got the same error on some wu's but after a restart I have no problem.
As you all can see I raised delay to 12 hr.
Lennart |
|
|
|
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere.
Is your GPU going hot ?
I got the same error on some wu's but after a restart I have no problem.
As you all can see I raised delay to 12 hr.
Lennart
No, its cold, and thx for extended deadline. |
|
|
|
|
|
Somehow my boinc is refusing to get more than 1 unit at the time at this moment.
I have my buffer on 0.20days and a task only takes 2 minutes.
But it doesn't ask for more units until the one is finished.
____________
|
|
|
|
|
Somehow my boinc is refusing to get more than 1 unit at the time at this moment.
I have my buffer on 0.20days and a task only takes 2 minutes.
But it doesn't ask for more units until the one is finished.
Could be a high duration correction factor. |
|
|
|
|
|
Even with buffer at 10 days not even a single extra task.
____________
|
|
|
|
|
|
The current (beta) program is only for 64bit m/c. Will this be the same for the final release? or can there be a 32 bit version?
____________
Member team AUSTRALIA
My lucky number is 9291*2^1085585+1 |
|
|
|
|
|
According to the settings 32-bit windows should also work.
____________
|
|
|
|
|
According to the settings 32-bit windows should also work.
Confirmed. Runs also at one of my x86 hosts...
Regards Odi
____________
|
|
|
|
|
|
Thank you so much Michael for such a thorough answer.
I hope new Kepler won't let us down with the DP and its performance. I'm saving money to get this new card to hunt for those elusive megaprimes. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
|
Does anyone know if the issues with GeneferCUDA and the GTX 550 Ti have been fixed in the BOINC version?
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere.
Could you share a link to one of those results, or post the whole stderr output from the result page?
Thanks.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
The current (beta) program is only for 64bit m/c. Will this be the same for the final release? or can there be a 32 bit version?
As far as I know, that's incorrect. Certainly, it wasn't correct when we opened the beta up as I've seen at least one 32 bit XP and one 32 bit Windows 7 computer successfully complete WUs.
If something has changed, it was unintentional. GeneferCUDA is a 32-bit app, and is likely to stay a 32-bit app. I saw no speed improvement at 64-bits (which is exactly what I would expect since the CPU isn't really doing much).
However, the beta -- and production -- is only for Compute Capability 1.3 or above GPUs. GeneferCUDA (and llrCUDA) need double precision floating point hardware, which isn't available on older GPUs. So you're limited to 4xx, 5xx, and GTX (NOT GTS) 2xx class GPUs (i.e., GTX 260 and above.) Note that the 3xx series of GPUs is only CC 1.2 and can not be used.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Does anyone know if the issues with GeneferCUDA and the GTX 550 Ti have been fixed in the BOINC version?
Unfortunately not. That part of the code is unchanged, and neither I nor Shoichiro have any idea what's wrong with it. IIRC, he said that HIS 550TI works find under Linux. I'm not sure if it's Linux, or if it's that only some of the 550 TIs are affected.
Warning: Pure, unadulterated, right off the top of my head speculation follows. It's probably all wrong.
I can think of a few possibilities.
1) Everyone who is having a problem with the 550 TI has it OC'd too high. The problem seems to be similar to what I'm observing now with some OC'd GPUs. So you could try underclocking the 550TI as far down as it will as a test to see if it changes the results. It's possible that, for some reason, "OC'd too high" includes stock clock rates on some cards.
2) Ok, really geeky math nerd stuff follows. There may be one or two people reading this who completely understand this stuff. I'm not one of them ;-)
Almost every type of primality testing program we have uses Fast Fourier Transforms. That includes geneferCUDA, which uses Nvidia's cuFFT libraries.
Nvidia's documentation says it's compatible with the open source FFTW libraries, which makes sense because FFTW is one of the fastest out there. So I suspect cuFFT works, internally, in a similar fashion to the way FFTW works.
FFTW doesn't do things in a predetermined manner. During initialization, it tests various strategies to see which is fastest. Then it executes the FFT with the fastest method.
I'm guessing cuFFT does the same thing. This is important because this means it's not doing the same thing on different computers. So it's entirely possible that on a 550 TI it's doing something differently than on other GPUs. This might be because the different memory architecture makes it faster to, for example, do more floating point operations of type X, which are faster, instead of doing fewer floating point operations of type Y, which are slower. Perhaps the extra floating point ops push the rounding errors over the limit. Again, this is 100% speculation.
If this is true, then it's not just the 550 TI causing the problem. It's the combination of 550 TI and your CPU, since the CPU performance could affect how it decides to to execute the FFT.
Or, the different execution path could have a bug in it.
One thing that you can try (forgive me if I already suggested this earlier) is to run a smaller number through GeneferCUDA on the 550 TI. The larger numbers (like a lot of those in the -b tests) are REALLY close to the max-b limits, so it doesn't take much to push them over. Try this command line with either the boinc 1.01 or the PSA 0.97 releases of geneferCUDA and see if it works:
geneferCUDA (or whatever) -q "1234^8192+1"
Or try any of the WUs currently being handed out by boinc. I think they're all pretty small, at least so far.
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
I tested my "bad" GTX550 Ti with genefercuda boinc. At the actual search range, I noticed no errors. Probably the error occurs again when the range of gfn262144 and gfn524288 is reached.
Regards Odi
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
Unfortunately not. That part of the code is unchanged, and neither I nor Shoichiro have any idea what's wrong with it. IIRC, he said that HIS 550TI works find under Linux. I'm not sure if it's Linux, or if it's that only some of the 550 TIs are affected.
Warning: Pure, unadulterated, right off the top of my head speculation follows. It's probably all wrong.
I can think of a few possibilities.
1) Everyone who is having a problem with the 550 TI has it OC'd too high. The problem seems to be similar to what I'm observing now with some OC'd GPUs. So you could try underclocking the 550TI as far down as it will as a test to see if it changes the results. It's possible that, for some reason, "OC'd too high" includes stock clock rates on some cards.
2) Ok, really geeky math nerd stuff follows. There may be one or two people reading this who completely understand this stuff. I'm not one of them ;-)
Almost every type of primality testing program we have uses Fast Fourier Transforms. That includes geneferCUDA, which uses Nvidia's cuFFT libraries.
Nvidia's documentation says it's compatible with the open source FFTW libraries, which makes sense because FFTW is one of the fastest out there. So I suspect cuFFT works, internally, in a similar fashion to the way FFTW works.
FFTW doesn't do things in a predetermined manner. During initialization, it tests various strategies to see which is fastest. Then it executes the FFT with the fastest method.
I'm guessing cuFFT does the same thing. This is important because this means it's not doing the same thing on different computers. So it's entirely possible that on a 550 TI it's doing something differently than on other GPUs. This might be because the different memory architecture makes it faster to, for example, do more floating point operations of type X, which are faster, instead of doing fewer floating point operations of type Y, which are slower. Perhaps the extra floating point ops push the rounding errors over the limit. Again, this is 100% speculation.
If this is true, then it's not just the 550 TI causing the problem. It's the combination of 550 TI and your CPU, since the CPU performance could affect how it decides to to execute the FFT.
Or, the different execution path could have a bug in it.
One thing that you can try (forgive me if I already suggested this earlier) is to run a smaller number through GeneferCUDA on the 550 TI. The larger numbers (like a lot of those in the -b tests) are REALLY close to the max-b limits, so it doesn't take much to push them over. Try this command line with either the boinc 1.01 or the PSA 0.97 releases of geneferCUDA and see if it works:
geneferCUDA (or whatever) -q "1234^8192+1"
Or try any of the WUs currently being handed out by boinc. I think they're all pretty small, at least so far.
Mike
Thanks for the reply Mike.
As for possibility #1, it can be eliminated I think. My wife's 550 card is a stock clocked EVGA offering and I never adjusted the clocks (it is not even a factory OC version).
I think we can also eliminate the CPU/GPU combo possibility in #2. Her 550Ti is paired with an AMD 1100T, so I don't think that the CPU performance is an issue.
I also already had done the smaller numbers test and they had worked fine. I think you are on to the issue with the FFT initialization issue. As I recall, when the 550Ti first came out, several games had issues with the card due to the odd memory configuration, and these needed some driver tweaking by NVidia to fix it (i.e., they wrote a software work around for the issue on this card). Such tweaking probably equates to something similar to what you have described above...and since the Linux drivers would not be tweaked the same way as would the Windows drivers (e.g., some of the game tweaks would not be applicable to Linux), that might explain why problems exist only on Windows boxes.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
____________
@AggieThePew
|
|
|
|
|
|
Project seems to be very 'sensitive' towards overclocking..
can look at my results about..
Greetings .. parabol
____________
I'm a prime millionaire !
9*2^3497442+1 |
|
|
|
|
I tested my "bad" GTX550 Ti with genefercuda boinc. At the actual search range, I noticed no errors.
Same with mine, so far. |
|
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Project seems to be very 'sensitive' towards overclocking..
can look at my results about..
Greetings .. parabol
Yes, same as with the CPU, primality testing is rougher on the hardware than sieving.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Have you checked the correction factor? Mine has crazily raised to 95! |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
From the boinc cloint's project tab, select PrimeGrid and hit the Properties button.
Near the bottom you'll see "Duration correction factor." You'll need to scroll down to see it. It it's higher than about 10, shut down boinc (the whole thing, not just the GUI), and edit client_state.xml to change that value to something reasonable like 1.0. Restart Boinc and see if that fixes the problem.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Have you checked the correction factor? Mine has crazily raised to 95!
I think when they increased the WU size on genefer, they didn't increase the number of flops in the Work description. Or didn't increase it enough. As a result, all of our computers are now about 100 times slower than they used to be, at least according to BOINC.
Then again, this IS beta testing, and one of the purposes of testing is to figure out what the right values for things like that are.
If everything worked the first time, we wouldn't need testing.
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
|
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
DCF is raising fast with the genefers getting longer and longer. The first I did (yesterday) took ~40 seconds. They're taking over 20 minutes now on my card. I think the estimated size of the tasks has not been updated, causing DCF to go nuts.
update:
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu. |
|
|
|
|
|
Out of work now. |
|
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
From the boinc cloint's project tab, select PrimeGrid and hit the Properties button.
Near the bottom you'll see "Duration correction factor." You'll need to scroll down to see it. It it's higher than about 10, shut down boinc (the whole thing, not just the GUI), and edit client_state.xml to change that value to something reasonable like 1.0. Restart Boinc and see if that fixes the problem.
It's 100! I have corrected it, lets'see if it helps (currently the cache is empty so no tasks available)
____________
|
|
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
From the boinc cloint's project tab, select PrimeGrid and hit the Properties button.
Near the bottom you'll see "Duration correction factor." You'll need to scroll down to see it. It it's higher than about 10, shut down boinc (the whole thing, not just the GUI), and edit client_state.xml to change that value to something reasonable like 1.0. Restart Boinc and see if that fixes the problem.
It's 100. Strange, because when I started on Friday with the test I didn't have the problem.
Anyway, i have corrected it, lets'see if it helps (currently the cache is empty so no tasks available)
I have the same value. The runtimes didnt change anymore. You can only reset the debt values with cc_config.xml |
|
|
|
|
|
I've noticed that incredible result of GTX 570 has a simple explanation:
all of its WU's finished with
maxErr exceeded for ..., 0.5000 > 0.4500
____________
|
|
|
|
|
|
Changing WU's to 262144 brings the error back to the GTX550 Ti:
<stderr_txt>
5480^262144+1end
[...]
Testing 5480^262144+1...
Testing 5480^262144+1... 3255817 steps to go
Testing 5480^262144+1... 3211264 steps to go
maxErr exceeded for 5480^262144+1, 0.5000 > 0.4500
20:16:46 (6696): called boinc_finish
</stderr_txt>
But without Validator, the task is displayed as finished and not as invalid.
Regards Odi
____________
|
|
|
|
|
|
Editing the client_state.xml has done the job.
Thanx.
Jobs are getting much longer and heavier for the GPU. Temps are rising and now the lag is on the screen is really noticable.
____________
|
|
|
|
|
Changing WU's to 262144 brings the error back to the GTX550 Ti:
Same here. Doing a test with extreme underclocking (602/1804 vs the stock 900/1800). estimate time (my estimate, after 2,5% done) is 7,5 hours...
If it works, could it means there's really a problem with the memory of this card (GGDR5), unable to stay up to the fast clocks of the cores? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu.
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Editing the client_state.xml has done the job.
Thanx.
Jobs are getting much longer and heavier for the GPU. Temps are rising and now the lag is on the screen is really noticable.
Temps should be similar to running any other GPU WU. Screen lag... Not sure what to say about that at this point. I don't see lag on my machine (GTX460 and Core2Quad) most of the time.
"Most" being the important word there.
There's two times I notice lag when running genefercuda:
1) When the User Authorization system dims the whole screen
2) When running certain Microsoft apps, like Live Mail. They seem to do something funky with the display driver, and it gets slowed down by genefer. Really not sure why.
Most of the time I don't see any ill effects, at least not until getting to much larger WUs (as in, N >= at least 1 million)
____________
My lucky number is 75898524288+1 |
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2651 ID: 1 Credit: 60,632,486 RAC: 119,580
                     
|
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu.
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
Here's one for you: http://www.primegrid.com/result.php?resultid=334011936
Edit: also
http://www.primegrid.com/workunit.php?wuid=238917357
http://www.primegrid.com/workunit.php?wuid=238990514
____________
|
|
|
|
|
Editing the client_state.xml has done the job.
Thanx.
Jobs are getting much longer and heavier for the GPU. Temps are rising and now the lag is on the screen is really noticable.
Temps should be similar to running any other GPU WU. Screen lag... Not sure what to say about that at this point. I don't see lag on my machine (GTX460 and Core2Quad) most of the time.
Most of the time I don't see any ill effects, at least not until getting to much larger WUs (as in, N >= at least 1 million)
I'm not seeing temps rising, but screen lag is there. Not sure if it is caused by the task itself or by the underclock described below (35% after 2h15 now. previous task, at stock clock, failed after 2 or 3 minutes with the maxErr error). |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
|
I am noticing numerous computation errors on a remote machine that is in use doing other things frequently. I have it set to not use GPU when active, and it looks like this may be causing the errors when a user starts working on the box. Can anyone confirm that this happens?
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
With the 34sec tasks my GPU was at 61c, now at 75c.
Which is the same as for PPS Sieve. Same applies to the screen lag.
So not strange, I was suprised it wasn't there at the smaller units.
Temps are not a problem, I can alsways raise my fanspeed. But it does indicate tasks are getting heavier.
____________
|
|
|
|
|
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
Here's one for you: http://www.primegrid.com/result.php?resultid=334011936
Edit: also http://www.primegrid.com/workunit.php?wuid=238917357
If you look, you'll see that residues are different in those tasks because the error is different in both wingmans (zero in one and above 0 in the other, although contained in the maxErr limit.)
I imagine that unless error is zero, no consensus will be achieved on genefer tasks. |
|
|
|
|
|
It would indeed be nice to have a validator.
I have looked through some of my finished work and see a couple of users with strange runtimes, like this unit.
http://www.primegrid.com/workunit.php?wuid=239173395
I have several units with Ardo as wingman (GTX570) and he has serveral of these very short runtimes.
Perhaps my own GTX580 is generating rubbish results, would be nice to know.
____________
|
|
|
|
|
|
may be late to the party but i've have a bunch that show validated. credit's showing up anywhere from .01 to 2+ per unit. also had one wu that took 11,000 + seconds but it validated. so far no invalid but several user aborted tasks.
anyone else showing validated work?
____________
@AggieThePew
|
|
|
|
|
may be late to the party but i've have a bunch that show validated. credit's showing up anywhere from .01 to 2+ per unit. also had one wu that took 11,000 + seconds but it validated. so far no invalid but several user aborted tasks.
anyone else showing validated work?
Yep. Half a dozen. Max credit 1.08. Guessing badges will be hard to get :) |
|
|
|
|
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I am noticing numerous computation errors on a remote machine that is in use doing other things frequently. I have it set to not use GPU when active, and it looks like this may be causing the errors when a user starts working on the box. Can anyone confirm that this happens?
I was thinking about this scenario -- having genefer shut down, but stay in memory, might be problematic because the user may screw up the GPU.
I was thinking that "leave task in memroy" being turned off might actually be better.
If that resolves the problem, I might know how to fix it -- maybe. It depends on how boinc does the -in-memory suspend.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu.
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
Here's one for you: http://www.primegrid.com/result.php?resultid=334011936
Edit: also
http://www.primegrid.com/workunit.php?wuid=238917357
http://www.primegrid.com/workunit.php?wuid=238990514
For 238917357, one of the two results is wrong, one is right. My GPU matches one of those two. So, I was wrong; you can get a bad result without hitting maxErr.
For 238990514, two of the 3 completed results MATCH, so those two should have validated.
Same thing with 239147407; two of the three residuals match and they should have validated.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I imagine that unless error is zero, no consensus will be achieved on genefer tasks.
Oddly enough, no.
My dinner's getting cold, so no long explanation this time. :)
Errors below 0.45 are to be expected and are ok. It's because of the fact that we're doing bizarre stuff with integers using floating point arithmetic. The closest analogy I can come up with is with using the Transporter from Star Trek to 'beam' a number from one place to another, decomposing it into little tiny bits (pun intended), then putting it back together. Some wierdness (the small rounding errors) is expected and not a problem.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
I have 2 WUs that did validate, so the validator IS validating. I don't understand why those two WUs that Rytis pointed out did not validate.
____________
My lucky number is 75898524288+1 |
|
|
|
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737
Thx for the info. I also thinked about downclocking the card, but for now, I retire my gtx 550ti from genefercuda to prevent more invalid results.
But it seems that GTX 570 is also one of these "bad" cards.
Regards Odi
____________
|
|
|
|
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737
Thx for the info. I also thinked about downclocking the card, but for now, I retire my gtx 550ti from genefercuda to prevent more invalid results.
But it seems that GTX 570 is also one of these "bad" cards.
Regards Odi
Actually, it failed crunching a 524288 task, but then finished within the maxerr a 262144
http://www.primegrid.com/result.php?resultid=334075619. I'm waiting to see if it will validate. But the card is definitely incapable of doing the longer tasks.
I'm testing a GFN262144 at PRPNet, at 800core/1600shader Mhz on the 550 ti. 1000000 steps to go... If it finishes, it will be the first for this card.
Update: it finished successfully in ~two hours.
I've also seen those 570 weird tasks, but all belonged to the same host (Ardo's, if i remember) . So, it could be an oc probem rather than a model issue (as it seems to be with the 550TI). |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
If that resolves the problem, I might know how to fix it -- maybe. It depends on how boinc does the -in-memory suspend.
As far as i know exists no suspend for GPU-work. If you suspend a task on GPU, the task will be completely unloaded from the GPU-memory and you loose the work state without an existing checkpoint.
This behaviour is totally different to the work on CPUs.
The boinc dev's discussed this problem on their mailing list somewhen in the past and IIRC, the conclusion was, the setting "leave tasks in memory" does not work with GPUs. I don't know, if this ever has changed. Maybe i missed simply this email. If so then should "Ageless" find this information in his emails. As far as i know, archives he all emails from the list. Or you try to find this info in the Berkeley mailman archive.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
If that resolves the problem, I might know how to fix it -- maybe. It depends on how boinc does the -in-memory suspend.
As far as i know exists no suspend for GPU-work. If you suspend a task on GPU, the task will be completely unloaded from the GPU-memory and you loose the work state without an existing checkpoint.
This behaviour is totally different to the work on CPUs.
The boinc dev's discussed this problem on their mailing list somewhen in the past and IIRC, the conclusion was, the setting "leave tasks in memory" does not work with GPUs. I don't know, if this ever has changed. Maybe i missed simply this email. If so then should "Ageless" find this information in his emails. As far as i know, archives he all emails from the list. Or you try to find this info in the Berkeley mailman archive.
I came to the same conclusion.
If you kill the GPU task, it can restart from its checkpoint. If it's suspended in memory, there's no guarantee that stuff it has stored on the GPU will be preserved while the task is suspended.
So it's better if the GPU task is killed and restarted. That's would work particularly well with my save-before-killing method, since you usually won't lose any work when the task is unloaded.
I would swear, however, that I tested this and genefer *did* stay in memory when it was suspended with the keep in memory flag on. It would be better if it didn't, however.
Of course, the easiest way to see what's going on is just to look at the code for the boinc client.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Wow 1.08 credits max per task. Will this be fixed or revalidated with fixed credits per level? |
|
|
|
|
|
Thats only when testing validation.
Lennart |
|
|
|
|
|
I keep having problems with no buffer.
I changed the duration correction factor yesterday and did get some more units. Now its back to 100,000 and only getting 1 tasks.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I keep having problems with no buffer.
I changed the duration correction factor yesterday and did get some more units. Now its back to 100,000 and only getting 1 tasks.
I can't speak with authority on this, but my guess is they've been having trouble with keeping the GLFOPS settings on the workunits accurate. The workunits have varied in duration from 40 seconds to 5 hours, a factor of 500:1. If they didn't *correctly* change the GFLOPS setting each time they changed the WU size, it's going to wreck havoc with the DCF.
Knowing/estimating/guessing what the correct GLFOPS settings should be is not easy.
By the way, there may be some *REAL* WUs in the queue now. Real, in this case, means new, unsearched virgin numbers for our crunching pleasure. If your WU has an N of 262144 and b above 500,000 those are real WUs. Those should take about 90 minutes to process. They're not yet the 8 day world record search numbers, but they're real numbers to be searched. It's exactly the same as the numbers being searched over on PSA at the GFN262144 port.
Anything with N of 262144 or 524288 is real if the b is greater than the range being searched over on the PSA. If N is greater than 524288, it's a real WU regardless of b.
____________
My lucky number is 75898524288+1 |
|
|
|
|
I keep having problems with no buffer.
I changed the duration correction factor yesterday and did get some more units. Now its back to 100,000 and only getting 1 tasks.
I can't speak with authority on this, but my guess is they've been having trouble with keeping the GLFOPS settings on the workunits accurate. The workunits have varied in duration from 40 seconds to 5 hours, a factor of 500:1. If they didn't *correctly* change the GFLOPS setting each time they changed the WU size, it's going to wreck havoc with the DCF.
Knowing/estimating/guessing what the correct GLFOPS settings should be is not easy.
By the way, there may be some *REAL* WUs in the queue now. Real, in this case, means new, unsearched virgin numbers for our crunching pleasure. If your WU has an N of 262144 and b above 500,000 those are real WUs. Those should take about 90 minutes to process. They're not yet the 8 day world record search numbers, but they're real numbers to be searched. It's exactly the same as the numbers being searched over on PSA at the GFN262144 port.
Anything with N of 262144 or 524288 is real if the b is greater than the range being searched over on the PSA. If N is greater than 524288, it's a real WU regardless of b.
It is correct ! We have real work at N= 262144 in now.
Rytis is working on validation but remember "he have a Real life also :) !
We are also working on the credit issue.
Time is increasing by N but also by b.
Delay time is at 24 hr now on the new work. This will be increased as well when we go from beta mode.
Lennart
|
|
|
|
|
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
____________
|
|
|
ardo  Send message
Joined: 12 Dec 10 Posts: 168 ID: 76659 Credit: 1,693,455,577 RAC: 1
                   
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737
Thx for the info. I also thinked about downclocking the card, but for now, I retire my gtx 550ti from genefercuda to prevent more invalid results.
But it seems that GTX 570 is also one of these "bad" cards.
Regards Odi
Actually, it failed crunching a 524288 task, but then finished within the maxerr a 262144
http://www.primegrid.com/result.php?resultid=334075619. I'm waiting to see if it will validate. But the card is definitely incapable of doing the longer tasks.
I'm testing a GFN262144 at PRPNet, at 800core/1600shader Mhz on the 550 ti. 1000000 steps to go... If it finishes, it will be the first for this card.
Update: it finished successfully in ~two hours.
I've also seen those 570 weird tasks, but all belonged to the same host (Ardo's, if i remember) . So, it could be an oc probem rather than a model issue (as it seems to be with the 550TI).
That host has two ASUS GTX570 DirectCU II cards with factory settings...
____________
Badge score: 2*5 + 8*7 + 3*8 + 3*9 + 1*10 + 1*11 + 1*13 = 151
|
|
|
|
|
I am noticing numerous computation errors on a remote machine that is in use doing other things frequently. I have it set to not use GPU when active, and it looks like this may be causing the errors when a user starts working on the box. Can anyone confirm that this happens?
Confirm.
And even can say more. My work GPU is 2nd in SLI. My master GPU is free for DC.
When I don't touch comp, a WU finishs good.
When I start to use GPU for example for Flash 11 (that is using GPU optimization), WUs brakes with maxErr.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
|
Well, it looks like my GTX 550 has completed two of the 262k units successfully. For example...
501082^262144+1 is a probable composite. (RES=23c1964028671da9) (1494197 digits) (err = 0.1016) (time = 1:35:52) 01:38:03
01:38:03 (5904): called boinc_finish
...so I'll keep it churning on these to see if a 524k unit can be done, also.
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Your situation has me puzzled at the moment. You're running the same environment (win 7 x64) and the same boinc client (6.12.34) as I am, so that makes things simple. This worked fine when I tested it, so something's different here, and I'm not sure what.
The four errors have this in their output:
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
____________
My lucky number is 75898524288+1 |
|
|
|
|
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
I also looked in my results and found some -161 errors at the 65536 range. But at these host, I never stopped and restart gpu working, because these gpu running 24/7 exclusively for boinc. On this machine I'm only working on a Card which is excluded from boinc.
Maybe it stopped because other GPU-Work was in queue, but I don't remember if this was 2 days ago.
Regards Odi
____________
|
|
|
|
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Your situation has me puzzled at the moment. You're running the same environment (win 7 x64) and the same boinc client (6.12.34) as I am, so that makes things simple. This worked fine when I tested it, so something's different here, and I'm not sure what.
The four errors have this in their output:
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
Same problem again, could be 1:100. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I also looked in my results and found some -161 errors at the 65536 range. But at these host, I never stopped and restart gpu working, because these gpu running 24/7 exclusively for boinc. On this machine I'm only working on a Card which is excluded from boinc.
Maybe it stopped because other GPU-Work was in queue, but I don't remember if this was 2 days ago.
Regards Odi
You have a lot of hosts. :)
I found two that had genefer WUs with errors, and in both cases, for all the errors, genefer detected that boinc had told it to shut down. I couldn't find any instances of errors where that was not the case.
But there's a lot of WUs to look through, so I may have missed it. Could you provide a link to one of those WUs?
Thanks.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
For reasons I can't comprehend BOINC has shoved the genefer unit into hi-prio because the deadline is <24 hours away but the remaining time is <1 hour.
Also noticing the screenlag reported before.
As suggested somewhere I tried freeing up one (or more) CPU cores. This does help a bit, but doesn't remove the screenlag completely. It's just reduced.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
For reasons I can't comprehend BOINC has shoved the genefer unit into hi-prio because the deadline is <24 hours away but the remaining time is <1 hour.
Also noticing the screenlag reported before.
As suggested somewhere I tried freeing up one (or more) CPU cores. This does help a bit, but doesn't remove the screenlag completely. It's just reduced.
Yes, that is correct. Having a free core does make the GPU more responsive, both in terms of more crunching time and also screen responsiveness. Sometimes it can make a big difference.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
I thought my browser just died a horrible death. Or the forums died a horrible death. When I went to read this thread, it was somewhere in the PSA section. Now it's not. :) I was very disoriented when I popped up a level and wasn't where I expected to be!
Thanks, John (presumably), for for giving Genefer (actually GFN) it's own topic.
Mike
____________
My lucky number is 75898524288+1 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Thanks, John (presumably), for for giving Genefer (actually GFN) it's own topic.
Thanks goes to Rytis!
____________
|
|
|
|
|
I thought my browser just died a horrible death. Or the forums died a horrible death. When I went to read this thread, it was somewhere in the PSA section. Now it's not. :) I was very disoriented when I popped up a level and wasn't where I expected to be!
Thanks, John (presumably), for for giving Genefer (actually GFN) it's own topic.
Mike
Me too - was wondering what happened.. now all is right again. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Your situation has me puzzled at the moment. You're running the same environment (win 7 x64) and the same boinc client (6.12.34) as I am, so that makes things simple. This worked fine when I tested it, so something's different here, and I'm not sure what.
The four errors have this in their output:
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
I am getting tons of these on this host. Same error across different workunit sizes. Machine is remote and set to not use GPU while in use (resumes after 1 minute idle).
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Seeing the same here. Paused a genefer unit because it messed with my video playback but this only caused it to error out, report and start a new one.
Relevant units are http://www.primegrid.com/result.php?resultid=334129310 and http://www.primegrid.com/result.php?resultid=334128885
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
ardo  Send message
Joined: 12 Dec 10 Posts: 168 ID: 76659 Credit: 1,693,455,577 RAC: 1
                   
|
|
OK, figured out what might have caused the host with the GTX570s to misbehave: I was using a GT210 as my main video card so that the two GTX570s could use their full potential for number crunching. In that setup I just ran the GeneferCUDA benchmark and got too high max errors for the larger numbers. I took out the GT210 and moved the GTX750s into the slots per MB manual. Rerunning the GeneferCUDa benchmarks things look much better for each card.
I resumed the boinc tasks and things look better.
However, I just noticed that when a shorter task on one card finishes, the task on the other card also is "finished" but without an output file..
____________
Badge score: 2*5 + 8*7 + 3*8 + 3*9 + 1*10 + 1*11 + 1*13 = 151
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
Okay,I am getting this kind of error on a machine where the GPU is set to always run even when in use. See here. This specific error I can trace to an exact event...My wife started up the game FreeCell on her machine. So it looks like anything active on the GPU other than Genefer might create this error???
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
This specific error I can trace to an exact event...My wife started up the game FreeCell on her machine. So it looks like anything active on the GPU other than Genefer might create this error???
550 TI seem to be extremely sensitive when running genefers. I've seen it happen after waking the display from screen saver and once it happened just with a task climbing up on the boinc manager task page. At least with the current drivers (under windows), I believe genefers are out of 550 ti limits. |
|
|
ardo  Send message
Joined: 12 Dec 10 Posts: 168 ID: 76659 Credit: 1,693,455,577 RAC: 1
                   
|
OK, figured out what might have caused the host with the GTX570s to misbehave: I was using a GT210 as my main video card so that the two GTX570s could use their full potential for number crunching. In that setup I just ran the GeneferCUDA benchmark and got too high max errors for the larger numbers. I took out the GT210 and moved the GTX750s into the slots per MB manual. Rerunning the GeneferCUDa benchmarks things look much better for each card.
I resumed the boinc tasks and things look better.
However, I just noticed that when a shorter task on one card finishes, the task on the other card also is "finished" but without an output file..
That last was apparently a leftover from before, as the last couple of tasks have been processed to completeness successfully...
____________
Badge score: 2*5 + 8*7 + 3*8 + 3*9 + 1*10 + 1*11 + 1*13 = 151
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
|
This is not the "-161" error that we are seeing across lots of cards that is related to pausing the GPU, etc.
Rather, I am surprised that my GTX 550 TI is completing any of the 262k or larger units as none of these worked on PRPnet. Looking at results so far, I am seeing a very mixed bag of results. About half result in the max err problem that occurred on PRPnet, though at different steps:
Testing 501002^262144+1... 4963553 steps to go
Testing 501002^262144+1... 4915200 steps to go
maxErr exceeded for 501002^262144+1, 0.5000 > 0.4500
04:46:08 (1952): called boinc_finish
Testing 500538^262144+1... 3604480 steps to go
maxErr exceeded for 500538^262144+1, 0.5000 > 0.4500
05:12:57 (4044): called boinc_finish
Testing 500454^262144+1... 2621440 steps to go
maxErr exceeded for 500454^262144+1, 0.5000 > 0.4500
05:58:23 (4280): called boinc_finish
Testing 501922^262144+1... 1114112 steps to go
maxErr exceeded for 501922^262144+1, 0.5000 > 0.4500
14:52:49 (3992): called boinc_finish
Testing 500574^262144+1... 4259840 steps to go
maxErr exceeded for 500574^262144+1, 0.5000 > 0.4500
18:26:48 (4048): called boinc_finish
Testing 502846^262144+1... 3735552 steps to go
maxErr exceeded for 502846^262144+1, 0.5000 > 0.4500
18:12:42 (5412): called boinc_finish
Testing 501500^262144+1... 4194304 steps to go
maxErr exceeded for 501500^262144+1, 0.5000 > 0.4500
07:49:44 (3388): called boinc_finish
Testing 502246^262144+1... 4325376 steps to go
maxErr exceeded for 502246^262144+1, 0.5000 > 0.4500
13:38:31 (4552): called boinc_finish
Testing 5100^524288+1... 6029312 steps to go
maxErr exceeded for 5100^524288+1, 0.5000 > 0.4500
15:30:39 (3748): called boinc_finish
While about half complete with no problems:
501082^262144+1 is a probable composite. (RES=23c1964028671da9) (1494197 digits) (err = 0.1016) (time = 1:35:52) 01:38:03
01:38:03 (5904): called boinc_finish
500874^262144+1 is a probable composite. (RES=d5e98a93688693dd) (1494150 digits) (err = 0.1094) (time = 1:35:42) 07:34:09
07:34:09 (5476): called boinc_finish
501756^262144+1 is a probable composite. (RES=6fe75633c14c20ba) (1494350 digits) (err = 0.1094) (time = 1:35:49) 11:49:56
11:49:56 (5616): called boinc_finish
501498^262144+1 is a probable composite. (RES=8f5dd4e9aed53071) (1494292 digits) (err = 0.3379) (time = 1:36:16) 09:26:03
09:26:03 (2356): called boinc_finish
501860^262144+1 is a probable composite. (RES=663982d90bfb4e4a) (1494374 digits) (err = 0.1016) (time = 1:35:39) 13:25:38
13:25:38 (992): called boinc_finish
5866^262144+1 is composite. (RES=d2289cc9c4b54a4d) (987849 digits) (err = 0.0020) (time = 1:03:10) 16:53:54
16:53:54 (540): called boinc_finish
5316^262144+1 is composite. (RES=ee53b0c252c2a799) (976640 digits) (err = 0.0000) (time = 1:02:19) 03:41:53
03:41:53 (1884): called boinc_finish
5580^262144+1 is composite. (RES=c81b367c9ec237a4) (982158 digits) (err = 0.0000) (time = 1:02:39) 04:44:35
04:44:35 (5708): called boinc_finish
Anyone see any pattern in what is going on here? or is it just random instability such that the 550's are better retired to another project permanently?
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
I suspect that the pausing problem is caused by my super special code that checkpoints before be shut down. Unfortunately, that seems to cause problems sometimes, but not always in my test environment. I've pulled that code out, so the problem should go away with v1.02 when it's released. That means that genefercuda will only checkpoint when it's scheduled to checkpoint according to the preferences you set.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Can anyone tell me how would run times of GeneferCUDA on GTX570 compare to those on i7-2600K when searching for a prime of the same digit length?
It'm curious what is better in terms of money spent, assuming that i7 can run 8 threads and consumes twice as less. Also the CPU overclocks much better when an overclocked GPU as stated in this thread starts to error.
Thank you for answers. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Can anyone tell me how would run times of GeneferCUDA on GTX570 compare to those on i7-2600K when searching for a prime of the same digit length?
It'm curious what is better in terms of money spent, assuming that i7 can run 8 threads and consumes twice as less. Also the CPU overclocks much better when an overclocked GPU as stated in this thread starts to error.
Thank you for answers.
I get a 10:1 ratio with a GTX 460 vs. a Q6600. Both are slower than the 570/2600K, but the i7's speed will be cut in half by hyperthreading. My guess would be you'd be in the vicinity of 10:1, but that's just a guess.
There's folks here who have 570/i7 combos, so hopefully you'll get a better answer.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I suspect that the pausing problem is caused by my super special code that checkpoints before be shut down. Unfortunately, that seems to cause problems sometimes, but not always in my test environment. I've pulled that code out, so the problem should go away with v1.02 when it's released. That means that genefercuda will only checkpoint when it's scheduled to checkpoint according to the preferences you set.
I forgot that now that the genefer server is running, I can test new versions of the software against the real server using app_info.xml. That makes things SO much simpler!
v1.02 does in fact seem to solve the problem. It suspends properly regardless of whether the 'keep-in-memory' flag is on or off.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Thank you Michael for the reply!
Here's another question. ATI cards show much better double-precision performance than NVidia. For instance, HD5850 performs 2 times faster than GTX 580. Maybe the next step should be getting an OpenCL version of Genefer?
We would crunch faster and attract more crunchers with ATI cards from other projects.
Cheers! |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Thank you Michael for the reply!
Here's another question. ATI cards show much better double-precision performance than NVidia. For instance, HD5850 performs 2 times faster than GTX 580. Maybe the next step should be getting an OpenCL version of Genefer?
We would crunch faster and attract more crunchers with ATI cards from other projects.
Cheers!
All true, except perhaps for the OpenCL part. OpenCL is portable, but when speed counts (and when doesn't speed count when you're talking about a GPU???) you might be better of with something written specifically for the ATI than something that's portable. This is far more true of GPUs that CPUs.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
maxErr with 1.02:
http://www.primegrid.com/result.php?resultid=334128749
Tried to play flash game during crunch.
The difference with 1.01 is in Exit code:
1.01 - 0 (0x0)
1.02 - 10 (0xa)
____________
|
|
|
|
|
|
x3mEn [Kyiv]
Could it be that you are running Boinc Ver7+ maybe ??
Had also problems with that version
____________
I'm a prime millionaire !
9*2^3497442+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
maxErr with 1.02:
http://www.primegrid.com/result.php?resultid=334128749
Tried to play flash game during crunch.
The difference with 1.01 is in Exit code:
1.01 - 0 (0x0)
1.02 - 10 (0xa)
That's exactly what's supposed to happen. MaxErr is now handled as an error immediately.
The MaxErr isn't supposed to happen, of course. I'm running that number on my 460 and we'll see what happens.
If you're curious as to whether the error might have been caused by the flash game, try running the test manually from the command line. Use this as the parameters:
-q "504424^262144+1"
The quotes ARE necessary when the program is run from the Windows command line. Not necessary if genefer is invoked from another program, and probably not necessary on any other O.S. Might not even be necessary under really old versions of Windows, either. That applies to all PrimeGrid programs that have a -q (or similar) parameter. Windows uses "^" as a special character on the command line, so if you don't use the quotes, the program receives this as the command line:
-q "504424262144+1"
The carat is removed without the quotes.
Ok, good, I've babbled on long enough so that my test run has progressed past where yours failed:
C:\GeneferCUDA test\geneferCUDA-boinc.1.02>GeneferCUDA-boinc-windows.exe -q "504
424^262144+1"
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -q 504424^262144+1
Testing 504424^262144+1... 4390912 steps to go
I'lll let it run to the end just to make sure.
It looks like your other WUs completed normally, so maybe it was the game.
____________
My lucky number is 75898524288+1 |
|
|
|
|
x3mEn [Kyiv]
Could it be that you are running Boinc Ver7+ maybe ??
Had also problems with that version
Yes, I am using 7.07 version.
But I don't think it's a main problem.
One half of my WU's have finished succesfully when I was silently sitting and looking for WUs progress.
And "maxErr" were appeared first when I gave up place for my wife to play flash game.
Today I tried new 1.02 version and played flash game myself and "maxErr" happend again.
Now I am trying to run cmd line.
I'm sure that if I'll be waiting I'll achieve the end.
Now I'm here:
primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q 504424^2
62144+1 --device 0
Testing 504424^262144+1... 3801088 steps to go
And I don't play flash game now.
____________
|
|
|
|
|
|
Unfortunately I was wrong.
Even from the command line program finished with maxErr...
primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q "504424^262144+1" --device 0
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q 504424^262144+1 --device 0
Testing 504424^262144+1... 3473408 steps to go
maxErr exceeded for 504424^262144+1, 0.5000 > 0.4500
____________
|
|
|
|
|
|
Actually, I was NOT playing flash game, but It was running in the background... so the test wasn't clean.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
It might be a good idea to turn off the "Use GPU while computer is in use" checkbox.
Chances are if some other program is interfering with GeneferCUDA, it's probably interfering with any CUDA program you run. GeneferCUDA may simply appear to be more sensitive because it does more strenuous internal checks than other programs. Or it may be more sensitive.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Maybe you are right, but I'm using GPU's with checked "Use GPU while computer is in use" for a very long time and had no problems with every GPU projects, except, maybe GPUGrid.
They say that GPUGrid is very sensitive to GPU memory accurancy.
I think that Genefer is very close to GPUGrid in requirements to GPU memory.
2x bigger GFN needs 2х more GPU memory I guess. So OC'd GPU's produced memory errors more frequently. My GTX 460 is factory OC'd: 810/2000/1620 instead of 675/1800/1350.
GDDR5 is not ECC, unfortunately...
____________
|
|
|
|
|
|
x, I'm running Evga gtx 460 with the shaders clocked to 1600. I had to drop it down from 1790 before any wu's would finish correctly. it's a 1 meg card but i'm ONLY running primegrid gpu tasks and have both cores idle. I'm also running a 32 bit vista os (don't laugh).
So now my question, it looks like most of the testing has settled down so I was wondering, when a wu is run and reported do i need to review logs to see if it actually finished correctly or is that now being reported to pg as an errored or invalid wu? Reason I'm asking is that even though I know we are not validating some validation testing must be going on because I have a lot of "valid" wu's with credit and of course lots of pending units.
____________
@AggieThePew
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Maybe you are right, but I'm using GPU's with checked "Use GPU while computer is in use" for a very long time and had no problems with every GPU projects, except, maybe GPUGrid.
You probably never played "Civilization V" then. ;-)
There are programs that absolutely will trample all over any CUDA program that has the misfortune to try to run at the same time.
Generally speaking, I agree with you. Most of the time CUDA co-exists nicely with other programs, but not all of the time. Although I keep my GPU crunching while I'm using the computer, I do have a list of programs in cc_config.xml that BOINC isn't allowed to run with simultaneously.
They say that GPUGrid is very sensitive to GPU memory accurancy.
I think that Genefer is very close to GPUGrid in requirements to GPU memory.
2x bigger GFN needs 2х more GPU memory I guess. So OC'd GPU's produced memory errors more frequently. My GTX 460 is factory OC'd: 810/2000/1620 instead of 675/1800/1350.
GDDR5 is not ECC, unfortunately...
Actually, that's not true -- and I'm not sure why not. GPU memory usage seems to peak at around 45% (out of 1 gig) no matter how high N goes.
My calculations on the 460 just finished:
C:\GeneferCUDA test\geneferCUDA-boinc.1.02>GeneferCUDA-boinc-windows.exe -q "504
424^262144+1"
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -q 504424^262144+1
504424^262144+1 is a probable composite. (RES=3c7571725ce19f93) (1494954 digit
s) (err = 0.1016) (time = 1:33:51) 09:02:33
I think it was the flash game. That's interesting, and not really what one would expect.
Definitely good information to know.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
So now my question, it looks like most of the testing has settled down so I was wondering, when a wu is run and reported do i need to review logs to see if it actually finished correctly or is that now being reported to pg as an errored or invalid wu? Reason I'm asking is that even though I know we are not validating some validation testing must be going on because I have a lot of "valid" wu's with credit and of course lots of pending units.
Anything from here on out (specifically with v1.02) reports maxErr as an error, so you won't need to look at the logs. For WUs run with 1.01, the plan was to have the validator check for maxErr exceeded errors in the output and mark the WUs as invalid, rather than as computation errors.
I don't have any information on the status of the validator.
____________
My lucky number is 75898524288+1 |
|
|
|
|
You probably never played "Civilization V" then. ;-)
Yes, I have 2 years as not a gamer :)
Here's how just started crunches at GPU and stopped playing games :)
My calculations on the 460 just finished:
Ok, I'll try to test this one again in conditions of pharmaceutical purity: no flash :) , absolutely idle CPU and so on...
____________
|
|
|
|
|
|
So I just finished calulations on my GTX 460:
primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q "504424^262144+1" --device 0
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q 504424^262144+1 --device 0
504424^262144+1 is a probable composite. (RES=3c7571725ce19f93) (1494954 digits) (err = 0.1016) (time = 1:19:02) 18:10:55
So, it was a flash game.
Michael, I've noticed that first time I started command line program, it resumed from a checkpoint:
Resuming 504424^262144+1 from a checkpoint (3538943 iterations left)
So I have an idea. What if to restart program from a checkpoint when "maxErr" appeared? At least when a checkpoint has changed from the (re)start of program.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
So I have an idea. What if to restart program from a checkpoint when "maxErr" appeared? At least when a checkpoint has changed from the (re)start of program.
Theoretically possible, but there's two problems:
* By the time you would want to do this, BONC will have already reported the task back to the server as a computation error.
* This could be very dangerous -- the actual errors (or *some* of the errors) may have occurred prior to the checkpoint. In this situation, maxErr getting too high is a manifestation of a computation error. There could be other computation errors that occurred that did not cause a large rounding error that would be detected. So it's safer just to abandon the WU. You know something went wrong, but you don't know how badly, or when.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
But, pretty soon the boinc version will be faster than any other, which will be most noticeable at higher Ns. There's a CPU-bound initialization process which is trivial at lower N. However, at N=4194304, this initialization takes two hours. I've been able to optimize that 2 hour phase down to 21 minutes, and I think I can get it down to a about a minute. That's not released yet, so the boinc version is pretty much the same speed as the PSA version. (The boinc version, btw, is compatible with PRPNet.)
Alright, I've now got that working the the way I want it.
That 2-hour initialization phase now takes a small fraction of a second. (Isn't math wonderful?)
There will be a 1.03, with this optimization code, before we bump N up beyond 524288. I'll be doing a bit more testing with this version before releasing it since this is the first change I've made to the actual computations.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Noticed a bunch more workunits. Just wondering what these were testing and what we needed to look for with these. thanks |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Noticed a bunch more workunits. Just wondering what these were testing and what we needed to look for with these. thanks
It looks like just more of the "real" WUs at N=262144. Mostly, at this point, the purpose is to do lots of testing to shake any other potential bugs out of the woodwork.
____________
My lucky number is 75898524288+1 |
|
|
|
|
Noticed a bunch more workunits. Just wondering what these were testing and what we needed to look for with these. thanks
It looks like just more of the "real" WUs at N=262144. Mostly, at this point, the purpose is to do lots of testing to shake any other potential bugs out of the woodwork.
Thanks.. so one more question. is the beta now available to the general public? I was thinking no since there weren't a lot of comments or questions flying around but just wondered. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Thanks.. so one more question. is the beta now available to the general public? I was thinking no since there weren't a lot of comments or questions flying around but just wondered.
It's not released to boinc yet. I put it out mostly so Shoichiro could get the source code for that b^N calculation.
I haven't started to thoroughly test it yet, and I'll want to do that. I'm not even sure yet exactly what king of testing I want to do with it. It will be going where no Genefer has ever gone before, and if it's doing something wrong it might be very hard to know something is amiss.
I can't stop anyone from grabbing it and running it with either prpnet or boinc (you would need an app_info.xml, but that's not hard), but I haven't done that yet myself so you could get to be the pioneer. It seems to work fine as a standalone app.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Is it just me or have others seen a drop in runtimes... mine have gone from 4880 seconds down to 3212 seconds |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Is it just me or have others seen a drop in runtimes... mine have gone from 4880 seconds down to 3212 seconds
Just you. ;-)
The short WU was a re-issue of an older, shorter WU that needed to be sent to another computer.
____________
My lucky number is 75898524288+1 |
|
|
|
|
Is it just me or have others seen a drop in runtimes... mine have gone from 4880 seconds down to 3212 seconds
Just you. ;-)
The short WU was a re-issue of an older, shorter WU that needed to be sent to another computer.
Dang lol - thanks for letting me know |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Thanks.. so one more question. is the beta now available to the general public? I was thinking no since there weren't a lot of comments or questions flying around but just wondered.
As it turns out, the code I wrote stops working correctly at a very low b value at N=4194304. So it's back to the drawing board.
This is why we do testing. :)
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Something I do find interesting. Even though there's no recorded credit other than what each unit shows, it's affected my overall RAC. I had to laugh because all those .01 credits really added up.
____________
@AggieThePew
|
|
|
|
|
|
Because I don't want to get thousands of answers I will ask here. Opinions on the best brand and model of a gtx 570. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Because I don't want to get thousands of answers I will ask here. Opinions on the best brand and model of a gtx 570.
I'm partial to EVGA myself, but that's not a very strong preference. My GTX 460 isn't EVGA; it was on sale. :)
DO make sure that whatever you buy has a lifetime warranty. That way a dead card == (usually) a free upgrade if enough time has passed so they don't have your old card in stock anymore.
My dead 280 got RMA'd with a 470. (That's an upgrade as my 460 crunches better than my 280 did).
Also, most manufacturers require that in order to enable the lifetime warranty, you much register the card. So don't forget to register.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Thanks Mike. I have an evga and I like it a lot. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
|
I have both EVGA and ASUS cards. I like them both. My preference between the two is ASUS. Both have been good quality and durable for me, but the ASUS cards have always run a bit cooler in my machines. (My EVGA = GTX 550 Ti and 9800 GTX; ASUS cards are 8800GS, 9600GSO, GTS 450, and GTX 460-768).
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
Because I don't want to get thousands of answers I will ask here. Opinions on the best brand and model of a gtx 570.
I used to have Palit for a half a year and digged it a lot. It overclocked perfectly and worked stable cruching GCWSieve at 950/324 with temps around 54-56 celcius. It's a good deal for money. |
|
|
|
|
|
Stuff had gotten pretty messed up over night here. I've been running some TRP sieves for a couple of days for which runtimes were way off. First ones I got had estimates of 14 days, even though they only take about 2 hours.
Before going out flat last night, I had my buffer set to 0.5 days. When I woke up this morning I had about 130 genefer tasks with deadlines of 24h and estimated runtimes of 3 or 4 minutes. Now, my GTX 570 is not slow but it does take close to an hour (3200-3300 seconds) for a unit currently. I.e. finishing more than 28 in a day is near impossible.
If I remember correctly this is because BOINC doesn't get that different subprojects can have different wu-lengths and messes up run time estimations because of it... Right? If not could someone have a look at it?
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
|
|
|
I have changed ETA but it will not take effect before next time we load more work.
Lennart |
|
|
|
|
Stuff had gotten pretty messed up over night here. I've been running some TRP sieves for a couple of days for which runtimes were way off. First ones I got had estimates of 14 days, even though they only take about 2 hours.
Before going out flat last night, I had my buffer set to 0.5 days. When I woke up this morning I had about 130 genefer tasks with deadlines of 24h and estimated runtimes of 3 or 4 minutes. Now, my GTX 570 is not slow but it does take close to an hour (3200-3300 seconds) for a unit currently. I.e. finishing more than 28 in a day is near impossible.
If I remember correctly this is because BOINC doesn't get that different subprojects can have different wu-lengths and messes up run time estimations because of it... Right? If not could someone have a look at it?
That's the reason why I set the buffer to 0,01. These units really mess up the estimates. I had to cancel a lot of units last weekend because of the faulty runtimes.
____________
|
|
|
|
|
|
Just looked and I now have a few invalid units so that means more testing is being done on the validator? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Just looked and I now have a few invalid units so that means more testing is being done on the validator?
I've got a lot more validated WUs than I did before, so it sure looks as it the validator is, well, validating.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Well I know I had a lot of invalids due to the o/c issue before I caught it but that's the first time I'd seen them start to show up.
Great sounds like this project is moving along :) |
|
|
|
|
|
Just suspended a unit because I wanted my GPU completely at my own disposal. Doing so completely messed up windows for a minute though. Screen turned black and unresponsive. After that minute it responded again. Haven't seen this behavior before with genefer or other gpu-apps.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Just suspended a unit because I wanted my GPU completely at my own disposal. Doing so completely messed up windows for a minute though. Screen turned black and unresponsive. After that minute it responded again. Haven't seen this behavior before with genefer or other gpu-apps.
That's a possibility. I've seen that happen once.
What most likely happened is that Boinc shut genefer down right in the middle of using the GPU -- which is actually pretty likely, considering that genefer is constantly using the GPU. However, there appears to be times when shutting genefer down breaks the video driver. That doesn't happen often.
As you observed, Windows restarts the video driver, so you get that short interval while everything is blanked out, then stuff goes back to normal.
Boinc has a mechanism to prevent that, but when I tried using it, it prevented genefer from ever shutting down. After looking at the boinc client's source code, I concluded that there's a fundamental flaw in the way that mechanism was built -- or, more specifically, the way it's designed -- and could not possibly ever work. It behaves exactly as I would expect it to behave based on what I saw in the source code.
So the viable choices are to have an app that will occasionally kill the video driver (which gets reset by Windows), or to have an app that boinc SAYS it suspended but keeps on running. The once in a while killing of the video driver seemed like the better choice.
For those interested in the technical details:
The Boinc API has calls that enter a "critical section" during which Boinc will not terminate your process. They give a specific example for using this with GPU kernels:
Enter Critical Section
<< execute GPU kernel >>>
Leave critical section
(process may get terminated here )
Enter Critical Section
<< execute GPU kernel >>>
Leave critical section
...repeat...
Simple enough, but the way boinc works, its attempt to kill the app isn't remembered if it doesn't work; it appears as if boinc merely keeps trying once per second to kill the app. The odds of that happening during the tiny fraction of a second between kernels is astronomically tiny. As a result, the critical section code causes a bigger problem than the one it solves.
____________
My lucky number is 75898524288+1 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
|
geneferCUDA-boinc.1.03a has been released in BOINC as 1.03 (cuda32_13). Installation time: 12 Jan 2012 | 8:51:30 UTC
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
Killing the video driver is neither a solution nor a workaround. This works also only with newer Win-versions and maybe linux.
In WinXP the gpu driver works only in the system-space and rendering the desktop will also be done by this.
This behaviour was changed with Vista as a result of a mass of bluescreens while to much GPU-OCing or simply poor driver quality. Since this timepoint the gpu-driver is splitted in two areas. All rudimentary functions (gpu-detection, registration in the system, supported features, amount of vram etc) will managed in the system-space and all other stuff like rendering or clockrates-tuning etc in the user-space. To have the simpler driver functions in the user-space makes a big difference. All tasks in user-space can be restarted without a reboot.
My suggestion would be to log on to the dev mailing list and writing a mail about this problem because posting in the dev-forum would have only a effect, when the dev's would read it regularly.
Do not be amazed if this needs much time than expected. Some dev's are more interested in developing a new credit-scheme or programming new features than in fixing some really old bugs.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Killing the video driver is neither a solution nor a workaround. This works also only with newer Win-versions and maybe linux.
In WinXP the gpu driver works only in the system-space and rendering the desktop will also be done by this.
This behaviour was changed with Vista as a result of a mass of bluescreens while to much GPU-OCing or simply poor driver quality. Since this timepoint the gpu-driver is splitted in two areas. All rudimentary functions (gpu-detection, registration in the system, supported features, amount of vram etc) will managed in the system-space and all other stuff like rendering or clockrates-tuning etc in the user-space. To have the simpler driver functions in the user-space makes a big difference. All tasks in user-space can be restarted without a reboot.
My suggestion would be to log on to the dev mailing list and writing a mail about this problem because posting in the dev-forum would have only a effect, when the dev's would read it regularly.
Do not be amazed if this needs much time than expected. Some dev's are more interested in developing a new credit-scheme or programming new features than in fixing some really old bugs.
Makes sense. I remember when they changed the driver behavior for Vista. So it's not a problem for me, but it is a problem for some. I may have thought of a way of dong this that might work. I'll give it a try. If it works I'd rather do that than wait on the Boinc devs to fix a bug. Their priorities are their priorities and obviously can't match the priorities of the hundred or so individual projects.
As for Linux, I'm not certain if the critical section code works the same as it does in Windows. A Linux build of Genefer might work correctly with the boinc critical section calls uncommented.
Sorry, John, if I can make a fix for this it would mean another release! I know I promised no more for a while. ;-)
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
I think I have that working right now.
Grrrrrrrrr. Boinc will drive me nuts, I swear.
They provide two mechanisms to alter the default handling of suspending or aborting a task.
They give you critical sections, which let you prevent your task from being killed at a bad time.
They also give you the ability to poll for an abort command, and handle the abort request yourself rather than having boinc kill the task from outside.
The documentation explicitly says to use both methods together with CUDA programs; you're supposed to protect the CUDA calls with a critical section and then poll for the abort flag between kernels.
Critical sections would work, but not with CUDA where you're in a critical section all the time.
Polling will work.
The two of them together -- as recommended in the documentation -- can not possibly work.
Ignoring the critical sections and doing just polling seems to work perfectly. It also has the benefit of re-enabling the checkpoint-before-suspend code that I had in there before.
It seems to be working pretty well.
I'm gonig to make it available, along with an app_info file, for anyone who wants to test this. As per Ronald's comments, it is recommended if you are using a version of Windows prior to Vista. It's not so terrible for Vista or Win 7 either, as it will checkpoint immediately before suspending itself.
Note that this app_info file ONLY contains the information for Genefer. No other PrimeGrid project will work unless you modify app_info.
Turning on app_info will likely kill all WUs you currently have in progress. You probably want to let everything finish before installing app_info.
Download here: GeneferCuda v1.04 beta
Unzip the two files in there into your (boinc-data-directory)/projects/www.primegrid.com directory.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=335875952
____________
|
|
|
|
|
|
Surprise surprise surprise - big chunks of validations just came through along with big chunks of credits. Guess the validation testing is going well and the unexpected credit is a nice present even if xmas is gone.
____________
@AggieThePew
|
|
|
|
|
|
Yeah good going on the validation and increase of credits :)
I have question on the credits though. Is the 3600 a fixed value or is it simply maxing out at that value? Guessing the first, but confirmation would be nice.
Will the project also be added to the overview on your account page?
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
|
|
|
Hey Pyrus, did you notice that some of the wu's got a credit of 1 and others of 3600.. wonder what made that difference? they units looked the same to me.. so to speak |
|
|
|
|
|
Just went to check. I have two of those, but can't find any good explanation either.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
|
|
|
would rather they all had a 3600 credit but hey that's what testing is for and I wasn't even expecting credit to begin with. |
|
|
|
|
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=334658038
http://www.primegrid.com/result.php?resultid=334658050
Maybe something wrong with logging?
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=334658038
http://www.primegrid.com/result.php?resultid=334658050
Maybe something wrong with logging?
Are you doing beta-tests with boinc7.07 ?
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=335875952
It would help a LOT if your computers weren't hidden. It's not at all obvious what is happening with your WUs, and the more information I have, the better the chances of figuring out what's happening.
____________
My lucky number is 75898524288+1 |
|
|
|
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=334658038
http://www.primegrid.com/result.php?resultid=334658050
Maybe something wrong with logging?
Are you doing beta-tests with boinc7.07 ?
yes
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
x3mEn [Kyiv] wrote:
Michael Goetz,
http://www.primegrid.com/result.php?resultid=335875952
This is where I stand with your WUs:
1) Some of your WUs are terminating early, with no indication of why.
One possibility is that you're hitting "suspend" or "abort", or the boinc client is doing that automatically for any of a dozen of reasons, and the 7.0.7 version of boinc doesn't behave the way the stable release version does. That could be an error in boinc, or it could be a design feature; I haven't looked at 7.0.7.
Boinc should NOT be able to kill Genefer; it's supposed to ask Genefer to kill itself. That's not happening here for some reason.
I'm inclined to think that's what's happening because I don't see this behavior on any other computer. If you can, could you post the boinc log showing the termination of the erroneous genefer WU? Maybe there's some information there that would be useful. One thing that you could try is to revert to 6.12.34 and see if the problem goes away.
Then again, maybe it's that flash game again.
Another possibility is that something other than Boinc is killing genefer, such as an antivirus program. Or it's being killed manually via task manager. Or something else.
Yet another possibility is that genefer is just encountering some internal error that it can't handle. Genefer is designed to detect any possible errors and report them before terminating itself; that's not happening here. The only thing it can't detect is if the CUDA DLLs are missing, but that's not what's happening.
2) Those errored WUs are showing up as VALID. I don't know how that's possible; the client should be marking it as computation error if genefer died unexpectedly, and furthermore, the validator can't be seeing the output file from genefer. So there's several things wrong there. The validator problem Rytis is looking at. The part about it being considered a successful return might be due to 7.0.7 again. It's hard to say.
So that's what I know, and what I don't know. With the information I have, there's nothing I can think of to explain the behavior other than it being related to 7.0.7. Is there anything environmental, such as something running on the computer, user actions, or anything else that might shed some light on what's happening? Are the aborts happening spontaneously by themselves, or is it happening when you hit Suspend or take some other action?
I need more information to dig into this further. The output from this workunit shouldn't be possible.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=335875952
http://www.primegrid.com/result.php?resultid=334658038
http://www.primegrid.com/result.php?resultid=334658050
Those errored WUs are showing up as VALID. I don't know how that's possible
all these 3 WUs finished absolutely correct without any errors.
You can see that 334658038 even has correct log ending:
...
Testing 515132^262144+1... 65536 steps to go
515132^262144+1 is a probable composite. (RES=3d7dfc1f8c6f2249) (1497346 digits) (err = 0.1094) (time = 1:19:15) 21:47:46
21:47:46 (5608): called boinc_finish
I'm inclined to think that's logging problem with 2 others WUs.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=335875952
http://www.primegrid.com/result.php?resultid=334658038
http://www.primegrid.com/result.php?resultid=334658050
Those errored WUs are showing up as VALID. I don't know how that's possible
all these 3 WUs finished absolutely correct without any errors.
I can see that Boinc says it finished correctly. The log says it didn't finish. One possibility is the log is not getting flushed at the end.
That would be a problem in 7.0.7 -- and it would also nicely explain what we're seeing. I'll know better once Rytis checks the validator. If there was a valid output file, then it's just the log that's not getting flushed by boinc. I can try explicitly flushing the log before terminating.
I'm inclined to think that's logging problem with 2 others WUs.
I think you're right.
I put 1.04 beta2 up on the website. Give it a try and see if it fixes the logging problem with 7.0.7.
(At the moment my GPU is busy and I don't want to interrupt it, so I can't test beta2 myself. It aught to work the same as the first beta except for the 7.0.7 logging.)
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=336054775
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=336054775
Could you try suspending and resuming a WU? It would be useful to see if that works.
Thanks
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
It works.
As you can see for example here:
http://www.primegrid.com/result.php?resultid=334658038
Are you interested how it works with 1.04 beta 2 ?
Or you want to test logging with 1.04 beta 2 when suspend and resume a WU?
Oh... It will be much better to test on short WUs...
1h 20min for every 1 test - it's quite a lot.
FYI:
1. I upgraded to Boinc 7.08
2. I was playing flash game during the last test and... nothing bad has happened :)
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
It works.
As you can see for example here:
http://www.primegrid.com/result.php?resultid=334658038
Are you interested how it works with 1.04 beta 2 ?
Or you want to test logging with 1.04 beta 2 when suspend and resume a WU?
Yes, I'd like to know how beta2 does with suspend/resume with 7.0.8 if it's not too much trouble.
Oh... It will be much better to test on short WUs...
1h 20min for every 1 test - it's quite a lot.
That's why we started with those little 30 second WUs. :)
FYI:
1. I upgraded to Boinc 7.08
2. I was playing flash game during the last test and... nothing bad has happened :)
Awesome.
____________
My lucky number is 75898524288+1 |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
Mike,
FYI: NOTE: Some previously experimental features in the client software are now important to have for the 7.0 release. The existing 7.0 branch will no longer be maintained, we'll create a new branch for the 7.0 client line at a later date. Change Log people, changes will be back in Trunk.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=336118801
There were 3 suspention points:
1st: I suspended and resumed
2nd: I suspended, shut down Boinc, started up Boinc and resumed
3rd: I shut down Boinc (leaded to suspention) and started up Boinc again (leaded to resumption).
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Michael Goetz,
http://www.primegrid.com/result.php?resultid=336118801
There were 3 suspention points:
1st: I suspended and resumed
2nd: I suspended, shut down Boinc, started up Boinc and resumed
3rd: I shut down Boinc (leaded to suspention) and started up Boinc again (leaded to resumption).
Looks good, thanks! Hopefully this fixed the problem. Let me know if you get any more WUs with truncated logs.
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Quick question. I just noticed in the preferences that the cpu option of the geneferCuda app has been added. Does that mean that a wu will try to download for a cpu if checked? Was just wondering cause I didn't want a cpu job to run. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Quick question. I just noticed in the preferences that the cpu option of the geneferCuda app has been added. Does that mean that a wu will try to download for a cpu if checked? Was just wondering cause I didn't want a cpu job to run.
That's news to me. There's no boxes there at all when I look at it.
____________
My lucky number is 75898524288+1 |
|
|
|
|
Quick question. I just noticed in the preferences that the cpu option of the geneferCuda app has been added. Does that mean that a wu will try to download for a cpu if checked? Was just wondering cause I didn't want a cpu job to run.
That's news to me. There's no boxes there at all when I look at it.
You must pm John and ask for it (see first post in this thread). |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
Quick question. I just noticed in the preferences that the cpu option of the geneferCuda app has been added. Does that mean that a wu will try to download for a cpu if checked? Was just wondering cause I didn't want a cpu job to run.
That's news to me. There's no boxes there at all when I look at it.
You must pm John and ask for it (see first post in this thread).
The GPU box has been there for some time now, but a CPU check box is now available when one edits preferences (Mike, maybe you are working with cached pages?). I haven't checked the box, but if someone has, are you getting CPU work?
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Quick question. I just noticed in the preferences that the cpu option of the geneferCuda app has been added. Does that mean that a wu will try to download for a cpu if checked? Was just wondering cause I didn't want a cpu job to run.
That's news to me. There's no boxes there at all when I look at it.
You must pm John and ask for it (see first post in this thread).
The GPU box has been there for some time now, but a CPU check box is now available when one edits preferences (Mike, maybe you are working with cached pages?). I haven't checked the box, but if someone has, are you getting CPU work?
If I had to guess, it has something to do with my "volunteer developer" status. I know it didn't used to say that, and I also know I used to have a box I could check for the genefer project.
Now both of those are no longer true, so perhaps they are related.
Regardless of what it all means, I don't know what makes the boxes appear or not on the website. Or I guess I do know -- boxes appear on the website because Rytis makes them appear. ;-)
____________
My lucky number is 75898524288+1 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Quick question. I just noticed in the preferences that the cpu option of the geneferCuda app has been added. Does that mean that a wu will try to download for a cpu if checked? Was just wondering cause I didn't want a cpu job to run.
That's news to me. There's no boxes there at all when I look at it.
You must pm John and ask for it (see first post in this thread).
The GPU box has been there for some time now, but a CPU check box is now available when one edits preferences (Mike, maybe you are working with cached pages?). I haven't checked the box, but if someone has, are you getting CPU work?
Yes, the CPU checkbox has been there since: 13 Jan 2012 | 12:11:00 UTC. A quick look to the Applications page will show that a "Mac OS 10.5+ running on an Intel 64-bit CPU" build is available.
The Generalized Fermat Prime Search project has 4 flavors of Genefer available: Genefer80, Genefer, GenefX64, and GeneferCUDA. Currently available in BOINC are GenefX64 (courtesy of Iain) for the CPU and GeneferCUDA (courtesy of Mike) for the GPU.
The MacIntel CPU GenefX64 app. is testing well and validating against the GPU tasks as evidenced by this host: http://www.primegrid.com/results.php?hostid=190430. We hope to have other builds available soon.
____________
|
|
|
|
|
|
msi gtx 570 superclock - out of box at 1590. finished suspended wu from a 460 with no apparent errors. gonna leave it at out of box settings for now to see if it has any issues.
so what's the "normal" runtimes for a 570 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
msi gtx 570 superclock - out of box at 1590. finished suspended wu from a 460 with no apparent errors. gonna leave it at out of box settings for now to see if it has any issues.
so what's the "normal" runtimes for a 570
Here are some sample times in seconds from some 570s out there for the current WUs:
3170
3159
3236
3020
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Thanks Mike.. will compare to these |
|
|
|
|
|
Thanks for adding me to the project, John.
My machine keeps trying to download WUs since I selected CPU and CUDA option, but I constantly get only a "No work" message since then - which can't be, when I see that there are over 1.000 WUs for GFPS available.
What can be wrong? Thought it could be a client problem first and that it only works with the 7 version like Volpex, so I downloaded 7.0.8, but there is no difference. Graphic card GTX 260, should work...
Do I have to do something else?
____________
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Thanks for adding me to the project, John.
My machine keeps trying to download WUs since I selected CPU and CUDA option, but I constantly get only a "No work" message since then - which can't be, when I see that there are over 1.000 WUs for GFPS available.
What can be wrong? Thought it could be a client problem first and that it only works with the 7 version like Volpex, so I downloaded 7.0.8, but there is no difference. Graphic card GTX 260, should work...
Do I have to do something else?
First of all, don't use the version 7 beta client if you don't have to. The released version of genefer does not work correctly with 7, so you'll need to use a beta version of genefer and app_info to get it working 100% correctly.
My guess is that there's a configuration problem on the server. I noticed in testing that none of the computers that recieved genefer WUs had GTX200 series GPUs, even though some of them, including yours, ARE capable of running genefer.
So I think the server isn't configured to send WUs to them for some reason. The GTX 260 is CC 1.3 and is supposed to be able to get these WUs. I think it's something Rytis will need to look at.
____________
My lucky number is 75898524288+1 |
|
|
|
|
First of all, don't use the version 7 beta client if you don't have to. The released version of genefer does not work correctly with 7, so you'll need to use a beta version of genefer and app_info to get it working 100% correctly.
Okay, no problem, then I change back to the previous one. Don't like the 7-version anyway. ;-)
So I think the server isn't configured to send WUs to them for some reason. The GTX 260 is CC 1.3 and is supposed to be able to get these WUs. I think it's something Rytis will need to look at.
Hm, that's bad. So I have to wait 'til Rytis fixed that...
Thanks for the info anyway.
____________
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
|
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2651 ID: 1 Credit: 60,632,486 RAC: 119,580
                     
|
|
Please let me know the GPU initialization line from BOINC event log (near the top, similar to 2012.01.18 01:16:04 | | NVIDIA GPU 0: NVS 3100M (driver version 28562, CUDA version 4010, compute capability 1.2, 512MB, 47 GFLOPS peak)). Something is mismatching between these numbers and the plan class I've set up.
____________
|
|
|
|
|
|
Okay, here is it:
17.01.2012 20:10:50 | | NVIDIA GPU 0: GeForce GTX 260 (driver version 285.62, CUDA version 4.10, compute capability 1.3, 896MB, 744MB available, 715 GFLOPS peak)
Probably next line could be helpful too:
17.01.2012 20:10:50 | | OpenCL: NVIDIA GPU 0: GeForce GTX 260 (driver version 285.62, device version OpenCL 1.0 CUDA, 896MB)
____________
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
|
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2651 ID: 1 Credit: 60,632,486 RAC: 119,580
                     
|
|
I see absolutely no reason why it doesn't get work. Can you send me (admin@primegrid.com) C:\programdata\boinc\sched_request_www.primegrid.com.xml after attempting to get work?
____________
|
|
|
|
|
|
Okay, done.
Unfortunately I still haven't downgraded to the previous client version yet, will try to do this now. Maybe it already helps... ;-)
____________
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
|
|
|
|
|
Okay, done.
Unfortunately I still haven't downgraded to the previous client version yet, will try to do this now. Maybe it already helps... ;-)
Note: A downgrade from 7 to <6.13.3 is impossible due changes on the client_state.xml. |
|
|
|
|
|
Umph, great, just noticed that already... -_-
Got 6.13.12 now, but no change obviously. ;-)
____________
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Umph, great, just noticed that already... -_-
Got 6.13.12 now, but no change obviously. ;-)
You can always just un-install and re-install from scratch.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Just a note on overclocking with a 570.
It looks like maybe the 570 is a much more stable card than my old 460. I'm slowly increasing my o/c on the 570 just to see how it works with genefer. I've been able to raise the shaders from 1520 to 1720 but kept the core linked as I increased shaders. I also unlocked the voltage and raised it to +100 mv. I also raised the memory from stock to 2050. So far the work units are showing completed. I'm also running a mix of cw and genefer.
So, this is just a hunch but taking into account comments from before, I thinking that some of the o/c issues may have been caused by having the core/memory not raised when the shaders were since the sieves don't seem to care and people were trying to save temps.. This of course is just a wild ass guess. One more observation, the cpu times have increased quite a bit as the run time of the workunit has decreased.
All the work Mike's done I KNOW has helped. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Just a note on overclocking with a 570.
It looks like maybe the 570 is a much more stable card than my old 460. I'm slowly increasing my o/c on the 570 just to see how it works with genefer. I've been able to raise the shaders from 1520 to 1720 but kept the core linked as I increased shaders. I also unlocked the voltage and raised it to +100 mv. I also raised the memory from stock to 2050. So far the work units are showing completed. I'm also running a mix of cw and genefer.
So, this is just a hunch but taking into account comments from before, I thinking that some of the o/c issues may have been caused by having the core/memory not raised when the shaders were since the sieves don't seem to care and people were trying to save temps.. This of course is just a wild ass guess. One more observation, the cpu times have increased quite a bit as the run time of the workunit has decreased.
All the work Mike's done I KNOW has helped.
OVERCLOCKING
(This is NOT directed at you specifically. Just general comments to the community at large...)
General comment about overclocking: This isn't a game. Ok, that sounds very patronizing, but hear me out.
In its normal use of creating impressive visualizations of just how awesome you are at fragging your opponents, the GPU doesn't have to work 100% correctly. A few wrong bits here and there, or even sometimes an instruction that doesn't work right won't wreck the game. A bad instruction might crash the program, but some bad bits in the display won't be noticeable.
At worst, the game crashes, you lower the clocks a little bit, and hit reset.
Sometimes the game crashed because you set the clock rates under circumstances that have changed. It's warmer today, for example, so the card is a few degrees hotter than when you did your OC tests. Today, things are crashing, but they didn't yesterday.
We all know that crunching is more sensitive than playing games. But the part about the environment changing the reliability of the GPU is the same for games and crunching. If the room is warmer, or the heatsinks have a little more dust on them, the GPU might not run reliably at the same speeds as it used to.
When we get going full steam, WUs are going to take about 4 to 6 DAYS on your 570. That's an extrapolation based on the one known time that I have: they take 8 days on my 460.
I will not OC at all on these WUs. I do not want to lose a WU after an entire week of crunching. I'm going to be extremely cautious. I'd rather be running 10% or 20% slower but never get an error on a WU. The stakes (and the rewards) are too high.
So, getting down from the soapbox, my recommendation is that if you OC, please do so very conservatively.
MEMORY OC vs. SHADER OC
Regarding your comments about memory OC: Anyone who has run this and has a tool to measure video ram usage knows that Genefer uses about 400K to 500K of video ram. You would think, therefore, that memory clock settings would be more significant than with the sieves.
The truth is I don't know if it's true or not. Certainly, Genefer does use a lot of memory. However, one of the methods for making CUDA code run fast is to NOT use the video ram often. Yes, a large chunk of it is used, but the software attemps to use other memory on the card rather than the video memory. By comparison, the video memory (called "global memory" in CUDA parlance) is very slow. So an attempt is made to access the slow global memory as little as possible.
The bottom line is I think memory overclocking will help Genefer a lot more than it helps the sieves, but I've never tested it.
On the other hand, in my opinion, the idea about a mismatch between memory and core/shader clocks causing errors is unlikely to be correct. Ram is always slower than the CPU, and all computer designs since the 1970's incorporate mechanisms for the CPU to wait for its slow memory. I don't think, therefore, that the memory clock rate NOT being raised is causing errors. If anything, it's simply that the core/shader clocks are too high. It seems unlikely that raising the memory clock would make things better -- if anything, it should make things worse.
CPU TIME
As for CPU time, my guess is that a WU will take X number of CPU seconds regardless of the clock rate on the GPU. The OC doesn't affect the work that the CPU has to do. So, if CPU time is being measured accurately (hint: it's not), if you increase the GPU clocks, elapsed time will decrease and CPU time will be unchanged.
However, measuring CPU time is a lot like trying to count the number of mosquitoes in a moving swarm. The best you can do is take a good guess. So if you see the CPU time change, most likely it's a measurement error because you changed the pattern of CPU usage, rather than a real change in the amount of CPU time.
CPU time "measurements" are actually statistical samples. Certain CPU usage patterns could cause significant discrepancies between actual usage and the samples. In particular, the "only use the CPU once in a while" pattern that you see with a GPU program is particularly prone to measuring CPU usage incorrectly.
Bottom line is I wouldn't put any faith at all in the CPU time measurements for a GPU app. Actual CPU usage could go up while the reported CPU usage goes down, for example.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Sounds pretty in depth to me. Like I said mine was just a guess based on how my card was acting versus the issues reported. It's still too early in my experimenting to tell for sure and I'm not going to make any drastic change to the card. I'm going to continue a gradual increase for a bit and see what happens.
However, on the cpu time question. The reported run time and the reported cpu times are not accurate? or more specifically the reported cpu time? that kind of bites. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
|
Just a couple of comments to add to Mike's response:
1) The core and shader clocks are separately adjustable on only pre-Fermi cards (i.e., 300 series and earlier...though note that a strange little OEM card--the GeForce 405--is just a rebranded 310/210). For the Genefer app, that means this issue only applies to the GTX 2xx series cards. On all Fermi cards, the core and shader clocks will OC in unison (with a 2 to 1 shader to core clock ratio). BTW, for those ATI card users (should we be able to use them with Genefer at some point in the future), core and stream processor clocks are matched in a 1-to-1 ratio.
2) Overclocking of shaders only happens in 54 clock cycle increments. For example, if I use MSI Afternburner or something similar to set shader clocks at 1600, the actual clocks match a 54 point increment and would be set at 1566. The next meanigful increase would not be until one hits 1620 (i.e., there is no difference for setting shader clocks to 1580, 1590, 1600, or 1610...they are all effectively 1566). See here for a nice short explanation of this.
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Sounds pretty in depth to me. Like I said mine was just a guess based on how my card was acting versus the issues reported. It's still too early in my experimenting to tell for sure and I'm not going to make any drastic change to the card. I'm going to continue a gradual increase for a bit and see what happens.
However, on the cpu time question. The reported run time and the reported cpu times are not accurate? or more specifically the reported cpu time? that kind of bites.
For many GPU tasks -- and ONLY GPU tasks -- the reported CPU time might not be very accurate. This only applies to GPU tasks like ours that use a very small amount of CPU time. If a significant portion of a full CPU core is being used, the measurement should be fairly accurate.
All CPU tasks should have a fairly accurate CPU time measurement.
Elapsed time, i.e., "wall clock time", should always be accurate.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Here's a funky unit.... http://www.primegrid.com/workunit.php?wuid=239667396
Two have been validated and the third is still on inconclusive.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
|
|
|
Hey Mike,
To start out I have a Phenom II x6 1100T, 8GB Mem, MSI Twin Frozr II/OC GTX460. You can see the rest of stats/info on 'my computers'. Computer ID 217242
After testing for the last couple days I have some issues and observations to report. First which has nothing to do with you is the estimated task size (#GFLOPS) set for the WUs. They really mess things up in BOINC. Had to abort a bunch of WUs.
Well, the main problem I'M having is the screen lag. Firefox is barely usable while GeneferCUDA is running as well as Windows task bar although VLC runs Xvid files OK (except when moving mouse) but not mkv files. Now youtube vids will run in Firefox once you get past the VERY sluggish mouse pointer movement. Flash objects mostly work.
I've experimented with suspending, restarting, over and over with different progs and settings and nothing seems to phase the WUs and app. So far Firefox and the task bar lags (mouse pointer is the worst and does sporadically happen on other progs) are the only major problems (yes I always have a lot of tabs open (50-150) but are in a few different groups and I suspend groups not being used currently (75% of tabs or so)). I know that Firefox is a memory hog but I have 8GB and rarely goes above 4GB overall.
I've tried different clocks on my GPU (stock overclock 751 Core/1502 Shaders, 1800 Mem) I find the best combo of more speed/less screen lag is 1900 Shaders/1900 Mem. As you can see I can increase Shaders a lot, but memory only a little. Unstable point is 2020 Shaders and 2050 Memory where occasional errors/driver crashes begin to happen (can go to 2070 Shaders on PPS sieves). what's strange about my card is if I go below 1250 Memory the screen will lock up after a few mins. to a few hrs. no matter what's going on (or not (idle)), even at stock Shaders settings.
Usage is always 98-99%. I see that the app runs at normal priority which it should be at idle for BOINC (is this just for the testing phase?) and if I do change it to low/idle it helps with the screen lag a little but no where near enough to make Firefox/Windows task bar near normal useability. also WU times will vary a little more(~2mins.).
Just a few hours ago I had a system crash that corrupted Windows and had missing files and registry damage. Worst crash ever. Never had one that affect Windows like that. Took a couple hours to repair but the Genefer WU that was running just picked up at chkpt and finished OK. The 2 early WUs that errored out were when Mem setting was too high, but the more recent one is a mystery. I did have Firefox open and VLC was open but not playing. It happened while I was away from computer sooo???
Any more info you need just ask. Any suggestions lay it on me. Anything you want me to try, aye. I'm happy to help in anyway I can. I'll be in & out so getting back to you MIGHT be a little bit.
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
|
|
|
Neo, don't feel alone. Even running the 570 my screen lags. Maybe not as bad as you are getting but it lags. I was thinking someone else posted the same thing earlier on the lag. Lucky for me I don't actually have to use that pc for anything other than crunching. |
|
|
Lumiukko Volunteer tester Send message
Joined: 7 Jul 08 Posts: 165 ID: 25183 Credit: 749,143,289 RAC: 40,743
                           
|
...Well, the main problem I'M having is the screen lag. Firefox is barely usable while GeneferCUDA is running as well as Windows task bar ....
NeoMetal*
Have you disabled acceleration from Firefox configuration?
You could try changing (by about:config):
gfx.direct2d.disabled = TRUE
layers.acceleration.disabled = TRUE
those may help a bit.
--
Lumiukko |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Here's a funky unit.... http://www.primegrid.com/workunit.php?wuid=239667396
Two have been validated and the third is still on inconclusive.
Nothing funky about it. Go to the three results and look at the bottom of the stderr output for each of them, particularly the residual that each one is reporting. Th...
Oh, wait, I see what you're saying. That third one should be marked as invalid, not merely pending. Gotcha.
That's something for John and/or Rytis to look into.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Hey Mike,
To start out I have a Phenom II x6 1100T, 8GB Mem, MSI Twin Frozr II/OC GTX460. You can see the rest of stats/info on 'my computers'. Computer ID 217242
I've got a Core2QQuad and a single GTX 460, so except for having 2 less cores and one less GPU, my system has similar speeds to yours.
Well, the main problem I'M having is the screen lag.
On *Most* applications, I don't see noticable screen lag. Video in a window is fine, but full screen video doesn't play well if I'm crunching on the GPU.
Some Microsoft apps -- Microsoft Live Mail, and ... something else, maybe IE, work poorly when stuff is crunching on the GPU.
Unfortunately, there probably isn't much I can do to improve video performance. One thing is likely, however -- based on my tests, I expect this condition to get worse when we go to the longer WUs. So, a few recommendations:
1) Leave a CPU core free. This not only lets the GPU run a little bit faster, but I found that it helped the problem with screen lag. A lot.
2) Try chrome rather than firefox. I have no idea if that will help, but I use chrome and don't usually have a problem. That's more likely to be due to having 2 cores free on my system.
3) Uncheck "Use GPU when the computer is in use". Genefer (at least once 1.04 is deployed), checkpoints right before exiting, and it's initialization time is now negligible, so you lose nothing if it's turned on and off repeatedly. I can pretty much guarantee this will solve the screen lag problem!
After testing for the last couple days I have some issues and observations to report. First which has nothing to do with you is the estimated task size (#GFLOPS) set for the WUs. They really mess things up in BOINC. Had to abort a bunch of WUs.
Yup, it's a problem. It's up to John or Rytis to fix that.
I see that the app runs at normal priority which it should be at idle for BOINC (is this just for the testing phase?)
Yes, and what a royal pain in the rear it was to make it do that! The BOINC mechanism for doing that didn't work, and I had to bypass BOINC to achieve that.
CPU apps are supposed to run at idle priority, but GPU apps are supposed to run at normal priority. The priority ONLY affects CPU scheduling and doesn't affect when things run on the GPU. Having these tasks at normal keeps the GPU from being delayed by the other CPU BOINC tasks. As you noticed, the priority has little effect on the screen lag, but setting it to idle would slow down the GPU task due to interference from the CPU tasks.
Just a few hours ago I had a system crash that corrupted Windows and had missing files and registry damage. Worst crash ever.
I don't know if that was due to Genefer, but if it was, my apologies. V1.04 corrects a situation that could cause a video driver crash in rare circumstances, but if you're running Vista or 7, that shouldn't crash Windows.
The 2 early WUs that errored out were when Mem setting was too high, but the more recent one is a mystery. I did have Firefox open and VLC was open but not playing. It happened while I was away from computer sooo???
I'm afraid you lost me there. Not sure which WU's you're talking about. Also, not sure what VLC is. VNC maybe? VNC plays nicely with CUDA in my experience.
____________
My lucky number is 75898524288+1 |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
|
@Mike
VLC is a video player that is a popular alternative to Windows Media Player, etc.
@those with screen lag
In addition to Mike's suggestions, if you are a Win7 or Vista user, try turning Windows Aero off for the desktop. This takes up more system resources than most realize, and if you don't need those screen/desktop bells and whistles, it might solve much of the screen lag problem.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
http://www.primegrid.com/result.php?resultid=337404545
This is the first "invalid" wu I've had since we moved to the larger units. In looking at it this says it's a probable prime and yet I get it marked invalid. Can anyone tell me why my job was bad? |
|
|
|
|
http://www.primegrid.com/result.php?resultid=337404545
This is the first "invalid" wu I've had since we moved to the larger units. In looking at it this says it's a probable prime and yet I get it marked invalid. Can anyone tell me why my job was bad?
Three results seem to match each other. No clue why two of them got marked invalid. |
|
|
|
|
http://www.primegrid.com/result.php?resultid=337404545
This is the first "invalid" wu I've had since we moved to the larger units. In looking at it this says it's a probable prime and yet I get it marked invalid. Can anyone tell me why my job was bad?
Three results seem to match each other. No clue why two of them got marked invalid.
And I would never have looked at it except it showed invalid I haven't had any of those for several days. I was thinking maybe my o/c testing had caused it but I've had lots actually validate in the last 24 hours. |
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2651 ID: 1 Credit: 60,632,486 RAC: 119,580
                     
|
|
Validator had a trouble with this task, fixed.
____________
|
|
|
|
|
|
Maybe the validator is not set to accept Probable Primes :)
I think none has been found (or at least reported) in Boinc, yet. It would also be nice to see the Genefer subproject total in the users page (and new badges too...). |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
http://www.primegrid.com/result.php?resultid=337404545
This is the first "invalid" wu I've had since we moved to the larger units. In looking at it this says it's a probable prime and yet I get it marked invalid. Can anyone tell me why my job was bad?
Yup.
The good news is it looks like you broke the validator code by finding a 1,499,526 digit mega-prime!!! Congratulations! Actually, it's only a probable prime and will need to be checked by PFGW or LLR for primality, but the odds of it being PRP and not actually prime are extremely low.
I sent an email off to John and Rytis to make sure they know, if they didn't already. (Since the validator is marking these as invalid, it's probably going to keep sending these to additional computers.)
(The bad news is it looks like you're the double-checker. User "KWSN Raw Data" returned the result about 8 hours earlier.)
Should everything be working correctly, and this is indeed a prime number, it would be the second largest GFN prime ever found, and, um, ... the 9th largest prime ever found at PrimeGrid and the 24th largest known prime.
We've been testing Genefer on Boinc for a few weeks now, right? Not too shabby for a few weeks!
____________
My lucky number is 75898524288+1 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Maybe the validator is not set to accept Probable Primes :)
I think none has been found (or at least reported) in Boinc, yet. It would also be nice to see the Genefer subproject total in the users page (and new badges too...).
in time...this is still beta.
____________
|
|
|
|
|
http://www.primegrid.com/result.php?resultid=337404545
This is the first "invalid" wu I've had since we moved to the larger units. In looking at it this says it's a probable prime and yet I get it marked invalid. Can anyone tell me why my job was bad?
Yup.
The good news is it looks like you broke the validator code by finding a 1,499,526 digit mega-prime!!! Congratulations! Actually, it's only a probable prime and will need to be checked by PFGW or LLR for primality, but the odds of it being PRP and not actually prime are extremely low.
I sent an email off to John and Rytis to make sure they know, if they didn't already. (Since the validator is marking these as invalid, it's probably going to keep sending these to additional computers.)
(The bad news is it looks like you're the double-checker. User "KWSN Raw Data" returned the result about 8 hours earlier.)
Should everything be working correctly, and this is indeed a prime number, it would be the second largest GFN prime ever found, and, um, ... the 9th largest prime ever found at PrimeGrid and the 24th largest known prime.
We've been testing Genefer on Boinc for a few weeks now, right? Not too shabby for a few weeks!
LOL just my luck... does it count any that I reported the problem? J/K if it is a prime then that's wonderful for this project since it was found rather "quickly" as far as those go. Being the double checker (if I am and wasn't beaten on that as well) is not a bad thing just not quite as nice as being the owner. |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
http://www.primegrid.com/result.php?resultid=337404545
This is the first "invalid" wu I've had since we moved to the larger units. In looking at it this says it's a probable prime and yet I get it marked invalid. Can anyone tell me why my job was bad?
Yup.
The good news is it looks like you broke the validator code by finding a 1,499,526 digit mega-prime!!! Congratulations! Actually, it's only a probable prime and will need to be checked by PFGW or LLR for primality, but the odds of it being PRP and not actually prime are extremely low.
I sent an email off to John and Rytis to make sure they know, if they didn't already. (Since the validator is marking these as invalid, it's probably going to keep sending these to additional computers.)
Already on it. PRP being verified now. stderr has been updated to remove info.
____________
|
|
|
|
|
I see absolutely no reason why it doesn't get work. Can you send me (admin@primegrid.com) C:\programdata\boinc\sched_request_www.primegrid.com.xml after attempting to get work?
Anything new on the GTX 260-front, Rytis?
I still get the "no work"-message...
____________
Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
The good news is it looks like you broke the validator code by finding a 1,499,526 digit mega-prime!!! Congratulations! Actually, it's only a probable prime and will need to be checked by PFGW or LLR for primality, but the odds of it being PRP and not actually prime are extremely low.
Confirmed PRP using GenefX64: time = 8:14:42
pfgw still has another 36 hours to prove primality.
____________
|
|
|
|
|
|
Mike,
Sorry for delay. Scott is right that VLC is a media player. Now in regards to using a different browser and changing task bar/display settings, I know all that. I know the inner workings of windows very well, but to tell someone to change windows settings IS NOT the right thing to do. FIXING the software IS! If you were selling this app would you tell your customers to change settings or just live with it? I think not. What happens when it's released from beta and 1000s start complaining. In PRPnet you had no GUI. You need to change some code to ease up on things. As far as the WUs errors I gave you my computer ID but the one with unknown reason for error is http://www.primegrid.com/result.php?resultid=337226432 and as I said before I was away from computer when it happened.
I know you've put a lot of effort into getting GeneferCUDA to work in BOINC but there is still some tweaking needed with the screen lag problem. Maybe a sleep or wait code adjustment is needed. I wouldn't know as I only know a little about coding. All in all I think things are progressing OK.
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
FIXING the software IS! If you were selling this app would you tell your customers to change settings or just live with it? I think not. What happens when it's released from beta and 1000s start complaining. In PRPnet you had no GUI.
Unfortunately, the boinc version of geneferCuda is identical in all important respects to the PRPNet version. Except for the initialization (which is irrelevant to this discussion), character by character, it's the exact same software except for the code that interfaces with boinc. That code has no affect whatsoever on the GPU, the windows GUI, or screen lag.
So if you're seeing a difference between the PRPNet version and the boinc version, something very peculiar is happening, because 99% of the software is unchanged, and that 1% wouldn't have anything to do with screen lag.
Or did I misunderstand what you're saying?
but to tell someone to change windows settings IS NOT the right thing to do
I don't believe that I suggested changing any Windows settings; I think that was Scott suggesting that you turn off Aero, correct? I actually think that's a good idea -- for diagnostic purposes. It might be informative to know what kind of affect it has on your system. Personally, I like Aero and wouldn't turn it off, but it would be useful to know if it's contributing to the problem.
I did suggest that you try Chrome, but that's because that was the one obvious difference between your machine and mine, and I'm not seeing severe problems with these WUs. I have personally observed that some programs are affected worse than other. That's the reason I suggested trying Chrome.
The other possible contibuting factors might be:
1) are you running the cpu cores at 100% on all cores? I'm not, so that might be important. (I vaguely recall that running with all cores did cause worse screen lag. I recommend leaving a core idle to service the GPUs, as I consider it advantageous to keep the GPUs running at full speed even if it means less CPU crunching, rather than the other way around.)
2) I've only got one GPU. Maybe there's something about having two GPUs.
As for selling the software, I'm not. Nobody's saying you have to run it. You're a volunteer. I'm a volunteer. Nobody's forcing me to write the software, and nobody's forcing you to run it. It's your choice. I can tell you, however, that the side effects will be worse with the bigger work units.
The source code is available in several locations. If you see something in there that I can improve upon, I'd be happy to do so. Outside of slowing the program down significantly -- which most people would object to -- I'm not sure if there is a feasible way of making it not affect the user experience.
The boinc v1.04 software can be found here. The prpnet version of the software can be found on the Mersenne forums, although I don't see that exact version.
There is a boinc setting to suspend the GPU when the user is active. I made the program do immediate checkpoints when it gets suspended so that there's no undue penalty to using that feature. I.e., you don't lose an hour's work when you suspend the program for five seconds. I also sped up the initialization by a factor of about 3600:1, so restarting genefer no longer incurs a huge penalty (nearly two hours in the full size WU). If the screen lag is so bad, I suggest you turn off that checkbox. That's what it's there for.
The best I can tell you is that if I can make the program behave better without slowing it down, I will do so. Honestly, however, I wouldn't hold your breath. There's no obvious way to do that without slowing the program down, and I'm not going to do that. If you don't want to use the "Use GPU when computer is in use feature", you might want to pass on GFN. Or try to figure out why you're seeing worse screen lag than other people are.
As for that WU error, it's not going to be possible to figure out what happened in this particular case. The Nvidia API reported a non-specific error, which could be caused by anything at all -- including any number of things not under control of the program. It may have had something to do with Genefer or it may not. There's no way to tell.
I'm not ignoring that error -- but the simple truth is that GPU programs are running in a rather uncontrolled environment. If I see a pattern of errors happening, I'll track down the bug and squash it. But a single occurrence of an error could be do to anything, and quite honestly there's a zillion outside factors that can crash CUDA. In your case, Nvidia simply said "oops, something's wrong". No info at all on what broke, or why.
There is not a single BOINC GPU program I can't make crash in at least half a dozen ways. Most of the time, it's the user who is able to best diagnose what the problem is because he or she can observe what else is happening on the computer. So with no information on what happened, no way to reproduce it, and the very real possibility that the problem was caused by an outside factor beyond my control, there's not much I can do.
Often, the only way to track problems like that down is to keep track of every process running on the computer, and most people would consider that an invasion of privacy. It's certainly not something I would ever attempt to do.
____________
My lucky number is 75898524288+1 |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
The screen lag is known for all nVidia gpu's and has many causes.
First point is the high utilization around 99% and the second is the different architecture compared to AMD/ATI.
AMD uses a real many-core design up to 2048 cores with the same clock rates for core and shaders while nVidia uses a few-core design with up to 512 cores and different clock rates for core and doubled frequency for the shaders. nVidia tries to compensate the lower amount of available cores with much higher clock rates. This can work but not in all cases. Positiv sideeffect of a many core gpu design is, you have in most cases ever enough gpu-cores free for rendering the desktop.
From the cpu-architecture is known, that it is easier to saturate few cores than many cores and if you find a way (better code) to saturate a many core design, you will get a much powerfully processor (gpu, cpu).
Therefore AMD/ATI migrated from VLIW5 (HD5000) to VLIW4 (HD6000). The aim was to reach a higher utilization of the available cores. Reducing the stream processors to VLIW4 allows AMD to save on transistors for each individual SP and add more overall in the future (this future is reached now with the HD7000 series).
Collatz as example uses different command line parameters (i, S, L) to control the performance and resource utilization of their app.
- Ix,
Default Value: I8
Valid Values: I5 through I8
Purpose: Controls the number of items per loop. The setting represents the power of 2 that will be used for each dimension of the two-dimensional array of items being calculated. e.g. I5 = 2^5 rows by 2^5 columns = 32x32 = 1024 numbers calculated per loop and I8 = 2^8 rows by 2^8 columns = 65536 numbers calculated per loop. Values below 32 could be used but result in the GPU being only partially utilized. Values above 8 exceeds the amount of memory allowed per CUDA kernel. Using the parameters L13 I5 takes 257 seconds to complete the sample workunit. Using L13 I8 takes only 44 seconds. Anything below I7 drastically increased the run time and will require move GPU time to complete the same workunit.
- Lxx
Default Value: L5
Valid Values: L1 through L13
Purpose: controls the number of loops per reduction. The higher the number, the better the GPU utilization and the faster the workunit will complete. Also, the higher the number, the less responsive the system will be. Machines which are dedicated crunchers will likely want to use L13. Machines used while crunching will want to use a value from 1-5. The lower the number, the higher the elapsed time will be. For example, a value of L1 runs at 73% GPU utilization and takes 67 seconds whereas L13 runs at 99% GPU utilization and takes 43 seconds. By comparison, the v2.03 application takes about 51 seconds. The value is actually the power of 2 that is used, so L3 = 2^3 = 8 loops per reduction. L13 = 2^13 = 8192 loops per reduction. There is about a 2% difference in run time and a 1-2% difference in GPU utilization using L5 verse L13 on a 9800 GTX+.
In general the more items per loop (Ix) and the more loops per reduction (Lxx) the faster the workunit will complete and the worse the video response will be.
- Sxxxx
Default Value: S1
Valid Values: S0 through S4294967295
Purpose: controls the number of milliseconds to wait for the application to complete the loops and reduction. Setting the value to 0 will cause it to use CPU while waiting for the GPU to finish its calculations but will result in teh fastest elapsed time. It will not increase or reduce the GPU time needed. Settings from 1 to 10 will have little effect on the runtime if using many loops per reduction (e.g. L13) but will drastically reduce GPU utilization when using fewer loops per reduction (e.g. the stock setting of L3). For example, using S10 results in an elapsed time of 83 seconds with L3 and 44 seconds with L13. Note: Setting this to the max value will require 136 YEARS to complete a workunit.
Ken implemented something similar in his TPsieve-app for Cuda with the parameter "-m".
I think it should be possible to do the same for GeneferCUDA. But this would also result in longer computation times...
My 0.02 cents
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Ken implemented something similar in his TPsieve-app for Cuda with the parameter "-m".
I think it should be possible to do the same for GeneferCUDA. But this would also result in longer computation times...
My 0.02 cents
I'll take all the loose change I can get. ;-)
Unfortunately, most of the GPU calls the genefer does are fairly atomic in as much as each kernel is doing ONE thing, and one thing only. You can't get any smaller than that.
The exceptions are when it uses the cuFFT Nvidia FFT libraries, which sieves don't do. That is a single call to do the FFT, and the library is responsible for splitting it up to run on the shaders. I have absolutely no control over what it's doing internally, so if the problem is in there, there's nothing that can be done.
There's two routines in Shoichiro's code that do loops inside the Kernels, and they CAN be made smaller.
Neometal*, if you want to be a tester and are comfortable with using app_info to run custom apps, I could see if that helps. It's not guaranteed that it will fix, and there may be significant performance implications. Or it could be a silver bullet. We won't know without trying.
Also, my apologies if I seemed a bit grumpy in that last post. I should know better than to be answering stuff on the boards right before going to sleep or before I have my morning coffee. Unfortunately, that's when I seem to be on the boards the most. ;-)
On a related note, if this DOES help, it would be really nice if the user had a way of controlling this behavior. It's almost a certainty that the lag is going to vary a lot from system to system. Short of using app_info (which I don't consider suitable for use by the masses), anyone know of a *good* way of having a per-computer method of configuring how a boinc app runs? One way I can think of is to have the user set a system environment variable with the configuration parameters.
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Mike,
You were right and I was wrong on who suggested changing Aero or other display settings and I apologize for it.
So if you're seeing a difference between the PRPNet version and the boinc version, something very peculiar is happening, because 99% of the software is unchanged, and that 1% wouldn't have anything to do with screen lag.
Since the app was coded for NO GUI it probably was written to run at max utilization. By just adding a GUI (BOINC) interface shouldn't change GeneferCUDA, but maybe somehow it has added to the overall overhead thus causing lag on some systems. The original code needs to be adjusted with a slight backing off on the full bore utilization. I'm not at all near knowledgable enough at coding to find and fix even if I did have the source code so someone else will have to do it. Maybe Ken_G6, or quel at DistrRTgen as he is a regular here at PG including challenges and a talented coder.
I did suggest that you try Chrome, but that's because that was the one obvious difference between your machine and mine, and I'm not seeing severe problems with these WUs. I have personally observed that some programs are affected worse than other. That's the reason I suggested trying Chrome.
This actually is sort of a settings change by suggesting using different software. I will only use Firefox because of the security and privacy stuff. I also use addons that I wouldn't be without like multi tab rows, multi groups that can be suspended OR hibernated, but especially No Script. So I'm not giving up my Firefox.
As far as # of cores If you saw my computer stats from the ID# I gave in the fist msg you'd see I have 5 cores going on a 6 core all the time being this is my main computer. I also have Throttle set at 99% even though it's not really needed with a free core as total CPU utilization is about 85-87%(my other 6 core IS running all 6 and I have Throttle set at 97% so 3% free overhead available per core there) so no lack of CPU available for feeding GPU.
As for selling the software, that was a 'what if'. I never meant that you were, after all this is all open source I think. I was just meaning that you should make sure it works properly on 99.99% of computers WITHOUT significant lag before releasing it to the masses.
I'm typing this in Firefox as GeneferCUDA runs and for once I can type faster than Firefox can keep up (at my 15-20 words a min. rate) and the curser only appears sporadically. Also some letters are being switch around when 2 are type fast together or missing altogether. Even outright wrong letters sometimes. So typing as well as the mouse movements are still real bad in Firefox but the windows task bar problems are much better with this batch of WUs. I haven't change anything or rebooted since the last batch so that's a mystery. One major thing is if I turn off desktop composition the entire computer is unusable taking up to a minute just to move mouse pointer across the screen. Everything else in the display performance menu (even aero) makes no difference, better or worse, in lag of any kind. Just Firefox still but is slightly better since I closed most tabs but with only a few tabs open it's gone from virtually unusable to barely usable. Why only Firefox and why so bad in Firefox!?! I'll keep messin' with it. [2+ hours to type AND correct everything]
Last thing for now is the Gflops/Fpops estimates. I figured out ABOUT what is should be I THINK. instead of
26,000 Gflops/fpops - client_state file <rsc_fpops_est>26000000000000.000000</rsc_fpops_est>
try setting it to
760,000 Gflops - <rsc_fpops_est>760000000000000.000000</rsc_fpops_est>
That's about 300 times off. This should get it in the ballpark for the 262144 WUs and greatly reduce the struggle between CPU tasks & GeneferCUDA constantly increasing, decreasing, increasing, decreasing estimate time left numbers as each finishes in BOINC. Tell John or Rytis if they don't see this themselves.
I'll keep at it for now.
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I was just meaning that you should make sure it works properly on 99.99% of computers WITHOUT significant lag before releasing it to the masses.
I believe this is not possible with nVidia gpu's in the moment but nVidia made a step forward in this direction with the Fermi-architecture and her ability for faster switching between different kernels.
PS: I have screen lags in every cuda-project i am attached to and these are Einstein, Collatz, dnetc/Moo wrapper, Milkyway, DistrRTgen and Primegrid.
Einstein seems to have the lowest utilization rate from this projects and produces therefore only sometimes a lag.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Since the app was coded for NO GUI it probably was written to run at max utilization. By just adding a GUI (BOINC) interface shouldn't change GeneferCUDA, but maybe somehow it has added to the overall overhead thus causing lag on some systems.
I'm afraid you've made a false assumption there. A reasonable assumption, but incorrect nonetheless. GeneferCUDA has no GUI component at all.
GeneferCUDA is a command line program. The BOINC client is also a command line program. The only GUI part is the BOINC Manager, which is a separate program which isn't even necessary to run BOINC. The BOINC manager, the only GUI program in the whole BOINC ecosphere (not counting screen savers, anyway), communicates with both the BOINC client and the apps themselves via RPC's, essentially network messages. So there's no GUI code in Genefer at all.
The only way that affects Genefer is that it's occasionally doing network calls. If anything, that slows down Genefer, so if there's any noticeable effect at all, it should be to IMPROVE any lag, not make it worse. In practice, however, those network calls happen very infrequently -- at a human scale, not a CPU scale, so they'll have zero impact on performance.
The BOINC code shouldn't be the problem. If you're realling seeing a significant difference between performance under PRPNET and BOINC, since the Genefer code is, for all intents and purposes, identical on both platflorms then we need to look elsewhere for the cause. One possibility for the difference is the CUDA DLLs -- the version of PRPNET you're running *might* have the v4.0 DLLs, whereas my version has the 3.2 version because they're faster.
Another potential difference is that BOINC itself is doing something funny and is causing the problem. However, you're using the same BOINC version as I am, so that's probably not the culprit.
This is how I would proceed:
1) Are you certain the problem is worse under BOINC than under PRPNET? If you're certain there's a difference, then we're going to have to spend time figuring out why there's a difference because there shouldn't be any difference, and if there is then we must understand why before we can try to fix it. If they're behaving the same way then we can concentrate on fixing the problem.
If you're certain there's a difference, then I would ask you to take the BOINC geneferCUDA executable and copy it to the PRPNET directory(s) from which you run PRPNET, and rename it to GeneferCUDA.exe. This will run the BOINC version in PRPNET. See how much lag you get in that configuration.
The original code needs to be adjusted with a slight backing off on the full bore utilization.
Not necessarily. What's happening is NOT that the CUDA program is using too much of the GPU -- that's what it's supposed to do, after all -- but rather that it's using it for too long without giving Windows an opportunity to run.
So the fix is usually for the CUDA program to do it's thing in smaller chunks. Actually putting delays in there is never done.
I did suggest that you try Chrome, but that's because that was the one obvious difference between your machine and mine, and I'm not seeing severe problems with these WUs. I have personally observed that some programs are affected worse than other. That's the reason I suggested trying Chrome.
This actually is sort of a settings change by suggesting using different software. I will only use Firefox because of the security and privacy stuff. I also use addons that I wouldn't be without like multi tab rows, multi groups that can be suspended OR hibernated, but especially No Script. So I'm not giving up my Firefox.
That's more of a diagnostic suggestion than a fix or workaround suggestion. The idea is to find out if it's a contributing factor so that the problem can be better understood. If GeneferCUDA caused EVERYONE to have screen lag ALL THE TIME, that would be pretty straightforward. But that's not what's happening. Most people don't have any lag most of the time, as far as I know. Certainly I don't. But I do have problems with a few specific programs. So the suggestion to try Chrome was to see if Firefox is part of the problem. If you don't want to try that, fine, but the less information that I have the less likely it is that you'll end up with a satisfactory solution.
I have 5 cores going on a 6 core
Ok, so that's not part of the problem.
I also have Throttle set at 99% even though it's not really needed
Hmmmm.
I could envision a scenario where setting Throttle to anything other than 100% could cause disastrous lag problems with a GPU program.
Try turning that off. That might be the cause of your problem. If I'm correct, not only does it solve your problem but that would be extremely useful information to know. Best part is it will take all of about 60 seconds to test. If it doesn't fix anything, then at least we know that's not the problem.
One thing you need to realize is that you are the only person who has complained about the lag problem. What you are describing is a pretty severe lag problem, as bad as I've ever seen. It's a darn good bet that if others had a problem anywhere close to that, we'd be hearing about it loud and clear from people. Nobody else is complaining, so it's a good bet nobody has a problem quite like that. I'm sure some folks are seeing some lag, at least under certain circumstances, but nobody else seems to be having a problem that makes their computer unusable.
There's something different that's happening on your computer that isn't happening on other computers.
We both need to understand what's different before a fix can be made. You need to understand the problem because it will lead to a fix that directly affects you. I need to understand it because sooner or later there's going to be a second person who has the same problem, and it would be nice if we could either fix the problem, or at a miniimum explain what the cause is so that person has options.
For that reason, it would help both of us if when there's a suggestion about changing something on your end, that you try it. Don't want to use Chrome? You don't have to. But try it and see if it affects the problem. Don't want to turn off Aero? Fine, but try it so we know if it's a factor. (It has been a factor in the past, so that's a really good suggestion.)
If you aren't willing to help diagnose this, seriously, you might be happier just not running genefer. Someone's got to run the tests, and the person running the test MUST be experiencing the problem. You're the only person with the problem, so if you're not willing to do the tests it's not going to get fixed.
Last thing for now is the Gflops/Fpops estimates.
That's a server configuration issue, and beyond my control. I know that the admins are aware of the problem, but I have no information on when or if any action will be taken.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Because of Firefox lag this will be short for now. I've tried everything you've said, except try chrome(which I will today), as well as a ton of other things. My last post I talked about aero and the other display settings and the one new critical thing, turning off desktop composition makes ENTIRE computer come to a NEAR lockup. Try it and see if you get any problems with it off. The other aero effects make no difference with anything on computer or Firefox.
A new thought is a memory issue. Firefox uses memory alot and I think you said earlier in this thread that genefer uses system memory alot too. I tried a memory optimizer and it helped a fair amount but gradually went back to original lag after 10-15 seconds. Any thoughts?
Last thing for now is I've been noticing a difference in lag, sometimes significant, from WU to WU. There's another to ponder.
For now,
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
turning off desktop composition makes ENTIRE computer come to a NEAR lockup. Try it and see if you get any problems with it off.
I actually have no idea what desktop composition is. Explain?
A new thought is a memory issue. Firefox uses memory alot and I think you said earlier in this thread that genefer uses system memory alot too. I tried a memory optimizer and it helped a fair amount but gradually went back to original lag after 10-15 seconds. Any thoughts?
When I mentioned "global" memory, that's a type of memory on the GPU card. When they talk about a GPU having, say, 1 GB of memory, they're talking about "global" memory. It's got nothing to do with memory on the host computer.
Genefer on my system only uses about 40-50 MB of host memory for this size WU, so Firefox's memory usage shouldn't be an issue unless your virtual memory is significantly overcommitted. You can check, via the task properties in the boinc manager, how much it's using on your system. It should be using a similar amount.
Last thing for now is I've been noticing a difference in lag, sometimes significant, from WU to WU. There's another to ponder.
It's unlikely it has anything to do with the WUs, as similar WUs should have very similar characteristics. Something funky is happening; we just have to figure out what it is. You can try re-running identical WU's from the command line and see if the problem is repeatable. The command line is printed in the stderr output; just use that command line without the -boinc option.
Have you tried running boinc with CPU utilization set to 100%? Since you say this only happens under boinc (or at least I *think* you said that -- can you confirm?) this points a HUGE, GLOWING, NEON sign right in the direction of the CPU UTILIZATION setting.
Therefore, please answer the following:
1) Does the problem go away if you change the "CPU Utilization" to 100%?
2) If no, then does this problem only happen under boinc or does it happen under PRPNET also?
3) If it happens only under PRPNET, does it continue to happen if you replace the PRPNET GeneferCUDA with the BOINC GeneferCUDA?
I'll need those answers to continue.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
I recognized some strange stderr outputs of some genefercuda wu's.
Only the half of the output was transferred:
http://www.primegrid.com/result.php?resultid=337472446
This one is complete empty:
http://www.primegrid.com/result.php?resultid=337472204
But the Wu's was successfully verified by my wingman.
Is this a general boinc error or something special with genefercuda. I never seen this on other WU's at my hosts.
Regards Odi
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I recognized some strange stderr outputs of some genefercuda wu's.
Only the half of the output was transferred:
http://www.primegrid.com/result.php?resultid=337472446
This one is complete empty:
http://www.primegrid.com/result.php?resultid=337472204
But the Wu's was successfully verified by my wingman.
Is this a general boinc error or something special with genefercuda. I never seen this on other WU's at my hosts.
Regards Odi
I'm stumped. The WU's seem to be producing output files, or they wouldn't be validating. This problem was reported previously, and I was hoping a fix in 1.04 would correct it, but apparently not.
I don't know what is messing up the stderr output. It's not critical to the operation of the WU and is really only used for debugging, but I wouldn't mind understanding and fixing this. I don't have an ideas about what could be causing it, however.
::ponders:: You're not looking at the stderr.out file in the slot/## directory by any chance, are you? That's the only thing I can think of that might cause this.
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
I'll get to answering more a little later but at the moment very little lag. If I stop and start BOINC the WU elapsed time backs up and different level of lag when restarted. If I restart it enough times it will start at beginning with 0 elapsed time.
Desktop composition is one of the selections in the same place you turn aero on or off.
Control panel>System>System>Advanced system setting>Performance>Visual effects
CPU utilization has been at 100% in Throttle since I first mentioned it earlier.
Haven't been able to use PRPnet since trying to upgrade to current version. Last version was a pain to get working. Had to run in C:\ only. Tried for 20+ hours with UAC settings and other stuff but root folder was only way. Now with new version it worked once and now non of the ini commands work (can't find files in path...blah blah). Works in my other OS on this box. I have dual boot into 2 different win7s. AAAAAHHHH... I'll boot into other win7 and see if any of these issues happen on it. Have to do it a little later though, got to run out in a bit, but will be back in a few hours.
Me
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Desktop composition is one of the selections in the same place you turn aero on or off.
Control panel>System>System>Advanced system setting>Performance>Visual effects
I've never changed anything in there. I don't intend to, either. :)
The only setting I see there that references Aero is "Aero Peek", which I don't think is the same thing as turning Aero on and off. I think this is just one single feature of Aero that this controls.
Turning Aero off is accomplished by going to the Windows Theme selection dialog, and choosing a non-Aero theme.
If you want my honest opinion, I think your Windows settings are messed up somehow, and re-installing Windows might not be a terrible idea. You simply should not be seeing these problems. I think that would be the best thing for you -- but not the best for me.
This discussion has given me some ideas about reducing screen lag that I'd like to try out, and your system -- the way it is now -- might be the best place to test those ideas. But don't let that stop you if you think re-installing is a good idea. I can always find another way to test.
Mike
____________
My lucky number is 75898524288+1 |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Had to run in C:\ only. Tried for 20+ hours with UAC settings and other stuff but root folder was only way. Now with new version it worked once and now non of the ini commands work (can't find files in path...blah blah). Works in my other OS on this box. I have dual boot into 2 different win7s. AAAAAHHHH... I'll boot into other win7 and see if any of these issues happen on it. Have to do it a little later though, got to run out in a bit, but will be back in a few hours.
If you want to use a folder in root of drive C you need to adjust the read/write-permissions of this folder to "any".
I walked in the same trap more than one times...
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
|
Credits are only at 3600 max, any chance to adjust them related to gcw and pps sieve? |
|
|
|
|
|
Well Mike,
I've switched to my other Win7 OS and so far no problems at all except when playing video in VLC which stutters significantly but that's almost expected. Strange how my other OS VLC played smoothly and Firefox gave me all that trouble. There is the occasional stutter here and there anywhere on the system but no big deal.
Before switching this morning I tried installing Chrome and holy cowhide Batman... It nearly locked up the entire OS just like when desktop composition was turned off. And just by closing it, it would not revert back to near normal (for whatever normal is on that OS) until I stop and restarted BOINC. More strangeness for that OS.
Well as you said earlier that OS is just messed up. I'm going to start using the other now but I'll have to reinstall it as that big bad crash a few days ago has affect this OS as well . Boot time is 10x longer and none of the control panel links work. This leads me to believe that the crash was hard drive related as both OS partitions are on the same drive (WD VelociRaptor 74GB). So all of the problems were unique to that OS and it custom screwyness. I'm going to keep on testing on this OS 'as is' for now up to the challenge. Then after the challenge I'll reintall OS but before doing so I'll do some crazy stuff and see if OS can handle it while GeneferCUDA is running. May stumble on to something.
Anyways Thanks for all the help so far (hoping that I won't need more) and I'll report if anything new comes up. If you want me to do any custom testing before I reinstall OS get with me right after the challenge has ended.
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
Neometal,
I did some tests. There's an internal setting I can vary that does have an effect on screen lag, but there's a trade off.
The larger this parameter is, the more screen lag there is. However, raising this parameter also lowers CPU usage and increases GPU utilization. So you can lower this value, which improves the screen lag problem, but causes more CPU to be used and also slows the GPU (and hence the WU).
I can make it so the user has control over this setting.
In my opinion, however, the trade off is a poor one. To make a substantial difference in the lag (which only happens with certain programs), you will end up significantly slowing the GPU and using a significant portion of a CPU core. I don't envision a situation where this is preferable to using the "Use GPU when computer is in use" setting.
____________
My lucky number is 75898524288+1 |
|
|
|
|
Neometal,
I did some tests. There's an internal setting I can vary that does have an effect on screen lag, but there's a trade off.
The larger this parameter is, the more screen lag there is. However, raising this parameter also lowers CPU usage and increases GPU utilization. So you can lower this value, which improves the screen lag problem, but causes more CPU to be used and also slows the GPU (and hence the WU).
I can make it so the user has control over this setting.
In my opinion, however, the trade off is a poor one. To make a substantial difference in the lag (which only happens with certain programs), you will end up significantly slowing the GPU and using a significant portion of a CPU core. I don't envision a situation where this is preferable to using the "Use GPU when computer is in use" setting.
Unless it's really critical my vote would be to have the gpu run as fast as possible. Even if you give the user "control" we would then need to do quite a bit more testing to insure the control piece is doing what it needs to. Of course I do under Neometal's problem. |
|
|
|
|
|
I would vote exactly the opposite. As posted before, I have the lags too. Even with one (or more) core(s) free, my system is nearly unworkable. That means I won't be running GFNCuda while my pc is in use, i.e. the choice between a little bit of runtime and a lot of runtime.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I would vote exactly the opposite. As posted before, I have the lags too. Even with one (or more) core(s) free, my system is nearly unworkable. That means I won't be running GFNCuda while my pc is in use, i.e. the choice between a little bit of runtime and a lot of runtime.
I don't think I made myself clear enough.
You need to give up a lot of GPU performance and a lot of CPU usage in order to gain a little bit of improvement in screen lag. The trade off, while possible, isn't worth it. I can't imagine anyone who has problems with lag gaining enough of an improvement to make it worth it.
Therefore, don't get your hopes up. I'm going to start a thread specificallly for discussing this problem.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Noticed the unit count is falling. Just wondering if testing will be suspended for a bit since the challenge is fixing to start. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Noticed the unit count is falling. Just wondering if testing will be suspended for a bit since the challenge is fixing to start.
It could be that they're going to refill the queue with larger WUs once this empties out. At least that's what I'm hoping.
____________
My lucky number is 75898524288+1 |
|
|
|
|
Noticed the unit count is falling. Just wondering if testing will be suspended for a bit since the challenge is fixing to start.
It could be that they're going to refill the queue with larger WUs once this empties out. At least that's what I'm hoping.
lol - not something we normally hear.. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Noticed the unit count is falling. Just wondering if testing will be suspended for a bit since the challenge is fixing to start.
It could be that they're going to refill the queue with larger WUs once this empties out. At least that's what I'm hoping.
lol - not something we normally hear..
We're not going to set a world record with little WUs. ;-)
Big WUs == Big Prime Numbers. The GFN domain has a pretty dense concentration of primes. This is a "target rich" environment.
____________
My lucky number is 75898524288+1 |
|
|
|
|
Noticed the unit count is falling. Just wondering if testing will be suspended for a bit since the challenge is fixing to start.
It could be that they're going to refill the queue with larger WUs once this empties out. At least that's what I'm hoping.
lol - not something we normally hear..
We're not going to set a world record with little WUs. ;-)
Big WUs == Big Prime Numbers. The GFN domain has a pretty dense concentration of primes. This is a "target rich" environment.
Now that is something good to hear.. speaking of John said that last hit was a PRP.. that means what? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Now that is something good to hear.. speaking of John said that last hit was a PRP.. that means what?
LLR is a primality test. Genefer is a probable primality test. PRP == Probable Prime.
Probable, in this sense, is a VERY high probability -- something like 99.99%. But it's not proof of primality. With some projects, such as GFN, there's a very fast PRP test, so we search with that. When we find a PRP, we then use a slower program to do the actual proof.
Using my hardware as an example, a world record GFN might take 8 days to PRP test with GeneferCUDA on the GPU, but would take 40 times that (320 days!!!) to do the actual proof using PFGW64 on my CPU.
So when GeneferCUDA found that prime a few days ago-- actually a probable prime, or PRP -- we still had to wait about 2 days or so for it to be proven prime using the PFGW64 program. (The LLR program can also do primality tests on GFNs in a time comparable to PFGW64.)
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
So did I miss it or was it not proven to be a prime? Just wondering cause it was such a large find and it would be great if it was. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
So did I miss it or was it not proven to be a prime? Just wondering cause it was such a large find and it would be great if it was.
http://www.primegrid.com/forum_thread.php?id=3966
and
http://www.primegrid.com/download/gfn-525094_262144.pdf
____________
My lucky number is 75898524288+1 |
|
|
|
|
So did I miss it or was it not proven to be a prime? Just wondering cause it was such a large find and it would be great if it was.
http://www.primegrid.com/forum_thread.php?id=3966
and
http://www.primegrid.com/download/gfn-525094_262144.pdf
thanks John ! this is great for the new project. |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
|
Thanks to Ronald, we will soon have a Linux build of GeneferCUDA in BOINC. If you are interested in testing, please PM me. You must have Linux OS and a GPU with compute capability >= 1.3
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
WU Estimated GFLOPS setting changed
Kudos go to Rytis for changing this!
The estimated GFLOPS on a WU was increased by almost a factor of 1000, and is now perfect -- for me, anyway. My DCF is now 0.985 (1.0 is what it should be in a perfect universe.) It should be reasonable (or better) for everyone else.
Note that there may still be some old WUs sitting in your cache, or perhaps resends of older WUs, so you still might get the odd WU that screws up your DCF. By and large, however, this should hopefully now be more or less fixed for everyone.
I've given Rytis estimates for larger WUs, so hopefully those will work well also when we increase the WU size.
Over time, BOINC will correct the DCF on your clients as you complete more WUs, but BOINC intentionally is designed to lower DCF very slowly, so it will take several days for it to reset to where it should be.
If you want to accelerate the process, you can do the following:
1) Shut down the BOINC client (Advanced -> Shut Down Connected Client)
2) Edit client_state.xml in your BOINC data directory, and find the <duration_correction_factor> tag in the PrimeGrid <project> section of the file. Change the value to 1.0, and save the file.
3) Restart the BOINC client (Advanced -> Select Computer...).
____________
My lucky number is 75898524288+1 |
|
|
|
|
Thanks to Ronald, we will soon have a Linux build of GeneferCUDA in BOINC. If you are interested in testing, please PM me. You must have Linux OS and a GPU with compute capability >= 1.3
Hopefully, TheDawgz are getting these messages are because the Linux version isn't out there for testing in BOINC quite yet.
Wed 25 Jan 2012 04:57:14 PM MST | PrimeGrid | Message from server: Genefer is not available for Linux running on an AMD x86_64 or Intel EM64T CPU.
TheDawgz thank both Ronald and Michael for all their time and effort !!!
____________
There's someone in our head but it's not us. |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Thanks to Ronald, we will soon have a Linux build of GeneferCUDA in BOINC. If you are interested in testing, please PM me. You must have Linux OS and a GPU with compute capability >= 1.3
Hopefully, TheDawgz are getting these messages are because the Linux version isn't out there for testing in BOINC quite yet.
Wed 25 Jan 2012 04:57:14 PM MST | PrimeGrid | Message from server: Genefer is not available for Linux running on an AMD x86_64 or Intel EM64T CPU.
It has not been released yet. It will be announced when it is.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
Okey-dokey, PM sent.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
|
|
|
http://www.primegrid.com/result.php?resultid=339575333
Can someone tell me why this work shows complete for me but the output doesn't look right? |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
http://www.primegrid.com/result.php?resultid=339575333
Can someone tell me why this work shows complete for me but the output doesn't look right?
There are several posts discussing this...but this is all you really need to know:
Known bugs:
Sometimes the output on the result page is truncated. The cause is not understood, but this does not prevent the WU from validating correctly.
____________
|
|
|
|
|
|
Thanks John |
|
|
|
|
|
Michael Goetz,
do you remember my troubles with gfn and flash games?
I think I've found the source of my problem.
I'm close to guess it's SLI.
I broke SLI and many things changed to better.
First of all I found that temp. of master card significantly decreased.
Earlier the temperature of silent (master) card was even higher than on working (secondary) card. I thought the reason is bad ventilation but actually it isn't.
Now the temperature of silent card up to 20 degrees lower than on working.
Second, the system lag gone. I guess the sytem lag led to maxErr errors.
After breaking SLI I found that the line of GPU usage is close to straight (99%).
So, I'll observe the situation several days... but I feel it will help.
SLI is evil for DC!
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
Ah, SLI rears its ugly head again!
Thanks for that information. I only have one GPU, so SLI had slipped my mind.
I remember that when CUDA was new, CUDA wouldn't work at all if SLI was turned on. But I hadn't heard much about SLI lately, so I figured Nvidia had worked the kinks out and that now SLI and CUDA had learned to play nicely together.
I guess not. :)
Thanks for the information, I'm sure that will be useful to a lot of people. Looking at my wingmen, it sure seems like a lot of people have multiple GPUs.
Thanks,
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Michael Goetz,
unfortunately maxErr did not go completely.
1 of 3 tasks was terminated with maxErr:
http://www.primegrid.com/result.php?resultid=340072575
2 others are still pending, but 1 of them looks questionably:
http://www.primegrid.com/result.php?resultid=340072587
http://www.primegrid.com/result.php?resultid=340072524
The log of the first ends with:
Terminating because BOINC client requested that we should quit.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Michael Goetz,
unfortunately maxErr did not go completely.
1 of 3 tasks was terminated with maxErr:
http://www.primegrid.com/result.php?resultid=340072575
2 others are still pending, but 1 of them looks questionably:
http://www.primegrid.com/result.php?resultid=340072587
http://www.primegrid.com/result.php?resultid=340072524
The log of the first ends with:
Terminating because BOINC client requested that we should quit.
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1620 MHz
This is stock:
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1350 MHz
Genefer is a very picky girl and doesn't seem to like overclocking. People have reported that overclocking will cause Genefer to have problems with rounding errors, even though other apps seem to work fine at the same speeds.
As for the "terminating" message, it means exactly what it says: Genefer detected that the Boinc client asked for it to shut down. Perhaps your Boinc log says why.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Michael Goetz,
GPU=GeForce GTX 460
Clock=1620 MHz
This is stock:
GPU=GeForce GTX 460
Clock=1350 MHz
Genefer is a very picky girl and doesn't seem to like overclocking. People have reported that overclocking will cause Genefer to have problems with rounding errors, even though other apps seem to work fine at the same speeds.
Zotac GTX 460 AMP! Edition factory overclocked.
As for the "terminating" message, it means exactly what it says: Genefer detected that the Boinc client asked for it to shut down. Perhaps your Boinc log says why.
FYI, If it will help:
01/26/12 21:10:16 | PrimeGrid | [task] Process for genefer_262144_13055_1 exited
01/26/12 21:10:16 | PrimeGrid | [task] task_state=EXITED for genefer_262144_13055_1 from handle_exited_app
01/26/12 21:10:16 | | [cpu_sched_debug] Request CPU reschedule: application exited
01/26/12 21:10:16 | PrimeGrid | Computation for task genefer_262144_13055_1 finished
01/26/12 21:10:16 | PrimeGrid | [task] result state=FILES_UPLOADING for genefer_262144_13055_1 from CS::app_finished
01/26/12 21:10:16 | PrimeGrid | [debt] recent est credit: 31.60G in 31.17 sec, 565.581249 + 2.175951 ->567.757200
01/26/12 21:10:16 | | [cpu_sched_debug] Request CPU reschedule: handle_finished_apps
01/26/12 21:10:16 | | [cpu_sched_debug] schedule_cpus(): start
01/26/12 21:10:16 | PrimeGrid | [prio] -1.000000 rsf 1.000000 rt 567.757200 rs 567.757200
01/26/12 21:10:16 | PrimeGrid | [prio] -1.000000 rsf 1.000000 rt 567.757200 rs 567.757200
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Genefer is a very picky girl and doesn't seem to like overclocking. People have reported that overclocking will cause Genefer to have problems with rounding errors, even though other apps seem to work fine at the same speeds.
Zotac GTX 460 AMP! Edition factory overclocked.
It's still overclocked by 20% regardless of who overclocked it. Remember, the 'factory' that overclocked it was Zotac's factory, not Nvidia's. Should it work? Yes, according to Zotac. If Nvidia warranted it to work at 1620 MHz, it wouldn't be called "overclocked".
Overclocking by 20% doesn't just mean that the card is running 20% faster. It's also running 20% hotter, and is drawing 20% more power. It could be another part of the computer that's having trouble keeping up even if the GPU is up to it.
That might be a problem, or it might not. There's only one way to know. If the card runs more reliably at stock clock speeds, then the problem is the overclocking. If it's the same, then the problem is elsewhere.
(Actually, that's not entirely true. A software bug could cause timing-related issues that would be sensitive to clock speed. I suspect that as one of the possible causes of the 550 Ti problem. However, neither Shoichiro nor myself can find any such problem in the Genefer code, but such a problem might still exist in either Nvidia's cuFFT library or the video drivers. In either case it would manifest itself more as the clock speed goes up. Regardless of whether it's a hardware problem or a Nvidia software problem, there's little I can do about it. The only possible solution (other than slowing the clocks) is if the problem is in cuFFT, we could write our own software to replace cuFFT. It may come to that if I can prove the problem is in cuFFT, but I don't see that happening quikly, if at all.)
As for the "terminating" message, it means exactly what it says: Genefer detected that the Boinc client asked for it to shut down. Perhaps your Boinc log says why.
FYI, If it will help:
01/26/12 21:10:16 | PrimeGrid | [task] Process for genefer_262144_13055_1 exited
01/26/12 21:10:16 | PrimeGrid | [task] task_state=EXITED for genefer_262144_13055_1 from handle_exited_app
01/26/12 21:10:16 | | [cpu_sched_debug] Request CPU reschedule: application exited
01/26/12 21:10:16 | PrimeGrid | Computation for task genefer_262144_13055_1 finished
01/26/12 21:10:16 | PrimeGrid | [task] result state=FILES_UPLOADING for genefer_262144_13055_1 from CS::app_finished
01/26/12 21:10:16 | PrimeGrid | [debt] recent est credit: 31.60G in 31.17 sec, 565.581249 + 2.175951 ->567.757200
01/26/12 21:10:16 | | [cpu_sched_debug] Request CPU reschedule: handle_finished_apps
01/26/12 21:10:16 | | [cpu_sched_debug] schedule_cpus(): start
01/26/12 21:10:16 | PrimeGrid | [prio] -1.000000 rsf 1.000000 rt 567.757200 rs 567.757200
01/26/12 21:10:16 | PrimeGrid | [prio] -1.000000 rsf 1.000000 rt 567.757200 rs 567.757200
Were there any messages before that? The first message there is Genefer exiting, and we already knew it did that. If there's anything interesting in the log, it would be right before that. If there's nothing there, then at the moment there's nothing I could do to trace the cause. It's not reproducible on my system, and since Genefer is being told to exit by the Boinc client, the real question is what's happening inside Boinc.
Which brings us to the fact that you're running a beta version of Boinc. Therefore, the question becomes "Why is Boinc 7.0.11 setting the quit_request flag in the BOINC_STATUS data structure?". That's something you'll have to take up with the Boinc developers. I've got enough on my plate getting Genefer running with the released versions of Boinc; I make no claims that it will work properly with versions that are still under development, sorry. To me this is clearly a case of the Boinc client doing something unexpected. Specifically, it's turning on the flag that tells the application to shut down. That's the only way Genefer will print that message.
____________
My lucky number is 75898524288+1 |
|
|
|
|
If there's anything interesting in the log, it would be right before that.
Nothing interesting.
Which brings us to the fact that you're running a beta version of Boinc. Therefore, the question becomes "Why is Boinc 7.0.11 setting the quit_request flag in the BOINC_STATUS data structure?". That's something you'll have to take up with the Boinc developers. I've got enough on my plate getting Genefer running with the released versions of Boinc; I make no claims that it will work properly with versions that are still under development, sorry.
Don't worry, I am aware of I'm using development version of BOINC and beta GFN at my own risk. I spend a lot of inefficient GPU time specially trying to find something helpful to you as a developer. I'm not asking you to solve my problems. :)
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Were there any messages before that? The first message there is Genefer exiting, and we already knew it did that. If there's anything interesting in the log, it would be right before that. If there's nothing there, then at the moment there's nothing I could do to trace the cause. It's not reproducible on my system, and since Genefer is being told to exit by the Boinc client, the real question is what's happening inside Boinc.
Mike i have also one or two units with the same problem on linux with boinc-client 6.10.56.
I checked my logs and found PPSsieve-units on the host. I have no debugging enabled and my LTD-value is resetted after client restart.
After a gpu work request my client goes immediattely in the EDF-mode when genefer-units come in and BoincView displayed for PG the message: deadline miss. The short deadline of genefer-units and/or an incorrect flops-value seems to cause this. Last times i saw something around 260days in BoincView for one unit...
[add]
I got a new PPSsieve-unit with 104 days to completion. Either PG has increased some WU values or genefer causes troubles inside the boinc-client.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I got a new PPSsieve-unit with 104 days to completion. Either PG has increased some WU values or genefer causes troubles inside the boinc-client.
Please see the instructions for quickly resetting DCF in Generalized Fermat Prime Search
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
Hi Michael!
It's very interesting that in spite of
Terminating because BOINC client requested that we should quit.
that WU is marked as Valid and 3600 credits granted:
http://www.primegrid.com/result.php?resultid=340072587
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Hi Michael!
It's very interesting that in spite of
Terminating because BOINC client requested that we should quit.
that WU is marked as Valid and 3600 credits granted:
http://www.primegrid.com/result.php?resultid=340072587
I noticed the same thing.
The only scenario I can think of to explain it is this:
1) Boinc told Genefer to shut down for some reason.
2) Boinc restarted Genefer, but the output from Genefer wasn't included in the stderr.txt file for some reason.
Since we already know there's a problem with some output not getting into stderr.txt, this scenario isn't completely beyond the realm of possibility. Since we know two things to be facts: that Boinc told Genefer to shut down, and that Genefer nevertheless completed processing and produced valid results, this seems to be a scenario that fits the facts.
Another theoretical possibility is that Genefer never actually shut down and just completed what it was doing, but nothing else was recorded in stderr.txt. This should not be possible, however, because 5 lines down from where that message gets printed Genefer calls exit(), so it's definitely going to stop.
So there's definitely something weird going on there. Which brings us back to it being a beta Boinc client, so there could be all sort of bugs in the client that could cause just about anything. So while I'd really like to understand what happened better, I'm not going to spend a lot of time worrying about it.
____________
My lucky number is 75898524288+1 |
|
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
|
Some test results for the Linux app v1.05.
Setup:
GTX 480 (15 SM, factory clocks, memory 1536 MiB, 384-bit bus)
nVidia driver 280.13
Ubuntu 11.10 desktop 64-bit
i7 875K @2.93GHz, 8 GiB
BOINC PPS LLR's running on 7 cores during tests. App_info in use (thanks ronald!).
GFN Candidates: b^262144+1, b ~ 570k
First the results with different shifts, 6 & 9 with just one data point, for now at least:
Shift Elapsed time CPU time Avg CPU load
6 3183 2434 0.76
7 3032 1175 0.39
8 3524 862 0.24
9 5330 907 0.17
The app correctly wrote the checkpoint file and quit when:
- the task was suspended via the suspend task button in BOINC Manager
- GPU computation was suspended
- All computation was suspended
- BOINC Client was shut down manually
Upon resumption, the app continued from the checkpoint.
A software reboot didn't trigger checkpoint creation, but the app resumed from a previous checkpoint.
Pressing the MB reset button caused the checkpoint file to be empty and the wu was restarted from scratch:GeneferCUDA-boinc 1.05 beta 3 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
...
The checkpoint version doesn't match current test. Current test will be restarted A cold reboot would figure to cause the same result.
A minor thing I noticed is that when a task is suspended, the boinc_task_state file in the slot folder isn't updated. Apparently, this file contains the elapsed time info for BOINC. On resumption, the elapsed time goes back to the value in the file. The final elapsed time for the task as reported by BOINC will be too low, but the difference is quite small unless there's a huge number of suspensions.
Didn't notice any screen lag at any time.
All in all, very smooth running :)
____________
|
|
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
|
On my Linux host I now have 20 validated tasks and one error: maxErr exceeded for 576822^262144+1, 0.5000 > 0.4500
Many validated since that error.
____________
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1906 ID: 352 Credit: 4,143,243,628 RAC: 4,833,233
                                 
|
|
We are currently running b ~580k, GeneferCUDA limit is ~815K.
I wonder when we will reach this limit at current pace.
Of course, it is expected to run even faster when this subproject goes out of beta...two weeks ago we were at ~513k.
Running 4 LLRs along GeneferCUDA has no negative impact on performance.
SHIFT=7 and GPU time ~2630 secs.
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
We are currently running b ~580k, GeneferCUDA limit is ~815K.
I wonder when we will reach this limit at current pace.
Of course, it is expected to run even faster when this subproject goes out of beta...two weeks ago we were at ~513k.
Running 4 LLRs along GeneferCUDA has no negative impact on performance.
SHIFT=7 and GPU time ~2630 secs.
The answer to your question is most likely "never", at least on Boinc. I'm sure we'll hit that limit eventually on the PSA, but the pace there will probably slow down since I expect the Boinc project to eventually draw users with the faster GPUs here from the PSA.
I imagine the Boinc GFN project will advance to either N=524288 or N=1048576 long before we get near the b limit of 262144.
Running the little 262144 WUs on the Boinc side is useful for the beta test so we can find and hopefully fix as many problems as possible. Sooner or later, however, we'll start doing what this was designed for -- crunching N=4194304.
I am, however, quite pleased at the rate crunching is advancing on the Boinc side. That bodes well for finding a world record prime.
We could use a few more people to help out with the sieving. There's still a lot of sieving to do, and every factor that's found there -- and I'm still finding about 40 factors per hour -- saves a week of GPU crunching. Every hour spent sieving therefore is worth about 9 months of GPU time.
____________
My lucky number is 75898524288+1 |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1906 ID: 352 Credit: 4,143,243,628 RAC: 4,833,233
                                 
|
We could use a few more people to help out with the sieving. There's still a lot of sieving to do, and every factor that's found there -- and I'm still finding about 40 factors per hour -- saves a week of GPU crunching. Every hour spent sieving therefore is worth about 9 months of GPU time.
Yes, a bit more sieving needs to be done, that's for sure.
When talking about factor rate found - it is for whole 100M range and only b<715k are suitable for GPU (so far) which makes factors rate for GPU range 140X smaller.
N values of 32768, 65536 are suitable for CPUs. Is there going to be both BOINC and PRPNet version? N=32768 are slipping off TOP5000 with b~2,35M, latest b~3,15M are at position 4500.
Assuming 262144 and 524288 are PRPNet ranges, BOINC can start with 1048576 range. Or run 20<n<22 at once? Will there be a change to choose?
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
When talking about factor rate found - it is for whole 100M range and only b<715k are suitable for GPU (so far) which makes factors rate for GPU range 140X smaller.
That was for the whole 100M range, which, of course, isn't searchable by GeneferCUDA (at least not the way it's currently written). Restricting the numbers to that which is searchable by the CUDA application reduces the factor rate by a ratio of about 200:1 -- which means I'm finding 1 factor every 5 hours on the CPU. Each factor found on the CPU sieve eliminates about 200 hours of GPU crunching, so it's still extremely advantageous to keep sieving. The primary reason we'll stop sieving around P=9000P is actually that the sieve program can't go any higher due to numeric overflows. If our CPUs could do 128 bit math, we might go even higher.
N values of 32768, 65536 are suitable for CPUs. Is there going to be both BOINC and PRPNet version? N=32768 are slipping off TOP5000 with b~2,35M, latest b~3,15M are at position 4500.
I wish it were so. So does everyone else. My understanding is that running a Boinc server with multiple subprojects is a hack of the Boinc code and a strain on the server. The official response to your question when I asked it was that they wished they could do it, but the server probably couldn't handle it.
Personally, I'd prefer to see N=1M and N=2M on the Boinc side in addition to the 4M GFN-FTW crunching, since those are not being crunched anywhere at all right now. But I suspect it would be much easier to add them to the PSA than it would be to add yet more subprojects to boinc.
Assuming 262144 and 524288 are PRPNet ranges, BOINC can start with 1048576 range. Or run 20<n<22 at once? Will there be a change to choose?
If the specifics have been decided, John would be the one to answer. The only thing I know for certain is the N=4194303 is the goal of this project, so we'll eventually be crunching that. Other than that, I don't know. What I would like and what is feasible are two different things.
____________
My lucky number is 75898524288+1 |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
When talking about factor rate found - it is for whole 100M range and only b<715k are suitable for GPU (so far) which makes factors rate for GPU range 140X smaller.
That was for the whole 100M range, which, of course, isn't searchable by GeneferCUDA (at least not the way it's currently written). Restricting the numbers to that which is searchable by the CUDA application reduces the factor rate by a ratio of about 200:1 -- which means I'm finding 1 factor every 5 hours on the CPU. Each factor found on the CPU sieve eliminates about 200 hours of GPU crunching, so it's still extremely advantageous to keep sieving. The primary reason we'll stop sieving around P=9000P is actually that the sieve program can't go any higher due to numeric overflows. If our CPUs could do 128 bit math, we might go even higher.
N values of 32768, 65536 are suitable for CPUs. Is there going to be both BOINC and PRPNet version? N=32768 are slipping off TOP5000 with b~2,35M, latest b~3,15M are at position 4500.
I wish it were so. So does everyone else. My understanding is that running a Boinc server with multiple subprojects is a hack of the Boinc code and a strain on the server. The official response to your question when I asked it was that they wished they could do it, but the server probably couldn't handle it.
Personally, I'd prefer to see N=1M and N=2M on the Boinc side in addition to the 4M GFN-FTW crunching, since those are not being crunched anywhere at all right now. But I suspect it would be much easier to add them to the PSA than it would be to add yet more subprojects to boinc.
Assuming 262144 and 524288 are PRPNet ranges, BOINC can start with 1048576 range. Or run 20<n<22 at once? Will there be a change to choose?
If the specifics have been decided, John would be the one to answer. The only thing I know for certain is the N=4194303 is the goal of this project, so we'll eventually be crunching that. Other than that, I don't know. What I would like and what is feasible are two different things.
I had asked John this question via PM a few days ago. I'll let him fill in any details he'd like (his answer to me was relatively brief), but in a nutshell, the live BOINC project will be only N=4194303 (perhaps until a world record is found or until Mersenne finds their next one which would keep GFN from being the record at current N values). All other GFN N-values will be on PRPnet.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
|
Mike.
I started back testing about a day ago (had 2 long GPUGrid WUs to finish after challenge) and I'm still on newer Win7 OS testing the Shift #s on my 460 at the same time loading up apps and fiddling with things, So far everything is still smooth EXCEPT Firefox with 100+ tabs open does have a just noticeable increase in minor stuttering in mouse movement but still very much acceptable and when scrolling with mouse wheel there are occasional stutters unless you scroll fast then there is always 1-2 long pauses (up to a full second) as I start. One other oddity with Firefox is when the little window alerting me of a new Firefox version that slides up then down in lower right corner, usually takes 2 seconds up, pause, 2 seconds down, will take 10-15 seconds up and same down. The odd part is the entire OS goes into very significant lag both screen and mouse while this is happening. Nothing else will will cause any lag anywhere close to that. With shift set to 7 it;s about 1/3 better and at shift 6 it's 2/3 better. And at shift 6 all other even minor lag is pretty much gone and faster times as well. I'll post some results in benchmark thread soon.
I'll be switching back to my old OS shortly and really see if the different shifts will help that deranged OS there. Just giving you something of an update since I've not posted since before the challenge.
NeoMetal*
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
EXCEPT Firefox with 100+ tabs open does have a just noticeable increase in minor stuttering in mouse movement
LOL -- I didn't think ANYONE was worse than me when it came to keeping a lot of browser tabs open, but you win!!!
One other oddity with Firefox is when the little window alerting me of a new Firefox version that slides up then down in lower right corner, usually takes 2 seconds up, pause, 2 seconds down, will take 10-15 seconds up and same down. The odd part is the entire OS goes into very significant lag both screen and mouse while this is happening.
That actually makes a lot of sense. I'm not surprised at that behavior.
I'll be switching back to my old OS shortly and really see if the different shifts will help that deranged OS there. Just giving you something of an update since I've not posted since before the challenge.
That could be interesting.
____________
My lucky number is 75898524288+1 |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Can the GFN CUDA Wu's be run on a Windows OS or is it just Linux ... ???
____________
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1906 ID: 352 Credit: 4,143,243,628 RAC: 4,833,233
                                 
|
Can the GFN CUDA Wu's be run on a Windows OS or is it just Linux ... ???
Sure, you can strech your GPUs on a Windows using GFN CUDA :-)
____________
My stats
Badge score: 1*1 + 5*1 + 8*3 + 9*11 + 10*1 + 11*1 + 12*3 = 186 |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
|
Okay I'll try 1 Windows Box too, Thanks Honza ... I thought it was only Linux for some reason ...
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Can the GFN CUDA Wu's be run on a Windows OS or is it just Linux ... ???
::SNORT::
LOL, yup, Windows is fine. I originally did the Boinc port under Windows. In fact, for a while it looked like we wouldn't have anyone in position to do a Linux build and we could *only* run under Windows. Thankfully, Ronald came to the rescue so we now have a Linux build.
____________
My lucky number is 75898524288+1 |
|
|
STE\/E Volunteer tester
 Send message
Joined: 10 Aug 05 Posts: 573 ID: 103 Credit: 3,630,330,192 RAC: 0
                     
|
Can the GFN CUDA Wu's be run on a Windows OS or is it just Linux ... ???
::SNORT::
??? What ???
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
|
I updated that post. ;-)
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
http://www.primegrid.com/result.php?resultid=343169293
Well, had my first unit error out in a few days. I have no idea in reading this what may be the issue other than it just crapped out. I suppose there's any number of reasons. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
http://www.primegrid.com/result.php?resultid=343169293
Well, had my first unit error out in a few days. I have no idea in reading this what may be the issue other than it just crapped out. I suppose there's any number of reasons.
Since you're running with a good bit of OC and were stable before, my guess would be the ambient temperature went up a little bit and the extra degree or two caused some instability in the FPU or memory.
____________
My lucky number is 75898524288+1 |
|
|
|
|
http://www.primegrid.com/result.php?resultid=343169293
Well, had my first unit error out in a few days. I have no idea in reading this what may be the issue other than it just crapped out. I suppose there's any number of reasons.
Since you're running with a good bit of OC and were stable before, my guess would be the ambient temperature went up a little bit and the extra degree or two caused some instability in the FPU or memory.
Unfortunately that's what I was thinking as well. I'd hoped you might have a magic answer other than my oc'd card :) I will have to watch it close for a couple of days. If I start getting more errors I will have to down clock. |
|
|
|
|
|
What is your GPU voltage?
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
314187728^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
|
|
What is your GPU voltage?
Its upp'd +100 ms but am not sure the exact number. will have to review when i get home from work. |
|
|
|
|
|
Generalized Fermat Prime Search is now open to all ( Public )
Lennart |
|
|
|
|
|
Just tested GFN on my GPU and all WU are erroring out.
Output file genefer_262144_20027_1_0 for task genefer_262144_20027_1 absent ...
|
|
|
|
|
Just tested GFN on my GPU and all WU are erroring out.
Output file genefer_262144_20027_1_0 for task genefer_262144_20027_1 absent ...
Whats the content of your stderr.out file? |
|
|
|
|
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Die Umgebung stimmt nicht. (0xa) - exit code 10 (0xa)
</message>
<stderr_txt>
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: projects/www.primegrid.com/primegrid_genefer_1.06_windows_intelx86__cuda32_13.exe -boinc -q hidden --device 0
Priority change succeeded.
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1622 MHz
# of MP=7
No project preference specified; using SHIFT=7
maxErr during b^N initialization = 0.0000 (0.198 seconds).
Testing b^262144+1...
maxErr exceeded for 609980^262144+1, 0.5000 > 0.4500
20:12:38 (14924): called boinc_finish
</stderr_txt>
]]> |
|
|
|
|
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Die Umgebung stimmt nicht. (0xa) - exit code 10 (0xa)
</message>
<stderr_txt>
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: projects/www.primegrid.com/primegrid_genefer_1.06_windows_intelx86__cuda32_13.exe -boinc -q hidden --device 0
Priority change succeeded.
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1622 MHz
# of MP=7
No project preference specified; using SHIFT=7
maxErr during b^N initialization = 0.0000 (0.198 seconds).
Testing b^262144+1...
maxErr exceeded for 609980^262144+1, 0.5000 > 0.4500
20:12:38 (14924): called boinc_finish
</stderr_txt>
]]>
Did you run another gpu session (game)? Crashed the WU immediately? OC card? |
|
|
|
|
|
no, no and no
all Wu crashed randomly about 2 to 3 minutes after start.
Right now I'm testing the blocksize setting. With a setting of 9 the actual WU is running about 13 minutes, but the interface lag is very hard. |
|
|
|
|
no, no and no
all Wu crashed randomly about 2 to 3 minutes after start.
Right now I'm testing the blocksize setting. With a setting of 9 the actual WU is running about 13 minutes, but the interface lag is very hard.
I have also a GTX460 but with 725/1450Mhz running with blocksize 7, driver is 285.62, BOINC 7.0.14. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
no, no and no
all Wu crashed randomly about 2 to 3 minutes after start.
Right now I'm testing the blocksize setting. With a setting of 9 the actual WU is running about 13 minutes, but the interface lag is very hard.
Most likely it's the overclocking and/or temperature. Try lowering the shader and memory clocks, and/or set the fan the run at 100% to lower the GPU temperature.
It's unlikely that changing the block size will help unless you set it so low that the GPU is running much slower than optimal.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
I've tried two WU's and both errored with error 6: http://www.primegrid.com/result.php?resultid=343885572
____________
|
|
|
|
|
|
Try different drivers. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
I've tried two WU's and both errored with error 6: http://www.primegrid.com/result.php?resultid=343885572
I'll be honest with you: I have no idea why that's happening. Most people who have lots of errors usually have trouble because their GPU is clocked too high. In your case, however, the error is different and your GPU isn't overclocked.
Your drivers are fine. There's enough video memory. There's no overclocking. There's nothing I can see from here that should be a problem.
Is there anything unusual about your setup? Any programs that are running that might be interfering with the GPU? Watch any videos while the GPU was crunching? Did you use Remote Desktop, or switch to another user login? Play any games? What's the GPU temperature?
____________
My lucky number is 75898524288+1 |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
Die Umgebung stimmt nicht. (0xa) - exit code 10 (0xa)
</message>
<stderr_txt>
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: projects/www.primegrid.com/primegrid_genefer_1.06_windows_intelx86__cuda32_13.exe -boinc -q hidden --device 0
Priority change succeeded.
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1622 MHz
# of MP=7
No project preference specified; using SHIFT=7
maxErr during b^N initialization = 0.0000 (0.198 seconds).
Testing b^262144+1...
maxErr exceeded for 609980^262144+1, 0.5000 > 0.4500
20:12:38 (14924): called boinc_finish
</stderr_txt>
]]>
Your card is overclocked considerably (above the stock 1350 shader clock--manual OC or factory OC is not different). Three problems are the most likely, and I'd approach solutions to those in this order:
1) Heat...if you are running over 80C on the card, get it lower than that. Increase the GPU fan if you can, but often this won't help much if the conditions are the room or computer case that are creating the heating problem. Try opening up the case and see where your temps are at (or a window if you are in the northern hemisphere to let a little winter in).
2) If not heat, the higher shader clocks may just be enough to cause errors (not all GTX 460--or any other card series for that matter--are created equal). Try dropping shaders by 54MHz increments (from 1620, which is what your 1622 is really running at--i.e., try it at 1566...then 1512, etc. until it is stable).
3) drop your memory clocks some. This is the fix for the GTX 550 Ti, but some of the 1GB 460 cards are a second version with 192-bit memory bus that may have the same error. If your 460 is one of these, try this fix first.
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
Is there anything unusual about your setup? Any programs that are running that might be interfering with the GPU? Watch any videos while the GPU was crunching? Did you use Remote Desktop, or switch to another user login? Play any games? What's the GPU temperature?
At the time of the first error, I wasn't on the computer. But at the second time, the only thing the computer was running was Firefox and GPU-Z. The temperature was 49º C
Oh, and in the BOINC log also appeared "Output file ... for task ... absent ... " both times.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Is there anything unusual about your setup? Any programs that are running that might be interfering with the GPU? Watch any videos while the GPU was crunching? Did you use Remote Desktop, or switch to another user login? Play any games? What's the GPU temperature?
At the time of the first error, I wasn't on the computer. But at the second time, the only thing the computer was running was Firefox and GPU-Z. The temperature was 49º C
Oh, and in the BOINC log also appeared "Output file ... for task ... absent ... " both times.
No idea what's wrong. I don't have any personal experience with the 430, and not too many people have spoke up about their experiences with it. Maybe Scott has an idea since he seems to have one of everything. ;-)
It certainly isn't the temperature. I think my GPU is hotter than that when it's powered off. :)
____________
My lucky number is 75898524288+1 |
|
|
Lumiukko Volunteer tester Send message
Joined: 7 Jul 08 Posts: 165 ID: 25183 Credit: 749,143,289 RAC: 40,743
                           
|
I see absolutely no reason why it doesn't get work. Can you send me (admin@primegrid.com) C:\programdata\boinc\sched_request_www.primegrid.com.xml after attempting to get work?
Anything new on the GTX 260-front, Rytis?
I still get the "no work"-message...
Now that the GeneferCUDA for BOINC went public I also tried, but
I'm having the same issue as DoctorNow with my two GTX275 and GTX285.
These hosts:
70038 Win7 x64 BOINC 6.12.34 driver 285.62
70039 Win7 x64 BOINC 6.12.34 driver 285.62
70044 WinXP x86 BOINC 6.12.34 driver 280.26
Not able to get any GeneferCUDA work.
My Linux host 101168 with GTX480 (with same venue as Win hosts) does get work. (BOINC 6.12.34, driver 290.10).
Something still wrong with the GTX200-series?
--
Lumiukko |
|
|
|
|
Your card is overclocked considerably (above the stock 1350 shader clock--manual OC or factory OC is not different). Three problems are the most likely, and I'd approach solutions to those in this order:
1) Heat...if you are running over 80C on the card, get it lower than that. Increase the GPU fan if you can, but often this won't help much if the conditions are the room or computer case that are creating the heating problem. Try opening up the case and see where your temps are at (or a window if you are in the northern hemisphere to let a little winter in).
2) If not heat, the higher shader clocks may just be enough to cause errors (not all GTX 460--or any other card series for that matter--are created equal). Try dropping shaders by 54MHz increments (from 1620, which is what your 1622 is really running at--i.e., try it at 1566...then 1512, etc. until it is stable).
3) drop your memory clocks some. This is the fix for the GTX 550 Ti, but some of the 1GB 460 cards are a second version with 192-bit memory bus that may have the same error. If your 460 is one of these, try this fix first.
1.) Temp is ~60°C at 99% Load. I don't think this is too hot.
2.) Clocks are the Stocksettings for this Card (MSI N460GTX Hawk) and I never had a problem with these.
3.) It seems like my Card has a 192-bit memory interface.
I just got a new bunch of workunits and the first one is finished. I didn't change anything on the GPU just set the blocksize setting to 6, results in a .6 usage of one core but its finished.
Edit: blocksize setting back to 0 = error after 10 minutes |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
Is there anything unusual about your setup? Any programs that are running that might be interfering with the GPU? Watch any videos while the GPU was crunching? Did you use Remote Desktop, or switch to another user login? Play any games? What's the GPU temperature?
At the time of the first error, I wasn't on the computer. But at the second time, the only thing the computer was running was Firefox and GPU-Z. The temperature was 49º C
Oh, and in the BOINC log also appeared "Output file ... for task ... absent ... " both times.
No idea what's wrong. I don't have any personal experience with the 430, and not too many people have spoke up about their experiences with it. Maybe Scott has an idea since he seems to have one of everything. ;-)
It certainly isn't the temperature. I think my GPU is hotter than that when it's powered off. :)
Not quite one of everything. ;)
Output file absent errors often occur when there are permission errors with the BOINC files or directories. On Linux this happens alot when users haven't set permissions correctly, but since you are on Windows, I'd check to make sure that your anti-virus program isn't the culprit; you might even need to exempt the BOINC directories from the virus scanner.
____________
141941*2^4299438-1 is prime!
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
1.) Temp is ~60°C at 99% Load. I don't think this is too hot.
2.) Clocks are the Stocksettings for this Card (MSI N460GTX Hawk) and I never had a problem with these.
3.) It seems like my Card has a 192-bit memory interface.
I just got a new bunch of workunits and the first one is finished. I didn't change anything on the GPU just set the blocksize setting to 6, results in a .6 usage of one core but its finished.
Edit: blocksize setting back to 0 = error after 10 minutes
Yes! (forgive my excitement, but it is nice to have the 192-bit GTX 460 in the game as this might get us closer to the memory errors found on the GTX 550 Ti and other cards like it) Also, just to verify, yours is a GTX 460 with 192-bit bus and 1GB of video RAM (not the 768mb version), correct?
Getting some work to complete with no real adjustments to the card may happen (I was getting about 50% completing normally on the default clocks with superclocked 550). Keep your clocks as you had them for core and shader. Also, keep the default block size set at 7. Drop your memory clock significantly and see if that corrects things. On the GTX 550's, we dropped to 1700 clock with no errors and some have been able to gradually increase the memory clock up to 2100. I don't think it is a fixed clock level, however, but more likely a ratio of timings with whatever is going on with the rest of the card (e.g., core and shader clocks, memory load, etc.).
Also, if you are interested, I am sure that Mike would love the opportunity to see one of these 192-bit 1GB version 2 GTX 460 cards under some of his testing applications that we have been trying out on the GTX 550's.
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Output file absent errors often occur when there are permission errors with the BOINC files or directories. On Linux this happens alot when users haven't set permissions correctly, but since you are on Windows, I'd check to make sure that your anti-virus program isn't the culprit; you might even need to exempt the BOINC directories from the virus scanner.
That's ONE cause of the output file missing, but it's not the only cause. That's not the problem here. This was a case where the program itself had an error. The cuFFT failed for an unknown reason. The program aborted after that, which resulted in there being no output file.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Yes! (forgive my excitement, but it is nice to have the 192-bit GTX 460 in the game as this might get us closer to the memory errors found on the GTX 550 Ti and other cards like it) Also, just to verify, yours is a GTX 460 with 192-bit bus and 1GB of video RAM (not the 768mb version), correct?
Scott, the recent evidence points to a more mundane problem: I think some of the 550 Ti's simply can't handle the temperatures produced at the clock rates they're shipped with.
The failure mode on the 550 Ti seems to be identical to the failures on all other cards, except that it's happening at lower (compared to stock) speeds and temperatures. But the 550 Ti comes aggressively clocked from the factory.
The fact that decreasing temperature alone (via increasing the fan speed) seems to elliminate the problem would tend to rule out any kind of software or logical design flaw. On at least some cards, it's close to certain that it's simply that the FPU is making errors due to a combination of clock speed and temperature.
I can easily explain why software could cause this problem with varying clock speeds. But I can't think of a single scenario where software (or logic design) could cause a problem that varies with temperature. That's just a plain old hardware fault. The solution is to the card down and cool it off.
Since the Ti's symptoms are identical to the symptoms on the other cards, and the 550 Ti investigation has, so far, not produced any usable information, I have to assume that the 550 Ti's problem is the same problem as what we see with too much overclocking on the other GPUs. It's just starting off at a higher clock rate.
I guess "Ti" stands for "Temperature==inferno". :)
____________
My lucky number is 75898524288+1 |
|
|
|
|
Yes! (forgive my excitement, but it is nice to have the 192-bit GTX 460 in the game as this might get us closer to the memory errors found on the GTX 550 Ti and other cards like it) Also, just to verify, yours is a GTX 460 with 192-bit bus and 1GB of video RAM (not the 768mb version), correct?
Getting some work to complete with no real adjustments to the card may happen (I was getting about 50% completing normally on the default clocks with superclocked 550). Keep your clocks as you had them for core and shader. Also, keep the default block size set at 7. Drop your memory clock significantly and see if that corrects things. On the GTX 550's, we dropped to 1700 clock with no errors and some have been able to gradually increase the memory clock up to 2100. I don't think it is a fixed clock level, however, but more likely a ratio of timings with whatever is going on with the rest of the card (e.g., core and shader clocks, memory load, etc.).
Also, if you are interested, I am sure that Mike would love the opportunity to see one of these 192-bit 1GB version 2 GTX 460 cards under some of his testing applications that we have been trying out on the GTX 550's.
unfortunately I made a mistake with the 192-bit bus. It's really 256-bit there was just a site that stated it was 192-bit.
|
|
|
|
|
Scott, the recent evidence points to a more mundane problem: I think some of the 550 Ti's simply can't handle the temperatures produced at the clock rates they're shipped with.
I guess "Ti" stands for "Temperature==inferno". :)
My 550 TI never went above 53º running in full load (99%) with high OC. That did't prevent errors to show up. At stable clocks (ie, with just one error in over 100 GFN262k tasks) temps are 51º-52º. So I do not think that high temperatures can explain most of the errors seen. |
|
|
|
|
|
When I buy my 560 OC version it has problems with OC clock speed shiped from factory. Solution was easy: I just need to add some mV to GPU voltage, and from them there was no errors or any problems. So aggressive OC from factory can be reason. Once more, test your card under GPU full stress. You can use Furmark, OCCT or 3DMark. Run this test at least one hour before you can declare card is error free. I personally like OCCT since it show errors and if your card get even one error in at least one hour your shader or memory clock are to high. So you have two choices: try to lower shader and memory freq or boost little more voltage to GPU and try it again.
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
314187728^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2259 ID: 1178 Credit: 11,068,521,226 RAC: 11,429,379
                                        
|
Scott, the recent evidence points to a more mundane problem: I think some of the 550 Ti's simply can't handle the temperatures produced at the clock rates they're shipped with.
I guess "Ti" stands for "Temperature==inferno". :)
My 550 TI never went above 53º running in full load (99%) with high OC. That did't prevent errors to show up. At stable clocks (ie, with just one error in over 100 GFN262k tasks) temps are 51º-52º. So I do not think that high temperatures can explain most of the errors seen.
I have to say that heat isn't the culprit in my case either. The only difference in my card failing at 50% rate and having zero errors over more than a week of crunching now is the lower memory clock, which did not alter the card temp at all (or maybe 1C lower).
____________
141941*2^4299438-1 is prime!
|
|
|
|
|
I guess "Ti" stands for "Temperature==inferno". :)
My 550 TI never went above 53º running in full load (99%) with high OC. That did't prevent errors to show up. At stable clocks (ie, with just one error in over 100 GFN262k tasks) temps are 51º-52º. So I do not think that high temperatures can explain most of the errors seen.
The cards in my host running with a lot higher temperature. This could a reason that I errored at 2100Mhz while yours running fine with this. I will check the temp difference next time while running at 1900Mhz, which produces no error on mine.
Regards Odi
____________
|
|
|
|
|
|
http://www.primegrid.com/result.php?resultid=343969808
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1620 MHz
# of MP=7
No project preference specified; using SHIFT=7
Resuming b^262144+1 from a checkpoint (3342335 iterations left)
608120^262144+1 is a probable composite. (RES=6a6a6a6a6a6a6a6a) (1516239 digits) (err = 0.1172) (time = 1:23:13) 07:45:31
07:45:31 (4184): called boinc_finish
It isn't MaxErr, but I sure the WU will be wrong.
Overclocking/overheating again?
FYI.
Actually GPU worked at 1500MHz.
I changed Clock during WU was running.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
(RES=6a6a6a6a6a6a6a6a)
It isn't MaxErr, but I sure the WU will be wrong.
Overclocking/overheating again?
Looks like it. It appears to be the same kind of problem as MaxErr, but there's a small chance the WU will finish without the error being detected. Except that the result is wrong and won't validate, of course.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
At the moment GFN is running stable on my GPU. Memory Clock set to 1700 MHz down from 1950 MHz. Next I will try to set the clock back up step by step, let's see how far I can go. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
At the moment GFN is running stable on my GPU. Memory Clock set to 1700 MHz down from 1950 MHz. Next I will try to set the clock back up step by step, let's see how far I can go.
I would recommend that once you find the "safe" limit that you back off considerably for GFN crunching.
GFN seems to be very sensitive to clock speed, temperature, of both, and slight changes in ambient temperature could make a previously stable GPU unstable. With WUs that are going to run a week or more, you'll want your GPU to be stable for the whole run. An error after 7 days of crunching won't be much fun.
____________
My lucky number is 75898524288+1 |
|
|
|
|
At the moment GFN is running stable on my GPU. Memory Clock set to 1700 MHz down from 1950 MHz.
At the moment GFN is running stable on my GPU. Memory Clock set to 1500 MHz down from 1620 MHz.
:)
6 WUs without maxErr at least.
2 of 6 are already valid.
2 of remained 4 have no info about the number was checked:
http://www.primegrid.com/result.php?resultid=344048896
http://www.primegrid.com/result.php?resultid=343912326
I'm not sure I'm so lucky to find 2 primes at once :)
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
2 of remained 4 have no info about the number was checked:
http://www.primegrid.com/result.php?resultid=344048896
http://www.primegrid.com/result.php?resultid=343912326
I'm not sure I'm so lucky to find 2 primes at once :)
That's a bug. I'm not sure what's causing it, but it doesn't affect the results. It certainly has nothing to do with finding a prime, unfortunately.
If there's a prime, you'll see the normal messaging, except that the the actual number would be replaced with "b". For example:
b^242144+1 is a probable prime.
____________
My lucky number is 75898524288+1 |
|
|
BiBi Volunteer tester Send message
Joined: 6 Mar 10 Posts: 151 ID: 56425 Credit: 34,290,031 RAC: 0
                   
|
|
======================
/var/lib/boinc-client/projects/www.primegrid.com$ ./primegrid_genefer_1.06_i686-pc-linux-gnu__cuda32_13
GeneferCUDA-boinc 1.06 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
./primegrid_genefer_1.06_i686-pc-linux-gnu__cuda32_13
Usage: GeneferCUDA [options] Options may be specified in any order
Options: -b run benchmarks
-b2 n run benchmarks at N=2^n for different SHIFT values
-shift n Set SHIFT=n to override internal kernel blocking factor
-t run tests of known prime GFNs
-r run residue test on known composite GFNs
-l compute approximate usable upper limit of b at
each N
-q "b^N+1" test quick expression
-d N or --device N set device number=N(default 0)
<filename> test GFNs in <filename>, one GFN per line,
in the format b N
-v or -V print the startup banner and immediately exit
-boinc operate as a BOINC client app
No options were specified, using interactive mode:
1. bench
2. test
3. test residue
4. normal
1
device number: 0
Error: API mismatch: the NVIDIA kernel module has version 280.13,
but this NVIDIA driver component has version 270.18. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
GeneferCUDA-boinc.cu(107) : cudaSafeCall() Runtime API error : no CUDA-capable device is detected.
================
I updated to the latest drivers that ubuntu 11.10 will get....
GPU is a 460 |
|
|
|
|
I guess "Ti" stands for "Temperature==inferno". :)
My 550 TI never went above 53º running in full load (99%) with high OC. That did't prevent errors to show up. At stable clocks (ie, with just one error in over 100 GFN262k tasks) temps are 51º-52º. So I do not think that high temperatures can explain most of the errors seen.
The cards in my host running with a lot higher temperature. This could a reason that I errored at 2100Mhz while yours running fine with this. I will check the temp difference next time while running at 1900Mhz, which produces no error on mine.
Regards Odi
Here where I am (Portugal) it is cold (for our standards): max temp outside is around 12º celsius. At dawn they've been around 2/3 in the last days. Room temperature is less than 18º/19º. The EVGA cards seem to be set to low noise instead of low temperature. Im my host, with the stock fan speeds, temps would be considerably (10º or more) higher. So, I use MSI Afterburner (a little better than EVGA Precison, IMHO), which has a nice "user defined software automatic fan control" feature, that I use to make the fans run a little faster than stock settings. 56-59% fan speed is more than enough to keep temps bellow 52º. |
|
|
|
|
|
Hello,
I want to ask developer, if it is possible to add device number into log message.
GPU=GeForce GTX 460
Device#= (0,1,2,x)
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1460 MHz
# of MP=7
I have some tasks with error, it helps me to identify wrong element (card). I have two GTX 460 (no SLI).
Thank you
dD |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Error: API mismatch: the NVIDIA kernel module has version 280.13,
but this NVIDIA driver component has version 270.18. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
GeneferCUDA-boinc.cu(107) : cudaSafeCall() Runtime API error : no CUDA-capable device is detected.
================
I updated to the latest drivers that ubuntu 11.10 will get....
Please don't use any nVidia-driver from the Ubuntu-repo. I had nearly the same troubles with my Intrepid-inst.
Download the driver 285.05.33 directly from nVidia and install it with the 32bit-libs.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
Hello,
I want to ask developer, if it is possible to add device number into log message.
GPU=GeForce GTX 460
Device#= (0,1,2,x)
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1460 MHz
# of MP=7
I have some tasks with error, it helps me to identify wrong element (card). I have two GTX 460 (no SLI).
Thank you
dD
Just above that part of the output is the command line:
Command line: projects/www.primegrid.com/primegrid_genefer_1.06_windows_intelx86__cuda32_13.exe -boinc -q hidden --device 0
Priority change succeeded.
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1350 MHz
# of MP=7
The last parameter of the command line specifies which device you're running on. Also, it really helps a lot if you unhide your computers before asking for help.
____________
My lucky number is 75898524288+1 |
|
|
BiBi Volunteer tester Send message
Joined: 6 Mar 10 Posts: 151 ID: 56425 Credit: 34,290,031 RAC: 0
                   
|
Please don't use any nVidia-driver from the Ubuntu-repo. I had nearly the same troubles with my Intrepid-inst.
Download the driver 285.05.33 directly from nVidia and install it with the 32bit-libs.
OK, thanks to confirm the driver issue. I remember doing that before, I wonder if it still need to be done from one of the terminal logins.
|
|
|
|
|
Just above that part of the output is the command line:
Command line: projects/www.primegrid.com/primegrid_genefer_1.06_windows_intelx86__cuda32_13.exe -boinc -q hidden --device 0
Priority change succeeded.
GPU=GeForce GTX 460
Global memory=1073741824 Shared memory/block=49152 Registers/block=32768 Warp size=32
Max threads/block=1024
Max thread dim=1024 1024 64
Max grid=65535 65535 65535
CC=2.1
Clock=1350 MHz
# of MP=7
The last parameter of the command line specifies which device you're running on. Also, it really helps a lot if you unhide your computers before asking for help.
OK, my computer ID is
http://www.primegrid.com/show_host_detail.php?hostid=225109
It fails 9x device 0 (Gainward Golden Sample) 1x device 1 (Palit Sonic Platinum)
Thank you
dD
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13634 ID: 53948 Credit: 281,255,329 RAC: 18,239
                           
|
OK, my computer ID is
http://www.primegrid.com/show_host_detail.php?hostid=225109
It fails 9x device 0 (Gainward Golden Sample) 1x device 1 (Palit Sonic Platinum)
Thank you
dD
The symptoms are indicative of an overclocking problem. Genefer uses parts of the GPU that are not typically used by many other GPU programs and are rarely, if ever, used by games and other graphical programs. Overclocking -- even factory overclocking -- doesn't seem to work as well when running Genefer.
For some cards, in particular the 550 Ti, I suspect that the hardware doesn't work right even at stock clock speeds.
My suggestion is to lower the clock speeds and/or increase the fan speed. There's some evidence to suggest that simply keeping the card colder will help. With the 550 Ti, lowering the memory clock below stock seems to help. I do not know if this applies to other cards.
____________
My lucky number is 75898524288+1 |
|
|
|
|
|
I set memory clock speed down to 1900 MHz - everything looks OK.
Thank you
dD |
|
|