Author |
Message |
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Anyone interested in beta testing GeneferCUDA in BOINC, please PM me. You must have Windows OS and a GPU with compute capability >= 1.3.
We are taking a limited number for testing. Validator still needs updating so no credit granted for now. Tasks are less than a minute but will be bumped up once the current cache is complete.
____________
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
The response has been good so far. Thank you!
With the onset of the GFN Prime Search transitioning to BOINC, it makes it even more of a priority to get the sieves to their proper depths.
If you have Win64 (or Linux 64 running Windows in VM), you can participate. Please see GFN Prime Search Sieving.
Depths are respectable right now but it sure would be nice to push them to their goals.
____________
|
|
|
|
John, can you extend the deadline? With only 1h my client aborted all missing deadline tasks. |
|
|
|
John, can you extend the deadline? With only 1h my client aborted all missing deadline tasks.
Perhaps lowering your buffer will help?
Normally you would easily make the deadline with your buffer not too large.
Currently crunching smoothly @34seconds per unit on a GTX580.
GPU usage is 99% and temps stay pretty low compared to PPS sieve.
60c for genefer, against 75c for PPS sieve.
____________
|
|
|
|
John, can you extend the deadline? With only 1h my client aborted all missing deadline tasks.
Perhaps lowering your buffer will help?
Normally you would easily make the deadline with your buffer not too large.
Currently crunching smoothly @34seconds per unit on a GTX580.
GPU usage is 99% and temps stay pretty low compared to PPS sieve.
I had 0.1 now 0.01, 1h is pretty low ^^ |
|
|
|
Ok that is pretty low.
How long do your units take on what gpu?
Ah you just edited your msg already.
I see why I didn't have that problem, I had a PPS Sieve waiting, just returned that one and I also got 129 task on a 0,1day buffer.
So I agree 1h is pretty low ;)
____________
|
|
|
|
Ok that is pretty low.
How long do your units take on what gpu?
GTX460 46sec. |
|
|
|
I assume when the tasks get bigger the deadline will be adjusted.
Only 500 remaining in cache according to the frontpage.
I just OC'ed the core/shaders of my card with 15% but strangely enough it had no effect of runtimes.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
I just OC'ed the core/shaders of my card with 15% but strangely enough it had no effect of runtimes.
That depends on how much you OC'ed the shaders. Keep in mind 2 things:
1) Shaders frequency can be adjusted in single unit increments in software like MSI afterburner, but the actual hardware only adjusts in clock "chunks". For example, on my old 9600 GSO, a 1700 shader clock does not bump up the clock to the next "chunk", but going to a 1703 clock does (and you need to go past 1728 for the next one if I remember correctly). You might not be pushing past a particular block of shader frequency, whereas one or two more clock ticks would do it (Sorry, I do not remember how the frequency blocks are organized on Fermi cards).
2) Even ignoring the above point, GeneferCUDA is using the DP capabilities of the card. Whereas increasing the shader clocks gives a directly noticeable gain on SP applications like PPS Sieve (i.e., 10% shader clock increase = about 10% performance increase), the effect on DP processes is much lower. This is due to the DP capabilities of consumer cards being limited to 1/8th of the SP FLOPs. With your GTX 580 with stock 1544 shader clocks, the DP (theoretical) FLOPs will be about 197 (1581 for SP). OC the shaders to 1644, and the DP FLOPs only increase to about 210 in DP (1683 in SP...about a 6% gain, but a 102 FLOPs gain in SP vs. 13 FLOPs in DP). With a 34 sec/task time you reported above, that 6% gain would only equate to 2 seconds faster (assuming the same start of stock clocks, your 15% OC would equate to about a 4 sec/task reduction). Add in the fact that actual FLOPS run lower than the theoretical, and you should see very little gain if any...a quirk of the combination of extremely short app. times and the 1/8th DP capability issue.
____________
141941*2^4299438-1 is prime!
|
|
|
|
I used MSI afterburner.
I bumped the core/shaders up to 825/1650.
That takes 1 second of the runtime hehe.
I guess it's not much use to OC it, at least not for this small units.
Thanx for explaining.
____________
|
|
|
|
just finished my first genefer - GTX460 - 20 seconds - o/c'd on shader to 1790
____________
@AggieThePew
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I assume when the tasks get bigger the deadline will be adjusted.
Only 500 remaining in cache according to the frontpage.
LOL.
It will be increased just a tiny, itsy, bitsy, bit.
I don't know exactly how we're going to ramp up to it, since it's still being discussed, but we're currently running really small WUs just for testing.
When we get going for real, this project's goal is to find a new world record prime number. Not a new world record GFN prime. Not a new record for PrimeGrid. A new world record prime, period. To be completely unambiguous: #1 on Chris Caldwell's top 500 prime list. That's the goal here.
These test WUs take 40 seconds on my GPU.
The real WUs take about 8 days. They'll actually start slightly shorter than because we'll be starting just below world record territory. But after the first 200 or so WUs, the run time will be about 8 days, and it will slowly climb from there.
Times are for a stock 460.
It's not up to me, but my guess would be a deadline of 3 weeks is probably appropriate. That's similar to SoB.
Mike
Yeah, the deadline will be a little bit longer. ;-)
____________
My lucky number is 75898524288+1 |
|
|
|
8 days on a single unit on a GPU?
I hope the save point work properly than ;)
@Rick: 20 seconds? WTF, how you do that? That's over 50% faster then my GTX580. Even with the shaders 'only' at 1650, I would expect the GTX580 to be at least close to that 20seconds on your GTX460.
By the way, cache is empty ;)
____________
|
|
|
|
@Rick: 20 seconds? WTF, how you do that? That's over 50% faster then my GTX580. Even with the shaders 'only' at 1650, I would expect the GTX580 to be at least close to that 20seconds on your GTX460.
By the way, cache is empty ;)
Not sure really - I've only done the one and it said run time was 20.53 seconds. It pays to have old technology sometimes... NOT.
Really it's the fact that a rat is quicker than a cow so these short wu's fit real well.
____________
@AggieThePew
|
|
|
|
Strange though that a non oc 460 takes 40 seconds and your OC card does it in half the time?
That would be a 100% overclock in runtime, and given the explanation given above by Michael, it doesn't seem to be very trustworthy.
It sounds like something is wrong there?
Too bad there is no validator yet active to see if the task is valid.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
8 days on a single unit on a GPU?
I hope the save point work properly than ;)
It does.
In fact, unlike most boinc programs, I made genefer's boinc interface a little bit atypical such that if boinc shuts genefer down (i. e., you suspend the task with "keep tasks in memory" turned off), Genefer will write a checkpoint immediately before shutting down. That way, you don't lose any work since the last checkpoint.
It may (or may not) checkpoint when you shut boinc down completely, such as when you reboot. I aught to test that, but if it doesn't checkpoint when boinc shuts down, there's really nothing more I could do beyond what I'm already doing.
____________
My lucky number is 75898524288+1 |
|
|
|
Rick: you have one finished now in 4 seconds? Something is not going as it should i think ;)
____________
|
|
|
|
Rick: you have one finished now in 4 seconds? Something is not going as it should i think ;)
Even though they are not showing an error I can't see how they are right ... so something is a miss but I'm not reporting it as an error.. have to go reboot |
|
|
|
Yep - the beta doesnt like it o/c'd that much... cw works fine at 1790 but i dropped it back down to 1600 and it's now finishing them correctly it appears... at about 1:22 a unit
so A, never fear your gpu is faster :)
____________
@AggieThePew
|
|
|
|
"Tasks are less than a minute"? Not on my GTX 570 apparently :| Just ran 2 units that both needed ~62 seconds. Could this be due to old drivers perhaps?
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
"Tasks are less than a minute"? Not on my GTX 570 apparently :| Just ran 2 units that both needed ~62 seconds. Could this be due to old drivers perhaps?
New work has been added at the next N. Previous work was at N=8192. Current work is at 16384.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Yep - the beta doesnt like it o/c'd that much... cw works fine at 1790 but i dropped it back down to 1600 and it's now finishing them correctly it appears... at about 1:22 a unit
Awesome. I was wondering what was causing that particular problem. It's a real computing error, and will, in the next release, show up as a computing error on the boinc pages, rather than as a 'success' that would eventually be tossed by the validator.
What's actually happening, if you're interested, is that for whatever reason (heat, power fluctuations, or just circuits that simply can't respond fast enough) you're getting floating point errors in the result of the math that's being done. The actual culprit could be almost anything on the card -- the actual FPU, memory, the SMs, or who knows what else. But something's not working quite right at that speed.
Genefer (and llrCUDA, which has a lot of similarities) literally use different circuitry than the sieves use. In particular, just like their CPU counterparts, the sieves do integer math and the primality programs use floating point math. I'm not surprised that you can overclock one type of program more than the other.
Personally, I'd recommend not overclocking for this project. With the amount of credit that each WU is going to be worth, I wouldn't want computation errors popping up 95% of the way through a WU. Plus, I wouldn't want my wingman to find the world record prime because I had my card overclocked just a smidgeon too much.
____________
My lucky number is 75898524288+1 |
|
|
|
Shall current work be validated or just used for test purposes? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Shall current work be validated or just used for test purposes?
It's my understanding that the plan is for the current work to be validated (those that ARE valid, that is), and that they will receive credit. However, I have no insight into what the problem is, so it's conceivable that they can't validate those WUs.
That being said, from my perspective, they're just for test purposes. It's possible they're also being used to double check work that's been done previously, but I don't think so.
I don't expect testing to last long. The guts of this program has been in production for a year already; the testing is primarily to ensure that genefer and boinc play nicely together.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
It may (or may not) checkpoint when you shut boinc down completely, such as when you reboot. I aught to test that, but if it doesn't checkpoint when boinc shuts down, there's really nothing more I could do beyond what I'm already doing.
I just checked, and it will checkpoint during a boinc shutdown.
____________
My lucky number is 75898524288+1 |
|
|
|
Michael, is the BOINC GeneferCUDA application slower than the original one in PSA?
Cheers. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Michael, is the BOINC GeneferCUDA application slower than the original one in PSA?
Cheers.
The one we're using for boinc should be the fastest of all of them.
However, there's a flaw in the benchmarks of older versions, so the benchmarks will say that 0.99 (what they're using in PSA right now) is faster. But if you actually run real tests through the two, you should get identical run times.
But, pretty soon the boinc version will be faster than any other, which will be most noticeable at higher Ns. There's a CPU-bound initialization process which is trivial at lower N. However, at N=4194304, this initialization takes two hours. I've been able to optimize that 2 hour phase down to 21 minutes, and I think I can get it down to a about a minute. That's not released yet, so the boinc version is pretty much the same speed as the PSA version. (The boinc version, btw, is compatible with PRPNet.)
EDIT: I think they're using 0.97 on PRPNet, but the timing should be the same.
____________
My lucky number is 75898524288+1 |
|
|
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere. |
|
|
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere.
Is your GPU going hot ?
I got the same error on some wu's but after a restart I have no problem.
As you all can see I raised delay to 12 hr.
Lennart |
|
|
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere.
Is your GPU going hot ?
I got the same error on some wu's but after a restart I have no problem.
As you all can see I raised delay to 12 hr.
Lennart
No, its cold, and thx for extended deadline. |
|
|
|
Somehow my boinc is refusing to get more than 1 unit at the time at this moment.
I have my buffer on 0.20days and a task only takes 2 minutes.
But it doesn't ask for more units until the one is finished.
____________
|
|
|
|
Somehow my boinc is refusing to get more than 1 unit at the time at this moment.
I have my buffer on 0.20days and a task only takes 2 minutes.
But it doesn't ask for more units until the one is finished.
Could be a high duration correction factor. |
|
|
|
Even with buffer at 10 days not even a single extra task.
____________
|
|
|
|
The current (beta) program is only for 64bit m/c. Will this be the same for the final release? or can there be a 32 bit version?
____________
Member team AUSTRALIA
My lucky number is 9291*2^1085585+1 |
|
|
|
According to the settings 32-bit windows should also work.
____________
|
|
|
|
According to the settings 32-bit windows should also work.
Confirmed. Runs also at one of my x86 hosts...
Regards Odi
____________
|
|
|
|
Thank you so much Michael for such a thorough answer.
I hope new Kepler won't let us down with the DP and its performance. I'm saving money to get this new card to hunt for those elusive megaprimes. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
Does anyone know if the issues with GeneferCUDA and the GTX 550 Ti have been fixed in the BOINC version?
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I got some -161 errors within the 32768 search. Cannot find the problem elsewhere.
Could you share a link to one of those results, or post the whole stderr output from the result page?
Thanks.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
The current (beta) program is only for 64bit m/c. Will this be the same for the final release? or can there be a 32 bit version?
As far as I know, that's incorrect. Certainly, it wasn't correct when we opened the beta up as I've seen at least one 32 bit XP and one 32 bit Windows 7 computer successfully complete WUs.
If something has changed, it was unintentional. GeneferCUDA is a 32-bit app, and is likely to stay a 32-bit app. I saw no speed improvement at 64-bits (which is exactly what I would expect since the CPU isn't really doing much).
However, the beta -- and production -- is only for Compute Capability 1.3 or above GPUs. GeneferCUDA (and llrCUDA) need double precision floating point hardware, which isn't available on older GPUs. So you're limited to 4xx, 5xx, and GTX (NOT GTS) 2xx class GPUs (i.e., GTX 260 and above.) Note that the 3xx series of GPUs is only CC 1.2 and can not be used.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Does anyone know if the issues with GeneferCUDA and the GTX 550 Ti have been fixed in the BOINC version?
Unfortunately not. That part of the code is unchanged, and neither I nor Shoichiro have any idea what's wrong with it. IIRC, he said that HIS 550TI works find under Linux. I'm not sure if it's Linux, or if it's that only some of the 550 TIs are affected.
Warning: Pure, unadulterated, right off the top of my head speculation follows. It's probably all wrong.
I can think of a few possibilities.
1) Everyone who is having a problem with the 550 TI has it OC'd too high. The problem seems to be similar to what I'm observing now with some OC'd GPUs. So you could try underclocking the 550TI as far down as it will as a test to see if it changes the results. It's possible that, for some reason, "OC'd too high" includes stock clock rates on some cards.
2) Ok, really geeky math nerd stuff follows. There may be one or two people reading this who completely understand this stuff. I'm not one of them ;-)
Almost every type of primality testing program we have uses Fast Fourier Transforms. That includes geneferCUDA, which uses Nvidia's cuFFT libraries.
Nvidia's documentation says it's compatible with the open source FFTW libraries, which makes sense because FFTW is one of the fastest out there. So I suspect cuFFT works, internally, in a similar fashion to the way FFTW works.
FFTW doesn't do things in a predetermined manner. During initialization, it tests various strategies to see which is fastest. Then it executes the FFT with the fastest method.
I'm guessing cuFFT does the same thing. This is important because this means it's not doing the same thing on different computers. So it's entirely possible that on a 550 TI it's doing something differently than on other GPUs. This might be because the different memory architecture makes it faster to, for example, do more floating point operations of type X, which are faster, instead of doing fewer floating point operations of type Y, which are slower. Perhaps the extra floating point ops push the rounding errors over the limit. Again, this is 100% speculation.
If this is true, then it's not just the 550 TI causing the problem. It's the combination of 550 TI and your CPU, since the CPU performance could affect how it decides to to execute the FFT.
Or, the different execution path could have a bug in it.
One thing that you can try (forgive me if I already suggested this earlier) is to run a smaller number through GeneferCUDA on the 550 TI. The larger numbers (like a lot of those in the -b tests) are REALLY close to the max-b limits, so it doesn't take much to push them over. Try this command line with either the boinc 1.01 or the PSA 0.97 releases of geneferCUDA and see if it works:
geneferCUDA (or whatever) -q "1234^8192+1"
Or try any of the WUs currently being handed out by boinc. I think they're all pretty small, at least so far.
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
I tested my "bad" GTX550 Ti with genefercuda boinc. At the actual search range, I noticed no errors. Probably the error occurs again when the range of gfn262144 and gfn524288 is reached.
Regards Odi
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
Unfortunately not. That part of the code is unchanged, and neither I nor Shoichiro have any idea what's wrong with it. IIRC, he said that HIS 550TI works find under Linux. I'm not sure if it's Linux, or if it's that only some of the 550 TIs are affected.
Warning: Pure, unadulterated, right off the top of my head speculation follows. It's probably all wrong.
I can think of a few possibilities.
1) Everyone who is having a problem with the 550 TI has it OC'd too high. The problem seems to be similar to what I'm observing now with some OC'd GPUs. So you could try underclocking the 550TI as far down as it will as a test to see if it changes the results. It's possible that, for some reason, "OC'd too high" includes stock clock rates on some cards.
2) Ok, really geeky math nerd stuff follows. There may be one or two people reading this who completely understand this stuff. I'm not one of them ;-)
Almost every type of primality testing program we have uses Fast Fourier Transforms. That includes geneferCUDA, which uses Nvidia's cuFFT libraries.
Nvidia's documentation says it's compatible with the open source FFTW libraries, which makes sense because FFTW is one of the fastest out there. So I suspect cuFFT works, internally, in a similar fashion to the way FFTW works.
FFTW doesn't do things in a predetermined manner. During initialization, it tests various strategies to see which is fastest. Then it executes the FFT with the fastest method.
I'm guessing cuFFT does the same thing. This is important because this means it's not doing the same thing on different computers. So it's entirely possible that on a 550 TI it's doing something differently than on other GPUs. This might be because the different memory architecture makes it faster to, for example, do more floating point operations of type X, which are faster, instead of doing fewer floating point operations of type Y, which are slower. Perhaps the extra floating point ops push the rounding errors over the limit. Again, this is 100% speculation.
If this is true, then it's not just the 550 TI causing the problem. It's the combination of 550 TI and your CPU, since the CPU performance could affect how it decides to to execute the FFT.
Or, the different execution path could have a bug in it.
One thing that you can try (forgive me if I already suggested this earlier) is to run a smaller number through GeneferCUDA on the 550 TI. The larger numbers (like a lot of those in the -b tests) are REALLY close to the max-b limits, so it doesn't take much to push them over. Try this command line with either the boinc 1.01 or the PSA 0.97 releases of geneferCUDA and see if it works:
geneferCUDA (or whatever) -q "1234^8192+1"
Or try any of the WUs currently being handed out by boinc. I think they're all pretty small, at least so far.
Mike
Thanks for the reply Mike.
As for possibility #1, it can be eliminated I think. My wife's 550 card is a stock clocked EVGA offering and I never adjusted the clocks (it is not even a factory OC version).
I think we can also eliminate the CPU/GPU combo possibility in #2. Her 550Ti is paired with an AMD 1100T, so I don't think that the CPU performance is an issue.
I also already had done the smaller numbers test and they had worked fine. I think you are on to the issue with the FFT initialization issue. As I recall, when the 550Ti first came out, several games had issues with the card due to the odd memory configuration, and these needed some driver tweaking by NVidia to fix it (i.e., they wrote a software work around for the issue on this card). Such tweaking probably equates to something similar to what you have described above...and since the Linux drivers would not be tweaked the same way as would the Windows drivers (e.g., some of the game tweaks would not be applicable to Linux), that might explain why problems exist only on Windows boxes.
____________
141941*2^4299438-1 is prime!
|
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
____________
@AggieThePew
|
|
|
|
Project seems to be very 'sensitive' towards overclocking..
can look at my results about..
Greetings .. parabol
____________
I'm a prime millionaire !
9*2^3497442+1 |
|
|
|
I tested my "bad" GTX550 Ti with genefercuda boinc. At the actual search range, I noticed no errors.
Same with mine, so far. |
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Project seems to be very 'sensitive' towards overclocking..
can look at my results about..
Greetings .. parabol
Yes, same as with the CPU, primality testing is rougher on the hardware than sieving.
____________
My lucky number is 75898524288+1 |
|
|
|
Have you checked the correction factor? Mine has crazily raised to 95! |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
From the boinc cloint's project tab, select PrimeGrid and hit the Properties button.
Near the bottom you'll see "Duration correction factor." You'll need to scroll down to see it. It it's higher than about 10, shut down boinc (the whole thing, not just the GUI), and edit client_state.xml to change that value to something reasonable like 1.0. Restart Boinc and see if that fixes the problem.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Have you checked the correction factor? Mine has crazily raised to 95!
I think when they increased the WU size on genefer, they didn't increase the number of flops in the Work description. Or didn't increase it enough. As a result, all of our computers are now about 100 times slower than they used to be, at least according to BOINC.
Then again, this IS beta testing, and one of the purposes of testing is to figure out what the right values for things like that are.
If everything worked the first time, we wouldn't need testing.
Mike
____________
My lucky number is 75898524288+1 |
|
|
|
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
DCF is raising fast with the genefers getting longer and longer. The first I did (yesterday) took ~40 seconds. They're taking over 20 minutes now on my card. I think the estimated size of the tasks has not been updated, causing DCF to go nuts.
update:
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu. |
|
|
|
Out of work now. |
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
From the boinc cloint's project tab, select PrimeGrid and hit the Properties button.
Near the bottom you'll see "Duration correction factor." You'll need to scroll down to see it. It it's higher than about 10, shut down boinc (the whole thing, not just the GUI), and edit client_state.xml to change that value to something reasonable like 1.0. Restart Boinc and see if that fixes the problem.
It's 100! I have corrected it, lets'see if it helps (currently the cache is empty so no tasks available)
____________
|
|
|
|
Thanks for the insights and answers Mike.
The wu's are definitely getting longer - now 7+ minutes on my 460 but Arjant, I'm able to get a cache load of work. Not sure you may have already solved that issue. I did drop my cache because it looks like it was too high and I had a lot of aborted work.
Still not solved. Only getting one 1 task a time, pretty annoying!
Any solutions?
Did you run PPS LLR tasks when they were very large during the challenge? If so, your DCF probably got erroneously raised to a value much higher than it should be.
From the boinc cloint's project tab, select PrimeGrid and hit the Properties button.
Near the bottom you'll see "Duration correction factor." You'll need to scroll down to see it. It it's higher than about 10, shut down boinc (the whole thing, not just the GUI), and edit client_state.xml to change that value to something reasonable like 1.0. Restart Boinc and see if that fixes the problem.
It's 100. Strange, because when I started on Friday with the test I didn't have the problem.
Anyway, i have corrected it, lets'see if it helps (currently the cache is empty so no tasks available)
I have the same value. The runtimes didnt change anymore. You can only reset the debt values with cc_config.xml |
|
|
|
I've noticed that incredible result of GTX 570 has a simple explanation:
all of its WU's finished with
maxErr exceeded for ..., 0.5000 > 0.4500
____________
|
|
|
|
Changing WU's to 262144 brings the error back to the GTX550 Ti:
<stderr_txt>
5480^262144+1end
[...]
Testing 5480^262144+1...
Testing 5480^262144+1... 3255817 steps to go
Testing 5480^262144+1... 3211264 steps to go
maxErr exceeded for 5480^262144+1, 0.5000 > 0.4500
20:16:46 (6696): called boinc_finish
</stderr_txt>
But without Validator, the task is displayed as finished and not as invalid.
Regards Odi
____________
|
|
|
|
Editing the client_state.xml has done the job.
Thanx.
Jobs are getting much longer and heavier for the GPU. Temps are rising and now the lag is on the screen is really noticable.
____________
|
|
|
|
Changing WU's to 262144 brings the error back to the GTX550 Ti:
Same here. Doing a test with extreme underclocking (602/1804 vs the stock 900/1800). estimate time (my estimate, after 2,5% done) is 7,5 hours...
If it works, could it means there's really a problem with the memory of this card (GGDR5), unable to stay up to the fast clocks of the cores? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu.
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Editing the client_state.xml has done the job.
Thanx.
Jobs are getting much longer and heavier for the GPU. Temps are rising and now the lag is on the screen is really noticable.
Temps should be similar to running any other GPU WU. Screen lag... Not sure what to say about that at this point. I don't see lag on my machine (GTX460 and Core2Quad) most of the time.
"Most" being the important word there.
There's two times I notice lag when running genefercuda:
1) When the User Authorization system dims the whole screen
2) When running certain Microsoft apps, like Live Mail. They seem to do something funky with the display driver, and it gets slowed down by genefer. Really not sure why.
Most of the time I don't see any ill effects, at least not until getting to much larger WUs (as in, N >= at least 1 million)
____________
My lucky number is 75898524288+1 |
|
|
RytisVolunteer moderator Project administrator
 Send message
Joined: 22 Jun 05 Posts: 2649 ID: 1 Credit: 29,088,458 RAC: 146,985
                     
|
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu.
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
Here's one for you: http://www.primegrid.com/result.php?resultid=334011936
Edit: also
http://www.primegrid.com/workunit.php?wuid=238917357
http://www.primegrid.com/workunit.php?wuid=238990514
____________
|
|
|
|
Editing the client_state.xml has done the job.
Thanx.
Jobs are getting much longer and heavier for the GPU. Temps are rising and now the lag is on the screen is really noticable.
Temps should be similar to running any other GPU WU. Screen lag... Not sure what to say about that at this point. I don't see lag on my machine (GTX460 and Core2Quad) most of the time.
Most of the time I don't see any ill effects, at least not until getting to much larger WUs (as in, N >= at least 1 million)
I'm not seeing temps rising, but screen lag is there. Not sure if it is caused by the task itself or by the underclock described below (35% after 2h15 now. previous task, at stock clock, failed after 2 or 3 minutes with the maxErr error). |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
I am noticing numerous computation errors on a remote machine that is in use doing other things frequently. I have it set to not use GPU when active, and it looks like this may be causing the errors when a user starts working on the box. Can anyone confirm that this happens?
____________
141941*2^4299438-1 is prime!
|
|
|
|
With the 34sec tasks my GPU was at 61c, now at 75c.
Which is the same as for PPS Sieve. Same applies to the screen lag.
So not strange, I was suprised it wasn't there at the smaller units.
Temps are not a problem, I can alsways raise my fanspeed. But it does indicate tasks are getting heavier.
____________
|
|
|
|
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
Here's one for you: http://www.primegrid.com/result.php?resultid=334011936
Edit: also http://www.primegrid.com/workunit.php?wuid=238917357
If you look, you'll see that residues are different in those tasks because the error is different in both wingmans (zero in one and above 0 in the other, although contained in the maxErr limit.)
I imagine that unless error is zero, no consensus will be achieved on genefer tasks. |
|
|
|
It would indeed be nice to have a validator.
I have looked through some of my finished work and see a couple of users with strange runtimes, like this unit.
http://www.primegrid.com/workunit.php?wuid=239173395
I have several units with Ardo as wingman (GTX570) and he has serveral of these very short runtimes.
Perhaps my own GTX580 is generating rubbish results, would be nice to know.
____________
|
|
|
|
may be late to the party but i've have a bunch that show validated. credit's showing up anywhere from .01 to 2+ per unit. also had one wu that took 11,000 + seconds but it validated. so far no invalid but several user aborted tasks.
anyone else showing validated work?
____________
@AggieThePew
|
|
|
|
may be late to the party but i've have a bunch that show validated. credit's showing up anywhere from .01 to 2+ per unit. also had one wu that took 11,000 + seconds but it validated. so far no invalid but several user aborted tasks.
anyone else showing validated work?
Yep. Half a dozen. Max credit 1.08. Guessing badges will be hard to get :) |
|
|
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I am noticing numerous computation errors on a remote machine that is in use doing other things frequently. I have it set to not use GPU when active, and it looks like this may be causing the errors when a user starts working on the box. Can anyone confirm that this happens?
I was thinking about this scenario -- having genefer shut down, but stay in memory, might be problematic because the user may screw up the GPU.
I was thinking that "leave task in memroy" being turned off might actually be better.
If that resolves the problem, I might know how to fix it -- maybe. It depends on how boinc does the -in-memory suspend.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
If everything worked the first time, we wouldn't need testing.
Agreed. Validation would be nice, though. Not because of credit, but to check the reliability of the gpu.
I think it would be hard to not get the maxErr exceeded error and still have a computation error, but I agree, it would be nice to have the validator working so we could know that for sure.
Here's one for you: http://www.primegrid.com/result.php?resultid=334011936
Edit: also
http://www.primegrid.com/workunit.php?wuid=238917357
http://www.primegrid.com/workunit.php?wuid=238990514
For 238917357, one of the two results is wrong, one is right. My GPU matches one of those two. So, I was wrong; you can get a bad result without hitting maxErr.
For 238990514, two of the 3 completed results MATCH, so those two should have validated.
Same thing with 239147407; two of the three residuals match and they should have validated.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I imagine that unless error is zero, no consensus will be achieved on genefer tasks.
Oddly enough, no.
My dinner's getting cold, so no long explanation this time. :)
Errors below 0.45 are to be expected and are ok. It's because of the fact that we're doing bizarre stuff with integers using floating point arithmetic. The closest analogy I can come up with is with using the Transporter from Star Trek to 'beam' a number from one place to another, decomposing it into little tiny bits (pun intended), then putting it back together. Some wierdness (the small rounding errors) is expected and not a problem.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I have 2 WUs that did validate, so the validator IS validating. I don't understand why those two WUs that Rytis pointed out did not validate.
____________
My lucky number is 75898524288+1 |
|
|
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737
Thx for the info. I also thinked about downclocking the card, but for now, I retire my gtx 550ti from genefercuda to prevent more invalid results.
But it seems that GTX 570 is also one of these "bad" cards.
Regards Odi
____________
|
|
|
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737
Thx for the info. I also thinked about downclocking the card, but for now, I retire my gtx 550ti from genefercuda to prevent more invalid results.
But it seems that GTX 570 is also one of these "bad" cards.
Regards Odi
Actually, it failed crunching a 524288 task, but then finished within the maxerr a 262144
http://www.primegrid.com/result.php?resultid=334075619. I'm waiting to see if it will validate. But the card is definitely incapable of doing the longer tasks.
I'm testing a GFN262144 at PRPNet, at 800core/1600shader Mhz on the 550 ti. 1000000 steps to go... If it finishes, it will be the first for this card.
Update: it finished successfully in ~two hours.
I've also seen those 570 weird tasks, but all belonged to the same host (Ardo's, if i remember) . So, it could be an oc probem rather than a model issue (as it seems to be with the 550TI). |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
If that resolves the problem, I might know how to fix it -- maybe. It depends on how boinc does the -in-memory suspend.
As far as i know exists no suspend for GPU-work. If you suspend a task on GPU, the task will be completely unloaded from the GPU-memory and you loose the work state without an existing checkpoint.
This behaviour is totally different to the work on CPUs.
The boinc dev's discussed this problem on their mailing list somewhen in the past and IIRC, the conclusion was, the setting "leave tasks in memory" does not work with GPUs. I don't know, if this ever has changed. Maybe i missed simply this email. If so then should "Ageless" find this information in his emails. As far as i know, archives he all emails from the list. Or you try to find this info in the Berkeley mailman archive.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
If that resolves the problem, I might know how to fix it -- maybe. It depends on how boinc does the -in-memory suspend.
As far as i know exists no suspend for GPU-work. If you suspend a task on GPU, the task will be completely unloaded from the GPU-memory and you loose the work state without an existing checkpoint.
This behaviour is totally different to the work on CPUs.
The boinc dev's discussed this problem on their mailing list somewhen in the past and IIRC, the conclusion was, the setting "leave tasks in memory" does not work with GPUs. I don't know, if this ever has changed. Maybe i missed simply this email. If so then should "Ageless" find this information in his emails. As far as i know, archives he all emails from the list. Or you try to find this info in the Berkeley mailman archive.
I came to the same conclusion.
If you kill the GPU task, it can restart from its checkpoint. If it's suspended in memory, there's no guarantee that stuff it has stored on the GPU will be preserved while the task is suspended.
So it's better if the GPU task is killed and restarted. That's would work particularly well with my save-before-killing method, since you usually won't lose any work when the task is unloaded.
I would swear, however, that I tested this and genefer *did* stay in memory when it was suspended with the keep in memory flag on. It would be better if it didn't, however.
Of course, the easiest way to see what's going on is just to look at the code for the boinc client.
____________
My lucky number is 75898524288+1 |
|
|
|
Wow 1.08 credits max per task. Will this be fixed or revalidated with fixed credits per level? |
|
|
|
Thats only when testing validation.
Lennart |
|
|
|
I keep having problems with no buffer.
I changed the duration correction factor yesterday and did get some more units. Now its back to 100,000 and only getting 1 tasks.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I keep having problems with no buffer.
I changed the duration correction factor yesterday and did get some more units. Now its back to 100,000 and only getting 1 tasks.
I can't speak with authority on this, but my guess is they've been having trouble with keeping the GLFOPS settings on the workunits accurate. The workunits have varied in duration from 40 seconds to 5 hours, a factor of 500:1. If they didn't *correctly* change the GFLOPS setting each time they changed the WU size, it's going to wreck havoc with the DCF.
Knowing/estimating/guessing what the correct GLFOPS settings should be is not easy.
By the way, there may be some *REAL* WUs in the queue now. Real, in this case, means new, unsearched virgin numbers for our crunching pleasure. If your WU has an N of 262144 and b above 500,000 those are real WUs. Those should take about 90 minutes to process. They're not yet the 8 day world record search numbers, but they're real numbers to be searched. It's exactly the same as the numbers being searched over on PSA at the GFN262144 port.
Anything with N of 262144 or 524288 is real if the b is greater than the range being searched over on the PSA. If N is greater than 524288, it's a real WU regardless of b.
____________
My lucky number is 75898524288+1 |
|
|
|
I keep having problems with no buffer.
I changed the duration correction factor yesterday and did get some more units. Now its back to 100,000 and only getting 1 tasks.
I can't speak with authority on this, but my guess is they've been having trouble with keeping the GLFOPS settings on the workunits accurate. The workunits have varied in duration from 40 seconds to 5 hours, a factor of 500:1. If they didn't *correctly* change the GFLOPS setting each time they changed the WU size, it's going to wreck havoc with the DCF.
Knowing/estimating/guessing what the correct GLFOPS settings should be is not easy.
By the way, there may be some *REAL* WUs in the queue now. Real, in this case, means new, unsearched virgin numbers for our crunching pleasure. If your WU has an N of 262144 and b above 500,000 those are real WUs. Those should take about 90 minutes to process. They're not yet the 8 day world record search numbers, but they're real numbers to be searched. It's exactly the same as the numbers being searched over on PSA at the GFN262144 port.
Anything with N of 262144 or 524288 is real if the b is greater than the range being searched over on the PSA. If N is greater than 524288, it's a real WU regardless of b.
It is correct ! We have real work at N= 262144 in now.
Rytis is working on validation but remember "he have a Real life also :) !
We are also working on the credit issue.
Time is increasing by N but also by b.
Delay time is at 24 hr now on the new work. This will be increased as well when we go from beta mode.
Lennart
|
|
|
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
____________
|
|
|
ardo  Send message
Joined: 12 Dec 10 Posts: 168 ID: 76659 Credit: 1,690,471,713 RAC: 0
                   
|
gtx550 ti really can't handle these tasks. heavily underclocked, max err exceeded after almost three hours:
http://www.primegrid.com/result.php?resultid=334072737
Thx for the info. I also thinked about downclocking the card, but for now, I retire my gtx 550ti from genefercuda to prevent more invalid results.
But it seems that GTX 570 is also one of these "bad" cards.
Regards Odi
Actually, it failed crunching a 524288 task, but then finished within the maxerr a 262144
http://www.primegrid.com/result.php?resultid=334075619. I'm waiting to see if it will validate. But the card is definitely incapable of doing the longer tasks.
I'm testing a GFN262144 at PRPNet, at 800core/1600shader Mhz on the 550 ti. 1000000 steps to go... If it finishes, it will be the first for this card.
Update: it finished successfully in ~two hours.
I've also seen those 570 weird tasks, but all belonged to the same host (Ardo's, if i remember) . So, it could be an oc probem rather than a model issue (as it seems to be with the 550TI).
That host has two ASUS GTX570 DirectCU II cards with factory settings...
____________
Badge score: 2*5 + 8*7 + 3*8 + 3*9 + 1*10 + 1*11 + 1*13 = 151
|
|
|
|
I am noticing numerous computation errors on a remote machine that is in use doing other things frequently. I have it set to not use GPU when active, and it looks like this may be causing the errors when a user starts working on the box. Can anyone confirm that this happens?
Confirm.
And even can say more. My work GPU is 2nd in SLI. My master GPU is free for DC.
When I don't touch comp, a WU finishs good.
When I start to use GPU for example for Flash 11 (that is using GPU optimization), WUs brakes with maxErr.
____________
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
Well, it looks like my GTX 550 has completed two of the 262k units successfully. For example...
501082^262144+1 is a probable composite. (RES=23c1964028671da9) (1494197 digits) (err = 0.1016) (time = 1:35:52) 01:38:03
01:38:03 (5904): called boinc_finish
...so I'll keep it churning on these to see if a 524k unit can be done, also.
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Your situation has me puzzled at the moment. You're running the same environment (win 7 x64) and the same boinc client (6.12.34) as I am, so that makes things simple. This worked fine when I tested it, so something's different here, and I'm not sure what.
The four errors have this in their output:
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
____________
My lucky number is 75898524288+1 |
|
|
|
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
I also looked in my results and found some -161 errors at the 65536 range. But at these host, I never stopped and restart gpu working, because these gpu running 24/7 exclusively for boinc. On this machine I'm only working on a Card which is excluded from boinc.
Maybe it stopped because other GPU-Work was in queue, but I don't remember if this was 2 days ago.
Regards Odi
____________
|
|
|
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Your situation has me puzzled at the moment. You're running the same environment (win 7 x64) and the same boinc client (6.12.34) as I am, so that makes things simple. This worked fine when I tested it, so something's different here, and I'm not sure what.
The four errors have this in their output:
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
Same problem again, could be 1:100. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I also looked in my results and found some -161 errors at the 65536 range. But at these host, I never stopped and restart gpu working, because these gpu running 24/7 exclusively for boinc. On this machine I'm only working on a Card which is excluded from boinc.
Maybe it stopped because other GPU-Work was in queue, but I don't remember if this was 2 days ago.
Regards Odi
You have a lot of hosts. :)
I found two that had genefer WUs with errors, and in both cases, for all the errors, genefer detected that boinc had told it to shut down. I couldn't find any instances of errors where that was not the case.
But there's a lot of WUs to look through, so I may have missed it. Could you provide a link to one of those WUs?
Thanks.
____________
My lucky number is 75898524288+1 |
|
|
|
For reasons I can't comprehend BOINC has shoved the genefer unit into hi-prio because the deadline is <24 hours away but the remaining time is <1 hour.
Also noticing the screenlag reported before.
As suggested somewhere I tried freeing up one (or more) CPU cores. This does help a bit, but doesn't remove the screenlag completely. It's just reduced.
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
For reasons I can't comprehend BOINC has shoved the genefer unit into hi-prio because the deadline is <24 hours away but the remaining time is <1 hour.
Also noticing the screenlag reported before.
As suggested somewhere I tried freeing up one (or more) CPU cores. This does help a bit, but doesn't remove the screenlag completely. It's just reduced.
Yes, that is correct. Having a free core does make the GPU more responsive, both in terms of more crunching time and also screen responsiveness. Sometimes it can make a big difference.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I thought my browser just died a horrible death. Or the forums died a horrible death. When I went to read this thread, it was somewhere in the PSA section. Now it's not. :) I was very disoriented when I popped up a level and wasn't where I expected to be!
Thanks, John (presumably), for for giving Genefer (actually GFN) it's own topic.
Mike
____________
My lucky number is 75898524288+1 |
|
|
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Thanks, John (presumably), for for giving Genefer (actually GFN) it's own topic.
Thanks goes to Rytis!
____________
|
|
|
|
I thought my browser just died a horrible death. Or the forums died a horrible death. When I went to read this thread, it was somewhere in the PSA section. Now it's not. :) I was very disoriented when I popped up a level and wasn't where I expected to be!
Thanks, John (presumably), for for giving Genefer (actually GFN) it's own topic.
Mike
Me too - was wondering what happened.. now all is right again. |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Your situation has me puzzled at the moment. You're running the same environment (win 7 x64) and the same boinc client (6.12.34) as I am, so that makes things simple. This worked fine when I tested it, so something's different here, and I'm not sure what.
The four errors have this in their output:
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
The first part says BOINC told Genefer "It's time go, stop processing now", and the second part says "Hey, you stopped before you were done, so something must be wrong."
That doesn't make a lot of sense to me at the moment. The frst part DOES make sense; when you pause the task, BOINC will shut it down. The second part about not seeing the output file seems to indicate that BOINC forgot it told genefer to shut down.
So, for now, it's a mystery... until it's not. ;-)
I am getting tons of these on this host. Same error across different workunit sizes. Machine is remote and set to not use GPU while in use (resumes after 1 minute idle).
____________
141941*2^4299438-1 is prime!
|
|
|
|
Pausing the boinc application leads to immediate error on the tasks over here.
So that is a serious problem for me. I am not shutting down boinc everytime I want to do something else with my computer... Option 'keep tasks in memory' is not making a difference.
Seeing the same here. Paused a genefer unit because it messed with my video playback but this only caused it to error out, report and start a new one.
Relevant units are http://www.primegrid.com/result.php?resultid=334129310 and http://www.primegrid.com/result.php?resultid=334128885
____________
PrimeGrid Challenge Overall standings --- Last update: From Pi to Paddy (2016)
|
|
|
ardo  Send message
Joined: 12 Dec 10 Posts: 168 ID: 76659 Credit: 1,690,471,713 RAC: 0
                   
|
OK, figured out what might have caused the host with the GTX570s to misbehave: I was using a GT210 as my main video card so that the two GTX570s could use their full potential for number crunching. In that setup I just ran the GeneferCUDA benchmark and got too high max errors for the larger numbers. I took out the GT210 and moved the GTX750s into the slots per MB manual. Rerunning the GeneferCUDa benchmarks things look much better for each card.
I resumed the boinc tasks and things look better.
However, I just noticed that when a shorter task on one card finishes, the task on the other card also is "finished" but without an output file..
____________
Badge score: 2*5 + 8*7 + 3*8 + 3*9 + 1*10 + 1*11 + 1*13 = 151
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
Terminating because BOINC client request that we should quit.
12:35:07 (6328): called boinc_finish
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>genefer_262144_6241_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
Okay,I am getting this kind of error on a machine where the GPU is set to always run even when in use. See here. This specific error I can trace to an exact event...My wife started up the game FreeCell on her machine. So it looks like anything active on the GPU other than Genefer might create this error???
____________
141941*2^4299438-1 is prime!
|
|
|
|
This specific error I can trace to an exact event...My wife started up the game FreeCell on her machine. So it looks like anything active on the GPU other than Genefer might create this error???
550 TI seem to be extremely sensitive when running genefers. I've seen it happen after waking the display from screen saver and once it happened just with a task climbing up on the boinc manager task page. At least with the current drivers (under windows), I believe genefers are out of 550 ti limits. |
|
|
ardo  Send message
Joined: 12 Dec 10 Posts: 168 ID: 76659 Credit: 1,690,471,713 RAC: 0
                   
|
OK, figured out what might have caused the host with the GTX570s to misbehave: I was using a GT210 as my main video card so that the two GTX570s could use their full potential for number crunching. In that setup I just ran the GeneferCUDA benchmark and got too high max errors for the larger numbers. I took out the GT210 and moved the GTX750s into the slots per MB manual. Rerunning the GeneferCUDa benchmarks things look much better for each card.
I resumed the boinc tasks and things look better.
However, I just noticed that when a shorter task on one card finishes, the task on the other card also is "finished" but without an output file..
That last was apparently a leftover from before, as the last couple of tasks have been processed to completeness successfully...
____________
Badge score: 2*5 + 8*7 + 3*8 + 3*9 + 1*10 + 1*11 + 1*13 = 151
|
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2179 ID: 1178 Credit: 9,035,206,113 RAC: 13,584,644
                                      
|
This is not the "-161" error that we are seeing across lots of cards that is related to pausing the GPU, etc.
Rather, I am surprised that my GTX 550 TI is completing any of the 262k or larger units as none of these worked on PRPnet. Looking at results so far, I am seeing a very mixed bag of results. About half result in the max err problem that occurred on PRPnet, though at different steps:
Testing 501002^262144+1... 4963553 steps to go
Testing 501002^262144+1... 4915200 steps to go
maxErr exceeded for 501002^262144+1, 0.5000 > 0.4500
04:46:08 (1952): called boinc_finish
Testing 500538^262144+1... 3604480 steps to go
maxErr exceeded for 500538^262144+1, 0.5000 > 0.4500
05:12:57 (4044): called boinc_finish
Testing 500454^262144+1... 2621440 steps to go
maxErr exceeded for 500454^262144+1, 0.5000 > 0.4500
05:58:23 (4280): called boinc_finish
Testing 501922^262144+1... 1114112 steps to go
maxErr exceeded for 501922^262144+1, 0.5000 > 0.4500
14:52:49 (3992): called boinc_finish
Testing 500574^262144+1... 4259840 steps to go
maxErr exceeded for 500574^262144+1, 0.5000 > 0.4500
18:26:48 (4048): called boinc_finish
Testing 502846^262144+1... 3735552 steps to go
maxErr exceeded for 502846^262144+1, 0.5000 > 0.4500
18:12:42 (5412): called boinc_finish
Testing 501500^262144+1... 4194304 steps to go
maxErr exceeded for 501500^262144+1, 0.5000 > 0.4500
07:49:44 (3388): called boinc_finish
Testing 502246^262144+1... 4325376 steps to go
maxErr exceeded for 502246^262144+1, 0.5000 > 0.4500
13:38:31 (4552): called boinc_finish
Testing 5100^524288+1... 6029312 steps to go
maxErr exceeded for 5100^524288+1, 0.5000 > 0.4500
15:30:39 (3748): called boinc_finish
While about half complete with no problems:
501082^262144+1 is a probable composite. (RES=23c1964028671da9) (1494197 digits) (err = 0.1016) (time = 1:35:52) 01:38:03
01:38:03 (5904): called boinc_finish
500874^262144+1 is a probable composite. (RES=d5e98a93688693dd) (1494150 digits) (err = 0.1094) (time = 1:35:42) 07:34:09
07:34:09 (5476): called boinc_finish
501756^262144+1 is a probable composite. (RES=6fe75633c14c20ba) (1494350 digits) (err = 0.1094) (time = 1:35:49) 11:49:56
11:49:56 (5616): called boinc_finish
501498^262144+1 is a probable composite. (RES=8f5dd4e9aed53071) (1494292 digits) (err = 0.3379) (time = 1:36:16) 09:26:03
09:26:03 (2356): called boinc_finish
501860^262144+1 is a probable composite. (RES=663982d90bfb4e4a) (1494374 digits) (err = 0.1016) (time = 1:35:39) 13:25:38
13:25:38 (992): called boinc_finish
5866^262144+1 is composite. (RES=d2289cc9c4b54a4d) (987849 digits) (err = 0.0020) (time = 1:03:10) 16:53:54
16:53:54 (540): called boinc_finish
5316^262144+1 is composite. (RES=ee53b0c252c2a799) (976640 digits) (err = 0.0000) (time = 1:02:19) 03:41:53
03:41:53 (1884): called boinc_finish
5580^262144+1 is composite. (RES=c81b367c9ec237a4) (982158 digits) (err = 0.0000) (time = 1:02:39) 04:44:35
04:44:35 (5708): called boinc_finish
Anyone see any pattern in what is going on here? or is it just random instability such that the 550's are better retired to another project permanently?
____________
141941*2^4299438-1 is prime!
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I suspect that the pausing problem is caused by my super special code that checkpoints before be shut down. Unfortunately, that seems to cause problems sometimes, but not always in my test environment. I've pulled that code out, so the problem should go away with v1.02 when it's released. That means that genefercuda will only checkpoint when it's scheduled to checkpoint according to the preferences you set.
____________
My lucky number is 75898524288+1 |
|
|
|
Can anyone tell me how would run times of GeneferCUDA on GTX570 compare to those on i7-2600K when searching for a prime of the same digit length?
It'm curious what is better in terms of money spent, assuming that i7 can run 8 threads and consumes twice as less. Also the CPU overclocks much better when an overclocked GPU as stated in this thread starts to error.
Thank you for answers. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Can anyone tell me how would run times of GeneferCUDA on GTX570 compare to those on i7-2600K when searching for a prime of the same digit length?
It'm curious what is better in terms of money spent, assuming that i7 can run 8 threads and consumes twice as less. Also the CPU overclocks much better when an overclocked GPU as stated in this thread starts to error.
Thank you for answers.
I get a 10:1 ratio with a GTX 460 vs. a Q6600. Both are slower than the 570/2600K, but the i7's speed will be cut in half by hyperthreading. My guess would be you'd be in the vicinity of 10:1, but that's just a guess.
There's folks here who have 570/i7 combos, so hopefully you'll get a better answer.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
I suspect that the pausing problem is caused by my super special code that checkpoints before be shut down. Unfortunately, that seems to cause problems sometimes, but not always in my test environment. I've pulled that code out, so the problem should go away with v1.02 when it's released. That means that genefercuda will only checkpoint when it's scheduled to checkpoint according to the preferences you set.
I forgot that now that the genefer server is running, I can test new versions of the software against the real server using app_info.xml. That makes things SO much simpler!
v1.02 does in fact seem to solve the problem. It suspends properly regardless of whether the 'keep-in-memory' flag is on or off.
____________
My lucky number is 75898524288+1 |
|
|
|
Thank you Michael for the reply!
Here's another question. ATI cards show much better double-precision performance than NVidia. For instance, HD5850 performs 2 times faster than GTX 580. Maybe the next step should be getting an OpenCL version of Genefer?
We would crunch faster and attract more crunchers with ATI cards from other projects.
Cheers! |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Thank you Michael for the reply!
Here's another question. ATI cards show much better double-precision performance than NVidia. For instance, HD5850 performs 2 times faster than GTX 580. Maybe the next step should be getting an OpenCL version of Genefer?
We would crunch faster and attract more crunchers with ATI cards from other projects.
Cheers!
All true, except perhaps for the OpenCL part. OpenCL is portable, but when speed counts (and when doesn't speed count when you're talking about a GPU???) you might be better of with something written specifically for the ATI than something that's portable. This is far more true of GPUs that CPUs.
____________
My lucky number is 75898524288+1 |
|
|
|
maxErr with 1.02:
http://www.primegrid.com/result.php?resultid=334128749
Tried to play flash game during crunch.
The difference with 1.01 is in Exit code:
1.01 - 0 (0x0)
1.02 - 10 (0xa)
____________
|
|
|
|
x3mEn [Kyiv]
Could it be that you are running Boinc Ver7+ maybe ??
Had also problems with that version
____________
I'm a prime millionaire !
9*2^3497442+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
maxErr with 1.02:
http://www.primegrid.com/result.php?resultid=334128749
Tried to play flash game during crunch.
The difference with 1.01 is in Exit code:
1.01 - 0 (0x0)
1.02 - 10 (0xa)
That's exactly what's supposed to happen. MaxErr is now handled as an error immediately.
The MaxErr isn't supposed to happen, of course. I'm running that number on my 460 and we'll see what happens.
If you're curious as to whether the error might have been caused by the flash game, try running the test manually from the command line. Use this as the parameters:
-q "504424^262144+1"
The quotes ARE necessary when the program is run from the Windows command line. Not necessary if genefer is invoked from another program, and probably not necessary on any other O.S. Might not even be necessary under really old versions of Windows, either. That applies to all PrimeGrid programs that have a -q (or similar) parameter. Windows uses "^" as a special character on the command line, so if you don't use the quotes, the program receives this as the command line:
-q "504424262144+1"
The carat is removed without the quotes.
Ok, good, I've babbled on long enough so that my test run has progressed past where yours failed:
C:\GeneferCUDA test\geneferCUDA-boinc.1.02>GeneferCUDA-boinc-windows.exe -q "504
424^262144+1"
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -q 504424^262144+1
Testing 504424^262144+1... 4390912 steps to go
I'lll let it run to the end just to make sure.
It looks like your other WUs completed normally, so maybe it was the game.
____________
My lucky number is 75898524288+1 |
|
|
|
x3mEn [Kyiv]
Could it be that you are running Boinc Ver7+ maybe ??
Had also problems with that version
Yes, I am using 7.07 version.
But I don't think it's a main problem.
One half of my WU's have finished succesfully when I was silently sitting and looking for WUs progress.
And "maxErr" were appeared first when I gave up place for my wife to play flash game.
Today I tried new 1.02 version and played flash game myself and "maxErr" happend again.
Now I am trying to run cmd line.
I'm sure that if I'll be waiting I'll achieve the end.
Now I'm here:
primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q 504424^2
62144+1 --device 0
Testing 504424^262144+1... 3801088 steps to go
And I don't play flash game now.
____________
|
|
|
|
Unfortunately I was wrong.
Even from the command line program finished with maxErr...
primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q "504424^262144+1" --device 0
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q 504424^262144+1 --device 0
Testing 504424^262144+1... 3473408 steps to go
maxErr exceeded for 504424^262144+1, 0.5000 > 0.4500
____________
|
|
|
|
Actually, I was NOT playing flash game, but It was running in the background... so the test wasn't clean.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
It might be a good idea to turn off the "Use GPU while computer is in use" checkbox.
Chances are if some other program is interfering with GeneferCUDA, it's probably interfering with any CUDA program you run. GeneferCUDA may simply appear to be more sensitive because it does more strenuous internal checks than other programs. Or it may be more sensitive.
____________
My lucky number is 75898524288+1 |
|
|
|
Maybe you are right, but I'm using GPU's with checked "Use GPU while computer is in use" for a very long time and had no problems with every GPU projects, except, maybe GPUGrid.
They say that GPUGrid is very sensitive to GPU memory accurancy.
I think that Genefer is very close to GPUGrid in requirements to GPU memory.
2x bigger GFN needs 2Ñ… more GPU memory I guess. So OC'd GPU's produced memory errors more frequently. My GTX 460 is factory OC'd: 810/2000/1620 instead of 675/1800/1350.
GDDR5 is not ECC, unfortunately...
____________
|
|
|
|
x, I'm running Evga gtx 460 with the shaders clocked to 1600. I had to drop it down from 1790 before any wu's would finish correctly. it's a 1 meg card but i'm ONLY running primegrid gpu tasks and have both cores idle. I'm also running a 32 bit vista os (don't laugh).
So now my question, it looks like most of the testing has settled down so I was wondering, when a wu is run and reported do i need to review logs to see if it actually finished correctly or is that now being reported to pg as an errored or invalid wu? Reason I'm asking is that even though I know we are not validating some validation testing must be going on because I have a lot of "valid" wu's with credit and of course lots of pending units.
____________
@AggieThePew
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Maybe you are right, but I'm using GPU's with checked "Use GPU while computer is in use" for a very long time and had no problems with every GPU projects, except, maybe GPUGrid.
You probably never played "Civilization V" then. ;-)
There are programs that absolutely will trample all over any CUDA program that has the misfortune to try to run at the same time.
Generally speaking, I agree with you. Most of the time CUDA co-exists nicely with other programs, but not all of the time. Although I keep my GPU crunching while I'm using the computer, I do have a list of programs in cc_config.xml that BOINC isn't allowed to run with simultaneously.
They say that GPUGrid is very sensitive to GPU memory accurancy.
I think that Genefer is very close to GPUGrid in requirements to GPU memory.
2x bigger GFN needs 2Ñ… more GPU memory I guess. So OC'd GPU's produced memory errors more frequently. My GTX 460 is factory OC'd: 810/2000/1620 instead of 675/1800/1350.
GDDR5 is not ECC, unfortunately...
Actually, that's not true -- and I'm not sure why not. GPU memory usage seems to peak at around 45% (out of 1 gig) no matter how high N goes.
My calculations on the 460 just finished:
C:\GeneferCUDA test\geneferCUDA-boinc.1.02>GeneferCUDA-boinc-windows.exe -q "504
424^262144+1"
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: GeneferCUDA-boinc-windows.exe -q 504424^262144+1
504424^262144+1 is a probable composite. (RES=3c7571725ce19f93) (1494954 digit
s) (err = 0.1016) (time = 1:33:51) 09:02:33
I think it was the flash game. That's interesting, and not really what one would expect.
Definitely good information to know.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
So now my question, it looks like most of the testing has settled down so I was wondering, when a wu is run and reported do i need to review logs to see if it actually finished correctly or is that now being reported to pg as an errored or invalid wu? Reason I'm asking is that even though I know we are not validating some validation testing must be going on because I have a lot of "valid" wu's with credit and of course lots of pending units.
Anything from here on out (specifically with v1.02) reports maxErr as an error, so you won't need to look at the logs. For WUs run with 1.01, the plan was to have the validator check for maxErr exceeded errors in the output and mark the WUs as invalid, rather than as computation errors.
I don't have any information on the status of the validator.
____________
My lucky number is 75898524288+1 |
|
|
|
You probably never played "Civilization V" then. ;-)
Yes, I have 2 years as not a gamer :)
Here's how just started crunches at GPU and stopped playing games :)
My calculations on the 460 just finished:
Ok, I'll try to test this one again in conditions of pharmaceutical purity: no flash :) , absolutely idle CPU and so on...
____________
|
|
|
|
So I just finished calulations on my GTX 460:
primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q "504424^262144+1" --device 0
GeneferCUDA-boinc 1.02 (CUDA3.2) based on GeneferCUDA 1.049 and Genefer 2.2.1
Copyright (C) 2001-2003, Yves Gallot (v1.3)
Copyright (C) 2009-2011, Mark Rodenkirch, David Underbakke (v2.2.1)
Copyright (C) 2010-2012, Shoichiro Yamada (CUDA)
Portions of this software written by Michael Goetz 2011-2012 (BOINC)
A program for finding large probable generalized Fermat primes.
Command line: primegrid_genefer_1.02_windows_intelx86__cuda32_13.exe -q 504424^262144+1 --device 0
504424^262144+1 is a probable composite. (RES=3c7571725ce19f93) (1494954 digits) (err = 0.1016) (time = 1:19:02) 18:10:55
So, it was a flash game.
Michael, I've noticed that first time I started command line program, it resumed from a checkpoint:
Resuming 504424^262144+1 from a checkpoint (3538943 iterations left)
So I have an idea. What if to restart program from a checkpoint when "maxErr" appeared? At least when a checkpoint has changed from the (re)start of program.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
So I have an idea. What if to restart program from a checkpoint when "maxErr" appeared? At least when a checkpoint has changed from the (re)start of program.
Theoretically possible, but there's two problems:
* By the time you would want to do this, BONC will have already reported the task back to the server as a computation error.
* This could be very dangerous -- the actual errors (or *some* of the errors) may have occurred prior to the checkpoint. In this situation, maxErr getting too high is a manifestation of a computation error. There could be other computation errors that occurred that did not cause a large rounding error that would be detected. So it's safer just to abandon the WU. You know something went wrong, but you don't know how badly, or when.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
But, pretty soon the boinc version will be faster than any other, which will be most noticeable at higher Ns. There's a CPU-bound initialization process which is trivial at lower N. However, at N=4194304, this initialization takes two hours. I've been able to optimize that 2 hour phase down to 21 minutes, and I think I can get it down to a about a minute. That's not released yet, so the boinc version is pretty much the same speed as the PSA version. (The boinc version, btw, is compatible with PRPNet.)
Alright, I've now got that working the the way I want it.
That 2-hour initialization phase now takes a small fraction of a second. (Isn't math wonderful?)
There will be a 1.03, with this optimization code, before we bump N up beyond 524288. I'll be doing a bit more testing with this version before releasing it since this is the first change I've made to the actual computations.
____________
My lucky number is 75898524288+1 |
|
|
|
Noticed a bunch more workunits. Just wondering what these were testing and what we needed to look for with these. thanks |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Noticed a bunch more workunits. Just wondering what these were testing and what we needed to look for with these. thanks
It looks like just more of the "real" WUs at N=262144. Mostly, at this point, the purpose is to do lots of testing to shake any other potential bugs out of the woodwork.
____________
My lucky number is 75898524288+1 |
|
|
|
Noticed a bunch more workunits. Just wondering what these were testing and what we needed to look for with these. thanks
It looks like just more of the "real" WUs at N=262144. Mostly, at this point, the purpose is to do lots of testing to shake any other potential bugs out of the woodwork.
Thanks.. so one more question. is the beta now available to the general public? I was thinking no since there weren't a lot of comments or questions flying around but just wondered. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Thanks.. so one more question. is the beta now available to the general public? I was thinking no since there weren't a lot of comments or questions flying around but just wondered.
It's not released to boinc yet. I put it out mostly so Shoichiro could get the source code for that b^N calculation.
I haven't started to thoroughly test it yet, and I'll want to do that. I'm not even sure yet exactly what king of testing I want to do with it. It will be going where no Genefer has ever gone before, and if it's doing something wrong it might be very hard to know something is amiss.
I can't stop anyone from grabbing it and running it with either prpnet or boinc (you would need an app_info.xml, but that's not hard), but I haven't done that yet myself so you could get to be the pioneer. It seems to work fine as a standalone app.
____________
My lucky number is 75898524288+1 |
|
|
|
Is it just me or have others seen a drop in runtimes... mine have gone from 4880 seconds down to 3212 seconds |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Is it just me or have others seen a drop in runtimes... mine have gone from 4880 seconds down to 3212 seconds
Just you. ;-)
The short WU was a re-issue of an older, shorter WU that needed to be sent to another computer.
____________
My lucky number is 75898524288+1 |
|
|
|
Is it just me or have others seen a drop in runtimes... mine have gone from 4880 seconds down to 3212 seconds
Just you. ;-)
The short WU was a re-issue of an older, shorter WU that needed to be sent to another computer.
Dang lol - thanks for letting me know |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Thanks.. so one more question. is the beta now available to the general public? I was thinking no since there weren't a lot of comments or questions flying around but just wondered.
As it turns out, the code I wrote stops working correctly at a very low b value at N=4194304. So it's back to the drawing board.
This is why we do testing. :)
____________
My lucky number is 75898524288+1 |
|
|
|
Something I do find interesting. Even though there's no recorded credit other than what each unit shows, it's affected my overall RAC. I had to laugh because all those .01 credits really added up.
____________
@AggieThePew
|
|
|
|
Because I don't want to get thousands of answers I will ask here. Opinions on the best brand and model of a gtx 570. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13524 ID: 53948 Credit: 244,359,622 RAC: 386,594
                          
|
Because I don't want to get thousands of answers I will ask here. Opinions on the best brand and model of a gtx 570.
I'm partial to EVGA myself, but that's not a very strong preference. My GTX 460 isn't EVGA; it was on sale. :)
DO make sure that whatever you buy has a lifetime warranty. That way a dead card == (usually) a free upgrade if enough time has passed so they don't have your old card in stock anymore.
My dead 280 got RMA'd with a 470. (That's an upgrade as my 460 crunches better than my 280 did).
Also, most manufacturers require that in order to enable the lifetime warranty, you much register the card. So don't forget to register.
____________
My lucky number is 75898524288+1 |
|
|
|