Author |
Message |
|
I have just found my wingman on SOB has produced one of the '3 second errors' mentioned in other threads. Up to now it has been dismissed as a simple set up error afffecting one user but I now have records of 4 users and 5 machines doing it.
My SOB wingman is new to PG, joined in April, has a modern i7 with Win8 and I don't think it is overclocked. He probably has the latest drivers and BOINC. He has the same fault on at least 75% of work from several subprojects from his task file, 321, PPS LLR, SOB, TRP LLR. All have mainly failures. I have sent him a PM to alert him to the problem and suggested deleting BOINC and reloading fron scratch.
With 4 users who happen to have been my or my wifes wingman there must be many others out there who I have not met. We may have a problem as this is going to make a massive list of jobs in progress waiting for rechecks.
I am unable to progress any investigation any further, but I think it needs looking at.
____________
Member team AUSTRALIA
My lucky number is 9291*2^1085585+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I have just found my wingman on SOB has produced one of the '3 second errors' mentioned in other threads. Up to now it has been dismissed as a simple set up error afffecting one user but I now have records of 4 users and 5 machines doing it.
My SOB wingman is new to PG, joined in April, has a modern i7 with Win8 and I don't think it is overclocked. He probably has the latest drivers and BOINC. He has the same fault on at least 75% of work from several subprojects from his task file, 321, PPS LLR, SOB, TRP LLR. All have mainly failures. I have sent him a PM to alert him to the problem and suggested deleting BOINC and reloading fron scratch.
With 4 users who happen to have been my or my wifes wingman there must be many others out there who I have not met. We may have a problem as this is going to make a massive list of jobs in progress waiting for rechecks.
I am unable to progress any investigation any further, but I think it needs looking at.
It's not limited to a single person -- but it's also not widespread.
Fortunately, one of the people who is affected by this is a frequent contributor here and has been working with us to try to figure out what the problem is.
It's not anything obvious and the problem is, so far, unexplained.
I will tell you what we know so far:
1) It only seems to affect LLR tasks, so the problem seems to be limited to either the llr app itself or the llr wrapper. Another possibility is that the small downloaded data file is not being created for some reason.
2) It doesn't affect everyone, but it does affect more than one person.
3) The problem started without any obvious changes being made on either the host computer or on the server. So far, there's no explanation for why it worked fine on one day and failed on the next day.
One thing that I'm considering is that this may be due to interference from an anti-virus program. That could explain the behavior we're seeing. That's just a hunch so far.
____________
My lucky number is 75898524288+1 |
|
|
|
I happen to be one of those who, unfortunately, produced 60 - 3 second errors during the challenge. In the hopes of helping to find a cure for it, here's what I experienced:
It began as my first WOO was reported and a new one to replace it began to run. I cut back that core and continued on. I checked everything I could think of and continued monitoring. Somewhere down the line of cutting back to just 4 cores, I reset the project. That seemed to fix it and I merrily finished the challenge without further incident.
After the challenge ended, I finished up the remaining time on my WOOs in progress. As they finished, I switched over to PPS & SGS llrs and it started again with them. Haven't counted how many of them I produced. This time re-setting did not work.
As I had just recently upgraded to BOINC 7.0.64, I thought maybe that had something to do with it. Trying to repair my 7.0.64 resulted in an error code I do not remember. I did a complete re-install of 7.0.64 and all is running fine to date. Since then I have found other 3-seconders running versions previous to 7.0.64.
I am currently half way through my second set of WOO units (running 6 cores) to assist with the clean-up until it is done. Everything is running fine. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Paperboy,
What, if any, anti-virus program is running on your computer?
____________
My lucky number is 75898524288+1 |
|
|
|
Running Winblows 8 Enterprise 64-bit with stock Win 8 protection which I believe is listed as Windows Defender but is really MS Essentials in disguise. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I have a request for anyone experiencing the 3-second problem: could you try two tests to help us diagnose the problem?
The first test is relatively easy: Try configuring your anti-virus scanner to ignore the boinc data directory. Then detach from PrimeGrid and re-attach to PrimeGrid. (Obviously, make sure all in-progress PrimeGrid tasks are completed first since this will kill them.) Now try running LLR again. Does this solve the problem?
I'm not optimistic that first test will work, but it needs to be tried.
The second test is a little more involved:
1) Create a new folder somewhere outside the boinc directory. For example, call it C:\wrapper
2) COPY (don't move) the following files from ...\boinc\projects\www.primegrid.com\ to \wrapper:
* primegrid_llr_wrapper_6.15_windows_x86_64.exe
* primegrid_cllr64_3.8.9_windows_x86_64.exe
* llr.ini.6.00
* pps_llr_xxxxxxxxxxx
That last file is the data input file for LLR that tells it what number to crunch. The x's will be replaced by a number. If the file name ends in "_0", that is the output file, not the input file. You need the input file. If you don't have a file like that, make sure your PrimeGrid preferences are set to send you PPS tasks, set the BOINC client to Activity: Suspend, and download some more tasks.
3) Rename the files in the \wrapper directory -- do NOT modify the originals in the boinc\projects\www.primegrid directory!!!
* To make things simple, rename primegrid_llr_wrapper_6.15_windows_x86_64.exe to wrapper.exe
* You MUST rename primegrid_cllr64_3.8.9_windows_x86_64.exe to primegrid_cllr.exe
* You MUST rename the input file, pps_llr_xxxxxxxxxxx, to llr.in (note that the extension is "in" and not the more common "ini")
4) Now you're ready to run the llr program the way boinc runs it. Open up a console window, CD to \wrapper, and finally type wrapper and hit enter and hopefully the llr program will work. If it does, you'll see output on the console like this:
C:\Temp\wrapper>wrapper
C:\Temp\wrapper>
When it's finished crunching that number it will have produced a file called stderr.txt that contains something like this:
BOINC LLR 6.03 wrapper: starting
09:11:05 (5296): Can't open init data file - running in standalone mode
Major OS version: 6; Minor OS version: 1
FFT length: 128K
All done!
10:07:59 (5296): called boinc_finish
There should also be an output file called llr.out that contains something like this:
393*2^1718845+1 is not prime. Proth RES64: 0E8B2594345F748C Time : 3412.717 sec.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I have 3 second problem for LLR. PPS Sieve still works fine. I added BOINC directories to AVG exceptions, Remove project PrimeGrid in BOINC, restarted PC, then added PrimeGrid back. Problem still occurs. Once in a while a single LLR will complete successfully.
Next I created the wrapper directory, copied the files in from ...\boinc\projects\www.primegrid.com\ and renamed them. Then I executed the wrapper and got the following in stderr.txt:
BOINC LLR 6.03 wrapper: starting04:54:46 (4892): Can't open init data file - running in standalone mode
Major OS version: 6; Minor OS version: 2
FFT length: 80K
All done!
05:11:20 (4892): called boinc_finish
The llr.out had the following:
7347*2^1065719+1 is not prime. Proth RES64: 18A5F4BE99FE2245 Time : 992.830 sec.
So this manual test passed. Bit of a strange one. |
|
|
|
TheDawgz don't run Windoze so we maybe way off here -
Any chance that it has to do with permissions/ownership/inheritance on the slot sub-directories or on the target files of the soft links that are created as part of starting up the wu?
____________
There's someone in our head but it's not us. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
TheDawgz don't run Windoze so we maybe way off here -
Any chance that it has to do with permissions/ownership/inheritance on the slot sub-directories or on the target files of the soft links that are created as part of starting up the wu?
Checking that was going to be my next suggestion.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Under Windows 8 you right click a file, choose Properties then click on the Security tab.
For files and directory the permissions are:
SYSTEM: Full control, Modify, Read & execute, Read, Write
boinc_admins: Modify, Read & execute, Read, Write, Special permissions
boinc_users: Read & execute, Read
boinc_projects: Modify, Read & execute, Read, Write, Special permissions
Administrators: Full control, Modify, Read & execute, Read, Write
I haven't checked the pps_llr_xxxx permissions. I don't know about any "soft links". |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Under Windows 8 you right click a file, choose Properties then click on the Security tab.
For files and directory the permissions are:
SYSTEM: Full control, Modify, Read & execute, Read, Write
boinc_admins: Modify, Read & execute, Read, Write, Special permissions
boinc_users: Read & execute, Read
boinc_projects: Modify, Read & execute, Read, Write, Special permissions
Administrators: Full control, Modify, Read & execute, Read, Write
I haven't checked the pps_llr_xxxx permissions. I don't know about any "soft links".
You checked that all of the individual slot directories have the same security settings, right? That's one thing that wouldn't get reset by detaching from PrimeGrid. That would require un-installing BOINC.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Jim just inspired an idea: Roger (or anyone else with the 3 second problem), could you try running a PPS-LLR task from the test server ("CompositeGrid" at lt-a.primegrid.com)? This is just to see if you get the same problem there, which I suspect you will.
If the same problem exists with tasks coming from the test server, I'll put together a more verbose wrapper and install it on the test server that will hopefully tell us what is going wrong.
____________
My lucky number is 75898524288+1 |
|
|
|
Just started creating a brand new batch of 3 seconders (5 or 6 before I could get to NNT) on WOO cleanup tasks after running 12 without a hitch. The remaining 5 tasks still in progress are humming along just fine. (Couldn't have anything to do with it being the 13th one done - Oh No! - just kidding)
Maybe it's heat related. Was a bit hot where I am today and haven't turned on the AC yet. Ambient room temp is 27 - 28C, Core Temp shows all cores in the range of 61-65C, which is warmer than I like but well below max for my i7-3820 and the mobo is at 32C.
Will look at it again when the temps drop back to 50ish. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Just started creating a brand new batch of 3 seconders (5 or 6 before I could get to NNT) on WOO cleanup tasks after running 12 without a hitch. The remaining 5 tasks still in progress are humming along just fine. (Couldn't have anything to do with it being the 13th one done - Oh No! - just kidding)
Maybe it's heat related. Was a bit hot where I am today and haven't turned on the AC yet. Ambient room temp is 27 - 28C, Core Temp shows all cores in the range of 61-65C, which is warmer than I like but well below max for my i7-3820 and the mobo is at 32C.
Will look at it again when the temps drop back to 50ish.
Two questions:
1) Do you now get the three second error on every new LLR task, regardless of which LLR project you try to run? In particular, do you get the error with PPS-LLR too?
2) If the answer to question 1 is yes, then could you try attaching to our test server (http://lt-a.primegrid.com/) and try running a PPS-LLR tasks from there? Hopefully you'll get the same error on tasks coming from that server.
(In both cases, if you do NOT get the 3-second error, you don't have to complete the computation. There's certainly no need to complete the task on the test server.)
____________
My lucky number is 75898524288+1 |
|
|
|
Yes to both questions |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Yes to both questions
Good, that's what I expected.
In a few days I'm going to install a modified llr app (actually the wrapper, not llr itself) on the test server that will print out more diagnostic information so we can figure out what's happening. I'll let you know when it's ready.
____________
My lucky number is 75898524288+1 |
|
|
|
I don't know if this helps but I recently started doing PG tasks after a long break. The first couple of days of LLR tasks went OK but then I started consistently getting the 3 second errors on all the LLR projects I tried. I switched over to sieving and these tasks were succeeding.
After reading you previous post I logged onto the test server and downloaded 4 LLR tasks which all ran for a couple of minutes without failing.
I then went back to the main server, aborted the 4 sieve tasks that had just commenced and downloaded 4 new PPS LLR tasks - just to check. However, they are all running without problem!
I'll keep running LLR tasks to see if they start failing again.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Yes to both questions
I've put a new version of PPS-LLR for 64 bit Windows on the test server (http://lt-a.primegrid.com/). Could you (or anyone who has this problem) give it a try again? It should fail, just like the previous version, but hopefully it will provide the information necessary to understand what the problem is.
When the task starts up, you may notice a few DOS command windows briefly opening and closing. That's intentional.
Thank you.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I have 3 second problem for LLR. PPS Sieve still works fine. I added BOINC directories to AVG exceptions, Remove project PrimeGrid in BOINC, restarted PC, then added PrimeGrid back. Problem still occurs. Once in a while a single LLR will complete successfully.
Roger, I see you ran some tests on the test server, and they (unfortunately) worked. Are you still getting the 3 second error on the real PrimeGrid server?
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I ran my work host on the test server to figure out how to do it. My work host never had the 3 second problem, only my home host. I haven't upgraded the work host at all recently and it runs PPS LLR just fine.
I'll give the home host a go on CompositeGrid shortly and report back. |
|
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1081 ID: 183129 Credit: 1,384,625,026 RAC: 7,097
                          
|
Roger, I see you ran some tests on the test server, and they (unfortunately) worked. Are you still getting the 3 second error on the real PrimeGrid server?
Why unfortunately? Wouldn't it be a good thing that the tests were working? Or is there a big fix you have to implement to fix it.
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Roger, I see you ran some tests on the test server, and they (unfortunately) worked. Are you still getting the 3 second error on the real PrimeGrid server?
Why unfortunately? Wouldn't it be a good thing that the tests were working? Or is there a big fix you have to implement to fix it.
If you read the whole thread, you'll see that the purpose of the test is to try to find the cause of the problem. If the problem doesn't happen in the test, it can't be fixed.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Well, its bad luck then.
I tried CompositeGrid PPS LLR on my home host and it successfully completed a round of tasks. Then I tried PrimeGrid straight away and I got the 3 second errors. I removed the PrimeGrid project and added again and got the 3 second errors. Maybe I have to do a long term CompositeGrid test? The problem only seems to occur after a few days. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Well, its bad luck then.
I tried CompositeGrid PPS LLR on my home host and it successfully completed a round of tasks. Then I tried PrimeGrid straight away and I got the 3 second errors. I removed the PrimeGrid project and added again and got the 3 second errors. Maybe I have to do a long term CompositeGrid test? The problem only seems to occur after a few days.
Or, the problem could be in the wrapper and the special wrapper I'm using for the diagnostics isn't the same as the wrapper in production.
I reset the test server to use the production wrapper again. Could you try running a pps-llr task from the test server once more? If it's the wrapper, then the problem should come back.
____________
My lucky number is 75898524288+1 |
|
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1081 ID: 183129 Credit: 1,384,625,026 RAC: 7,097
                          
|
I've seen a lot of 3 second errors on Sophie German... Almost all of my tasks show someone doing it before me with a 3 second error.
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Well, its bad luck then.
Actually, this could be exceptionally GOOD luck, if it turns out to be the wrapper. I would not mind if all it takes to fix the problem is installing a new version of the wrapper. I'm not certain if this wrapper is the exact same source as the production wrapper, but it's using different versions of the BOINC libraries and built with VS2012's run time libraries as compared to VS 2005's libraries, so there's plenty of reasons why a new build of the wrapper could conceivably fix the problem.
Here's to hoping that the problem returns when you try the test server again.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
For PPS-LLR Windows *ONLY*, I have installed a new version of the wrapper. This is version 6.21.
Please let me know if anyone has any difficulties running 6.21, if you see any 3 second errors from anyone running 6.21, or if you see a host that had been getting 3 second errors which is now returning good results with 6.21.
If this works, I'll install the new version for the other LLR projects.
Also, please let me know if anyone's aware of this problem occurring on Linux or Mac computers. As far as I know, the problem is limited to Windows.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Unfortunately, I've had to pull the new wrapper. While it seems to have fixed the 3 second error, for at least one computer it created a brand new error.
____________
My lucky number is 75898524288+1 |
|
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1081 ID: 183129 Credit: 1,384,625,026 RAC: 7,097
                          
|
Unfortunately, I've had to pull the new wrapper. While it seems to have fixed the 3 second error, for at least one computer it created a brand new error.
Whelp.. That didn't last long
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Unfortunately, I've had to pull the new wrapper. While it seems to have fixed the 3 second error, for at least one computer it created a brand new error.
It turns out the new problem seems to be limited to 32 bit hosts. The two computers I'm aware of that are affected by the new problem (one of which, fortunately, is mine) are both running 32 bit XP.
So, the new app is turned back on for 64 bit Windows, while 32 bit Windows get the old app. I don't yet know if the 3 second bug affects 32 bit hosts.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
All my PPS LLR v6.21 on CompositeGrid worked. Most v6.15 failed. I've stopped chrunching there now.
I moved over to PrimeGrid and started receiving PPS LLR v6.15 which are all getting the 3 second error. I have only 64 bit hosts. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
All my PPS LLR v6.21 on CompositeGrid worked. Most v6.15 failed. I've stopped chrunching there now.
I moved over to PrimeGrid and started receiving PPS LLR v6.15 which are all getting the 3 second error. I have only 64 bit hosts.
I stopped and started the server -- you should be getting 6.21 now.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Ok, I've got the problem figured out -- VS2012 by default doesn't produce executables that will run on XP. (It's not 32 bits that's the problem; it's XP, and also Windows Server 2k3 that are the problem.)
There's a setting to compile code that's compatible with XP, so I'll rebuild the wrapper, and hopefully that will work on XP and still cure the 3 second bug.
More later...
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
That didn't go exactly as planned, but I think everything should now be working for everyone.
Telling VS2012 to use build XP-compatible code was supposed to work -- except that with that setting, the boinc libraries didn't compile.
Next I tried compiling in VS2010 mode. That worked, AND the wrapper ran under both Windows 7 64 bit and XP 32 bit. So I installed those here as 6.22...
...and saw that a Windows 2000 host was now failing (that's a 32-bit OS). I pulled the 32-bit 6.22.
Right now, we're running 64-bit 6.22 and the old 32-bit 6.15 on PPS-LLR. It looks like this works for everyone. If this holds up, I'll install 6.22 for 64 bit Windows for all the other LLR projects tomorrow.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Roger, does this version (6.22) fix the 3 second problem on your computer?
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I am remote from my home PC now, but looking at my task list v6.21 worked and v6.22 has the 3 second error. We've found the problem, just have to find the right solution. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I am remote from my home PC now, but looking at my task list v6.21 worked and v6.22 has the 3 second error. We've found the problem, just have to find the right solution.
I saw that same behavior on another host. So it's not the solution -- but I can now build a version of the wrapper with the debugging code that still has the 3 second bug, so we can use that to hopefully find the problem. I'll put that together, install it on the test server, and let you know when it's ready.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
The test server is now running the debug version of the wrapper. Hopefully that will show something useful.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
PPS LLR v6.24 on CompositeGrid looks like no error for my home host. This one has the debugs on. Not sure if you want it to fail or not?
Do you have an estimate for how many hosts are affected by this problem?
Maybe being a Volunteer tester is my niche here at PrimeGrid? Certainly happy to help where I can. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
PPS LLR v6.24 on CompositeGrid looks like no error for my home host. This one has the debugs on. Not sure if you want it to fail or not?
Do you have an estimate for how many hosts are affected by this problem?
Maybe being a Volunteer tester is my niche here at PrimeGrid? Certainly happy to help where I can.
That is surprising. I've put the test server back to the same version that's currently running on the real server -- it's 6.23 on the test server. I want to see if that fails on your machine. The only difference between 6.23 and 6.24 is that 6.24 prints a lot of information to stderr.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
It looks like your computer is failing on 6.23 but succeeding on 6.24, which on the surface doesn't make a lot of sense.
This is going to need some more thought.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Roger, for now, I've shut down the test server until I think of another way to proceed. I've got a few ideas, but I need to think about them a bit. Thanks for your help!
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
PPS LLR v6.25 on CompositeGrid and PPS LLR v6.22 on PrimeGrid are giving me the 3 second errors. Note that I am running a 50G Factorial Sieve on the GPU in the background (shouldn't be affecting PPS LLR though). |
|
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 940 ID: 3110 Credit: 265,153,553 RAC: 110,745
                            
|
...and saw that a Windows 2000 host was now failing (that's a 32-bit OS). I pulled the 32-bit 6.22. I have a Windows 2000 VM I can use to test your wrappers if you want. But note that Windows 2000 is unsupported, and has been for a long time.
____________
|
|
|
Tyler Project administrator Volunteer tester Send message
Joined: 4 Dec 12 Posts: 1081 ID: 183129 Credit: 1,384,625,026 RAC: 7,097
                          
|
Ive started to get these errors on my machine: Windows 7 pro 64 bit, 24gb RAM, ONLY on pps_llr_xxxxxxxx, but not on pps_llr_extended_xxxxxxxx... This only started after I had reset my project from BOINC manager. Hope this info helps. I'm on the newest version of BOINC manager
EDIT: One of my WUs http://www.primegrid.com/workunit.php?wuid=336546705 has been completed 5 times, all with the 3 second error.
____________
275*2^3585539+1 is prime!!! (1079358 digits)
Proud member of Aggie the Pew
|
|
|
JimB Honorary cruncher Send message
Joined: 4 Aug 11 Posts: 920 ID: 107307 Credit: 989,553,981 RAC: 23,780
                     
|
Ive started to get these errors on my machine: Windows 7 pro 64 bit, 24gb RAM, ONLY on pps_llr_xxxxxxxx, but not on pps_llr_extended_xxxxxxxx... This only started after I had reset my project from BOINC manager. Hope this info helps. I'm on the newest version of BOINC manager
EDIT: One of my WUs http://www.primegrid.com/workunit.php?wuid=336546705 has been completed 5 times, all with the 3 second error.
That's not a 3-second error. That's a workunit that has a small factor (3) and should have been sieved out ages ago. The validator wasn't handling them properly - it understood small factors but didn't like how fast they were being processed. That problem has now been fixed and we're looking into the sieve file used to generate it.
By the way, when I say it's fixed, everything will work properly when the next host reports and causes the validator to look at that workunit again. I just manually triggered validation on the one you referenced and it validated.
The 3-second problem will always produce a Validate error.
[later edit]
Since those jobs take such a short time, they will now give 0.05 credits, which given that they take about 2-3 seconds is quite fair. Jobs already validated under the old rules will retain the full credit they were given. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
PPS LLR v6.25 on CompositeGrid and PPS LLR v6.22 on PrimeGrid are giving me the 3 second errors. Note that I am running a 50G Factorial Sieve on the GPU in the background (shouldn't be affecting PPS LLR though).
Roger (or anyone else with the 3 second problem), when you get a chance, could you try running some test PPS-LLR work units from the test server? It's back to 6.15, but I've changed the workunit definition. It's a shot in the dark, but if it works, then I think I know what the problem is.
Thanks.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I just tried the PPS LLR v6.15 on CompositeGrid and it seems to be working OK. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I just tried the PPS LLR v6.15 on CompositeGrid and it seems to be working OK.
That's interesting and encouraging. More later...
____________
My lucky number is 75898524288+1 |
|
|
|
Just tried 4 of the PPS-LLR 6.15 from the test server. All 3 second erred.
Since my last post and a round of 6 more WOO cleanup tasks finished, I've had another 3 second episode on various LLR tasks. In one episode on TRP-LLR 6.15, the first 2 tasks went 3 second and the fourth one ran fine to completion.
Once again a reinstall of the BOINC Manager seemed to solve the problem. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Paperboy, I'm not sure why your computer still got the 3 second problem on the test server, but I made the same change on the live server about 20 minutes ago on PPS-LLR. Since then, there's been ZERO 3 second errors on PPR-LLR but we're continuing to get them on other LLR projects.
I'll let it run a little longer, but I'm hopeful this is the answer. Of course, I've said that before. :)
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I think this problem is fixed for good. I changed the work generator about 45 minutes ago, which affects all LLR projects, and also modified all existing PPS-LLR workunits in case any of them got resent if an error was returned by a host.
Since I made that change, there's been no occurrences of this error, so I've gone ahead and also modified all existing LLR workunits on the other projects.
The workunits were specifying a memory limit that was right at the boundary of what's needed to run, so on some computers, some of the time, it was being exceeded.
Raising the memory limit solved the problem.
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
And, once again -- still not fixed.
____________
My lucky number is 75898524288+1 |
|
|
|
And, once again -- still not fixed.
Mike, how many times have you slap your forehead after you think it's fixed only to find out it's not. >:\
EDIT: By the way I've never had one of those 3 second errors.
____________
Largest Primes to Date:
As Double Checker: SR5 109208*5^1816285+1 Dgts-1,269,534
As Initial Finder: SR5 243944*5^1258576-1 Dgts-879,713
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
And, once again -- still not fixed.
Mike, how many times have you slap your forehead after you think it's fixed only to find out it's not. >:\
EDIT: By the way I've never had one of those 3 second errors.
Too many, certainly.
It's not affecting a lot of computers, which is both good, and bad. It's bad because it's very hard to diagnose. What's worse is the problem is so fickle; even minor changes make it go away.
____________
My lucky number is 75898524288+1 |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3247 ID: 50683 Credit: 152,646,050 RAC: 18,212
                         
|
This host is downloaded more then 100.000 WU and 99.99% is errors...
[url]
http://www.primegrid.com/show_host_detail.php?hostid=339862[/url]
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
This host is downloaded more then 100.000 WU and 99.99% is errors...
[url]
http://www.primegrid.com/show_host_detail.php?hostid=339862[/url]
And that tells us what that we don't already know? Yes, hosts that are experiencing this problem will generate many errors. That's of no interest to anyone other than the people that own those hosts.
The number of errors has a negligible effect on anyone else.
There are currently 12,811 active hosts. Of those, 51 have llr validation errors with less than 10 seconds of cpu time, which is probably a decent way of identifying the problem tasks. Lots of those errors are caused by other miscellaneous problems, however, so you want to look at those that have a multitude of these errors. This problem causes 100% errors on the affected hosts, so if there's only a few errors, it's probably something else.
Of those 51, 12 have more than 100 such errors. So there's a a total of 12 hosts, out of 12,000, which are experiencing this problem. 99.9% of hosts are fine.
99.9% success is great -- unless your computer is one of the 0.1% experiencing this problem. It's something that needs to be fixed. If I knew what the problem was, I'd fix it. I haven't given up trying, but it's very elusive.
Of those 12, 11 are 64 bit Windows 7 or Windows 8 computers. 1 is a Linux computer which might or might not be a different problem.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
My 6 core Windows 8 64 bit host was one of the afflicted 12 Werehosts, but I fixed it by:
- Uninstalling BOINC (through Control Panel, Programs and Features)
- Deleting the BOINC data directory (C:\ProgramData\BOINC)
- Reinstalling BOINC (v7.0.64 x64)
- a.k.a Silver Bullet
I've tested for a week with the SR5 challenge and PPSE WUs and no errors!
Previously most of my LLR tasks would finish in 3 seconds.
With one LLR WU left it would sometimes complete correctly. |
|
|
|
ALL tasks are incorrect in my Computer
Only all sieve are runing good-
i have to much returning incorrect without overheating
one Task = 2 second
PPS LLR are correct
but not the others Sierpinski/Riesel Base 5 Problem (LLR)
I´ve stop it now !!!
All Tasks are incorrect, ALL
what is wrong ?
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
what is wrong ?
The "3-second bug."
We're not sure what causes this. Suggested action: Uninstall BOINC and delete the BOINC directories, especially the data directory in C:\ProgramData. Then re-install BOINC. Hopefully this will fix the problem (it did for one person.)
This affects a very small number of computers, but we've been unable to identify the cause.
____________
My lucky number is 75898524288+1 |
|
|
|
The "3-second bug."
Is it my imagination or are all these 3 second bugs on Windows 8 machines? Just all the ones I've checked randomly seem to be Windows 8 machines and wonder if there is a protection/permission problem that may not have been ironed out.
____________
My lucky numbers are 121*2^4553899-1 and 3756801695685*2^666669±1
My movie https://vimeo.com/manage/videos/502242 |
|
|
|
Mike
I`ve Uninstall BOINC and Reinstall is runing now good.
All problems gone
thanks
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Mike
I`ve Uninstall BOINC and Reinstall is runing now good.
All problems gone
thanks
That's great news. Perhaps now we have a method for fixing this problem!
____________
My lucky number is 75898524288+1 |
|
|
|
Not so sure the "Uninstall / Reinstall" is a fix. I went that route and it didn't work for me. Matter of fact, it made it worse. I couldn't run a single LLR at all and my previous method of Reinstall as a Repair was useless. I was, however, able to return to some normalcy with going to BOINC 7.1.17. Still had to monitor it but when the "3 second bug" arose I could once again fix it temporarily with my previous "Reinstall / Repair" method.
Pooh Bear, I think you're on to something with the Windows 8 theory. Add that to the start of the problem during the "Low Country - Woodall" Challenge and maybe it can help to better pinpoint what the cause may be. |
|
|
|
You might find, as I did, that all is OK until you reboot the machine. After that any new workunits will fail after 3 seconds. That's what consistently happens on my machine.
It addition, I found you don't need to actually delete the BOINC data directory. After a reboot I set "no new work", allow any existing tasks to complete, uninstall and reinstall BOINC from the Control Panel and then allow new work.
64 bit Windows 8 here as well...
FWIW, I've started a couple of SoB tasks to see what happens with them after a reboot.
Ian |
|
|
|
As I am not seriously running BOINC any longer, and do see the 3 second problem, I thought I'd try a little test.
Yesterday I installed BOINC on my 4 core 64 bit Windows 8 machine. I set my CPU processor usage preference to 25% and allowed it to download a single SoB task. It started up and worked as expected.
Some hours later I set my CPU processor pref to 50% and allowed another SoB task to download. This started up and worked as expected.
Last night both tasks were progressing (~ 7% and 2%) so I shut down BOINC and then turned off the computer for the night.
This morning I restarted the system and both tasks continued from where they had finished the previous night.
I then set my CPU processor preferences to 75% and allowed another task to download. This failed after 3 seconds. One more task downloaded and also failed after 3 seconds.
I then "no new worked" and set the CPU processor usage back to 50%.
Going from my previous experiences, to get back to processing new tasks I will need to allow the current tasks to complete (or abort them of course) and then uninstall/reinstall BOINC (no folder deletions necessary).
Ian |
|
|
|
OK : today problems again
This is an Windows 8 problem --> perhaps in constellation with BoinC Manager ??? -->>>and PG Server ??? - dont know-
think so.
All LLR Tasks are 3,0 Seconds and abort
In other BoinC projects its all Ok-
Mike
U can made an special windows 8 new testversion for down - uploads ? in PG Server ?
____________
|
|
|
|
Mike,
If you restart your Computer, then all tasks `re wrong.
( only GPU is runing )
BoinC Manager is right, all tasks are runing in other BoinC Projekts
can give this Download Settings again on my Computer ? I had delete this. 1 Day ago , I had not this Problem with Windows 8.
here : http://www.primegrid.com/forum_thread.php?id=3982&nowrap=true#65322
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
U can made an special windows 8 new testversion for down - uploads ? in PG Server ?
can give this Download Settings again on my Computer ? I had delete this. 1 Day ago , I had not this Problem with Windows 8
I'm afraid I don't understand either question. What special Windows 8 test version? What download settings?
____________
My lucky number is 75898524288+1 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I'm not certain it's exclusively a Windows 8 problem, but if it isn't, it's definitely a lot more prevalent on Windows 8. This first chart is LLR tasks that had an error with less than 3 seconds of CPU time. Note the Win 8 is about 90% of the total:
+-------+------------------------------------+
| cnt | os_name |
+-------+------------------------------------+
| 75435 | Microsoft Windows 8 |
| 5607 | Microsoft Windows 7 |
| 1519 | Linux |
| 140 | Microsoft Windows XP |
| 17 | Microsoft Windows Vista |
| 5 | Darwin |
| 4 | Microsoft Windows 8 Server |
| 3 | Microsoft Windows Server 2008 "R2" |
| 1 | Microsoft Windows Server 2008 |
| 1 | Microsoft Windows Server 2003 "R2" |
+-------+------------------------------------+
The second chart is LLR tasks that were successfully validated. Note that Win 8 is closer to 10%:
+--------+-------------------------------------+
| cnt | os_name |
+--------+-------------------------------------+
| 405298 | Microsoft Windows 7 |
| 192361 | Microsoft Windows Server 2003 "R2" |
| 123889 | Linux |
| 42112 | Microsoft Windows Server 2008 "R2" |
| 42107 | Microsoft Windows 8 Server |
| 32734 | Microsoft Windows 8 |
| 29324 | Microsoft Windows XP |
| 26232 | NULL |
| 19121 | Darwin |
| 11076 | Microsoft Windows Vista |
| 5354 | Microsoft Windows Server 2008 |
| 4636 | Microsoft Windows Server 2003 |
| 4390 | Microsoft |
| 3801 | Microsoft Windows Server 2012 |
| 2002 | Microsoft Windows Server "Longhorn" |
| 643 | |
| 549 | Microsoft Windows 2000 |
| 182 | FreeBSD |
| 1 | Microsoft Windows NT |
+--------+-------------------------------------+
____________
My lucky number is 75898524288+1 |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,022,333,648 RAC: 20,456,197
                                                
|
I am curious if anyone with Win8 has tried turning off the User Account Control (not advised for security reasons) to see if that would solve the 3 second issue. That would certainly identify things as a permissions issue (and as I recall was often needed when Win7 first was introduced).
|
|
|
|
Disabled UAC, rebooted (just to be safe) and enabled downloading new SoB tasks. Two tasks failed, the only difference was that the first failed in 4 seconds rather than the usual 3.
FWIW, if it was UAC/permissions or whatever then I would expect it to fail on all tasks, not just those downloaded after a system restart. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Disabled UAC, rebooted (just to be safe) and enabled downloading new SoB tasks. Two tasks failed, the only difference was that the first failed in 4 seconds rather than the usual 3.
FWIW, if it was UAC/permissions or whatever then I would expect it to fail on all tasks, not just those downloaded after a system restart.
My guess it's got something to do with some background process, such as virus scanning, indexing, backups, etc. Virus scanners are unlikely since one of the first things we checked was which scanner afflicted computers were using, and there wasn't any commonality amongst the anti virus products.
The LLR apps -- and, more importantly, the wrapper used by them to launch LLR -- date back many years and the wrapper manually moves files from the project directory into the slot directory. This is an antiquated way of doing things, and there may be some reason why that's not playing nicely with Windows 8. More modern apps would simply configure the app description to have BOINC automatically copy (or symlink) the files into the slot directory.
It's not at all clear, however, what exactly the problem might be.
____________
My lucky number is 75898524288+1 |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,022,333,648 RAC: 20,456,197
                                                
|
I wonder if the old wrapper is not playing nicely with Windows 8's new AppContainer security level. This site and this site may be informative for thinking about this issue.
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I wonder if the old wrapper is not playing nicely with Windows 8's new AppContainer security level. This site and this site may be informative for thinking about this issue.
Hmmmm.
So while it should be possible to run standard desktop applications in Windows 8 using the new sandbox, unless specifically designed to work with AppContainer, only the most basic of programs are likely to run without severely compromised functionality.
When installing BOINC, isn't there an option about installing BOINC in a sandboxed, protected mode? I've never bothered with that, so I'm not sure what it's doing, exactly, but if it's doing what that article is describing, it would be a problem.
On the other hand, I seem to recall that using that protected BOINC installation also precluded using GPUs, so for this to be the cause, either that restriction has been overcome, or nobody with the 3 second bug is also running a GPU.
That being said, this appears to apply only (mostly?) to Metro apps, so it's not at all clear if or why this should affect pre-Metro programs. If it did, absolutely nothing even slightly more complex than "Hello world!" would work on Windows 8. It's also perplexing why this would be causing problems only some of the time.
____________
My lucky number is 75898524288+1 |
|
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2417 ID: 1178 Credit: 20,022,333,648 RAC: 20,456,197
                                                
|
That being said, this appears to apply only (mostly?) to Metro apps, so it's not at all clear if or why this should affect pre-Metro programs. If it did, absolutely nothing even slightly more complex than "Hello world!" would work on Windows 8. It's also perplexing why this would be causing problems only some of the time.
I was thinking the same way at first, but then I thought a bit more about what you said a couple of messages above about the wrapper moving files. The wrapper is certainly a pre-Metro program, but if it is using some piece of windows explorer (or other windows component) in the moving process, could that be treated more like modern application? |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
That being said, this appears to apply only (mostly?) to Metro apps, so it's not at all clear if or why this should affect pre-Metro programs. If it did, absolutely nothing even slightly more complex than "Hello world!" would work on Windows 8. It's also perplexing why this would be causing problems only some of the time.
I was thinking the same way at first, but then I thought a bit more about what you said a couple of messages above about the wrapper moving files. The wrapper is certainly a pre-Metro program, but if it is using some piece of windows explorer (or other windows component) in the moving process, could that be treated more like modern application?
It's not. It uses standard C RTL calls, and if those didn't work, Windows 8 wouldn't have ever made it out of the lab.
____________
My lucky number is 75898524288+1 |
|
|
|
Father's Day Challenge, I´ve done with windows 8 and now ???
from december to today, all OK, but now its wrong
____________
|
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I am sorry to report the same, restart PC and the 3 second error comes back for LLR. Fixed by uninstalling BOINC, deleting the BOINC Data directory and reinstalling BOINC, but is a hassle to do every time you restart your PC.
We have work around sure, but is so repeatable that there must be a fix. Yeah, I am Windows 8. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
That's good news, of a sort. This probably means that the 'success' people were experiencing on the test server wouldn't have survived a reboot, and that in turn means we might be able to replicate the problem more easily on the test server.
Unfortunately, I'm going to be too busy to work on this for a little while. Complicating the problem is that I don't have a Windows 8 machine, so I can't test this myself.
____________
My lucky number is 75898524288+1 |
|
|
|
1. De- Install Boinc, - delete all data from your PC
2. Restart your PC
3. Install Boinc
4 In Install exe " as Admin"
5. Install
6. Crunch
7. ready
some llr tasks runing, some not !
Windows 8 ?
better, you crunch only with GPU
____________
|
|
|
|
I know that the 3 second error has happened on other OS, but quite a few on Windows 8. I now found another project that has problems with Windows 8. There must be a service in Windows 8 that might be causing these issues. I am unsure if a new wrapper is needed or if BOINC itself needs to adjust properties on the folders or something similar.
The issue I saw on another project is this type of unit runs about 12 hours, no checkpoint or percentage graph is shows during processing. You can lose many hours if you stop and restart BOINC because of the no checkpoint. The issue is the unit runs nearly 2 hours then get stuck. It has happened more than once. I have not tried too many times because it is a waste of processing power, but there seems to something different about Windows 8 that probably needs to be addressed somewhere. This is not the first OS change that has caused issues. Remember Vista?
____________
My lucky numbers are 121*2^4553899-1 and 3756801695685*2^666669±1
My movie https://vimeo.com/manage/videos/502242 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Extra step, installing BOINC as Admin, seems to have worked for me too. Awesome.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Extra step, installing BOINC as Admin, seems to have worked for me too. Awesome.
Do you mean you installed BOINC while logged into Windows as Administrator, or are you referring to one of the BONC installation options?
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I right clicked the BOINC installer and choose "Run as administrator". No LLR 3 second errors even after computer restart. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I right clicked the BOINC installer and choose "Run as administrator". No LLR 3 second errors even after computer restart.
Anybody able to verify that this works for them? Neither Jim nor I have any machines running Windows 8.
____________
My lucky number is 75898524288+1 |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Well, it started off working and then I left it for a few hours, and now it's gone back to having 3 second LLR errors.
@T.Armstrong: Did you install logged in as Admin or do as I did and install BOINC with the "Run as administrator" option? Is yours still working? |
|
|
|
Roger,
" Run as Administrator " ( not " as Admin " sorry ) Server DNS boot your PC as " Your" PC...and ? No Problems. so - your allowd, hehe.
... and run
" faszinated "
my UOTD " 111 " ? its a prime !
____________
|
|
|
|
No. thats the same Problem
3 second tasks
____________
|
|
|
|
Please note :
for Perseid Shower Challenge Project
you can crunch with CPU
without `3 second ` errors
this is sieve , ( The Riesel Problem (Sieve) ) ) wih only CPU Tasks
llr is not runing, but sieve with CPUs
____________
|
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Server DNS boot your PC as " Your" PC ...
@T.Armstrong: Can you explain this step? Not sure what your meaning.
Thanks |
|
|
|
Roger,,
This step with the '3 second' errors Problem " - is nothing !
" as Admin" or " as Administrator " or not, Its the same.
sorry
...we have an :'3 second' Errors Problem.....hm
____________
|
|
|
GDBSend message
Joined: 15 Nov 11 Posts: 304 ID: 119185 Credit: 4,281,187,359 RAC: 1,751,930
                      
|
I've been running with Win 8 on a new machine ver. 7.0.64 (x64) without 3 sec. errors on LLR and sieve tasks. Why have so many other people had problems? |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
See Message 66290 above. "So there's a total of 12 hosts, out of 12,000, which are experiencing this problem. 99.9% of hosts are fine."
I ran Win 8 for months before I encountered the 3 second problem. Now I can't get rid of it. I can temporarily fix it, until the next PC reboot, then it occurs again. |
|
|
GDBSend message
Joined: 15 Nov 11 Posts: 304 ID: 119185 Credit: 4,281,187,359 RAC: 1,751,930
                      
|
What do you do to "temporarily fix it"? |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
To temporarily fix:
- Uninstall BOINC (through Control Panel, Programs and Features)
- Delete the BOINC data directory (C:\ProgramData\BOINC)
- Reinstall BOINC (v7.0.64 x64, right click the BOINC installer and choose "Run as administrator")
Works until the next PC reboot. |
|
|
GDBSend message
Joined: 15 Nov 11 Posts: 304 ID: 119185 Credit: 4,281,187,359 RAC: 1,751,930
                      
|
Ouch! That's a huge hassle to get PrimeGrid to work again. You said you ran for quite a while before you started getting 3-sec errors? And you have no idea what may have triggered it to happen? Nothing was installed, or updated, or setting changed? |
|
|
|
End of this Problems :
My LLR`s runing fine.
Step one
de-install the BOINC Manager
Step Two
Install the BOINC Manager
Not close the BOINC manager.
 If it is closed, you have the "Downloads" and click of the EXE to re-install again, select "Repair", the Manager window opens again.
Do not close the manager, and not shut down the computer, all LLR run normally. and fine
____________
|
|
|
|
End of this Problems :
I have 124 LLR`s ready
My LLR`s runing fine.
Step one
de-install the BOINC Manager
Step Two
Re-Install the BOINC Manager
Do never close the BOINC manager.
 If it is closed, you have the "Downloads" and click of the EXE to re-install again, select "Repair", the ( BoinC Manager Window ) will opens again.
Do not close the manager, and not shut down the computer, also all LLR run normally. and fine
on my profile you can see it. or here : http://www.primegrid.com/results.php?hostid=374938
you see: all LLR`s `re runing good.
Please note :
Open the Boinc Manger, only over " Repair" if you have closed him - or you have the 3 second problem again !!
@ Mike,
Please test it and take a look, if thats right !
my 321 LLR`s I have 21 % ready and is runing fine
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
@ Mike,
Please test it and take a look, if thats right !
my 321 LLR`s I have 21 % ready and is runing fine
That's certainly good news!
I can't test your fix because none of us have ever been able to recreate this problem on our computers.
____________
My lucky number is 75898524288+1 |
|
|
|
Mike,
This is going well
____________
|
|
|
|
Instructions to start Windows 8 without 3 second erros
Start your PC
under Downloads vote BoinC Exe
vote ever " Repair" to open the BoinC Manager Window
when run, never closed te Boinc Manager...
so, you have never this 3 second error.
look, LLRs are runing good : http://www.primegrid.com/results.php?hostid=374938
____________
|
|
|
|
It works for me for weeks, my Test on 2 PC`s
So always use the BOINC Manager from the Exe.
And always choose "Repair".
There can be no matter how small +no errors on this way.
It really works.
note: don`t choose BoinC Client in Windows, choose BoinC Manager with the BoinC EXE, and then choose " Rapair" the BoinC Window will automatically open then. And now, you can work fine with your BoinC Manager, without this 3 second Errors.
The Exe is under " Downloads " Only there, you can open the Boinc Manager with "Repair "
And its reality, a crazy Way, but its good.
If you are not a professor, you can do that too. It is easy - only with 5 mouse clicks
Armstrong
____________
|
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
The Exe is under " Downloads " Only there, you can open the Boinc Manager with "Repair "
Armstrong
The installer program is downloaded and saved in:
C:\Users\Roger\Downloads\boinc_7.0.64_windows_x86_64.exe
@Armstrong: Are you saying to start BOINC from the installer and choose repair rather than choose BOINC Manager through the start menu or icon on the taskbar?
Like this:
I will give it a try.
____________
|
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
I will give it a try.
I used the BOINC installer to "repair" 5 times with SGS and PPSE LLR WUs after computer restarts and it worked each time. Then I tried it through the Taskbar icon after a computer restart and after a short while it failed, proving the "repair" method didn't give a false positive. Even after breaking a "repair" is all that's needed. Forget the old uninstall, delete the data directory and reinstall routine, it is unnecessary.
With the "repair" method you don't have to reinstall and reconfigure BOINC and lose long LLR WUs. This is a good workaround, but BOINC still needs to be fixed in the long term.
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I will give it a try.
I used the BOINC installer to "repair" 5 times with SGS and PPSE LLR WUs after computer restarts and it worked each time. Then I tried it through the Taskbar icon after a computer restart and after a short while it failed, proving the "repair" method didn't give a false positive.
With the "repair" method you don't have to reinstall and reconfigure BOINC and lose long LLR WUs. This is a good workaround, but BOINC still needs to be fixed in the long term.
It would be really useful to know what the "repair" is "fixing".
This is only happening to a small number of computers, so I suspect there's something that's modifying the BOINC installation somehow.
____________
My lucky number is 75898524288+1 |
|
|
|
Roger,
I show you this perfect LLRshttp://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=0&appid=19
one SOB http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=3&appid=13
I cant do a Test with SOB, I have 800 other tasks in Cache
If I tried it through the Taskbar icon after a computer restart, then comes all Errors. So, i will never vote the taskbar.
Yes, with your picture with the "repair " option, that the right way. Do it !!! If you`re ready, then open the BoinC Manager, and you can take all LLR`s in BOX.
If you shut down your PC - and start it later, then you find the BoinC EXE, -> click again over Repair, and the tasks goes on !
The right way, to crunch lucky, over the repair option.
absolutly no problems or Errors. Roger, after Restart your PC, you must choose the BoinC Exe, --> then Repair --> then open automatically the Boinc Manager. But dont close the Window of the Boinc Manager, take the Minimum option, but not closed. Take the Minimum option for the >BoinC Window, not the " X " to closed.
If you have questions, ask me over Private Mail
Thats the craziest Way, to kick the Errors, but is good.
@Mike,
In Repair is the magic message, but what ?
The secret magic fix is =?????? in " repair " ???
Who is the master, to find it ??? IT Man ?
Armstrong
No 3 second Errors
-----------------------------------------------------------------------
When comparing between installer and repair
then you find the missing file, which must be added to the installer. the right file is not in the installer.
____________
|
|
|
|
Roger,
I test SGS for you, moment please.
Dont open with an Symbol, open it with the Exe and Repair option.
If the SGS tasks ready, I give you the Link here, and can you see: NO ERRORs
UPDATE
Some SGS are runing now. ( 10 % ready ) In this time for test, i shut down my PC 10 times with restart. ( BoinC EXE -> Repair ) If I`m ready,I give the Link with ready tasks here. Please wait 30 mins
Link : http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=3&appid=2
OK : Ten times shut down and restart
some in pending, but you see : thats right.
please, feel free to do that
( thats american way of life :) ) a crazy way, but good to kick in this 3 second` erros
If you want it, I test PPSE LLR for you. please write.
____________
|
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
But dont close the Window of the Boinc Manager, take the Minimum option, but not closed. Take the Minimum option for the >BoinC Window, not the " X " to closed.
I do the "X" to close the Boinc Manager, then open again with the icon in the Taskbar and it works no problem for me. I only start getting 3 second errors when I restart the PC. |
|
|
|
OK, Roger
My question : Come this error first ?
Please note :
Step one : restart your PC
when ready, dont wait, click on BoinC Exe
and open the Manager - without - an ICON -
with " Repair " option
Step two
Please give the result, write, what happend
I have restarted 20 times now, I`ve 120 SGS LLRs Test-tasks ready without errors.
My SGS LLR errors, you see nothing : http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=5&appid=2
and here are the perfect tasks : http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=3&appid=2
But now, I must do my 769 TRP sv, I have 3 day to deadline, after this 3 days I can test PPSE and SOB
____________
|
|
|
|
Roger
I`ve seen, you have 5 errors, ähm-hehe
http://www.primegrid.com/results.php?hostid=405102&offset=0&show_names=0&state=5&appid=2
this is nothing for the 3 second problem, other members have 500 errors and more
____________
|
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Roger
I`ve seen, you have 5 errors, ähm-hehe
http://www.primegrid.com/results.php?hostid=405102&offset=0&show_names=0&state=5&appid=2
this is nothing for the 3 second problem, other members have 500 errors and more
These are manual Aborts, not related to 3 second error testing. To cause the error you sometimes have to load a fresh WU and I don't want to spend an hour waiting. Manually aborting a WU is legitimate, especially when deliberately testing.
3 second error WUs appear in the Invalid task list, not the Error task list:
http://www.primegrid.com/results.php?userid=120786&offset=0&show_names=0&state=4&appid=
I have 45 of those currently in the database. When testing this 3 second error issue I use short WUs and don't leave it going for long. We have to cause this issue in order to learn about it. Well done for coming up with this "repair" procedure! |
|
|
|
Roger, you're right
Perhaps some members test the " repair " procedure, then we will know more. My SGS are all right with "repair " procedure.
http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=3&appid=2
Now 50 times up and down with my system, restart. No Error with " repair " procedure.
On my Error side is nothing, here
http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=5&appid=2
Armstrong
____________
|
|
|
|
Roger,
If my Cache is ready, in 2 days, then I will test to reload new tasks with the one hour problem, the icon and the restart.
____________
|
|
|
|
Successful problem solving
How to do the right Settings ?
I have tested. with 321 LLR
In German, I´m in Germany with my american Company-
Compatibliätsproblem
BOINC EXE is not Compatible with Windows 8
Is a twin work
Step by step:
Select "BOINC exe"
Right click, select
 "Compatiblity solve problem"
click and select
"Compatible with Windows 7"
click "next" and so on
and ready,
Now install normal Boinc
CLCK on BOINC icon in the task manager
right mouse buton
Please do not open, right mousebutton
"Properties"
make exactly the same
Compatible with Windows 7
and then shut down the computer
not restart, shut-down
next step:
Start Windows 8 computer
click on BOINC icon and this will runing fine. Have Fun
Now, I`m so glad, 10 times I`ve shut down and start the PC
and 10 times I`ve restart
This Problem is gone
Hua !!!
Armstrong
Picture Guide
Step one: Click not on the first - click : Problems
Step 2, select " next "
[
Step 3, select " next "
Step 4, select " Program test " and then " next"
Step 5, select the first one
[
find the problem and close
Now, not restart,
select BoinC Icon in Taskmanager under " Programs "
and do the same steps, i dont know why, but do it
____________
|
|
|
|
Ouuuuhhh, what the devil,.....
all LLR runing, but the TRP sv are with 4 second`errors
what a stupid Game is this ? to kick the sieve ???
4 second problems here http://www.primegrid.com/results.php?hostid=374938&offset=0&show_names=0&state=5&appid=14
LLR`s runing fine, TRP sieve are kicking with 4 second errors, WR is runing. All GPU`s runing, but not sieve with CPU.
If come 5 second errors, i will do a nucleare brain test, with my self..huuuuuuu
hey, thats no good, but with the " repair " procedure, TRP sv is runing good, all stupid.
Only TRP sv with " repair " Procedure: is runing good.
____________
|
|
|
|
FWIW ...
Since returning (occasionally) to BOINC and PG I've consistently had the problem with the 3 seconds errors - on Windows 8 64-bit and now 8.1 64-bit.
After reading the last few messages I had a play and have, so far, been able to avoid the issue. What I did was -
Go to the Program Files folder for BOINC
For both BOINC.EXE and BOINCMGR.EXE right click and select properties.
Select the "Compatibility mode" and choose Windows 7
Select the "Run as Administrator" option
NB Both of the above options need to be changed - just one of them on its own doesn't seem to work (I'm not sure if both BOINC and BOINCMGR need to be changed though?).
NB Choosing the Administrator option is a bit of a pain because of UAC - it means that BOINC won't automatically restart after a reboot/power down. You have to manually restart it and acknowledge the UAC prompt.
Works (so far) for me anyway! It's still a bit of a bother, but easier than the uninstall/reinstall procedure
Ian |
|
|
|
It appears upgrading to BOINC 7.2.28 corrects the 3 second issue. I did the upgrade just before the challenge. Did everything that seemed to trigger 3 second errors before and they didn't happen. No further need for the compatibility fix being used.
Nice to see my rig listed as Windows 8 again
|
|
|
|
Sorry about making a duplicate thread. I thought about posting in this forum first, but saw the description said "for new users" and for some reason assumed it wouldn't apply to me. Oops. Well I mean I'm not new around here by any means but whatever.
So I get the impression this was an issue with Windows 8 / 8.1 and the previous version of BOINC? My issues seem to have been resolved since upgrading to the stable 7.2.28 release. Not sure how we went DOWN a minor version (prior to this I was running 7.2.5) but no complaints seeing as it's working.
I'm glad it wasn't a problem related to antivirus software, seeing as I don't have and never have had any such software beyond what's built into the OS...and glad it's not because of 8.1, though admittedly upgrading to 8.1 was a mistake and the list of known issues remains tremendous, but that's me bringing work home with me again so I'll shush right there.
I'll keep watching for new developments though. :)
____________
|
|
|
|
I have experienced the same issue on windows 8.1 x64 using BOINC 7.2.28 64-bit client. The errors were all on my gpu and PPS sieve (atiPPSsieve). I had non-stop errors for a large amount of WU's. All coming back as validate error. Non-overclocked xfx 7850 oc edition. CPU is a AMD a6-5400k. I stopped doing the PPS sieve apps onn GPU and directed them to my servers instead. Now I'm having a issue with only one task running on GPU and then the GPU shutting down in the next WU. I started a separate thread for that issue: http://www.primegrid.com/forum_thread.php?id=5376 |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
I have experienced the same issue on windows 8.1 x64 using BOINC 7.2.28 64-bit client. The errors were all on my gpu and PPS sieve (atiPPSsieve). I had non-stop errors for a large amount of WU's. All coming back as validate error. Non-overclocked xfx 7850 oc edition. CPU is a AMD a6-5400k. I stopped doing the PPS sieve apps onn GPU and directed them to my servers instead. Now I'm having a issue with only one task running on GPU and then the GPU shutting down in the next WU. I started a separate thread for that issue: http://www.primegrid.com/forum_thread.php?id=5376
The "3-second" error is an LLR specific error. What you're experiencing with PPS sieve on your GPU is an unrelated problem. It's best to continue the discussion about PPS Sieve problems in the other thread you started. Thank you! (Link: http://www.primegrid.com/forum_thread.php?id=5376)
____________
My lucky number is 75898524288+1 |
|
|
|
I'm back to having errors on all LLR tasks. Timing out after 1-4 seconds, random ones will run through as normal but burning through many at a time before getting one which will. Currently have Genefer short running on GPU (7850) and three cpu threads running PPS and PPSE LLR tasks ok. The fourth thread is continuously uploading after a few seconds. Running BOINC 7.2.33 on Windows 8.1 with an AMD A10-6800k. Just started this today.
My computer id is 416092 and the results are listed under invalid |
|
|
|
Running boinc.exe as admin and then opening boinc manager as admin makes it work this time. I've managed a few PPS and PPSE tasks now |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               
|
Running boinc.exe as admin and then opening boinc manager as admin makes it work this time. I've managed a few PPS and PPSE tasks now
Sounds like a directory permision problem.
____________
My lucky number is 75898524288+1 |
|
|
|
I'll look into that as soon as I can, my power supply took a dump and I'm waiting on an RMA from Visiontek. |
|
|
|
Hey guys-- I was also having the "3-second-error" problem. I closed and re-opened BOINC manager, running as Administrator. This fixed the problem. Thanks for that suggestion! |
|
|
RogerVolunteer developer Volunteer tester
 Send message
Joined: 27 Nov 11 Posts: 1138 ID: 120786 Credit: 268,668,824 RAC: 0
                    
|
Running BOINC 7.2.33. Problem still occurs. |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,161,398 RAC: 289,514
                               |