Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Problems and Help :
Problem with Project Communication
| Author |
Message |
|
|
|
I seem to be having a problem connecting to PrimeGrid. Whenever I try to do an update, or BOINC tries to do an update on its own, it starts the communication, waits for 5 minutes, then says the connection's timed out. Enabling the HTTP debug and the HTTP transfer debug seems to indicate that the PrimeGrid server's not returning anything at all. I've checked the server status page, and it indicates that all the servers are up, so I'm not sure what's going on here.
A request to the notices.php page does go through, so it looks like the rest of the communication is fine. Obviously, I can reach the web servers to post this message.
We do have a transparent caching proxy on our connection, but it doesn't appear caching any response from the servers. As well, this computer has been running fine behind this connection for a couple weeks without issue, so I don't believe it is the problem.
03/11/2011 1:00:44 AM | PrimeGrid | update requested by user
03/11/2011 1:00:45 AM | PrimeGrid | Sending scheduler request: Requested by user.
03/11/2011 1:00:45 AM | PrimeGrid | Reporting 13 completed tasks, requesting new tasks for CPU and ATI GPU
03/11/2011 1:00:45 AM | | [http] HTTP_OP::init_post(): http://www.primegrid.com/cgi/cgi
03/11/2011 1:00:45 AM | | [http] HTTP_OP::libcurl_exec(): ca-bundle set
03/11/2011 1:00:45 AM | | [http] [ID#1] Info: About to connect() to www.primegrid.com port 80 (#1)
03/11/2011 1:00:45 AM | | [http] [ID#1] Info: Trying 217.67.244.150...
03/11/2011 1:00:45 AM | | [http] [ID#1] Info: Connected to www.primegrid.com (217.67.244.150) port 80 (#1)
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: POST /cgi/cgi HTTP/1.1
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 6.12.34)
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: Host: www.primegrid.com
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: Accept: */*
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: Content-Length: 25377
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server: Expect: 100-continue
03/11/2011 1:00:45 AM | | [http] [ID#1] Sent header to server:
03/11/2011 1:00:46 AM | | [http] [ID#1] Info: Done waiting for 100-continue
03/11/2011 1:05:52 AM | | [http] [ID#1] Info: Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds
03/11/2011 1:05:52 AM | | [http] [ID#1] Info: Closing connection #1
03/11/2011 1:05:52 AM | | [http] HTTP error: Timeout was reached
03/11/2011 1:05:53 AM | PrimeGrid | Scheduler request failed: Timeout was reached
____________
| |
|
|
|
|
Are you in the Pacific time zone? If so I think 1AM is right about the time when portions of, or all of, the PG site are down for maintenance. I have no detailed information on their downtime schedule but I often have difficulty with accessing PG around 1AM PT. I think I saw a posting elsewhere alluding to this as well. With me it's usually the message boards, since I keep boinc fed with a decent size backlog so it doesn't notice..
Or is this a persistent problem? I just happened to notice the "1 AM" timestamps in the transcript you posted.
--Gary
____________
"I am he as you are he as you are me and we are all together"
87*2^3496188+1 is prime! (1052460 digits)
4 is not prime! (1 digit) | |
|
DoES Volunteer tester
 Send message
Joined: 11 Oct 08 Posts: 784 ID: 30382 Credit: 75,064,140 RAC: 0
             
|
Are you in the Pacific time zone? If so I think 1AM is right about the time when portions of, or all of, the PG site are down for maintenance. I have no detailed information on their downtime schedule but I often have difficulty with accessing PG around 1AM PT. I think I saw a posting elsewhere alluding to this as well. With me it's usually the message boards, since I keep boinc fed with a decent size backlog so it doesn't notice..
Or is this a persistent problem? I just happened to notice the "1 AM" timestamps in the transcript you posted.
--Gary
Happens at 6.00pm for me in Australia -- UTC +10
____________
Member of AtP
Shown here is an Australian native rat (Ratus Kickarsus) | |
|
mikey Send message
Joined: 17 Mar 09 Posts: 2478 ID: 37043 Credit: 1,131,895,633 RAC: 200,676
                            
|
I seem to be having a problem connecting to PrimeGrid. Whenever I try to do an update, or BOINC tries to do an update on its own, it starts the communication, waits for 5 minutes, then says the connection's timed out. Enabling the HTTP debug and the HTTP transfer debug seems to indicate that the PrimeGrid server's not returning anything at all. I've checked the server status page, and it indicates that all the servers are up, so I'm not sure what's going on here.
The Server Status page on all Projects is NOT perfectly in sync with our viewing of it, it is a webpage that is cached at intervals. So if the interval is 60 minutes you could be looking at a 59 minute old page. I believe each Project can adjust that refresh rate and I do not know what PG's rate is. I am just saying the Server Status page is not an accurate measuring stick of what the Project is doing minute by minute. Projects have crashed and since the page is cached it looked like all was fine when in fact it was not. Don't know about your particular point in time though. | |
|
|
|
Are you in the Pacific time zone? If so I think 1AM is right about the time when portions of, or all of, the PG site are down for maintenance. I have no detailed information on their downtime schedule but I often have difficulty with accessing PG around 1AM PT. I think I saw a posting elsewhere alluding to this as well. With me it's usually the message boards, since I keep boinc fed with a decent size backlog so it doesn't notice..
Or is this a persistent problem? I just happened to notice the "1 AM" timestamps in the transcript you posted.
--Gary
I believe this has been happening for a day or two now, because my credits have dropped significantly since two days ago. That would be a direct cause of my GPU not getting any work to keep it going. And the times are Eastern, so it's 1AM EDT, or 10PM PDT.
I seem to be having a problem connecting to PrimeGrid. Whenever I try to do an update, or BOINC tries to do an update on its own, it starts the communication, waits for 5 minutes, then says the connection's timed out. Enabling the HTTP debug and the HTTP transfer debug seems to indicate that the PrimeGrid server's not returning anything at all. I've checked the server status page, and it indicates that all the servers are up, so I'm not sure what's going on here.
The Server Status page on all Projects is NOT perfectly in sync with our viewing of it, it is a webpage that is cached at intervals. So if the interval is 60 minutes you could be looking at a 59 minute old page. I believe each Project can adjust that refresh rate and I do not know what PG's rate is. I am just saying the Server Status page is not an accurate measuring stick of what the Project is doing minute by minute. Projects have crashed and since the page is cached it looked like all was fine when in fact it was not. Don't know about your particular point in time though.
I thought the server status page was cached because it is on other projects, but the timestamp in the top left updated when I refreshed the page, even 20 seconds later. So I figured it was real time. If it is cached (which makes sense that it would be), they should put the time when it was last updated on the page as well. That would avoid problems like this if it was a failed server. At this point, it doesn't look like it is, because my desktop still hasn't reported tasks, although it does look like they've all uploaded.
____________
| |
|
|
|
|
Seriously, is there nothing anyone can suggest for this? It's been 10 days now. The last tasks I have that haven't timed out are three 40 hour LLR tasks. If those time out, there are other projects that are more deserving of my CPU time and energy.
____________
| |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
I see many task with "Timed out - no response" and some tasks with "Error while computing".
Do you have reglemented the "network usage" via a tab of the same name in your boinc manager?
All errored out tasks have the same log message: <core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
couldn't start Can't write init file: -108: -108
</message>
]]> This could be caused by a file permission problem.
PG goes every day in maintenance at 8am UTC for max 1 hr.
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
|
|
I see many task with "Timed out - no response" and some tasks with "Error while computing".
Do you have reglemented the "network usage" via a tab of the same name in your boinc manager?
All errored out tasks have the same log message: <core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
couldn't start Can't write init file: -108: -108
</message>
]]> This could be caused by a file permission problem.
PG goes every day in maintenance at 8am UTC for max 1 hr.
I hadn't see those errored tasks, but I suspect it was part of a permissions problem. I had a problem before this was happening where the permissions on the ProgramData\BOINC Data folder were reset. I fixed that a couple days before the issue. PG was updating just fine between then. The timed out tasks are the tasks that are timing out because of this issue.
As for the network usage, the settings are wide open. No restrictions on time or amount of data transferred, 0.25 days of extra work requested. The CC is attempting to contact PG, but not receiving a reply back from the scheduler. So it doesn't look like a problem on this end.
The issue has been happening for 10+ days now, with BOINC attempting to submit completed work and get new work throughout the entire time. The maintenance period isn't the issue.
____________
| |
|
|
|
|
So, weird things are happening. All my work timed out, so I just detached the computer from the project, and then for curiosity's sake, I reattached it. And everything's working as normal again. So it was a problem with just my machine, whether in PG's database or BOINC locally.
____________
| |
|
|
|
So, weird things are happening. All my work timed out, so I just detached the computer from the project, and then for curiosity's sake, I reattached it. And everything's working as normal again. So it was a problem with just my machine, whether in PG's database or BOINC locally.
I spoke too soon. The reattached machine is unable to contact PG again. At least I only seem to have low CPU time sieve tasks now. Any more ideas from people?
____________
| |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
|
Do you use a virus-scanner or any other security-suite?
If yes, did you exclude the entire BOINC folders from scanning and have all BOINC files the permission to connect to the WWW?
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
|
|
Do you use a virus-scanner or any other security-suite?
If yes, did you exclude the entire BOINC folders from scanning and have all BOINC files the permission to connect to the WWW?
I do, and it does appear that the problem is with the suite, except not the virus scanning part. I've already excluded the program data folder as it was causing problems with WCG before. I'm using F-Secure Internet Security 2012, and if I disable the firewall, it starts working right away. It's disabled right now until I can figure out an exception rule. It's not a big problem that it's disabled anyway because our entire network is behind a NAT and global firewall on our internet gateway.
Sorry for making a big fuss over nothing. PG has been running on BOINC ever since, will continue to contribute cycles towards primes.
I'll post back here if I can figure out an effective exception rule for others to use.
____________
| |
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
I'll post back here if I can figure out an effective exception rule for others to use. An easy exception rule would be, if you allowed the boinc-executable itself (boinc on linux and boinc.exe on windows) to open a connection to the web.
Do you have also problems with the boincmgr-exec to connect to your local boinc-client? If yes then TCP-Port 31416 is blocked by your sec-suite or firewall...
____________
Best wishes. Knowledge is power. by jjwhalen
| |
|
|
|
I'll post back here if I can figure out an effective exception rule for others to use. An easy exception rule would be, if you allowed the boinc-executable itself (boinc on linux and boinc.exe on windows) to open a connection to the web.
Do you have also problems with the boincmgr-exec to connect to your local boinc-client? If yes then TCP-Port 31416 is blocked by your sec-suite or firewall...
Sorry, didn't see this message until today. No, no problems between the manager and the CC, although I might have to open 31416 in the future for remote manager access. I've already excepted the BOINC executables from the firewall, and had done so before this all started, but I guess that's not enough. Windows Firewall is also turned off, probably turned off when F-Secure was installed.
____________
| |
|
Message boards :
Problems and Help :
Problem with Project Communication |