PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
11) Message boards : Number crunching : I'm my own wingman + [resolved] ghost tasks (Message 20129)
Posted 3587 days ago by dh1saj
It seems that "Ghost Tasks" appear under AP26(cuda) as well.
I have at least one (http://www.primegrid.com/workunit.php?wuid=97919430) that appears online under computers/tasks for one machine but not on this computer itself.
12) Message boards : Project Staging Area : GCW13 Mini-Challenge Series (Message 19520)
Posted 3607 days ago by dh1saj
Wow.
Looks like Lennart caught a GCW13 prime.

Any details available (beside the fact that he obviously put maaaaaaaany cores into this)?
13) Message boards : Project Staging Area : PRNet Discussion (Old) (Message 19348)
Posted 3619 days ago by dh1saj

uwin.mine.nu:10000 seems to be down.
Also uwin.mine.nu:10000/user_stats.html shows blank screen but no HTML-Error.

Same issue with port 11k.

Prpnet on the PG server is up and running (at least for GCW13 and PPSE>450).

Client is 2.4.6 / WIN-XP32.
14) Message boards : Project Staging Area : PRNet Discussion (Old) (Message 19090)
Posted 3632 days ago by dh1saj

The server at uwin.mine.nu seems to be down.
Ports 10k and 11k don't get work.

On the positive end: 2.4.5 now ran two nights on a Q6600 with not any crash issue.
15) Message boards : Project Staging Area : PRNet Discussion (Old) (Message 19046)
Posted 3634 days ago by dh1saj
Port 10k running dry?

[2009-11-10 17:22:17 GMT] PPSE10k: No active candidates found on server

16) Message boards : Project Staging Area : PRNet Discussion( Old ) (Message 19000)
Posted 3636 days ago by dh1saj
Maybe this helps:
I ran ports 10k and 11k on ONE core on a slow machine (~35-45sec/WU) with setting "2" for number of WUs to be downloded per connect.

It worked with NO crash a full night!

The only issue was the known server side message about unknown WUs..... (Server has no record of this WU...)
17) Message boards : Project Staging Area : PRNet Discussion( Old ) (Message 18923)
Posted 3638 days ago by dh1saj
Sorry for confusion.
My clients ARE positively crashing - so terminating themselves, including the DOS-Window (Win32). As well a program crash is reported by the OS.

What I've posted is what is found in the logs per client - if anything looks unusual. Quite often there is no crash evidence in the log while the client has shut down anyway.

I've as well never seen a crash while doing the real math job. It always and only happens when the client is attempting to upload results to the server.

I can exactly second what cmorton reported earlier.

Also, I run 4 cores exclusively on PPSE (Primegrid server:12007) half the night with NO trouble at all. Trouble only occures if the 10k and 11k ports are involved.
18) Message boards : Project Staging Area : PRNet Discussion( Old ) (Message 18913)
Posted 3638 days ago by dh1saj

After locally restarting the Win32 Q9450 (non OCed!) 3 cores crashed within 1 hour with nothing suspicious in the log - the logs were just not continued.

On core crashed quite quickly, see the log:

[2009-11-05 17:49:13 GMT] PRPNet Client application v2.4.4 started
[2009-11-05 17:49:13 GMT] User name dh1saj at email address is dh1saj@arcor.de
[2009-11-05 17:49:28 GMT] PPSE11K: 4295*2^115435+1 is not prime. Residue 34F305935B3B226E
[2009-11-05 17:49:44 GMT] PPSE11K: 7497*2^115435+1 is not prime. Residue 6376B782B329DE6F
[2009-11-05 17:49:59 GMT] PPSE11K: 5031*2^115437+1 is not prime. Residue D8A4DF7BBFCD1313
[2009-11-05 17:50:14 GMT] PPSE11K: 9507*2^115436+1 is not prime. Residue 7BB2C29A73E94922
[2009-11-05 17:50:28 GMT] PPSE11K: 3487*2^115438+1 is not prime. Residue BE5AD084158A7758
[2009-11-05 17:50:42 GMT] PPSE11K: 7681*2^115436+1 is not prime. Residue BF8663D2D0D50DE2
[2009-11-05 17:50:56 GMT] PPSE11K: 8651*2^115435+1 is not prime. Residue E418C51593BD3371
[2009-11-05 17:51:10 GMT] PPSE11K: 8557*2^115436+1 is not prime. Residue F49F8012A3DBFC5E
[2009-11-05 17:51:23 GMT] PPSE11K: 8979*2^115434+1 is not prime. Residue 79C263DA4429CF5E
[2009-11-05 17:51:37 GMT] PPSE11K: 9597*2^115436+1 is not prime. Residue E541F200A1DFEE3B
[2009-11-05 17:51:51 GMT] PPSE11K: 7071*2^115435+1 is not prime. Residue 4F545D8CEDC1580B
[2009-11-05 17:52:05 GMT] PPSE11K: 1353*2^115437+1 is not prime. Residue 76F7B5A600C3EC08
[2009-11-05 17:52:19 GMT] PPSE11K: 8667*2^115436+1 is not prime. Residue 9B62214941357396
[2009-11-05 17:52:32 GMT] PPSE11K: 2197*2^115436+1 is not prime. Residue 67EA3E41580AD9F9
[2009-11-05 17:52:46 GMT] PPSE11K: 6975*2^115437+1 is not prime. Residue A10301CCB873C675
[2009-11-05 17:53:00 GMT] PPSE11K: 1353*2^115438+1 is not prime. Residue 10BE1965199A9ADB
[2009-11-05 17:53:14 GMT] PPSE11K: 3687*2^115438+1 is not prime. Residue 0232B6C4DBF7E71D
[2009-11-05 17:53:28 GMT] PPSE11K: 4779*2^115437+1 is not prime. Residue E518791E8A68DE9B
[2009-11-05 17:53:42 GMT] PPSE11K: 1797*2^115438+1 is not prime. Residue 1711B5359485EF59
[2009-11-05 17:53:56 GMT] PPSE11K: 2013*2^115436+1 is not prime. Residue 71FA2C8F4E543759
[2009-11-05 17:53:56 GMT] Total Time: 0:04:46 Total Tests: 20 Total PRPs Found: 0
[2009-11-05 17:54:01 GMT] PPSE11K: Returning work to server uwin.mine.nu at port 11000
[2009-11-05 17:54:01 GMT] PPSE11K: INFO: Server has no record of this test. The test result for 4295*2^115435+1 has been logged
[2009-11-05 17:54:01 GMT] PPSE11K: INFO: Server has no record of this test. The test result for 7497*2^115435+1 has been logged
[2009-11-05 17:54:01 GMT] PPSE11K: INFO: Server has no record of this test. The test result for 5031*2^115437+1 has been logged
[2009-11-05 17:54:01 GMT] PPSE11K: ERROR: Workunit 9507*2^115436+1 not found on server
[2009-11-05 17:54:01 GMT] PPSE11K: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe 7BB2C29A73E94922] cannot be parsed
[2009-11-05 17:54:01 GMT] PPSE11K: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-05 17:54:01 GMT] PPSE11K: INFO: Server has no record of this test. The test result for 3487*2^115438+1 has been logged
[2009-11-05 17:54:01 GMT] PPSE11K: INFO: Server has no record of this test. The test result for 7681*2^115436+1 has been logged
[2009-11-05 17:54:01 GMT] PPSE11K: ERROR: Workunit 8651*2^115435+1 not found on server
[2009-11-05 17:54:01 GMT] PPSE11K: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe E418C51593BD3371] cannot be parsed
[2009-11-05 17:54:01 GMT] PPSE11K: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-05 17:54:01 GMT] PPSE11K: INFO: Server has no record of this test. The test result for 8557*2^115436+1 has been logged
[2009-11-05 17:54:01 GMT] PPSE11K: INFO: Server has no record of this test. The test result for 8979*2^115434+1 has been logged

:-(
19) Message boards : Project Staging Area : PRNet Discussion( Old ) (Message 18892)
Posted 3639 days ago by dh1saj
Sorry, I've been out all day, unexpectedly.
Just returning to the crashed machine...

It's PRP 2.4.4 / WIN32 / Ports 10000 & 11000 on uwin.mine.nu

See the log of the persistently crashing core.


[2009-11-04 18:10:48 GMT] PPSE10k: Attempting to send previously completed work
[2009-11-04 18:10:53 GMT] PPSE10k: Returning work to server uwin.mine.nu at port 10000
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: Workunit 1613*2^111641+1 not found on server
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe B1ABD2E75F045581] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: Workunit 4863*2^111640+1 not found on server
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe 7CA195A986FDA145] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: Workunit 5831*2^111639+1 not found on server
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe 1F08926457276B93] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: Workunit 6523*2^111638+1 not found on server
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe B7DABE04EBDF20DD] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: Workunit 6547*2^111638+1 not found on server
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe 6E566D3073DDAA43] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:10:53 GMT] PPSE10k: ERROR: Workunit 9757*2^111638+1 not found on server
[2009-11-04 18:12:05 GMT] PPSE10k: Attempting to send previously completed work
[2009-11-04 18:12:08 GMT] PPSE10k: Returning work to server uwin.mine.nu at port 10000
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: Workunit 1613*2^111641+1 not found on server
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe B1ABD2E75F045581] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: Workunit 4863*2^111640+1 not found on server
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe 7CA195A986FDA145] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: Workunit 5831*2^111639+1 not found on server
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe 1F08926457276B93] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: Workunit 6523*2^111638+1 not found on server
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe B7DABE04EBDF20DD] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: Workunit 6547*2^111638+1 not found on server
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [Test Result: llr.exe 6E566D3073DDAA43] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: ReceiveCompletedWork error. Message [End of WorkUnit] cannot be parsed
[2009-11-04 18:12:09 GMT] PPSE10k: ERROR: Workunit 9757*2^111638+1 not found on server

20) Message boards : Project Staging Area : PRNet Discussion( Old ) (Message 18859)
Posted 3640 days ago by dh1saj
I see the same problems as already reported by dar1008.
After approx. 1hr running on 4 cores of a Q9450 / WIN32 all 4 cores had crashed with similar log entries.

Restarting all 4 cores caused immediate crash of 1 core.

Terminating 1-3 and restarting all 4 again caused 1 core to crash immediately. Done this 3 times, behaviour remains.

Meanwhile, the whole machine seems to be down as I can't get remote access any more. I'll check the logs tomorrow....


Regards
dh1saj


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2019 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 2.35, 1.91, 2.30
Generated 23 Oct 2019 | 19:41:51 UTC