PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise
1) Message boards : Number crunching : Perseid Challenge (Message 78832)
Posted 2234 days ago by Grandpa
I have 48 of the Megas running at the moment they are at 17% they show about 4hrs to complete and I have it set to no new tasks. If the follow the same pattern they should start completing about 1/2 way through so we should see them completing in about 2 hrs which would be about 50%


Totally different program, and the failure mode is different. They will probably run to completion, but the residues will be wrong if the there are calculation errors.


Some of the results are now in, and in every case so far where there's a wingman to compare against, the residues failed to match. 100% mismatch so far. That's pretty conclusive. Not as conclusive as it will be when the third tasks on each of those workunits come back, but it's enough.


Yep, I am pretty sure they are going to fail, now I just need to figure out what broke. I am happy to see that the validation process is working propeller on these. Dmeseg is not giving me any clues so I am going to have to do some digging and testing :(
2) Message boards : Number crunching : Perseid Challenge (Message 78824)
Posted 2234 days ago by Grandpa
EDIT:
I have removed this machine from TRP and aborted all of the running WU's it was still not completing 100%, for the time being. I will run it on another project Primgrid Project (Mega) for testing. to see what it does there. I ran them in the last challenge with no apparent problem so I will run a limited number of them to see if the problem exist there or not.


With Mega, we should see pretty quickly if it's having problems.

EDIT: The only completed LLR task currently in the database for that host is an SR5 where it returned an incorrect residue, which is consistent with the problems you're seeing on the sieve.


I have 48 of the Megas running at the moment they are at 17% they show about 4hrs to complete and I have it set to no new tasks. If the follow the same pattern they should start completing about 1/2 way through so we should see them completing in about 2 hrs which would be about 50%

How is possible to @ this Opteron so high, is this a some SE version? Don't you afriad of vrm on motherboard?

Damn, it's some nice oc. :D


No VM I am running a custom bios that allows OCing on some server boards. Although it is not needed on these chips since they are ES versions.
3) Message boards : Number crunching : Perseid Challenge (Message 78820)
Posted 2234 days ago by Grandpa

As for overclocking, CPU temperatures during sieve are significantly lower then on short LLR. But don't forget that each TRP Sieve task requires ~160 Megabytes of memory, so your 48 cores will use almost 8 Gigabytes only for Boinc, not counting other tasks and services run on this host.


Temps are not a problem on that machine it is liquid cooled so even LLR task can not get it hot current temps are good while running these I have never seen anything above 42C when running any Primgrid work on it..

Detected processor: Family 15h (Bulldozer/Interlagos/Valencia) Processor Machine has 8 nodes Processor has 6 cores Processor has 7 p-states Processor has 2 boost states Processor temperature slew rate:9.0°C Temperature table: Node 0 C0:36 C1:36 C2:36 C3:36 C4:36 C5:36 Node 1 C0:36 C1:36 C2:36 C3:36 C4:36 C5:36 Node 2 C0:38 C1:38 C2:38 C3:38 C4:38 C5:38 Node 3 C0:37 C1:37 C2:37 C3:37 C4:37 C5:37 Node 4 C0:37 C1:37 C2:37 C3:37 C4:37 C5:37 Node 5 C0:38 C1:38 C2:38 C3:38 C4:38 C5:38 Node 6 C0:38 C1:38 C2:38 C3:38 C4:38 C5:38 Node 7 C0:37 C1:37 C2:37 C3:37 C4:37 C5:37


As far as memory goes it has 32GB and is only using 25% of total available memory. and I have recently run memtest on it with no errors. I do have the ability to adjust the Core Voltage on these chips (63xx ) so I will adjust the Voltage up on them I have run them crunching up to 4.1Ghz before without issue. I have dropped the OC a little 245 x 15.5 = 3797.5 Mhz and upped the Voltage by .0125v so we shall see if that fixes the problem or if it will need another bump.

EDIT:
I have removed this machine from TRP and aborted all of the running WU's it was still not completing 100%, for the time being. I will run it on another project Primgrid Project (Mega) for testing. to see what it does there. I ran them in the last challenge with no apparent problem so I will run a limited number of them to see if the problem exist there or not.
4) Message boards : Number crunching : Perseid Challenge (Message 78816)
Posted 2234 days ago by Grandpa
This WU seems a bit odd It may be that I rebooted the machine while it was running, but I am unsure of that, according to the Validation process it took an unusually short amount of time to run it for this machine but it came back as Valid. I did see where it was mentioned there was a problem with short runtime. If the runtime is correct I very seriously doubt this Valid status is correct.

http://www.primegrid.com/workunit.php?wuid=402260432

canonical result 565491504
Work Unit 402260432


Indeed, that computer had caught our attention.

It's not working correctly. If it's overclocked, it's overclocked way too high. If it's not overclocked, something is broken. (My first guess would be memory, but it could be anything, really.)

Most of the tasks are dying well before they complete. We have some 'sanity checks' in place to catch tasks with unreasonably short run times, but there's computers much faster than this one that can complete these tasks in about 20 minute, so we can't set those thresholds very high or we'll be tossing out perfectly good tasks.

Assuming the result is still correct , some of this computer's tasks would be marked invalid, and some (which ran longer before failing) would be marked valid and would receive credit. (We're not thrilled with this, and may seek to improve the validation at some point.)

This computer (and one other) got the benefit of the doubt as far as credit is concerned. (The tracking for the challenge leaderboards is more strict since it's not fair to other participants to have faulty tasks get challenge points.)

The bottom line is this computer isn't working correctly and you should take a look at it.



Yes it is a 48 core Opteron overclocked to 3.9Ghz . Are these WU's more sensitive to overclock than others. This machine has run Many other Boinc and Primgrid projects without problems at the current settings. I will fix the OC on this Machines while it is running the clean up or remove them and put them running something else if it can not be fixed.

This computer (and one other) got the benefit of the doubt as far as credit is concerned. (The tracking for the challenge leaderboards is more strict since it's not fair to other participants to have faulty tasks get challenge points.)


Is this other one you are refuring to one of my computers, If it is, I can not find another one with short run times for some reason. If it is mine do you know the Machine ID number ? so I can monitor it and fix it.


One of the things I am a little worried about is that the WU was validated against another computer which I do not believe was correct it should have come up as invalid. Is there also something wrong with the normal Validation process on these WU's, I think it should have failed validation when checked against another computers results. I would think this could really be messing up the projects results.




Name TRP_sieve_7436884_1 Workunit 402260432 Created 14 Aug 2014 | 8:44:16 UTC Sent 14 Aug 2014 | 10:21:36 UTC Received 14 Aug 2014 | 12:49:00 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 418989 Report deadline 18 Aug 2014 | 11:21:36 UTC Run time 2,404.61 CPU time 2,400.95 Validate state Valid Credit 120.14 Application version The Riesel Problem (Sieve) v1.12


Name TRP_sieve_7436884_2 Workunit 402260432 Created 15 Aug 2014 | 16:06:39 UTC Sent 15 Aug 2014 | 16:14:47 UTC Received 15 Aug 2014 | 16:31:39 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 432611 Report deadline 19 Aug 2014 | 17:14:47 UTC Run time 913.15 CPU time 872.70 Validate state Valid Credit 120.14 Application version The Riesel Problem (Sieve) v1.07
5) Message boards : Number crunching : Perseid Challenge (Message 78814)
Posted 2234 days ago by Grandpa
This WU seems a bit odd It may be that I rebooted the machine while it was running, but I am unsure of that, according to the Validation process it took an unusually short amount of time to run it for this machine but it came back as Valid. I did see where it was mentioned there was a problem with short runtime. If the runtime is correct I very seriously doubt this Valid status is correct.

http://www.primegrid.com/workunit.php?wuid=402260432

canonical result 565491504
Work Unit 402260432
6) Message boards : Number crunching : New primes (Message 78313)
Posted 2255 days ago by Grandpa
Another mega prime to unofficially announce:

129*2^3328805+1 was discovered earlier today by Eric Clifton on the PPS-Mega project. It's a factor of xGF(3328804,7,5).

(For those of wondering what "xGF(3328804,7,5)" is, it's shorthand for 7^2^3328804+5^2^3328804, which is referred to as an "eXtended Generalized Fermat number".)

This prime is special for a couple of reasons:

1) It's the first prime found by the MEGA project since we moved it from PRPNet to BOINC.

2) It's the smallest mega prime ever discovered by PrimeGrid (Yes, that's a dubious distinction because that's essentially calling it the smallest large prime, which is rather silly.)

Saving the best for last...

3) This is the 100th mega prime ever discovered!!!


And to go along with it being the smallest mega prime it took 46.159833333 minutes to run it on a i7 2700k@4853Mhz
7) Message boards : News : Mega Prime Search Now Running on BOINC (Message 78310)
Posted 2255 days ago by Grandpa
Well it seems I am a lucky one I apparently found 1 yesterday 129*2^3328805+1, so It appears I join the ranks of those with a prime of 1,000,000+ digits
8) Message boards : Number crunching : Badges for discovering large primes (Message 75331)
Posted 2363 days ago by Grandpa
Well I was lucky and got 1 today that made the top 5000 list and I am actually a bit proud of it. And think it is a good idea to be able to show it off with a badge. I do like the example's given by Crun-chi.
9) Message boards : Number crunching : Double Top Secret Mystery Challenge (Message 74836)
Posted 2380 days ago by Grandpa
*** 248 tasks, 237 affecting scoring positions, of Double Top Secret Mystery Challenge (PSP-LLR) cleanup work are currently available! ***


^^^^^^^I was wondering how long it was going to take for that to hit ^^^^^^^^
10) Message boards : Number crunching : Double Top Secret Mystery Challenge (Message 74268)
Posted 2394 days ago by Grandpa
I was curious as to why a few of the WU's appear to have very short cpu time compared to all the others, they have been confirmed as valid and credited properly so there is nothing wrong with them is this correct or is there some type of bug or something in the tracking software.


Although it's unclear what the cause is, sometimes the cpu time is reported erroneously.

If those happen to be your computers, you are in a much better position to figure out what's causing that than we are. From the server side all we see is that the BOINC client is telling us it took X cpu seconds. If they're not your computers, you can safely ignore it.


They are on my computers and It appears to affect all of them a small % 2.5% over time, I am guessing it is a software bug since all of the computers seem to be affected from time to time. It looks like the cpu tracking software freezes but idk.


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 3.18, 3.24, 3.28
Generated 27 Sep 2020 | 11:19:48 UTC