Author |
Message |
|
Hi all,
I noticed over the last couple of days that the new SGS version that was released is usually 5-10% faster than the previous version - however the amount of time consumed once the WU begins crunching, before the status updates from 0.000% to 1.050% at the beginning, is substantial and varies a lot.
The WU's on one of my desktops take around 10 mins, but with an extra 10-40 seconds or so seemingly wasted at the beginning of each one. Since this is a reasonable fraction of the entire calculation time i'm just curious if only i see this issue or if infact there is any good reason for this?
Thanks
Darryl
____________
|
|
|
|
and there I thought I was imagining things. Yes, I've seen this too.
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13956 ID: 53948 Credit: 393,160,197 RAC: 187,115
                               
|
I have done a little investigation into this problem -- and I've found nothing out of the ordinary.
I tried running the current 3.8.8 LLR 64 bit executable, both standalone and via BOINC.
The BOINC status update in the GUI is driven by the LLR command line client outputing status messages, which are read by the wrapper and passed to the BOINC manager, which draws the progress in the display window.
So, the display updates whenever LLR prints those progress messages.
What I found was that, on my hardware, when running standalone, those messages were coming about 10 to 12 seconds apart. The first one appeared about 10 seconds after LLR started.
Running under BOINC, I found that the first status update occurred about 10-12 seconds after the WU started runnings, and continued at intervals of about 10-12 seconds. That's exactly what should be happening; it matches the output of LLR in standalone mode.
As far as I can determine, it's working the way it's supposed to.
What I would recommend doing is this:
Wait until a new SGS starts, and time how long it takes from the time the status changes to "Running" and the progress increments from 0.000 to 1.005 (or whatever). Then, also time the amount of time it takes to get to the second update at around 2%. If the two times are roughly the same, then it's working as intended. If it took substantially longer for the first update than the second, then something's wrong.
Since it's not happening on my computer (at least not when running just 1 core, which is how I tested it), perhaps it would help if you provided lots of details about the environment in which you have seen this problem occur. Since nobody seems to know what's causing it right now, including as much detail as possible might help fnd the clue that solves this.
My CPU is a Core2Quad, so no AVX, but it is 64 bits.
____________
My lucky number is 75898524288+1 |
|
|
|
Hi michael,
Thanks for the info. I thought someone might say this so i did indeed time it. From 0.000% to 1.050% takes anywhere from 20-50 seconds. From then on, i.e. 1.050 to 2.100% takes around 6 secs, and every status update thereafter also comes in 6 or 7 secs later. But the initial 15+ secs of the workunit are reported as "running" but at 0.000%. Something is wrong here, i just assume there is some code that requires a 15-30 second initialisation before the calculation can begin...
Regards,
Darryl
____________
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13956 ID: 53948 Credit: 393,160,197 RAC: 187,115
                               
|
Hi michael,
Thanks for the info. I thought someone might say this so i did indeed time it. From 0.000% to 1.050% takes anywhere from 20-50 seconds. From then on, i.e. 1.050 to 2.100% takes around 6 secs, and every status update thereafter also comes in 6 or 7 secs later. But the initial 15+ secs of the workunit are reported as "running" but at 0.000%. Something is wrong here, i just assume there is some code that requires a 15-30 second initialisation before the calculation can begin...
Regards,
Darryl
Awesome, it's reproducible on your computer and all your computers are running 64 bit Windows 7. That makes it simple:
Download the latest version of cllr.exe to a folder on your computer, rename it to something manageable like "cllr.exe", then run it with this command:
cllr.exe -d -q"441506687765*2^666669-1"
(or use one of the SGS numbers from your WUs.)
Copy and past the output here.
This is what you SHOULD see:
C:\PRPNet\prpclient-5.0.4-windows-gpu\prpclient-1-gpu>cllr388 -q"4415066877765*2
^666669-1" -d
Starting Lucas Lehmer Riesel prime test of 4415066877765*2^666669-1
Using zero-padded Core2 type-1 FFT length 72K, Pass1=96, Pass2=768
V1 = 5 ; Computing U0...done.
4415066877765*2^666669-1, iteration : 10000 / 666669 [1.49%]. Time per iteratio
4415066877765*2^666669-1, iteration : 20000 / 666669 [2.99%]. Time per iteratio
4415066877765*2^666669-1, iteration : 30000 / 666669 [4.49%]. Time per iteratio
4415066877765*2^666669-1, iteration : 40000 / 666669 [5.99%]. Time per iteratio
n : 1.344 ms.
Caught signal. Terminating.
(Note that the progress messages are on multiple lines because the 80 column window is too short, but that's useful in this circumstance.)
____________
My lucky number is 75898524288+1 |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
Great, so I am not crazy :)
This is same symptom I sow on my machine...
From 0.000% to 1.050% takes anywhere from 20-50 seconds. From then on, i.e. 1.050 to 2.100% takes around 6 secs, and every status update thereafter also comes in 6 or 7 secs later.
Michael this is not shown when you use only one core: on my 3.7 GHz one core finished WU in 9minutes and few seconds
But when you enable four cores crunch time increase to 10:30???
Yesterday I try few combinations in app.info with old llr.exe and old wrapers but didnot resolve problem. On the other hand: on Linux with one core or all cores: crunch time is same 9:05.
So mystery is even bigger... In first I suspect my windows installation, but since another have same problem it could not be windows. On the another hand i did try "old" llr and wrappers and problem is same....
I can even put small movie on youtube to see how it looks like, but something is strange.
I accept your explanation that is ok , but why crunch time rise on windows but not on Linux?
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
|
Please remember that Windows 7 in particular is very very bad scheduling at long cpu intensive tasks. So, to me at least, the really the only valid tests under Windows 7 is when all available cores are crunching and no GPU WUs. Note that hyperthreading must be disabled in the bios not restricting boinc to 50% of cores. I found the core temp program useful as it showed the processor frequency changing when no all available cores are working (none of the other similar tools showed this).
Really you should do as Michael suggests. But I would go further and say run half of your processors on SGS with boinc and run the after half on PRPNet 's SGS (server=SGS:100:1:prpnet.primegrid.com:12000). After the initial setup these should be provide similar run times. Otherwise there is a bigger issue related to boinc. |
|
|
|
What i also see is that in this "initialization time" the task takes more than 25% CPU time (on a 4-core CPU). Sometimes 50% or even 75% of the total CPU power...
____________
ESP: Eliminated k=94373
SR5: Eliminated k=97366 and k=325918 |
|
|
|
C:\Users\Darryl2\Downloads>cllr.exe -d -q"8829660260085*2^666668-1"
Starting Lucas Lehmer Riesel prime test of 8829660260085*2^666668-1
Using zero-padded AMD K10 type-1 FFT length 72K, Pass1=96, Pass2=768
V1 = 9 ; Computing U0...done.
8829660260085*2^666668-1, iteration : 10000 / 666668 [1.49%]. Time per iteratio
8829660260085*2^666668-1, iteration : 20000 / 666668 [2.99%]. Time per iteratio
8829660260085*2^666668-1, iteration : 30000 / 666668 [4.49%]. Time per iteratio
8829660260085*2^666668-1, iteration : 40000 / 666668 [5.99%]. Time per iteratio
8829660260085*2^666668-1, iteration : 50000 / 666668 [7.49%]. Time per iteratio
8829660260085*2^666668-1, iteration : 60000 / 666668 [8.99%]. Time per iteratio
8829660260085*2^666668-1, iteration : 70000 / 666668 [10.49%]. Time per iterati
8829660260085*2^666668-1, iteration : 80000 / 666668 [11.99%]. Time per iterati
8829660260085*2^666668-1, iteration : 90000 / 666668 [13.49%]. Time per iterati
8829660260085*2^666668-1, iteration : 100000 / 666668 [14.99%]. Time per iterat
ion : 0.939 ms.
Caught signal. Terminating.
This is what I see at the command line. The time per iteration jumps up and down a bit but I don't see anything here that really explains this behaviour. (All BOINC computations were suspended when I ran this)
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Do you see the same problem if you manually set the llr-app onto a specified core via taskmanager or process explorer?
Andreas Stiller from http://ctmagazin.de developed a win-tool called "Launch".
The syntax is "Launch <app> /p="" /a=1 /c=N"
/p="xxxx yyyy zzz" Parameter for the Application, "" for " in String
/a=xxx Processoraffinity for file, xxx= 0 without (is default)
1: Processor 0, 2:Processor 1, 3:Processor 0+1, 4: Processor 2 ...
/c=xxx priorityclass xxx=N[ormal] is default, I[dle], B(elow), H[igh], A[bove], R(ealtime)
launch cllr.exe /p="-d -q""8829660260085*2^666668-1""" /a=1 /c=N" This means, cllr.exe is executed only on core 1 with normal priority.
Pressing "enter" after each iteration you will get "Time per iteration" in a new line.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13956 ID: 53948 Credit: 393,160,197 RAC: 187,115
                               
|
What i also see is that in this "initialization time" the task takes more than 25% CPU time (on a 4-core CPU). Sometimes 50% or even 75% of the total CPU power...
That's odd.
I'm rapidly running out of ideas that don't involve asking Jean or George.
Here's a question: is anyone experiencing this problem on a CPU that is NOT AVX? If this is only happening on AVX CPUs, that would narrow the problem down.
____________
My lucky number is 75898524288+1 |
|
|
|
The pc I noticed it on was with avx. I presumed it was a HT quirk as it was only the 7th or 8th wu to start that displayed this 'problem'. It never bothered me enough to check whether it occurred with subsequent wu's. |
|
|
|
What i also see is that in this "initialization time" the task takes more than 25% CPU time (on a 4-core CPU). Sometimes 50% or even 75% of the total CPU power...
That's odd.
I'm rapidly running out of ideas that don't involve asking Jean or George.
Here's a question: is anyone experiencing this problem on a CPU that is NOT AVX? If this is only happening on AVX CPUs, that would narrow the problem down.
I remember seeing this problem already a long time ago when the first AVX release came out. I've never seen it on my non-AVX processor. But now with the new official release 1.00 i can see it also on my non-AVX processor...
____________
ESP: Eliminated k=94373
SR5: Eliminated k=97366 and k=325918 |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
What i also see is that in this "initialization time" the task takes more than 25% CPU time (on a 4-core CPU). Sometimes 50% or even 75% of the total CPU power...
That's odd.
I'm rapidly running out of ideas that don't involve asking Jean or George.
Here's a question: is anyone experiencing this problem on a CPU that is NOT AVX? If this is only happening on AVX CPUs, that would narrow the problem down.
Yes, on CPU that is AVX in enviroment that is not AVX aware (VMWare 4.x).
Described 3 days ago in [url=http://www.primegrid.com/forum_thread.php?id=4206&nowrap=true#52235
]LLR 3.8.8 Test in Boinc thread[/url]
____________
My stats |
|
|
|
I can see the "effect" on my AMD X6 1100T and also AMD X4 630.
However I'm running some SGS units on an Intel Xeon 5506, and although i can see the same effect there it is very small. WU times there are around 1150 secs and the difference between the 0 --> 1% jump is only about 2 secs longer than the 1--2% jump, 2--3% jump, etc. For comparison the AMD processors take around 2-3 times as long to get from 0 to 1 % as they do for any subsequent % jump.
A sample size of only 3 isn't really enough to draw conclusions but the AMD's seem to have the biggest problem, at least for me.
____________
|
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Honza wrote: What i also see is that in this "initialization time" the task takes more than 25% CPU time (on a 4-core CPU). Sometimes 50% or even 75% of the total CPU power...
That's odd.
I'm rapidly running out of ideas that don't involve asking Jean or George.
Here's a question: is anyone experiencing this problem on a CPU that is NOT AVX? If this is only happening on AVX CPUs, that would narrow the problem down.
Yes, on CPU that is AVX in enviroment that is not AVX aware (VMWare 4.x).
Described 3 days ago in LLR 3.8.8 Test in Boinc thread
Perhaps i can explain some things and hopefully somebody is able to see the root cause.
It should depend of the used OS, what you see. With older OS (all pre-Vista) an app can and will switch between all available cpu-cores to increase the overall system performance.
This behaviour has a little bit changed since Vista because this would be suboptimal for Intel's Turbo 1.0 and 2.0 feature. If all cores are on load, Turbo1.0 can not switch to a higher speed bin and Turbo 2.0 only one speed bin higher. How many speed bins a CPU can climb up is depended of the status of all other available cores and the TDP via the CPU-temperature. If they have reached a deep or deeper sleep state, the running core(s) are able to reach higher clockrate until they reach their programmed maximum.
W7 changed nothing on this. A single-thread process or app runs hopefully only on one core to reach a higher speed bin on this core but this is depended of the state of all other available cores and the programmed TDP. If the max TDP is reached, a CPU in Turbo-mode will ever lower their clockrate until the CPU has lowered their temperature.
But from time to time the OS can wake up a sleeping core to let them do some things. If this happens, a single-thread process/app can switch to another core with a lower clockrate.
For AMDs BD is this Intel optimization complete contra-productiv. The BD has a much lower single core performance than every up to date Intel processor and is also slower than their K10 predecessor. BD is only fast, when he can do multi-threaded processes/apps but he has some architecture limitations. SSE is 128bit width and can run on every part of a modul, AVX is 256bit width and need the help of both modul parts. Therefore it makes no difference in computation times on AMDs BD, if you use AVX or SSE.
Another point is the affinity of a process/app. Crunch3r released in the past a BOINC client with buildin affinity support. The version 6.1.0.32 for Win32 and 6.1.0.64 for Win64 (Linux did not need this work around because it is smart enough to let a single threaded app run only one core and hold it only on the same core the app was startet). This should help to let a single threaded app running only on one core and not switch over all available cores. The affinity features inside the client was a work around to mask an Intel design decision. Their Core2Quad was not a nativ quadcore like AMDs Phenom, Intel put together two separate silicon dies (each equivalent to a single Core 2 Duo) on one MCM. Both die's are connected/must communicate only over the slow FSB. Therefore Intels quads could not reach the higher scalability of AMDs quads. A process runs on core1 and is moved by the OS to core2. This will cause no penalties because both cores are homed on the same die but if the OS decides to move an app to core3 or core4, you will get a penalty. In this case the cpu-state with all registers must be transferred over the slow FSB before the app can continues on the core3/4. Intel changed this with Nehalem, their first native quadcore.
The BOINC developer did not want to have this affinity thing inside their source because it made no difference on AMDs native quads (Phenom and newer) and they got also no measurements that showed the relationship or usefulness.
VMware. Like said in another thread, VMware uses full virtualization and only the CPU itself will be directly available in a virtual machine. It is up to the installed OS inside a VM to use the available features inside the host CPU or not. For AVX you need at least W7SP1 or their binar compilat W2k8-R2 or Linux with kernel 2.6.30 and newer. If you see no AVX inside a VMware vSphere4/5-VM, you should check the EVC setting or some settings inside the physical BIOS. EVC is sometimes needed, if you want to use vMotion with different CPU-versions. VMware updated their KB-entry Enhanced vMotion Compatibility (EVC) processor support on march, 7th.
[add]
Here are two computation times. The first run of "cllr.exe" switched the complete runtime over all cores of my Core2Quad (viewed with SysInternals "process explorer") and the second with affinity to core 2:
D:\>cllr.exe -d -q"8829660260085*2^666668-1"
Starting Lucas Lehmer Riesel prime test of 8829660260085*2^666668-1
Using zero-padded Core2 type-1 FFT length 72K, Pass1=96, Pass2=768
V1 = 9 ; Computing U0...done.
8829660260085*2^666668-1 is not prime. LLR Res64: 1DF5917BCC5ED911 Time : 808.023 sec.
D:\>launch cllr.exe /p="-d -q""8829660260085*2^666668-1""" /a=2 /c=N
"D:\\cllr.exe" -d -q"8829660260085*2^666668-1" in VZ:D:\
run Instanz 0 Handle= 704 ID= 9A8 Th= 700 ThID= CCC
D:\>Starting Lucas Lehmer Riesel prime test of 8829660260085*2^666668-1
Using zero-padded Core2 type-1 FFT length 72K, Pass1=96, Pass2=768
V1 = 9 ; Computing U0...done.
8829660260085*2^666668-1, iteration : 10000 / 666668 [1.49%]. Time per iteration : 1.184 ms.
D:\>8829660260085*2^666668-1, iteration : 20000 / 666668 [2.99%]. Time per iteration : 1.145 ms.
D:\>8829660260085*2^666668-1, iteration : 30000 / 666668 [4.49%]. Time per iteration : 1.141 ms.
D:\>8829660260085*2^666668-1, iteration : 40000 / 666668 [5.99%]. Time per iteration : 1.144 ms.
D:\>8829660260085*2^666668-1, iteration : 50000 / 666668 [7.49%]. Time per iteration : 1.140 ms.
D:\>8829660260085*2^666668-1, iteration : 60000 / 666668 [8.99%]. Time per iteration : 1.259 ms.
8829660260085*2^666668-1, iteration : 80000 / 666668 [11.99%]. Time per iteration : 1.257 ms. ms.
D:\>8829660260085*2^666668-1, iteration : 90000 / 666668 [13.49%]. Time per iteration : 1.149 ms.
D:\>8829660260085*2^666668-1, iteration : 100000 / 666668 [14.99%]. Time per iteration : 1.253 ms.
8829660260085*2^666668-1 is not prime. LLR Res64: 1DF5917BCC5ED911 Time : 792.911 sec.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
Both my CPU are non AVX, but as I said before: on Linux there is not such problem ( since I use app.info)
But with Windows stock SGS applications, and with app.info I get similar results, and that is what confusing me...
Since there is new wrapper and new application for Windows, ok, lets say it is "not good".
But problem is also with app.info so you can use any combination of wrapper and cllr.
Since on last contest I run successfully PPS with app info and difference from windows and linux host was in few seconds on same speed, I will try to run it again. With app.info I used on that contest to see is there some problem with windows ( maybe driver update or something like that)
I doubt it is connected with series of number we are crunching in last few days...
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
New info
stock app 388 and wraper 1.0 ( both x 64)
AMD 960T ( OC 3.8 GHz)
Win 7 SP1 x64 ( 4 GB RAM)
In Boinc first set of 4 WU start in same time
finished 538-558 second
second set 4 WU
finished 547-563
third set 4 WU
finished 551-583
in fourth set I have already one WU that is processed in 600 second :(
in fifth set I have already one WU that is processed in 630 second :(
Now I am afraid to see crunch time tomorrow morning...
So in every new step WU processing became slower and slower
But if you use only one core then WU is finished in 538 sec
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
Perhaps i can explain some things and hopefully somebody is able to see the root cause.
It should depend of the used OS, what you see. With older OS (all pre-Vista) an app
Thanks for detailed description.
Running VMWare ESXi 4.1 (no vSphere) and Windows 2008R2SP1. CPU-Z does not see AVX nor VT-x; VT is enabled in BIOS.
Which version of CLLR can I downgrade to?
I would like to test with older version if there would be no such a huge deviation of run times (and that average run time would be much better).
____________
My stats |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Measurements inside BOINC are tricky and depended of the WU size. You can not compare two BOINC units with different size.
I suggest to wait until the challenge is over and do all test manually via CMD or ba$h.
Have you tried to pin WU manually on only one core via taskmanager or process explorer?
IIRC, AMDs Phenom X6 has also a Turbo-feature but works different to Intels implementation. If only half the cores of a CPU have work, the working cores are able to switch to a higher clockrate in one step. Depended of the CPU this step should be 400 or 500MHz. On the other side are only some processes able to saturate a core completely and the OS ever takes what it needs.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
Yes, I have been using Task manager to see core usage and CPU time.
(I do not consider CPU time in BOINC: Task properties reliable).
So when I reported that task is using more than one core (ie. 75% which is 3/4 cores), it was accroding to Windows Task Manager. CPU Affinity wasn't changed, apps are free to use whatever cores OS decides.
____________
My stats |
|
|
rroonnaalldd Volunteer developer Volunteer tester
 Send message
Joined: 3 Jul 09 Posts: 1213 ID: 42893 Credit: 34,634,263 RAC: 0
                 
|
Thanks for detailed description.
Running VMWare ESXi 4.1 (no vSphere) and Windows 2008R2SP1. CPU-Z does not see AVX nor VT-x; VT is enabled in BIOS.
Which version of CLLR can I downgrade to?
I would like to test with older version if there would be no such a huge deviation of run times (and that average run time would be much better).
vSphere4 is also used as synonym for ESX4 and ESXi4.
vSphere5 is a synonym for ESXi5, because there is no ESX5 anymore.
I believe you assoziate vSphere ever with vCenter. The vCenter is part of all VMware Kits. VMware compares all Kits at http://www.vmware.com/products/datacenter-virtualization/vsphere/small-business/compare-kits.html.
AVX should be no problem on a vSphere4 host. If you have a CPU with AVX and can not see it inside a VM, then you should check the settings for EVC on the host and the CPU mask inside every VM-config.
VT-x is normally only available on the host itself and needed for 64bit-VMs as a work around for some limitations in older Intel CPUs. If you need VT inside a VM then you should read JMattsons Running Nested VMs. Nested VMs are useful, if you want to have a virtualization product inside another VT-product. Examples are a vSphere-VM inside a Workstation-VM or HyperV inside VMware and so on. If you have trouble with VT-x, read JMattsons Trouble-shooting Intel VT-x Issues.
Older versions of LLR are available in all older PRPnet-versions or you download the wanted version directly from Jean Penne at officially releases and development versions.
____________
Best wishes. Knowledge is power. by jjwhalen
|
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
Let's put it another way: SGS on i5-2500, Run time ~28.000 - 35.000 secs.
http://www.primegrid.com/result.php?resultid=366000370
http://www.primegrid.com/result.php?resultid=365942073
http://www.primegrid.com/result.php?resultid=366000366
____________
My stats |
|
|
|
those examples are a little extreme to say the least.
What I can see in my results is that CPU and Run time are near enough to the same pretty much all the time on the i7 950.
On the i7 970 and the 2600K there's a good proportion where there's 5-20s difference between the times.
** The 950 has a single video card and the other 2 boxes have 2 cards each. The 970 is only running 10cores out of 12. I can't remember off-hand if the 2600K is on 7 or 8. |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
those examples are a little extreme to say the least.
Yes, they are.
And that's the point...and mystery.
Situation right now - 4 task are "running according to BOINC", all with 0% progress.
According to task manager, one task taking ~20% CPU with 95 minutes, other one 0% with 34 secs, last one with 80%, 4th one is not present.
A minute later - 4th task got running at 99% CPU, all other at 0%, all of them at 0% progress.
____________
My stats |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
Reverted back to 3.8.6 version. No benefit of AVX but no excessive run times, no long startup time, not more than single core usage per task etc.
____________
My stats |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
Yes 3.8.6 solves problem but it is slower, much slower then 3.8.8 64 bit.
I can get one WU in 8:58 on 3.8 GHz with 3.8.8 but it would take 10 min on 3.8.6
And it looks like that 3.8.7 also have problem as 3.8.8 ( can anyone confirm this)
AMD 960 T ( doesnot have AVX) only on Windows.
Linux 3.8.8 x64 works perfect and it is faster then 3.8.8 32 bit.
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
Yes 3.8.6 solves problem but it is slower, much slower then 3.8.8 64 bit.
I can get one WU in 8:58 on 3.8 GHz with 3.8.8 but it would take 10 min on 3.8.6
Slower? In my case it's ~490 non-AVX vs 350 secs AVX version.
Much slower? It's up to 35 000 secs with 3.8.8 vs. 490 secs with 3.8.6.
I would rather go slower but smooth, I can't baby sit all hosts.
(I might upgrade to VmWare 5.x in the future and see if it is any better)
____________
My stats |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
Honza, I finally got one 3.8.7 x64 bit ( non AVX) that works like a charm.
Try this ( win version)
http://www.asgoodasitgoetz.com/distribution/cllr387dev-win-x64.7z
now I get crunch time from 8:58 - 9:06 ,first percent is after 8 seconds on all cores, and everything looks ok!
Good luck!
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
IIRC, there was/were some version(s) that had issues.
I'm not sure if 3.8.7 is not among them.
This is why I asked "Which version of CLLR can I downgrade to?"
____________
My stats |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
There was two version of 3.8.7 one is with 26.5 and one is with 27.3
I now use one from 26.5 and it works ok...
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
|
Hi, just to clarify re: LLR versions.
Please, please, please do not go back to LLR 3.8.7 (with gwnum 26.5) - this version contains a bug which may cause some results to be incorrect, and to have to go back and double-check these is very time-consuming for the project.
The only version of LLR which is safe to use is LLR 3.8.8 (with gwnum 27.5). Gwnum 27.4 and above contains the fix. Any other version is potentially buggy.
Unfortunately 3.8.8 has this performance issue that is being discussed on this thread, but I hope you will agree that it is more important that our results are correct than we get them more quickly.
Thanks
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
Yes, correct results are top priority, not speed nor credit.
Iain, is 3.8.6 OK to use?
Or should I revert to even older version on affected host? Which version?
____________
My stats |
|
|
|
Hi Honza,
Unfortunately the 'LLR bug' has been in all versions of gwnum right up until it was found and fixed in 27.4. So all previous LLR versions are suspect. From my point-of-view the right thing is to keep with LLR 3.8.8 for now, and try to debug the cause of the slow tasks ASAP.
Can you PM or email me the details of some WUs which are running really slow (and ideally some which are also running OK on the same host?). If you can recreate it outside of BOINC even better. Then I'll get Jean and George involved. Am I correct in saying this is only affecting AVX hosts with HT on? Sorry, I've only had time to skim-read this thread so far!
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! |
|
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 13956 ID: 53948 Credit: 393,160,197 RAC: 187,115
                               
|
Honza, I finally got one 3.8.7 x64 bit ( non AVX) that works like a charm.
Try this ( win version)
http://www.asgoodasitgoetz.com/distribution/cllr387dev-win-x64.7z
And...
Please, please, please do not go back to LLR 3.8.7 (with gwnum 26.5) - this version contains a bug which may cause some results to be incorrect, and to have to go back and double-check these is very time-consuming for the project.
The only version of LLR which is safe to use is LLR 3.8.8 (with gwnum 27.5). Gwnum 27.4 and above contains the fix. Any other version is potentially buggy.
Unfortunately 3.8.8 has this performance issue that is being discussed on this thread, but I hope you will agree that it is more important that our results are correct than we get them more quickly.
Due to the buggy nature of the 3.8.7 build I have pulled that release from the website. Obviously, getting correct results is more important than getting potentially incorrect results faster.
____________
My lucky number is 75898524288+1 |
|
|
|
To folks that have seen these inconsistent runtimes - can you try running the LLR 3.8.8 32 bit app? Does this still exhibit the same problem?
Cheers
- Iain
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
Not in same way , but yes ( and it is much slower then 64 bit build)
1% will come after 28-38 seconds from start of WU.
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|
Honza Volunteer moderator Volunteer tester Project scientist Send message
Joined: 15 Aug 05 Posts: 1952 ID: 352 Credit: 6,016,767,981 RAC: 1,578,906
                                      
|
To folks that have seen these inconsistent runtimes - can you try running the LLR 3.8.8 32 bit app? Does this still exhibit the same problem?
Nope, it behaves as bad as x64 version of LLR 3.8.8.
____________
My stats |
|
|
|
Thanks - will look into this further and keep you all posted...
____________
Twitter: IainBethune
Proud member of team "Aggie The Pew". Go Aggie!
3073428256125*2^1290000-1 is Prime! |
|
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3208 ID: 50683 Credit: 135,132,479 RAC: 57,320
                         
|
Any news????
____________
92*10^1439761-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! |
|
|