PrimeGrid
Please visit donation page to help the project cover running costs for this month
1) Message boards : General discussion : Multitiple tasks on 1 GPU (Message 165514)
Posted 3 days ago by Profile Crun-chiProject donor
You will not gain any speed up...
2) Message boards : General discussion : For new users of PG -FFT- L3 cache ... (Message 165504)
Posted 4 days ago by Profile Crun-chiProject donor
Why lone prime hunter love lowest possible k in k*b^n+/-1

We already tell that FFT doesn't show how many digits some candidate have, but larger time will have slower iteration time.

I will give you some example (both examples are composite):

2*10^1000000+1 and 7867*10^1000000+1

First candidate has 1000001 digits but will use 192K FFT
Second candidate has 1000004 digits, so difference is only 3 digits, but it will use 320K FFT.

Both candidates have near same number of iterations. But if we look at benchmark times then situation become clear.

Timings for 192K all-complex FFT length (8 cores, 8 workers): 0.54, 0.54, 0.54, 0.54, 0.55, 0.54, 0.54, 0.54 ms. Throughput: 14809.10 iter/sec.

Timings for 320K all-complex FFT length (8 cores, 8 workers): 0.94, 0.95, 0.94, 0.94, 0.95, 0.94, 0.94, 0.94 ms. Throughput: 8472.28 iter/sec.

First candidate will be done in 0.54ms/it * 3321930=1793 seconds
Second candidate will be done in 0.95ms/iter * 3321930=3155 seconds.

So in difference only 3 digits... we have doubled time on this example...
3) Message boards : General discussion : Thought for the day (Message 165480)
Posted 5 days ago by Profile Crun-chiProject donor
Also I ate some dog food. Ruff.

It is ok to eat some cat and dog food. Just dont mix it, you will become vufmijauuu :)
4) Message boards : General discussion : For new users of PG -FFT- L3 cache ... (Message 165459)
Posted 5 days ago by Profile Crun-chiProject donor
This is guide to simply and quick get real timings for your results with your configurations. I will explain two possible scenarios, when candidate fit into L3 cache and when not fit (and how that will impact in real world)
So for this you need only two tools: PRST v10 and mprime/prime95 any newer version.

Let’s say you have AMD 3700X so it is 8/16 (core/threads) CPU. I always turn off HT/SMP but it is your decision. Make benchmark in Prime95 from 320K to 1024K using 1,2,4,8 cores. So, in first place benchmark will show speed od one worker
with eight cores, in second will show speed of two workers with 4 cores each....

First you must know next and very important thing: timings are final: so, you do no math with them... Many users thinks that you will need to multiply or divide benchmark results correspond to number of workers but that is not true. Second FFT of candidate is very important but doesn’t show how many digits candidate have. That is show by number of iterations.

So, you make benchmark and it will be written in mprime/Prime95 directory.

Example one
Let’s try real example 92*10^1635152-1
Prst will show this candidate will use 384K FFT size, has 1635154 digits and have 5431864 iterations.

If we now look at Prime benchmark, we will see this for 384 FFT

Timings for 384K all-complex FFT length (8 cores, 1 worker): 0.26 ms. Throughput: 3919.32 iter/sec.
Timings for 384K all-complex FFT length (8 cores, 2 workers): 0.33, 0.33 ms. Throughput: 5978.46 iter/sec.
Timings for 384K all-complex FFT length (8 cores, 4 workers): 0.64, 0.64, 0.64, 0.64 ms. Throughput: 6268.72 iter/sec.
Timings for 384K all-complex FFT length (8 cores, 8 workers): 1.24, 1.22, 1.22, 1.22, 1.21, 1.22, 1.21, 1.21 ms. Throughput: 6561.73 iter/sec.

Simple math say 384K*8 is 3072MB. So, this candidate will use 3MB of L3 cache. Since we know 3700x has 32 MB L3 cache and even with 8 workers with one core
will only use 8*3 or 24 MB L3 cache so everything is stay in chip. That is also confirmed with results of benchmark. Since this benchmark give us total throughput it is crystal clear that 8 workers, each worker has own core will produce the most. Candidate has 5431864 iterations, and benchmark show around 1.23 msec/itt , you processor will be done in 6681 seconds or 111 minutes.


Now let’s go higher.
Let’s take another real candidate 4*53^2309166+1

Again, PRST will show us this candidate will use 960K FFT size, have 3981640 digits and has 13226715 iterations

Timings for 960K all-complex FFT length (8 cores, 1 worker): 0.61 ms. Throughput: 1626.60 iter/sec.
Timings for 960K all-complex FFT length (8 cores, 2 workers): 0.86, 0.86 ms. Throughput: 2322.01 iter/sec.
Timings for 960K all-complex FFT length (8 cores, 4 workers): 1.68, 1.68, 1.68, 1.68 ms. Throughput: 2383.00 iter/sec.
Timings for 960K all-complex FFT length (8 cores, 8 workers): 9.13, 9.12, 9.07, 9.12, 8.86, 8.86, 8.87, 8.86 ms. Throughput: 890.65 iter/sec.

If we look this benchmark, you can see that 8 workers with one core is not fastest solution. Since it is huge FFT it cannot fit into L3 cache (if we setup 8 workers x 1 core per worker)
so computer use memory, and since communication between memory is always slower then communication with L3 cache inside CPU timings are very slow. In this case 960*8= 7.68MB L3 cache memory. Since 3700X has 32 MB l3 cache memory in theory we can put 4.1 results. Benchmark again confirm it since throughput for 4 workers, two cores each is fastest.

Real time vs benchmark times

Ok let’s try another example

On Intel Xeon we have this benchmark results

Prime95 64-bit version 30.16, RdtscTiming=1
Timings for 960K all-complex FFT length (18 cores, 2 workers): 0.84, 0.83 ms. Throughput: 2398.20 iter/sec.
Timings for 960K all-complex FFT length (18 cores, 3 workers): 1.20, 1.18, 1.19 ms. Throughput: 2517.56 iter/sec.

4*53^2311158+1 have 3985075 digits 960K is used, 13238131 iterations

So predicted time for process this candidate is 13238131* 1.19 ms= 15753375 ms or 15753 seconds. Real data show process time was 16320 seconds. So, difference is around 3.1% longer time.

In short
take any candidate find number of iterations and FFT
Multiply FFT with 8 and find how much of L3 cache will be used.
Take benchmark data, find fastest combination for chosen FFT and you can also calculate time CPU need to process result.
Of course this is many combinations of CPU , topology and other stuff, but for start it is nice tutorial
5) Message boards : Aggie The Pew message board (Message 165457)
Posted 6 days ago by Profile Crun-chiProject donor
After all testings on another Xeon , this time 18/36 2695v4 i found next thing.
This Xeon has near same performance as 3700X from AMD. It has more L3 cache and it can be configure in different way so it maybe little better for some projects on PG, but it also draw more power ( about 30-40W) more, and those cheap MB are lottery : what usb , sound or other part of chipset will or will not work.
When you buy AMD 7 3700X and nice MB you dont need to worry anything about. Yes in start it is way more cheaper then AMD kit, but in log run I will bet on AMD...
But for every school in life you must pay :)
6) Message boards : Aggie The Pew message board (Message 165380)
Posted 9 days ago by Profile Crun-chiProject donor
Yes it looks nice but it is not nice at all. If you dig little below surface you will find few things:
many motherboard doesn't have x99chipset: it has other chipset modded to work with Xeons since those chip sets are cheaper. Because they are not made to work with Xeons, many things on those MB doesn't work: like you have 6 USB ports and only two works. Then most important part: they make cheapest VRM and I run 24/7: so it will not last long.... Near all sensors doesnot work, so you cannot know temp of CPU or mosfets ... But it is little experiment so every learning you must pay :)



So for test I run 12 of 14 cores on 2840 V4 , HT off.

Last thing: since it works at only 2.8 Ghz, you need more cores then on current Intel or AMD CPU.
And since it is not latest technology it is not so effective, regardless fact it is designed and build for large farm servers. So it is equal for one year with current Ryzen. After that it will eat more electricity they you save on buy it cheap...

And if Intel core temp sensor is ok, since those CPU has huge surface it is easy to cool them down . Looking at temp of cores and they are around 52 °C. Of course frequency is also lower, so that affect also... But generally a lot colder CPU then new Ryzen...
7) Message boards : Aggie The Pew message board (Message 165218)
Posted 14 days ago by Profile Crun-chiProject donor
Yes , very cheap: so I just need to find MB also cheap to hold and run prime search on 100% CPU usage 24/7.
New era begins.... Now I look E5 2695v4 does SOB on all cores in around 20 hours...
8) Message boards : Aggie The Pew message board (Message 165144)
Posted 18 days ago by Profile Crun-chiProject donor
Since prime hunting is very addicting stuff and new CPU become expensive, I turn off new page in my computing and will try luck with few Xeons. I found one kit with MB+RAM+E5 2680V4 (I got some advice that I stick with 268x V4 and 269x V4 ).
Yes per core clock they are slower then new CPU , but has different L3 cache scheme, they are much cheaper, and use same PS, same cooler and same SSD as ordinary PC.
So my first kit will arrive in ten days , install Linux , and games can start :)

Hope Xeon will find huge prime if Ryzen cannot :)
9) Message boards : Number crunching : Nvidia 4xxx card GFN17 runtime (Message 164763)
Posted 44 days ago by Profile Crun-chiProject donor
Nvidia TI cards are always something between, one step more, and one step less then bigger one. But 60% more, and got only 15% performance is not good deal .
And the end I found on Linux and Windows and for GFN that you can lower TDP alot loosing oly fraction of performance. But I was thinking that difference between 3xxx and 4xxx power and TDP is way more then it is in reality...
Also I know now that 4xxx has huge L2 cache compared to 3xxx series and that could be biggest and more important difference. (but I dont know how much that affect GFN computing time)
10) Message boards : Aggie The Pew message board (Message 164745)
Posted 46 days ago by Profile Crun-chiProject donor
I am really sorry Crunchi. It sounded like it was a great relationship and I was very pleased for you a couple of years ago.

I hope you feel better soon and of course come to London.

Kind rgds

T


T

I've survived much more difficult times in my life, so I'll do this too. It's a little easier now, I have experience in these kinds of things, and I'm older, so it's a little easier, I'll forget sooner :)
Thanks for your sympathy

P.S She was in London last month :)


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2023 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 3.48, 2.85, 2.56
Generated 29 Sep 2023 | 13:08:39 UTC