## Other

drummers-lowrise

Message boards : Sieving : ppsieve/tpsieve CUDA testing

 Subscribe SortOldest firstNewest firstHighest rated posts first
Author Message
John
Honorary cruncher

Joined: 21 Feb 06
Posts: 2875
ID: 2449
Credit: 2,681,934
RAC: 0

Message 21651 - Posted: 9 Mar 2010 | 5:49:32 UTC

Exciting News!!!

Over in the PST forum, Ken and the crew have been working on a GPU application for sieving. The program is called ppsieve and is currently being used on the PPSE sieve. If all goes well, we may be able to merge the PPS and PPSE sieves bringing PPSE into BOINC.

Currently available for 32 & 64 bit Linux AND 32 bit Windows (will run on 64 bit). It should work on cards with any compute capability. You can download it here:

ppsieve-cuda.zip (source)

To test, please use the following command line:

./ppsieve-cuda-boinc-(version) -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60

It should output the following factors:

Range: 42070e9 to 42070030e6
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
42070010190569 | 5625*2^1903125+1
42070011430123 | 3821*2^1406279+1
42070012301263 | 1957*2^1185814+1
42070013521999 | 1965*2^404493+1
42070013970587 | 7143*2^1462422+1
42070013989247 | 5037*2^838603+1
42070017332953 | 6237*2^1916994+1
42070018235321 | 1941*2^363948+1
42070019542387 | 8587*2^1703626+1
42070023987581 | 9811*2^318944+1
42070024339237 | 9257*2^1170495+1
42070024532551 | 4311*2^1690093+1
42070024936837 | 5679*2^1726142+1
42070024995961 | 9111*2^1707153+1
42070026021997 | 4039*2^1819590+1
42070027452199 | 1323*2^854008+1
42070029006583 | 5943*2^663870+1
Found 27 factors

Thank you for testing!

p.s. If you wish to test the CUDA time vs. CPU time, you can download the CPU build here: ppsieve-bin.zip (source)

Just run the same test range and then compare the results.

Other sample test cases:

Range: 20070e9 to 20070010e6
20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Found 13 factors

Range: 249871e9 to 2498711e8
249871003789289 | 6295*2^266404+1
249871009510013 | 2771*2^1272671+1
249871010360639 | 1743*2^1337710+1
249871027030549 | 8865*2^1534637+1
249871030776329 | 7815*2^1679937+1
249871032591751 | 2335*2^23512+1
249871038523049 | 7527*2^204096+1
249871049497963 | 6497*2^505399+1
249871066947839 | 8497*2^1221770+1
249871068167599 | 7311*2^450531+1
249871089712009 | 9281*2^1650023+1
249871091913587 | 2139*2^1290902+1
249871099624639 | 8381*2^350375+1
Found 13 factors

Range: 42070e9 to 42070100e6
Found 68 factors
____________

tocx
Volunteer tester

Joined: 23 Nov 09
Posts: 15
ID: 50535
Credit: 203,523,000
RAC: 0

Message 21657 - Posted: 9 Mar 2010 | 10:25:50 UTC

System: Debian GNU Linux (Squeeze), Kernel 2.6.32 AMD64
Intel i5-750
GeForce 9500 GT (silent)
Nvidia Driver Ver.: 190.53
Cuda-Toolkit: 2.3 Ubuntu 9.04
Running AP26 on all 4 Cores, no other boinc-based GPU-apps running
GPU-Temperature changes from 38°C to 40°C during the sieve runs

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 24.00 sec. (0.01 init + 23.99 sieve) at 131122 p/sec.
Processor time: 3.42 sec. (0.02 init + 3.40 sieve) at 925156 p/sec.
Average processor utilization: 1.14 (init), 0.14 (sieve)
____________

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21658 - Posted: 9 Mar 2010 | 10:46:49 UTC

System: OpenSuSE 11.2 Kernel 2.6.31.12 amd64
CPU: Core 2 Quad Q9550 (E0 stepping)
GPU: GeForce GTX 260-192 - NVIDIA driver version: 190.53 (cuda 2.3 support)

Test run with the CPU idling and the GPU at stock clock:

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 5.81 sec. (0.02 init + 5.80 sieve) at 542504 p/sec.
Processor time: 1.35 sec. (0.02 init + 1.33 sieve) at 2363791 p/sec.
Average processor utilization: 1.10 (init), 0.23 (sieve)

Test run with the CPU idling and the GPU at 667 MHz (shaders linked):

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 4.98 sec. (0.02 init + 4.96 sieve) at 634269 p/sec.
Processor time: 1.22 sec. (0.02 init + 1.20 sieve) at 2617475 p/sec.
Average processor utilization: 0.95 (init), 0.24 (sieve)

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21660 - Posted: 9 Mar 2010 | 14:38:08 UTC

To give an impression of the current speeds here are the runtimes of the test range on a Q9550 @ 3.4 GHz (FSB 400 * 8.5) CPU use the linux-x86_64 version of the ppsieve-cpu application.

ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 490000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000

Sieve complete: 42070000000000 <= p < 42070003000000
Found 8 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 16.44 sec. (0.90 init + 15.54 sieve) at 193956 p/sec.
Processor time: 16.41 sec. (0.86 init + 15.55 sieve) at 193898 p/sec.
Average processor utilization: 0.95 (init), 1.00 (sieve)

ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 540000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000

Sieve complete: 42070000000000 <= p < 42070003000000
Found 8 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 4.94 sec. (0.91 init + 4.03 sieve) at 748919 p/sec.
Processor time: 16.11 sec. (0.91 init + 15.21 sieve) at 198232 p/sec.
Average processor utilization: 0.99 (init), 3.78 (sieve)

---

The "Elapsed time" on a GTX 260-192 @ 667 MHz is nearly the same as on the Q9550 @ 3.4 GHz. So the output of the GTX 260-192 at stock clock is roughly the same as that of 4 cores of my Q9550 at stock clock. This ratio also applies to the current AP26 apps (1.01 (cuda23) vs 1.04).

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 21661 - Posted: 9 Mar 2010 | 16:38:02 UTC - in response to Message 21660.

Sieve complete: 42070000000000 <= p < 42070003000000
Found 8 factors

Whoops! It looks like I left some flags in the ppconfig.txt file for the CPU version of PPSieve that was on my site, including one that forces sieving for Riesel numbers instead of Proth.

The timing information here is fine, but before doing anything in the PPSE sieve, anyone getting "-1"s in their results file should either download what I just uploaded, or edit ppconfig.txt to remove the "riesel" line.

OK, now that's over with, about the GPU testing...

3M is actually a small test range for the GPU code. I use it because it has good known factors, and becuase I don't have a compute-capable GPU and the emulator's really slow! But a 30M range or something would probably be better for speed comparison, if you have a minute or four:

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

Also, I haven't tried the 32-bit app at all! I'd be interested to know if it works!
____________

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21662 - Posted: 9 Mar 2010 | 16:48:39 UTC

64 bit GPU app - Test range 30M

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

My GTX 260-192 at stock clock with no load on the CPU:

ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 49.30 sec. (0.02 init + 49.29 sieve) at 611635 p/sec.
Processor time: 3.85 sec. (0.02 init + 3.83 sieve) at 7866200 p/sec.
Average processor utilization: 1.04 (init), 0.08 (sieve)

ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 42.05 sec. (0.02 init + 42.02 sieve) at 717374 p/sec.
Processor time: 3.72 sec. (0.02 init + 3.70 sieve) at 8142356 p/sec.
Average processor utilization: 0.82 (init), 0.09 (sieve)

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21663 - Posted: 9 Mar 2010 | 17:01:07 UTC

32 bit GPU app - Test range 30M

./ppsieve-cuda-x86-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

My GTX 260-192 at stock clock with no load on the CPU:

ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 49.59 sec. (0.02 init + 49.56 sieve) at 608232 p/sec.
Processor time: 4.10 sec. (0.02 init + 4.07 sieve) at 7399053 p/sec.
Average processor utilization: 1.11 (init), 0.08 (sieve)

ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 42.28 sec. (0.02 init + 42.26 sieve) at 713325 p/sec.
Processor time: 3.96 sec. (0.02 init + 3.94 sieve) at 7648693 p/sec.
Average processor utilization: 1.07 (init), 0.09 (sieve)

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21664 - Posted: 9 Mar 2010 | 17:13:00 UTC

64 bit CPU app - Test range 30M

1 thread - Same CPU and clock rate as stated above:

./ppsieve-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 500000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
p=42070023724033, 196.6K p/sec, 1.00 CPU cores, 79.1% done. ETA 09 Mar 18:11
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 153.56 sec. (0.87 init + 152.69 sieve) at 196576 p/sec.
Processor time: 153.24 sec. (0.87 init + 152.37 sieve) at 196995 p/sec.
Average processor utilization: 1.00 (init), 1.00 (sieve)

4 threads - Same CPU and clock rate as stated above:

ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 520000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 39.77 sec. (0.90 init + 38.88 sieve) at 772082 p/sec.
Processor time: 152.13 sec. (0.89 init + 151.24 sieve) at 198467 p/sec.
Average processor utilization: 0.99 (init), 3.89 (sieve)

Matthias Sobczyk

Joined: 2 Nov 07
Posts: 2
ID: 14318
Credit: 2,417,761
RAC: 0

Message 21668 - Posted: 9 Mar 2010 | 17:27:09 UTC

64 bit

System: Archlinux, Kernel 2.6.32-ARCH
Intel Xeon CPU X3360 @ 3.4GHz (C1 stepping)
GeForce GTX 285
Nvidia Driver Version: 190.53
Cuda-Toolkit: 2.3

Running TRP-Sieve on all 4 Cores, no other boinc-based GPU-apps running

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce GTX 285
Detected compute capability: 1.3
Detected 30 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 3.60 sec. (0.02 init + 3.58 sieve) at 878148 p/sec.
Processor time: 0.99 sec. (0.01 init + 0.97 sieve) at 3232123 p/sec.
Average processor utilization: 0.68 (init), 0.27 (sieve)
____________

HAmsty
Volunteer tester

Joined: 26 Dec 08
Posts: 132
ID: 33421
Credit: 12,510,712
RAC: 0

Message 21729 - Posted: 11 Mar 2010 | 18:37:30 UTC

does this version work for 1.0 cards?
____________

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21730 - Posted: 11 Mar 2010 | 19:00:34 UTC - in response to Message 21729.

does this version work for 1.0 cards?

I find no checks for a specific compute capability (1.0, 1.1 or 1.3) in the source code. You can download the zip file with the binaries in it (< 100 K) and simply start the 32 or the 64 bit binary with the commands John gave in his post. The test range is very short.

or, of course, we simply could read John's initial post. I overlooked it too:

Currently available for 32 & 64 bit Linux. It should work on cards with any compute capability. You can download it here:

____________

HAmsty
Volunteer tester

Joined: 26 Dec 08
Posts: 132
ID: 33421
Credit: 12,510,712
RAC: 0

Message 21735 - Posted: 11 Mar 2010 | 19:28:06 UTC

oh, sorry, i missed that to. :-(
____________

samuel7
Volunteer tester

Joined: 1 May 09
Posts: 89
ID: 39425
Credit: 257,425,010
RAC: 0

Message 21767 - Posted: 13 Mar 2010 | 16:19:47 UTC

Ubuntu 9.10, kernel 2.6.31-20-generic
GeForce 9800 GT, NVIDIA driver 190.42

This 30M range was run with the CPU cores busy on TRP sieve. Another run with the cores idle was just as fast (as expected).

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce 9800 GT
Detected compute capability: 1.1
Detected 14 multiprocessors.
p=42070029884417, 498.0K p/sec, 0.14 CPU cores, 99.6% done. ETA 13 Mar 17:49
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 60.52 sec. (0.01 init + 60.51 sieve) at 498248 p/sec.
Processor time: 9.42 sec. (0.01 init + 9.41 sieve) at 3203673 p/sec.
Average processor utilization: 0.71 (init), 0.16 (sieve)

____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 21768 - Posted: 13 Mar 2010 | 16:35:05 UTC - in response to Message 21767.

Has anyone given any thought to what's going to happen when this goes live via BOINC?

In particular, consider how I've got PrimeGrid set up, which is probably a fairly common arrangement for people with CUDA capable GPUs:

1) I've got a quad-core CPU and a GTX280 GPU.

2) I run AP26 on the GPU

3) I run other PrimeGrid stuff on the CPU

4) I do not want to run AP26 on the CPU because I can run it much, much faster on the GPU (about 5 min/WU).

5) There's no explicit BOINC mechanism to say "run X on the CPU and Y on the GPU", unless PrimeGrid makes two separate sub-projects, "AP26-CPU" and "AP26-GPU".

6) So, I have ONLY the CPU tasks selected on project preferences page to feed the right tasks to the CPU

7) ... and "Send work from any subproject..." to send AP26 to the GPU. This works because nothing else exists for the GPU, so it has to send AP26 tasks.

As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU, unless you want to also allow those to run on the CPU (which I think most people would prefer not to do.)

One possible solution is to make separate sub-projects for the GPU versions, but I realize that's far from ideal.
____________
My lucky number is 75898524288+1

Vato
Volunteer tester

Joined: 2 Feb 08
Posts: 863
ID: 18447
Credit: 882,676,451
RAC: 1,290,092

Message 21769 - Posted: 13 Mar 2010 | 16:40:42 UTC - in response to Message 21768.

I think it would be a mistake to build too much complexity into local bespoke code.

We need the BOINC client to do the right thing with GPU scheduling. We need the BOINC server to allow per subproject CPU/GPU preferences. We should try and feed those requirements into the mainstream BOINC development process and then make use of it when released.

All IMHO of course, and somewhat idealistic, since BOINC development seems to follow whatever direction the Berkeley folks are meandering in at a particular moment in time...

____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 21771 - Posted: 13 Mar 2010 | 17:13:12 UTC - in response to Message 21769.

I intentionally avoided the whole concept of "well, the BOINC software really *should* do this..." for the obvious reasons, primarily that we're going to have a problem in the near future whereas the BOINC client might eventually get around to solving a problem like this anywhere from tomorrow to never. They've got much bigger GPU scheduling problems to solve first before they could get around to this one.
____________
My lucky number is 75898524288+1

samuel7
Volunteer tester

Joined: 1 May 09
Posts: 89
ID: 39425
Credit: 257,425,010
RAC: 0

Message 21775 - Posted: 13 Mar 2010 | 18:56:23 UTC - in response to Message 21768.

As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU ...

The anonymous platform mechanism lets you control exactly what you want to run. I deployed it today on the Linux side of my quad and it even picked up existing tasks correctly.

Obviously, this isn't a solution for the masses, but it works for me.
____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 21778 - Posted: 13 Mar 2010 | 19:07:08 UTC - in response to Message 21775.

As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU ...

The anonymous platform mechanism lets you control exactly what you want to run. I deployed it today on the Linux side of my quad and it even picked up existing tasks correctly.

Obviously, this isn't a solution for the masses, but it works for me.

It's been a loooooong time since I've set up app_info by hand, and I don't really remember how to do it. Anyone know of a good reference for what exactly needs to be done?
____________
My lucky number is 75898524288+1

samuel7
Volunteer tester

Joined: 1 May 09
Posts: 89
ID: 39425
Credit: 257,425,010
RAC: 0

Message 21782 - Posted: 13 Mar 2010 | 20:25:36 UTC - in response to Message 21778.

As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU ...

The anonymous platform mechanism lets you control exactly what you want to run. I deployed it today on the Linux side of my quad and it even picked up existing tasks correctly.

Obviously, this isn't a solution for the masses, but it works for me.

It's been a loooooong time since I've set up app_info by hand, and I don't really remember how to do it. Anyone know of a good reference for what exactly needs to be done?

The BOINC wiki has this article. You can use the example in the format section as a template.

Open the client_state file in the data directory and find the Primegrid project data. Copy the app_version section of the first subproject you want to run and paste over the app_version in the template removing only the platform tag. You can edit the flops value if you know it's off for your current duration correction factor. Correct the <app> <name> and declare the files in the app_version also with <file_info> tags like in the example. Repeat for other subprojects you want to run. Save as app_info.xml in the PG project folder and restart BOINC. You should make sure all the files you declared actually exist in the PG folder.

It is probably a good idea to run down your cache and/or make a backup of the data directory (suspend network activity, too!) before deploying.

Below is a portion of my Win app_info for Primegrid.

<app_info>

<app>
<name>ap26</name>
</app>
<file_info>
<name>primegrid_ap26_1.01_windows_intelx86__cuda23.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart.dll</name>
<executable/>
</file_info>
<app_version>
<app_name>ap26</app_name>
<version_num>101</version_num>
<plan_class>cuda23</plan_class>
<avg_ncpus>0.050000</avg_ncpus>
<max_ncpus>0.050000</max_ncpus>
<flops>5604000000.000000</flops>
<coproc>
<type>CUDA</type>
<count>1.000000</count>
</coproc>
<file_ref>
<file_name>primegrid_ap26_1.01_windows_intelx86__cuda23.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart.dll</file_name>
</file_ref>
</app_version>

<app>
<name>psp_sr2sieve</name>
</app>
<file_info>
<name>primegrid_sr2sieve_wrapper_1.12_windows_x86_64.exe</name>
<executable/>
</file_info>
<file_info>
<name>primegrid_sr2sieve_1.8.10_windows_x86_64.exe.orig</name>
<executable/>
</file_info>
<app_version>
<app_name>psp_sr2sieve</app_name>
<version_num>112</version_num>
<flops>2876776723.690640</flops>
<file_ref>
<file_name>primegrid_sr2sieve_wrapper_1.12_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>primegrid_sr2sieve_1.8.10_windows_x86_64.exe.orig</file_name>
<open_name>primegrid_sr2sieve_1.8.10_windows_x86_64.exe.orig</open_name>
</file_ref>
</app_version>
</app_info>

____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 21784 - Posted: 13 Mar 2010 | 20:46:54 UTC - in response to Message 21782.

Thanks, that makes sense and helps a lot.

____________
My lucky number is 75898524288+1

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 21915 - Posted: 17 Mar 2010 | 16:26:17 UTC

Alright, I've posted a release candidate version to the locations given at the beginning of the thread. It may be slightly faster; it may also lock up your GPU until it's done. Let me know how it goes.

In any case, I've recently been looking at some other projects' GPU speeds, and I'm finding myself disappointed with my speeds. When Milkyway@Home is 17 times faster on high-end NVidia (PDF), and even a simple Collatz app (not the Collatz, but the only source code I could find) is more than twice as fast as a CPU on a mid-range card, but my code is only as fast as a CPU on a high-end card, I wonder if I'm doing something wrong. Would any of the experienced CUDA developers around here care to give my code the once-over, to see if I'm doing something obviously stupid like not giving the card enough threads?

I suppose the other side of the coin could be that my CUDA code isn't bad, but that my and Geoff's CPU code is extraordinarily good.

Edit: P.S. There's a BOINC capable (I think) executable in the zipfile as well. :)
____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 21919 - Posted: 17 Mar 2010 | 21:27:56 UTC - in response to Message 21915.

In any case, I've recently been looking at some other projects' GPU speeds, and I'm finding myself disappointed with my speeds. When Milkyway@Home is 17 times faster on high-end NVidia (PDF), and even a simple Collatz app (not the Collatz, but the only source code I could find) is more than twice as fast as a CPU on a mid-range card, but my code is only as fast as a CPU on a high-end card, I wonder if I'm doing something wrong.

One thing to consider is that some problems just don't lend themselves very well to parallel processing. Even with the best code in the world it still might not work very well on a GPU. Remember, the GPU isn't all that fast compared to a CPU. It's its ability to run several hundred calculations simultaneously that makes it fast. If the problem doesn't fit the hardware well, the GPU won't be able to crunch it very quickly.

____________
My lucky number is 75898524288+1

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 21921 - Posted: 17 Mar 2010 | 22:03:42 UTC - in response to Message 21919.

Yeah...the application here is computationally-bound, and doesn't require much memory. Probably the slowest part are the 64-bit multiplies. When Fermi comes out, I expect that each stream processor will run my app (once recompiled) twice as fast.

Another part of it could be that others are comparing GPU speed to CPU speed on one core. In that case my app is 4 times as fast as the CPU version. :)
____________

mfl0p

Joined: 5 Apr 09
Posts: 251
ID: 38042
Credit: 2,757,874,746
RAC: 17,206

Message 21922 - Posted: 17 Mar 2010 | 22:38:08 UTC

Nice work so far Ken.. I only see one issue, this appears to be a compute-mode only CUDA application, meaning it will not run on the primary adapter under Windows in current form (driver watchdog timer). Correct?
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 21923 - Posted: 17 Mar 2010 | 22:46:33 UTC - in response to Message 21922.

I have no idea what you just said! (I'm new at this CUDA stuff.)

If you mean it's not using the driver API, that's correct. I was hoping to avoid it.

Edit: Did you see cuda_sleep_memcpy.cu?
____________

mfl0p

Joined: 5 Apr 09
Posts: 251
ID: 38042
Credit: 2,757,874,746
RAC: 17,206

Message 21924 - Posted: 17 Mar 2010 | 22:51:45 UTC - in response to Message 21923.

I have no idea what you just said! (I'm new at this CUDA stuff.)

If you mean it's not using the driver API, that's correct. I was hoping to avoid it.

In Windows, if a CUDA kernel runs longer than 5 seconds the program will be terminated by the driver. Briefly looking at your posted source, it appears you're running one huge kernel.

RE: app speeds, currently in AP26 a 1.3 CUDA card is about 5.5 times as fast as one core of an Intel Q6600 CPU. So your app isn't exactly slow, it's just doing things the GPU isn't good at.
____________

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 21925 - Posted: 17 Mar 2010 | 22:55:03 UTC

Have you investigated whether some of the later compute capabilities add features that increase speed? It is nice to see an application that works on all CUDA cards, but given that only a handful of models are compute capable version 1.0 (G80 chips), added features such as atomic functions in compute capable 1.1 cards might help with speed depending on the processes computed in the application.

____________
141941*2^4299438-1 is prime!

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 21926 - Posted: 17 Mar 2010 | 23:54:09 UTC - in response to Message 21924.

In Windows, if a CUDA kernel runs longer than 5 seconds the program will be terminated by the driver. Briefly looking at your posted source, it appears you're running one huge kernel.

Not exactly. I load up the GPU with either 384 or 768 P's per multiprocessor, run just those, further check any that found a factor on the CPU, then repeat. There's no specific time checking, but I estimate the kernel won't run more than 1 or 2 seconds at a time.

Scott: I looked into it. I'm not using much global memory, or any shared memory, so atomic functions don't matter. I'm not sure; double-precision might have enough precision to be useful in one case, but it would be tricky. Otherwise there's nothing until compute capability 2.0, which as I mentioned makes multiplication faster.

____________

mfl0p

Joined: 5 Apr 09
Posts: 251
ID: 38042
Credit: 2,757,874,746
RAC: 17,206

Message 21928 - Posted: 18 Mar 2010 | 1:22:54 UTC - in response to Message 21926.

Ok, i'll have to pay more attention to the code when reading. That should work fine in Windows, too.
____________

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21933 - Posted: 18 Mar 2010 | 8:38:09 UTC - in response to Message 21662.

64 bit GPU app - Test range 30M

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

My GTX 260-192 at stock clock with no load on the CPU:

ppsieve version cuda-0.1.1-rc1 (testing)
Compiled Mar 17 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 49.12 sec. (0.02 init + 49.10 sieve) at 613957 p/sec.
Processor time: 3.81 sec. (0.02 init + 3.79 sieve) at 7951250 p/sec.
Average processor utilization: 1.04 (init), 0.08 (sieve)

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1-rc1 (testing)
Compiled Mar 17 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 41.87 sec. (0.02 init + 41.86 sieve) at 720229 p/sec.
Processor time: 3.69 sec. (0.02 init + 3.67 sieve) at 8215573 p/sec.
Average processor utilization: 1.04 (init), 0.09 (sieve)

____________

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 21934 - Posted: 18 Mar 2010 | 8:40:34 UTC - in response to Message 21933.

64 bit GPU app - Test range 30M - speed comparison

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

My GTX 260-192 at stock clock with no load on the CPU:

ppsieve version cuda-0.1.1-beta (testing) : Elapsed time: 49.30 sec. (0.02 init + 49.29 sieve) at 611635 p/sec.
ppsieve version cuda-0.1.1-rc1 (testing) : Elapsed time: 49.12 sec. (0.02 init + 49.10 sieve) at 613957 p/sec.

ppsieve version cuda-0.1.1-beta (testing) : Elapsed time: 42.05 sec. (0.02 init + 42.02 sieve) at 717374 p/sec.
ppsieve version cuda-0.1.1-rc1 (testing) : Elapsed time: 41.87 sec. (0.02 init + 41.86 sieve) at 720229 p/sec.
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 22207 - Posted: 30 Mar 2010 | 5:18:56 UTC

Thanks for all your testing help, guys! I've got one more thing to test, and I hope you won't mind because I expect it to be slower. (But I've been wrong before!)

Linux 64-bit users *only*, with pre-Fermi cards (Fermi isn't in stores yet if you didn't know), please try the two binaries in this zipfile. This is an experiment in 24-bit multiplies instead of 64-bit ones. Both binaries do 24-bit multiplies, despite their names, but they do other stuff differently. Even if it doesn't work here, this is a plausible algorithm for ATI if I can ever figure out how to develop for OpenCL without their GPU.

If anyone reading this *does* have a Fermi (GTX4xx), I'd love to see a benchmark from the original code (linked in the first post by John). If Fermi doesn't run 50-100% faster per shader, I may have to recompile or something for maximum speed.
____________

Benva
Volunteer tester

Joined: 5 May 08
Posts: 73
ID: 22332
Credit: 2,715,050
RAC: 0

Message 22230 - Posted: 30 Mar 2010 | 20:52:30 UTC

SYSTEM Ubuntu 9.10
Intel Core2Duo T9550 @ 2.66GHZ
G105M
195.36.15 drivers

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1-rc1 (testing)
Compiled Mar 17 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce G 105M
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 80.06 sec. (0.05 init + 80.02 sieve) at 39313 p/sec.
Processor time: 12.80 sec. (0.03 init + 12.77 sieve) at 246337 p/sec.
Average processor utilization: 0.67 (init), 0.16 (sieve)

pps-cuda-a1
./ppsieve-cuda-64bit-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce G 105M
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 166.18 sec. (0.04 init + 166.14 sieve) at 18934 p/sec.
Processor time: 13.11 sec. (0.04 init + 13.07 sieve) at 240683 p/sec.
Average processor utilization: 1.01 (init), 0.08 (sieve)

./ppsieve-cuda-24bit-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce G 105M
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 163.43 sec. (0.04 init + 163.39 sieve) at 19252 p/sec.
Processor time: 12.32 sec. (0.04 init + 12.28 sieve) at 256167 p/sec.
Average processor utilization: 1.01 (init), 0.08 (sieve)
____________

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 22256 - Posted: 1 Apr 2010 | 3:50:10 UTC - in response to Message 22207.

64 bit GPU app - Test range 30M

./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

My GTX 260-192 at stock clock with no load on the CPU:

ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
p=42070020971521, 349.5K p/sec, 0.06 CPU cores, 69.9% done. ETA 01 Apr 05:42
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 87.11 sec. (0.02 init + 87.09 sieve) at 346140 p/sec.
Processor time: 4.61 sec. (0.02 init + 4.59 sieve) at 6566015 p/sec.
Average processor utilization: 1.11 (init), 0.05 (sieve)

./ppsieve-cuda-24bit-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

My GTX 260-192 at stock clock with no load on the CPU:

ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
p=42070019922945, 332.0K p/sec, 0.06 CPU cores, 66.4% done. ETA 01 Apr 05:47
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 94.13 sec. (0.02 init + 94.12 sieve) at 320306 p/sec.
Processor time: 4.69 sec. (0.02 init + 4.67 sieve) at 6449443 p/sec.
Average processor utilization: 1.10 (init), 0.05 (sieve)

____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 22257 - Posted: 1 Apr 2010 | 4:23:38 UTC

Yeah, apparently my test version is slower as expected. There's probably no need to test it any more. But thanks for the testing you did!
____________

mfl0p

Joined: 5 Apr 09
Posts: 251
ID: 38042
Credit: 2,757,874,746
RAC: 17,206

Message 22274 - Posted: 2 Apr 2010 | 1:48:02 UTC

Ken, whenever you get the code finalized, I can build a Win32 version if you need.
____________

Volunteer developer

Joined: 11 Sep 08
Posts: 600
ID: 28785
Credit: 331,699,243
RAC: 0

Message 22304 - Posted: 3 Apr 2010 | 13:44:52 UTC

Okay, took another range: 174550 to 174551

Intel Xeon W3520 => 43 factors found; 855.903 k p/s
Nvidia GTX 260 => 43 factors found; 894.724 k p/s
Nvidia FX 580 => manually aborted due to long runtime; ~115 k p/s

Like in AP26 with mfl0ps newest app the GTX260 is a bit faster than the Xeon W3520.

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 22394 - Posted: 8 Apr 2010 | 1:34:35 UTC

OK, new version uploaded, probably the finalized version, at the links in the first post. I found a somewhat major bug in previous versions: around 30 of the highest N's in the ranges were being skipped! But that's fixed now.

Bryan, see if you can get this code to build in VS, perhaps without BOINC first. If you need to make changes, perhaps I should set up a GitHub account?
____________

mfl0p

Joined: 5 Apr 09
Posts: 251
ID: 38042
Credit: 2,757,874,746
RAC: 17,206

Message 22397 - Posted: 8 Apr 2010 | 10:07:36 UTC - in response to Message 22394.

Ok, will try building Win32 version soon. Thanks Ken
____________

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 22408 - Posted: 9 Apr 2010 | 5:54:51 UTC - in response to Message 22397.

Ken, if you're ready for me to do a Mac port, I'd be very happy to start on that if I can get the source code.

Cheers

- Iain

HAmsty
Volunteer tester

Joined: 26 Dec 08
Posts: 132
ID: 33421
Credit: 12,510,712
RAC: 0

Message 22409 - Posted: 9 Apr 2010 | 6:38:32 UTC

@Iain the source code is linked in the first post of this thread
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 22528 - Posted: 14 Apr 2010 | 20:21:10 UTC

Well, I thought I was done, but I've made a few more changes for the release version of PPSieve-CUDA. The biggest change is compiling with CUDA 3.0. I hope it works for everyone!

The biggest code change is that I gave up using boinc_init_parallel() in favor of boinc_init(), because it's more compatible. The rest of the code changes are to header files and paths to BOINC header files. So nothing major there.

By the way, apparently CUDA 3.0 introduces an easier way to lower CPU usage. It might go from 5% down to 1 or 2%. But I'm going to leave lowering CPU usage for V0.1.2, if it's needed.
____________

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 22530 - Posted: 14 Apr 2010 | 20:41:00 UTC - in response to Message 22528.

Thanks Ken - I just got the 0.1.1-rc2 version ported to Mac OS X (only minor tweaks required as __thread attribute is not supported by GCC on Mac OS X), I'll pick up the new code and rebuild ASAP and post here when it's done (hopefully in a couple of days)...

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 22533 - Posted: 14 Apr 2010 | 23:08:24 UTC - in response to Message 22528.

Compiling with CUDA 3.0 will probably mean that many will be forced to upgrade drivers to at least the 195.xx series. Just FYI...the 196.xx and 197.xx dirvers have been noted to slow down many cards computational speeds compared to the 190.xx and 191.xx drivers (especially 8xxx and 9xxx series cards under Win7 and Vista), so the gain in freer CPU may actually be lost (and maybe exceeded) by the loss in speed on some cards.

____________
141941*2^4299438-1 is prime!

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 22534 - Posted: 14 Apr 2010 | 23:19:08 UTC - in response to Message 22533.

OK, it doesn't have to be compiled with 3.0 (yet). I wasn't sure if 2.3 would support Fermi. Since it looks like it does (PDF), I'll see about going back to 2.3.

Edit: To be clear, only the binaries will change, not the source code.
____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 22535 - Posted: 14 Apr 2010 | 23:22:11 UTC - in response to Message 22533.

Compiling with CUDA 3.0 will probably mean that many will be forced to upgrade drivers to at least the 195.xx series. Just FYI...the 196.xx and 197.xx dirvers have been noted to slow down many cards computational speeds compared to the 190.xx and 191.xx drivers (especially 8xxx and 9xxx series cards under Win7 and Vista), so the gain in freer CPU may actually be lost (and maybe exceeded) by the loss in speed on some cards.

The slow downs some people have reported with some versions of the drivers have been significant. Around 25%, IIRC.

____________
My lucky number is 75898524288+1

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 22536 - Posted: 14 Apr 2010 | 23:50:11 UTC - in response to Message 22534.

OK, it doesn't have to be compiled with 3.0 (yet). I wasn't sure if 2.3 would support Fermi. Since it looks like it does (PDF), I'll see about going back to 2.3.

Edit: To be clear, only the binaries will change, not the source code.

That's good...am i reading it correctly that CUDA 2.3 devices will use the native CUBIN that can work with the older drivers, but Fermi devices will need to have the 195.xx driver or higher to utilize the PTX code?

____________
141941*2^4299438-1 is prime!

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 22537 - Posted: 14 Apr 2010 | 23:51:37 UTC - in response to Message 22534.

...And we're back to binaries compiled with CUDA 2.3. :)

Edit: Scott, I think that's right. I doubt there are any drivers older than that for Fermi.
____________

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 22565 - Posted: 16 Apr 2010 | 10:26:27 UTC - in response to Message 22537.

The Mac OS X / CUDA version of ppsieve is now available for testing:

Note that only 32 bit CUDA executables are supported on the Mac, but as most runtime is spent on the GPU, this is not a problem. Since upgrading to Mac OS 10.6.3, Apple now only support CUDA 3.0, so this app is build and linked with the CUDA 3.0 libraries. However, it should work fine with machines where CUDA 2.3 is installed. If you have a Mac running OS 10.5 and/or CUDA 2.3 I'd be very grateful for your testing.

To test the app, please use the same inputs as in the original post, and obviously the output should be the same!

On my machine (MacBookPro, 2.66 GHz Core 2 Duo, GeForce 9400M / 9600M GT) with the CPU idling, the 9400M takes

Elapsed time: 67.96 sec. (0.02 init + 67.94 sieve) at 46303 p/sec.
Processor time: 6.93 sec. (0.03 init + 6.90 sieve) at 455624 p/sec.
Average processor utilization: 1.54 (init), 0.10 (sieve)

and the 9600M GT takes:

Elapsed time: 41.52 sec. (0.02 init + 41.50 sieve) at 75805 p/sec.
Processor time: 3.90 sec. (0.03 init + 3.87 sieve) at 812352 p/sec.
Average processor utilization: 1.35 (init), 0.09 (sieve)

Thanks

- Iain

Kevin
Volunteer tester

Joined: 4 Aug 09
Posts: 61
ID: 44488
Credit: 5,675,896
RAC: 0

Message 22569 - Posted: 16 Apr 2010 | 14:00:23 UTC

Does anyone have plans for a Proth Prime Sieve ATI GPU app in the near future?
____________
May the Force be with you always.

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 22574 - Posted: 16 Apr 2010 | 16:10:46 UTC - in response to Message 22569.

An ATI app should be possible. This is the kind of highly-parallel, low-memory work that should work very well on ATI.

However, I haven't even been able to get their OpenCL compiler to run. Right now I'm focusing on getting the CPU and CUDA apps into BOINC, so ATI is off my radar for now.
____________

KPX

Joined: 8 Jan 07
Posts: 20
ID: 4756
Credit: 126,838,812
RAC: 0

Message 22864 - Posted: 1 May 2010 | 22:13:06 UTC - in response to Message 22397.

Ok, will try building Win32 version soon. Thanks Ken

Any progress on the Windows version? Please? :-)

Gary Craig
Volunteer tester

Joined: 30 Dec 09
Posts: 3213
ID: 52890
Credit: 1,005,618,748
RAC: 0

Message 22880 - Posted: 2 May 2010 | 20:24:53 UTC

I ran the Mac version. Here's how the BOINC client sees my computer (24" iMac, OS X 10.6.3):

Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8335 @ 2.93GHz [x86 Family 6 Model 23 Stepping 10]
Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM SSE3 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1
OS: Darwin: 10.3.0
Memory: 4.00 GB physical, 559.57 GB virtual
Disk: 595.85 GB total, 559.32 GB free
Local time is UTC -7 hours
NVIDIA GPU 0: GeForce GT 120 (driver version unknown, CUDA version 3000, compute capability 1.1, 256MB, 80 GFLOPS peak)

I first suspended all BOINC tasks. Here's the output:

% /usr/bin/time ./ppsieve-cuda-boinc-i686-apple-darwin -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
Compiled Apr 16 2010 with GCC 4.2.1 (Apple Inc. build 5646) (dot 1)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Found 12 factors
28.37 real 3.61 user 0.13 sys

Screen repainting was pretty herky-jerky when the test was running, not that that's unexpected... it would have been pretty annoying if I was trying to do anything else.

-- Gary

Iain Bethune
Honorary cruncher

Joined: 28 Jan 09
Posts: 1588
ID: 34775
Credit: 194,056,043
RAC: 0

Message 22902 - Posted: 4 May 2010 | 7:24:19 UTC - in response to Message 22880.

That's great - thanks for testing Gary!

Volunteer developer

Joined: 11 Sep 08
Posts: 600
ID: 28785
Credit: 331,699,243
RAC: 0

Message 23775 - Posted: 14 May 2010 | 19:12:36 UTC

I think we have a problem here...:

Here come my results:

ppsieve-0.3.4 on Intel Core 2 Quad Q9550

ppsieve-cuda 0.1.1-rc1 (testing) on GeForce GTX260

Running ppsieve-cuda-x86_64-linux. ppsieve version cuda-0.1.1-rc1 (testing) Compiled Mar 17 2010 with GCC 4.3.3 Scanning ABCD file... Found K's from 1201 to 9999. Found N's from 0 to 2000000. nstart=80, nstep=32, gpu_nstep=35 Reading ABCD file. Read 324490054 terms from ABCD format input file `ppse_137TE0.txt' ppsieve initialized: 1201 <= k <= 9999, 80 <= n <= 2000000 Sieve started: 174550000000000 <= p < 174551000000000 Thread 0 starting Detected GPU 0: GeForce GTX 260 Detected compute capability: 1.3 Detected 27 multiprocessors. p=174550966262785, 895.5K p/sec, 0.07 CPU cores, 96.6% done. ETA 14 May 21:05 Thread 0 completed Waiting for threads to exit Sieve complete: 174550000000000 <= p < 174551000000000 count=30492087,sum=0x87435f1f71650555 Elapsed time: 1203.92 sec. (81.91 init + 1122.01 sieve) at 891327 p/sec. Processor time: 161.52 sec. (80.80 init + 80.73 sieve) at 12388301 p/sec. Average processor utilization: 0.99 (init), 0.07 (sieve) Found 43 factors Run completed successfully!

[roadrunner@rr022 ppsieve-cuda]# diff -u fppse_174550G-174551G.txt ../ppsieve/fppse_174550G-174551G.txt --- fppse_174550G-174551G.txt 2010-05-14 21:05:53.000000000 +0200 +++ ../ppsieve/fppse_174550G-174551G.txt 2010-05-14 21:11:44.000000000 +0200 @@ -1,43 +1,16 @@ 174550025415817 | 7911*2^73648+1 174550045592773 | 2793*2^586237+1 -174550069177949 | 8745*2^1984556+1 -174550072026563 | 5457*2^226986+1 -174550072429729 | 9075*2^1747880+1 174550087108373 | 3009*2^653483+1 -174550160034671 | 1329*2^1681186+1 174550160534359 | 3255*2^959816+1 174550164384991 | 8355*2^47924+1 174550169778407 | 8553*2^689552+1 174550180112447 | 2569*2^714210+1 -174550180935937 | 1933*2^370384+1 -174550234719989 | 9149*2^1030559+1 -174550274164087 | 6729*2^1373601+1 -174550276818167 | 6207*2^1373038+1 -174550316241167 | 7731*2^1931925+1 -174550374684949 | 8743*2^638110+1 -174550399908163 | 1383*2^1894880+1 174550460586391 | 6543*2^1032642+1 -174550469318573 | 9217*2^1762344+1 -174550494079007 | 4001*2^157237+1 -174550503180689 | 5391*2^1644311+1 -174550579748341 | 3225*2^291262+1 -174550596690163 | 6799*2^1459850+1 -174550612854811 | 2377*2^1165082+1 174550639882459 | 5079*2^1786863+1 -174550644475447 | 4901*2^820583+1 174550668538153 | 7905*2^1360676+1 -174550683576527 | 1715*2^1236227+1 174550695157613 | 2919*2^951421+1 -174550731814651 | 2127*2^1357850+1 174550734921757 | 8485*2^1891676+1 -174550743947477 | 4499*2^832027+1 -174550772799551 | 6351*2^874484+1 174550799235079 | 7803*2^1302508+1 174550886243347 | 8783*2^1311925+1 -174550889752607 | 3299*2^1161717+1 -174550900074251 | 2993*2^1487561+1 -174550900755097 | 3675*2^1656171+1 -174550902785663 | 6123*2^493737+1 174550916859199 | 3197*2^1643463+1 -174550954763071 | 4191*2^1993452+1 174550971619799 | 5577*2^1063059+1

That is not good i think...

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 23779 - Posted: 14 May 2010 | 21:39:00 UTC - in response to Message 23775.

Hm, not good is right. The CUDA app performed fine - all those factors are valid. But the CPU app didn't! :Q

I'm in another race, but in about 3 hours I'll have enough free memory to look into this.
____________

Volunteer developer

Joined: 11 Sep 08
Posts: 600
ID: 28785
Credit: 331,699,243
RAC: 0

Message 23781 - Posted: 14 May 2010 | 22:34:55 UTC

Okay. Meanwhile i am cross checking some other ranges i have on file and keep the results posted.

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 23782 - Posted: 15 May 2010 | 0:34:10 UTC - in response to Message 23781.

Well, I can't reproduce your CPU results. I get exactly the same results as you got with CUDA. But there could be several reasons for that.

For one thing, the version I downloaded from the PPSE Sieve Reservations page is 0.3.3. Not that 0.3.4 should produce bad results like that, but I can't compare directly.

PPSieve runs my CPU hot, at least as hot as fast LLR tests. Is it possible your machine isn't entirely stable?

Otherwise, PM me and I'll see about getting your version of files to test with.
____________

Volunteer developer

Joined: 11 Sep 08
Posts: 600
ID: 28785
Credit: 331,699,243
RAC: 0

Message 23783 - Posted: 15 May 2010 | 8:45:32 UTC

Okay, i took 0.3.3 and all is fine.

Intel C2Q Q9550 vs Nvidia GeForce GTX260

diff -u fppse_174550G-174551G.txt ../ppsieve/fppse_174550G-174551G.txt

Doing 198000 to 198001 now on four plattforms:

Intel C2Q Q9550
Intel Xeon W3520
Nvidia GeForce GTX260

I think the host could be ruled out since it does not produced faulty results in one year and boinc-WUs do validate without problems.

Volunteer developer

Joined: 11 Sep 08
Posts: 600
ID: 28785
Credit: 331,699,243
RAC: 0

Message 23784 - Posted: 15 May 2010 | 11:17:38 UTC

All okay, the only thing that was different is:

# head -38 ../ppsieve/fppse_198000G-198500G.txt | diff -u fppse_198000G-198001G.gtx260.txt - --- fppse_198000G-198001G.gtx260.txt 2010-05-15 08:00:15.000000000 +0200 +++ - 2010-05-15 13:15:06.343799000 +0200 @@ -13,9 +13,9 @@ 198000446092087 | 8441*2^657907+1 198000480975821 | 5379*2^1828509+1 198000523921751 | 6541*2^909876+1 -198000544962289 | 8067*2^925640+1 198000545674577 | 5467*2^1099466+1 198000546654689 | 5593*2^925632+1 +198000544962289 | 8067*2^925640+1 198000583273579 | 2783*2^1822821+1 198000609451933 | 2667*2^1881395+1 198000664307197 | 1435*2^1989456+1

But this is okay, the numbers are only in another order.
All done with 0.3.3 for cpu and 0.1.1-rc1 for cuda.
My copy of 0.3.4 for cpu must be somewhat defective.

Volunteer developer

Joined: 11 Sep 08
Posts: 600
ID: 28785
Credit: 331,699,243
RAC: 0

Message 23785 - Posted: 15 May 2010 | 13:18:32 UTC

What is that while computing the 198900G to 199000G range while using cuda-0.1.1-1rc?

Computation Error: no candidates found for p=198908075406077

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 23790 - Posted: 15 May 2010 | 17:39:28 UTC - in response to Message 23785.

Two possibilities for computation errors. Most likely this one is because you're using 0.1.1-rc1. I believe I fixed a bug between rc2 and the final release that could rarely cause this error. It could definitely cause factors to be missed near NMax.

A computation error means that the GPU says it found some factor (it doesn't return what factor), but the CPU failed to find a factor in that range. So it could also be caused by an unstable GPU or rarely an unstable CPU.
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 23936 - Posted: 22 May 2010 | 0:35:28 UTC

FYI, the source for PPSieve CUDA is now on GitHub!

http://github.com/Ken-g6/PSieve-CUDA

So now you can see the various directions I've considered. The redc branch is the current one, but I have an idea for the other branch that might pull it ahead, if I can find a large enough, fast-enough area of memory; maybe texture memory.

But first, since I've heard nothing from mfl0p, I think I'd better try to set up a WinXP VM and build a version for Windows.
____________

Microcruncher*
Volunteer tester

Joined: 28 Jun 09
Posts: 391
ID: 42625
Credit: 45,226,534
RAC: 0

Message 23938 - Posted: 22 May 2010 | 7:55:43 UTC - in response to Message 23936.

Thank you very much for setting up the repository. This makes it easier to follow the developments. I think it is time for me to reinstall the NVIDIA drivers and their CUDA toolkit under Lucid Lynx and get my GTX 260-192 out of hibernation mode again (in the last few weeks I've crunched with a HD 4770 under Windows and Linux).

By the way: The repo contains a file named pps/ppse_37TE1.txt that is a link to a file in a /downloads/... directory that is not in the repo. Is this file too large to include in the repository or are there other reasons not to include the file?
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 23942 - Posted: 22 May 2010 | 18:55:21 UTC

Heh, didn't know that file was in there. It's a 1.2 GB file, so there's no way to include it. Plus it's not going to be used with BOINC, so its only purpose here would be for testing with many_n_test.sh and maybe some of the other testing scripts. It's not useful for the testing we're doing in this thread.

Edit: By the way, the code hasn't changed in about a month. I just made the code and its previous changes easier to access.
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24060 - Posted: 1 Jun 2010 | 21:40:52 UTC

Alright, people, I need HELP with MSVC++.

I tried compiling the source with VC++ 2008 Express. The files compile fine, but when linking, it's like no file sees any other file's header. I included the header files - even some new ones to replace missing Linux versions, so I'm not sure what's going on.

If you know anything about MSVC++ (since I don't), could you please take a look at my source code?

Thanks!

P.S. What all has to be included in the source code to save the proper build instructions? Does the .sln file need to be there? I really want to avoid including the gigantic .ncb file.
____________

Tanya Puckett

Joined: 11 Aug 09
Posts: 19
ID: 44850
Credit: 5,549,656
RAC: 0

Message 24063 - Posted: 2 Jun 2010 | 15:25:39 UTC

Ken, I took a look at your source code. I am pretty new to C++, but I think I may have spotted your problem.

#include <assert.h>
To include the header, I believe it needs to be inclosed in parenthesis.
#include "assert.h"

I tryied to compile the code after changing it. I got it to compile further before failing, but I think the reason I couldn't compile is because I am running the 2010 visual C++.

I also noticed that a couple of the headers you are trying to load don't appear to exist, "util.h" and "gfn_app.h" There is a "putil.h" so maybe the name was just mistyped.

Hope this helps,

____________

Jay
Volunteer tester

Joined: 28 Apr 10
Posts: 82
ID: 59636
Credit: 10,419,429
RAC: 0

Message 24065 - Posted: 2 Jun 2010 | 15:47:30 UTC

Tanya, angle brackets (<>) are used for including non-user-written libraries that are (or should be) in your compiler path. You use the quotes ("") when what you're including is in the same directory or in the directory of another file that includes it. If it's still not found, I think it then checks the compiler path.
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24066 - Posted: 2 Jun 2010 | 15:58:56 UTC - in response to Message 24063.

I also noticed that a couple of the headers you are trying to load don't appear to exist, "util.h" and "gfn_app.h" There is a "putil.h" so maybe the name was just mistyped.
No, the name was not mistyped. Look at the #ifdef's. If USE_BOINC is #define'd, it will use util.h; but I'm not trying to compile the BOINC version yet.

By the way, I wasn't sure if project-level preprocessor directives got included in the file I zipped up. You should make sure NDEBUG is #defined in the project, or you may get more errors than I did.

I couldn't find any reference to gfn_app.h or gfn_main.h. Where did you see that?

But I still don't think that will fix the 131 errors with 74 unresolved externals.
____________

Tanya Puckett

Joined: 11 Aug 09
Posts: 19
ID: 44850
Credit: 5,549,656
RAC: 0

Message 24067 - Posted: 2 Jun 2010 | 16:18:44 UTC - in response to Message 24065.

Jay, what you are saying sounds mostly right, so I'm not sure if you're saying I got something wrong in my earlier message. I do know that for including headers, at least with the 2010 version of Visual C++, that I need to use quotes or the header won't work, and I have used a non user-written library before: #include <iostream>. I didn't think that the non user-written library had to be included in the compilers path, although I don't know where it would be.

Perhaps I have misuderstood something, as I have done very little with C++.
____________

Tanya Puckett

Joined: 11 Aug 09
Posts: 19
ID: 44850
Credit: 5,549,656
RAC: 0

Message 24068 - Posted: 2 Jun 2010 | 16:36:16 UTC

I couldn't find any reference to gfn_app.h or gfn_main.h. Where did you see that?

Instead of bringing up the full project in visual C++, I went looking through folder at the individual files. One file was named gfn_main.c which I opened and looked at the code. That was where I saw the line
#include <gfn_app.h>
That is also where the line "#include <gfn_main.c>" is.

By the way, I wasn't sure if project-level preprocessor directives got included in the file I zipped up. You should make sure NDEBUG is #defined in the project, or you may get more errors than I did.

I'm afraid you've lost me here.
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24073 - Posted: 2 Jun 2010 | 19:53:47 UTC

Getting better. It seems most of my problem was self-inflicted. I read something about how to get rid of

LINK : warning LNK4098: defaultlib "LIBCMT" conflicts with use of other libs; use /NODEFAULTLIB:library

Another large part was solved by linking the CUDA libraries. I'm down to two unresolved references, which are probably just because those functions aren't in MSVC.

Thanks! I'll let you know if I hit any more roadblocks.
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24095 - Posted: 3 Jun 2010 | 22:15:40 UTC

Alright, I think I have a working binary! So if you have Win32 or Win64 and want to test it, please download my source and binary zipfile, and run the usual test on the binary in the Release folder.

Next steps include making it work for BOINC and fixing a checkpointing bug in *all other versions*. Don't let me forget to do that!
____________

Tanya Puckett

Joined: 11 Aug 09
Posts: 19
ID: 44850
Credit: 5,549,656
RAC: 0

Message 24100 - Posted: 3 Jun 2010 | 23:55:39 UTC

I downloaded the zipfile and tryed to run the exe in the release folder. I got a message that the program can't start because cudart.dll is missing from my computer. I think I may have found a place to get the cudart.dll. Do I need to get it and put it in the directory with the exe, or is there something else I need to do?

____________

shoelace

Joined: 29 Oct 07
Posts: 40
ID: 14166
Credit: 2,324,276
RAC: 0

Message 24102 - Posted: 4 Jun 2010 | 1:02:28 UTC

just for fun i ran your windows cuda ppsieve on my win32 xp machine.. with NO cude card.
(yes i know)

afetr grabbing a cudart.dll (from the distributed.net beta client)
it run like this..

ppsieve version cuda-0.1.1 (testing) nstart=76, nstep=32, gpu_nstep=32 ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000 Sieve started: 42070000000000 <= p < 42070003000000 Thread 0 starting Detected GPU 0: Device Emulation (CPU) Detected compute capability: 9999.9999 Detected 16 multiprocessors. Insufficient available memory on GPU 0. Waiting for threads to exit Sieve incomplete: 42070000000000 <= p < 42070000000001 Found 0 factors count=0,sum=0x0000000000000000 Elapsed time: 0.03 sec. (0.03 init + 0.00 sieve) at -1 p/sec. Processor time: 0.05 sec. (0.05 init + 0.00 sieve) at -1 p/sec. Average processor utilization: 1.50 (init), -1.#J (sieve)

so.. it didnt fail. whcih is good. :)

as a comparision.. the dnetc client goes like
distributed.net client for CUDA 2.2 on Win32 Copyright 1997-2009, distributed.net Please visit http://www.distributed.net/ for up-to-date contest information. Start the client with '-help' for a list of valid command line options. dnetc v2.9107-516-CTR-09122712 for CUDA 2.2 on Win32 (WindowsNT 5.1). Please provide the *entire* version descriptor when submitting bug reports. The distributed.net bug report pages are at http://bugs.distributed.net/ [Jun 04 01:04:36 UTC] Unable to locate CUDA module handle [Jun 04 01:04:36 UTC] No CUDA-supported GPU found.

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24106 - Posted: 4 Jun 2010 | 1:42:11 UTC - in response to Message 24100.

I downloaded the zipfile and tryed to run the exe in the release folder. I got a message that the program can't start because cudart.dll is missing from my computer. I think I may have found a place to get the cudart.dll. Do I need to get it and put it in the directory with the exe, or is there something else I need to do?

You should be able to copy it from the BOINC directory.

____________
141941*2^4299438-1 is prime!

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24107 - Posted: 4 Jun 2010 | 1:50:30 UTC

i7-920 (stock clocks)
6GB RAM
Vista Home Premium 64-bit [Version 6.0.6002]
BOINC suspended for tests
9500GT (512mb, factory OC card, 191.07 driver)

C:\Users\Scott\Downloads\ppsieve-cuda-vc\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
p=42070007340033, 1.490K p/sec, 0.01 CPU cores, 24.5% done. ETA 04 Jun 00:09

all factors match.

____________
141941*2^4299438-1 is prime!

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24109 - Posted: 4 Jun 2010 | 2:01:30 UTC - in response to Message 24107.

p=42070007340033, 1.490K p/sec, 0.01 CPU cores, 24.5% done. ETA 04 Jun 00:09

all factors match.

I did a spit take! But then I realized that's the wrong line. What did the line that starts with "Elapsed time" say?
____________

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24110 - Posted: 4 Jun 2010 | 2:24:22 UTC - in response to Message 24109.

p=42070007340033, 1.490K p/sec, 0.01 CPU cores, 24.5% done. ETA 04 Jun 00:09

all factors match.

I did a spit take! But then I realized that's the wrong line. What did the line that starts with "Elapsed time" say?

Sorry, I stopped it running at about 25% (I am switching the card out this evening for an ATI 4670 that I just picked up). A wall clock estimate for the total run time based on the 25% complete would be in the neighborhood of about 3-3.5 hours. Also, and interestingly, I had very little delayed screen response.

I am at home, but tomorrow I can test it on 32-bit systems with various CUDA cards (9600 GSO, 9600GS, 8600 GT, 8400 GS, 8300 GS). Might try my laptop's 8400M GS tonight...is there a memory minimum limit?

EDIT:
Okay, before pulling the 9500GT, I have gone back and run the shorter 3M test with the following output/results:

ppsieve-cuda.exe -p42070e9 -P42 070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Ignoring invalid checkpoint in ppcheck42070e9.txt
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
p=42070001310721, 21.85K p/sec, 0.11 CPU cores, 43.7% done. ETA 03 Jun 22:38
42070001040127 | 6471*2^37907+1
p=42070001572865, 4.369K p/sec, 0.04 CPU cores, 52.4% done. ETA 03 Jun 22:40
p=42070002097153, 8.738K p/sec, 0.03 CPU cores, 69.9% done. ETA 03 Jun 22:40
p=42070002359297, 4.369K p/sec, 0.03 CPU cores, 78.6% done. ETA 03 Jun 22:41
p=42070002621441, 4.369K p/sec, 0.02 CPU cores, 87.4% done. ETA 03 Jun 22:42
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
p=42070003145729, 3.757K p/sec, 0.02 CPU cores, 104.9% done. ETA 03 Jun 22:43
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 496.42 sec. (0.03 init + 496.38 sieve) at 6337 p/sec.
Processor time: 18.74 sec. (0.05 init + 18.69 sieve) at 168320 p/sec.
Average processor utilization: 1.38 (init), 0.04 (sieve)
____________
141941*2^4299438-1 is prime!

Redstar3894

Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 154,883,580
RAC: 17,493

Message 24111 - Posted: 4 Jun 2010 | 2:48:08 UTC - in response to Message 24095.

i7-920 @ 2.8 GHz
6GB RAM
Win7-64
GTX 260 Core 216 (Factory OC)
BOINC suspended for all tests

D:\Patrick\ppsieve-cuda-vc\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q ppsieve version cuda-0.1.1 (testing) nstart=76, nstep=32, gpu_nstep=32 ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000 Sieve started: 42070000000000 <= p < 42070030000000 Thread 0 starting Detected GPU 0: GeForce GTX 260 Detected compute capability: 1.3 Detected 27 multiprocessors. p=42070030146561, 17.48K p/sec, 0.03 CPU cores, 100.5% done. ETA 03 Jun 21:40 Thread 0 completed Waiting for threads to exit Sieve complete: 42070000000000 <= p < 42070030000000 Found 97 factors count=955289,sum=0x2dbc17167afb6a8d Elapsed time: 842.70 sec. (0.03 init + 842.66 sieve) at 35775 p/sec. Processor time: 37.74 sec. (0.03 init + 37.71 sieve) at 799528 p/sec. Average processor utilization: 0.97 (init), 0.04 (sieve)

I swiped the cudart.dll from Collatz...

I am at home, but tomorrow I can test it on 32-bit systems with various CUDA cards (9600 GSO, 9600GS, 8600 GT, 8400 GS, 8300 GS). Might try my laptop's 8400M GS tonight...is there a memory minimum limit?

I am also curious as to what this may be...it depends on what version of the CUDA SDK this was compiled with...newer versions will run considerably faster on newer cards as well as include increased capabilities (double precision, anyone?)

I'll try getting the cudart.dll from a project like GPUGrid or Milkyway, which both use at least CUDA 2.2 (due to double precision support) and see what, if any, difference that makes...

EDIT: Whoa, put my foot in my mouth a bit there...Collatz would use 2.2...my bad... :p
And also, when I switched the cudart.dll with the one from GPUGrid, it made NO difference whatsoever...
____________

Redstar3894

Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 154,883,580
RAC: 17,493

Message 24112 - Posted: 4 Jun 2010 | 2:51:52 UTC - in response to Message 24110.

Here's my result from the shorter test Scott posted above:

D:\Patrick\ppsieve-cuda-vc\Release>ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal ppsieve version cuda-0.1.1 (testing) nstart=76, nstep=32, gpu_nstep=32 ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000 Sieve started: 42070000000000 <= p < 42070003000000 Thread 0 starting Detected GPU 0: GeForce GTX 260 Detected compute capability: 1.3 Detected 27 multiprocessors. 42070000070587 | 9475*2^197534+1 42070000198537 | 3373*2^1046686+1 42070000300049 | 9139*2^461846+1 42070000345343 | 1715*2^635711+1 42070000464001 | 4179*2^1577462+1 42070000949861 | 4707*2^571847+1 42070001011573 | 7113*2^215532+1 42070001040127 | 6471*2^37907+1 42070002482267 | 9951*2^1920408+1 42070002690167 | 2553*2^1888870+1 42070002698543 | 4239*2^368773+1 42070002875941 | 4081*2^1494668+1 Thread 0 completed Waiting for threads to exit Sieve complete: 42070000000000 <= p < 42070003000000 Found 12 factors count=95668,sum=0x37dacb7121ccffe4 Elapsed time: 11.19 sec. (0.05 init + 11.14 sieve) at 282411 p/sec. Processor time: 4.18 sec. (0.06 init + 4.12 sieve) at 763818 p/sec. Average processor utilization: 1.30 (init), 0.37 (sieve)

____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24113 - Posted: 4 Jun 2010 | 3:05:49 UTC - in response to Message 24112.

OK, I had thought it was running over 1MP/s; it was just 1KP/s. I think something may be wrong with my sleep timing. I'll look into it and get back to you.
____________

Redstar3894

Joined: 23 Mar 07
Posts: 32
ID: 6678
Credit: 154,883,580
RAC: 17,493

Message 24114 - Posted: 4 Jun 2010 | 3:11:08 UTC - in response to Message 24113.

OK, I had thought it was running over 1MP/s; it was just 1KP/s. I think something may be wrong with my sleep timing. I'll look into it and get back to you.

I did notice (through watching GPU usage on EVGA Precision) that the GPU usage never stayed constant...it would spike for a second or two to around 75% and then fall to zero for about 10-20 seconds....

Hope that helps!

And BTW, thanks Ken for building a Windows version! It seems like it has a some more ground to cover to catch up with the linux builds, but great job nonetheless!
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24115 - Posted: 4 Jun 2010 | 4:52:38 UTC

Well whaddaya know - I found a second bug, also in code I had thought was stable. That makes two bugs that - while they didn't affect results - could severely impact usability.

Alright, give the newly-updated zipfile a try and I'll see what's developed tomorrow. Thanks for testing!
____________

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24116 - Posted: 4 Jun 2010 | 5:23:31 UTC

Updated code on 9500GT (short test - 3M):

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
p=42070001310721, 21.85K p/sec, 0.11 CPU cores, 43.7% done. ETA 04 Jun 01:14
42070001040127 | 6471*2^37907+1
p=42070001572865, 4.369K p/sec, 0.04 CPU cores, 52.4% done. ETA 04 Jun 01:16
p=42070002097153, 8.738K p/sec, 0.03 CPU cores, 69.9% done. ETA 04 Jun 01:16
p=42070002359297, 4.369K p/sec, 0.03 CPU cores, 78.6% done. ETA 04 Jun 01:17
p=42070002621441, 4.369K p/sec, 0.02 CPU cores, 87.4% done. ETA 04 Jun 01:17
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
p=42070003145729, 3.763K p/sec, 0.02 CPU cores, 104.9% done. ETA 04 Jun 01:19
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 496.11 sec. (0.04 init + 496.07 sieve) at 6341 p/sec.
Processor time: 18.77 sec. (0.05 init + 18.72 sieve) at 168040 p/sec.
Average processor utilization: 1.06 (init), 0.04 (sieve)

____________
141941*2^4299438-1 is prime!

valterc
Volunteer tester

Joined: 30 May 07
Posts: 121
ID: 8810
Credit: 20,947,020,974
RAC: 5,581,214

Message 24118 - Posted: 4 Jun 2010 | 11:03:00 UTC - in response to Message 24116.

my own test follows Q9450@3400 W7U (cudart v2.3)

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce GTX 275
Detected compute capability: 1.3
Detected 30 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 9.75 sec. (0.02 init + 9.73 sieve) at 323155 p/sec.
Processor time: 3.56 sec. (0.03 init + 3.53 sieve) at 892248 p/sec.
Average processor utilization: 2.00 (init), 0.36 (sieve)

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24120 - Posted: 4 Jun 2010 | 12:14:22 UTC

Pentium D 965 Extreme Edition (HT turned on)
ASUS 9600 GSO (factory OC "TOP" version, 384mb)
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce 9600 GSO
Detected compute capability: 1.1
Detected 12 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 6.56 sec. (0.05 init + 6.52 sieve) at 482798 p/sec.
Processor time: 1.39 sec. (0.06 init + 1.33 sieve) at 2368548 p/sec.
Average processor utilization: 1.33 (init), 0.20 (sieve)

____________
141941*2^4299438-1 is prime!

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24121 - Posted: 4 Jun 2010 | 12:18:51 UTC

Same 9600 GSO on 30M test:

ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce 9600 GSO
Detected compute capability: 1.1
Detected 12 multiprocessors.
p=42070029360129, 489.3K p/sec, 0.19 CPU cores, 97.9% done. ETA 04 Jun 08:16
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 62.28 sec. (0.05 init + 62.23 sieve) at 484404 p/sec.
Processor time: 12.30 sec. (0.25 init + 12.05 sieve) at 2502438 p/sec.
Average processor utilization: 5.33 (init), 0.19 (sieve)

____________
141941*2^4299438-1 is prime!

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 24122 - Posted: 4 Jun 2010 | 12:22:18 UTC - in response to Message 24118.

I ran the test on my Vista-32/Q6600/GTX280.

I won't bother posting the output from the program, because that's not the interesting part.

My GPU temperature *barely* nudged from its idle temperature. That's a really, really, bad sign and indicates that the GPU isn't being utilized efficiently.

Taking a look at the GPU utilization graph on GPU-Z, it showed that the vast majority of the time the utilization was 0%. About every 15 seconds or so, the utilization briefly spiked way up, then returned back to 0. Even stranger was that it wasn't using the CPU during the time the GPU was idle. CPU utilization was at about 10% to 20% of a single core according to task manager. (The output from the program said it was using 0.03 CPU cores, which was significantly lower than what task manager was showing.)

So, for most of the run time, it's not using the GPU or the CPU. I would guess that it's either waiting on a resource or sleeping.
____________
My lucky number is 75898524288+1

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24123 - Posted: 4 Jun 2010 | 12:29:05 UTC

Pentium D 830 (using --device # option to test on both GPU)
9600 GS0 (ASUS factory OC "TOP" version, 384mb)
9600 GS (768 mb)
Microsoft Windows XP Home (32-bit) [Version 5.1.2600]

DEVICE 0:

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce 9600 GSO
Detected compute capability: 1.1
Detected 12 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 6.72 sec. (0.08 init + 6.64 sieve) at 473710 p/sec.
Processor time: 1.66 sec. (0.11 init + 1.55 sieve) at 2033602 p/sec.
Average processor utilization: 1.40 (init), 0.23 (sieve)

DEVICE 1:

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal --device 1

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 1: GeForce 9600 GS
Detected compute capability: 1.1
Detected 6 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 15.97 sec. (0.05 init + 15.92 sieve) at 197573 p/sec.
Processor time: 2.78 sec. (0.09 init + 2.69 sieve) at 1170503 p/sec.
Average processor utilization: 2.00 (init), 0.17 (sieve)

____________
141941*2^4299438-1 is prime!

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24124 - Posted: 4 Jun 2010 | 12:38:31 UTC - in response to Message 24122.

I ran the test on my Vista-32/Q6600/GTX280.

I won't bother posting the output from the program, because that's not the interesting part.

My GPU temperature *barely* nudged from its idle temperature. That's a really, really, bad sign and indicates that the GPU isn't being utilized efficiently.

Taking a look at the GPU utilization graph on GPU-Z, it showed that the vast majority of the time the utilization was 0%. About every 15 seconds or so, the utilization briefly spiked way up, then returned back to 0. Even stranger was that it wasn't using the CPU during the time the GPU was idle. CPU utilization was at about 10% to 20% of a single core according to task manager. (The output from the program said it was using 0.03 CPU cores, which was significantly lower than what task manager was showing.)

So, for most of the run time, it's not using the GPU or the CPU. I would guess that it's either waiting on a resource or sleeping.

Hmmm...this might show something about Vista specifically. On my 9600GSO under 32-bit XP Pro, GPU-Z shows the GPU utilization at 99% for the whole test. My unusually long 9500GT results (which on the OC'ed 32-shader card should be similar to the stock clocked 9600 GS 48-shader card) are also obtained on Vista (albeit 64-bit). Looks like something in the code is not activating the GPU properly under Vista (and I'd suspect under Win 7 also).

____________
141941*2^4299438-1 is prime!

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24125 - Posted: 4 Jun 2010 | 12:44:17 UTC

9600 GS on 30M test:

ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q --device 1

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 1: GeForce 9600 GS
Detected compute capability: 1.1
Detected 6 multiprocessors.
p=42070024641537, 204.4K p/sec, 0.15 CPU cores, 82.1% done. ETA 04 Jun 08:42
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 147.89 sec. (0.06 init + 147.83 sieve) at 203930 p/sec.
Processor time: 22.25 sec. (0.09 init + 22.16 sieve) at 1360635 p/sec.
Average processor utilization: 1.50 (init), 0.15 (sieve)

____________
141941*2^4299438-1 is prime!

Tanya Puckett

Joined: 11 Aug 09
Posts: 19
ID: 44850
Credit: 5,549,656
RAC: 0

Message 24128 - Posted: 4 Jun 2010 | 14:04:11 UTC

The application was unable to start correctly (0x000007b).

Any idea what's causing this?
____________

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24129 - Posted: 4 Jun 2010 | 14:20:40 UTC - in response to Message 24128.

The application was unable to start correctly (0x000007b).

Any idea what's causing this?

A stop error with that code is usually associated with a problematic boot device (usually a hard drive)...kinda weird to see it with this CUDA application. You aren't by chance trying to run it off of a USB stick?

____________
141941*2^4299438-1 is prime!

Tanya Puckett

Joined: 11 Aug 09
Posts: 19
ID: 44850
Credit: 5,549,656
RAC: 0

Message 24130 - Posted: 4 Jun 2010 | 14:31:13 UTC

No USB stick. I've actually tryed running it on several different computers, just in case it was a problem with the NVidia card. All three computers were running windows 7.

____________

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24133 - Posted: 4 Jun 2010 | 15:34:44 UTC

Pentium 4 (HT) 3.6Ghz
8600 GT (256mb)
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]

3M Test:

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce 8600 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 18.92 sec. (0.05 init + 18.88 sieve) at 166660 p/sec.
Processor time: 3.31 sec. (0.08 init + 3.23 sieve) at 972592 p/sec.
Average processor utilization: 1.67 (init), 0.17 (sieve)

30M Test:

ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce 8600 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
p=42070020185089, 165.3K p/sec, 0.17 CPU cores, 67.3% done. ETA 04 Jun 11:30
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 180.02 sec. (0.05 init + 179.97 sieve) at 167509 p/sec.
Processor time: 30.97 sec. (0.28 init + 30.69 sieve) at 982373 p/sec.
Average processor utilization: 6.00 (init), 0.17 (sieve)

____________
141941*2^4299438-1 is prime!

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24135 - Posted: 4 Jun 2010 | 16:51:44 UTC

Pentium 4 (HT) 3.8 Ghz
8400 GS (256mb)
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]

3M Test:

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce 8400 GS
Detected compute capability: 1.1
Detected 2 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
p=42070002883585, 48.06K p/sec, 0.10 CPU cores, 96.1% done. ETA 04 Jun 11:43
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 63.83 sec. (0.05 init + 63.78 sieve) at 49319 p/sec.
Processor time: 6.34 sec. (0.20 init + 6.14 sieve) at 512281 p/sec.
Average processor utilization: 4.33 (init), 0.10 (sieve)

30M Test:

ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce 8400 GS
Detected compute capability: 1.1
Detected 2 multiprocessors.
p=42070028835841, 47.17K p/sec, 0.09 CPU cores, 96.1% done. ETA 04 Jun 11:54
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 632.41 sec. (0.05 init + 632.36 sieve) at 47673 p/sec.
Processor time: 59.66 sec. (0.23 init + 59.42 sieve) at 507331 p/sec.
Average processor utilization: 5.00 (init), 0.09 (sieve)

____________
141941*2^4299438-1 is prime!

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24136 - Posted: 4 Jun 2010 | 17:01:15 UTC

Pentium 4 (HT) 3.6Ghz
8300 GS (128mb) ...This is about as slow as CUDA devices get!
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]

3M Test:

ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Detected GPU 0: GeForce 8300 GS
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
p=42070001572865, 26.21K p/sec, 0.10 CPU cores, 52.4% done. ETA 04 Jun 11:33
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1

Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 119.53 sec. (0.06 init + 119.47 sieve) at 26331 p/sec.
Processor time: 11.86 sec. (0.08 init + 11.78 sieve) at 267011 p/sec.
Average processor utilization: 1.25 (init), 0.10 (sieve)

30M Test:

ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce 8300 GS
Detected compute capability: 1.1
Detected 1 multiprocessors.
p=42070029622273, 26.21K p/sec, 0.10 CPU cores, 98.7% done. ETA 04 Jun 11:56
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 1186.17 sec. (0.06 init + 1186.11 sieve) at 25416 p/sec.
Processor time: 119.03 sec. (0.30 init + 118.73 sieve) at 253899 p/sec.
Average processor utilization: 4.75 (init), 0.10 (sieve)

____________
141941*2^4299438-1 is prime!

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24137 - Posted: 4 Jun 2010 | 17:08:38 UTC

Alright, I'm getting the impression that sleep isn't fixed - at least not in all cases.

So, I'd like those of you who had problems with it in particular, and anyone else, to test the version I just uploaded, and please report the sleep diagnostics it outputs. By the way, "OVERslept by WAY TOO LONG!" is the line that indicates trouble; but I can only learn the magnitude of the trouble from the other lines.

The other option is to use one CPU core 100%, but I'd like to avoid that if I can.
____________

HAmsty
Volunteer tester

Joined: 26 Dec 08
Posts: 132
ID: 33421
Credit: 12,510,712
RAC: 0

Message 24138 - Posted: 4 Jun 2010 | 17:32:31 UTC

I've have this oversleping, too.

just some lines, there are really a lot more of these:

Will sleep 590625 usec next time. Sleeping 590625 usec. Actually sleeping 590625 usec. OVERslept by 46875 usec. Will sleep 543750 usec next time. Sleeping 543750 usec. Actually sleeping 543750 usec. OVERslept by 46875 usec. Will sleep 496875 usec next time. Sleeping 496875 usec. Actually sleeping 496875 usec. OVERslept by 46875 usec. Will sleep 450000 usec next time. Sleeping 450000 usec. Actually sleeping 450000 usec. OVERslept by 46875 usec. Will sleep 403125 usec next time. Sleeping 403125 usec. Actually sleeping 403125 usec. OVERslept by 15625 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping 371875 usec. Underslept by 0 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping -65625 usec. Underslept by 15625 usec. Will sleep 403125 usec next time. Sleeping 403125 usec. Actually sleeping -34375 usec. OVERslept by 15625 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping 387500 usec. Underslept by 0 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping -65625 usec. Underslept by 15625 usec. Will sleep 403125 usec next time. Sleeping 403125 usec. Actually sleeping 403125 usec. Underslept by 390625 usec. Will sleep 793750 usec next time. Sleeping 793750 usec. Actually sleeping 793750 usec. OVERslept by 46875 usec. Will sleep 746875 usec next time. Sleeping 746875 usec. Actually sleeping 746875 usec. OVERslept by 46875 usec. Will sleep 700000 usec next time. Sleeping 700000 usec. Actually sleeping 700000 usec. OVERslept by 46875 usec. Will sleep 653125 usec next time. Sleeping 653125 usec. Actually sleeping 653125 usec. OVERslept by 46875 usec. Will sleep 606250 usec next time. Sleeping 606250 usec. Actually sleeping 590625 usec. OVERslept by WAY TOO LONG! Will sleep 543750 usec next time. Sleeping 543750 usec. Actually sleeping 528125 usec. OVERslept by 46875 usec. Will sleep 496875 usec next time. Sleeping 496875 usec. Actually sleeping 496875 usec. OVERslept by WAY TOO LONG! Will sleep 434375 usec next time. Sleeping 434375 usec. Actually sleeping 418750 usec. OVERslept by 31250 usec. Will sleep 403125 usec next time. Sleeping 403125 usec. Actually sleeping -34375 usec. OVERslept by 15625 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping -50000 usec. Underslept by 0 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping 371875 usec. Underslept by 0 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping -50000 usec. Underslept by 0 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping -65625 usec. Underslept by 15625 usec. Will sleep 403125 usec next time. Sleeping 403125 usec. Actually sleeping 403125 usec. OVERslept by 15625 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping -50000 usec. Underslept by 0 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping 387500 usec. Underslept by 0 usec. Will sleep 387500 usec next time. Sleeping 387500 usec. Actually sleeping -65625 usec. Underslept by 15625 usec. Will sleep 403125 usec next time.

Nvidia 8800 GTS 320MB G80

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 104.80 sec. (0.03 init + 104.77 sieve) at 287752 p/sec.
Processor time: 4.61 sec. (0.05 init + 4.56 sieve) at 6607465 p/sec.
Average processor utilization: 1.50 (init), 0.04 (sieve)

____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24139 - Posted: 4 Jun 2010 | 18:08:28 UTC - in response to Message 24138.

Sleeping 403125 usec.
Actually sleeping 403125 usec.
Underslept by 390625 usec.
Will sleep 793750 usec next time.

There's the "money shot". Combined with other parts, this tells me that the timing is far too random for the current method to work.

I'll look at other options. I didn't see any timing less than 300,000 usec, so I might try finding the minimum.
____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 24140 - Posted: 4 Jun 2010 | 18:19:54 UTC - in response to Message 24138.

Ok, here's the results:

Vista-32/Q6600/GTX28
BOINC shut down

C:\Temp\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 280
Detected compute capability: 1.3
Detected 30 multiprocessors.
Sleeping 0 usec.
Underslept by 826000 usec.
Will sleep 826000 usec next time.
Sleeping 826000 usec.
Actually sleeping 785000 usec.
Underslept by 825000 usec.
Will sleep 1651000 usec next time.
Sleeping 1651000 usec.
Actually sleeping 1628000 usec.
Underslept by 825000 usec.
Will sleep 2476000 usec next time.
Sleeping 2476000 usec.
Actually sleeping 2464000 usec.
Underslept by 825000 usec.
Will sleep 3301000 usec next time.
Sleeping 3301000 usec.
Actually sleeping 3261000 usec.
Underslept by 825000 usec.
Will sleep 4126000 usec next time.
Sleeping 4126000 usec.
Actually sleeping 4085000 usec.
Underslept by 825000 usec.
Will sleep 4951000 usec next time.
Sleeping 4951000 usec.
Actually sleeping 4933000 usec.
Underslept by 825000 usec.
Will sleep 5776000 usec next time.
Sleeping 5776000 usec.
Actually sleeping 5753000 usec.
Underslept by 825000 usec.
Will sleep 6601000 usec next time.
Sleeping 6601000 usec.
Actually sleeping 6578000 usec.
Underslept by 825000 usec.
Will sleep 7426000 usec next time.
Sleeping 7426000 usec.
Actually sleeping 7408000 usec.
Underslept by 825000 usec.
Will sleep 8251000 usec next time.
Sleeping 8251000 usec.
Actually sleeping 8208000 usec.
Underslept by 825000 usec.
Will sleep 9076000 usec next time.
Sleeping 9076000 usec.
Actually sleeping 9047000 usec.
Underslept by 825000 usec./sec, 0.17 CPU cores, 31.5% done. ETA 04 Jun 13:44
Will sleep 9901000 usec next time.
Sleeping 9901000 usec.
Actually sleeping 9872000 usec.
Underslept by 825000 usec.
Will sleep 10726000 usec next time.
Sleeping 10726000 usec.
Actually sleeping 10685000 usec.
Underslept by 826000 usec.
Will sleep 11552000 usec next time.
Sleeping 11552000 usec.
Actually sleeping 11544000 usec.
Underslept by 825000 usec.
Will sleep 12377000 usec next time.
Sleeping 12377000 usec.
Actually sleeping 12339000 usec.
Underslept by 828000 usec.
Will sleep 13205000 usec next time.
Sleeping 13205000 usec.
Actually sleeping 13146000 usec.
Underslept by 826000 usec./sec, 0.08 CPU cores, 43.7% done. ETA 04 Jun 13:45
Will sleep 14031000 usec next time.
Sleeping 14031000 usec.
Actually sleeping 14001000 usec.
Underslept by 825000 usec.
Will sleep 14856000 usec next time.
Sleeping 14856000 usec.
Actually sleeping 14831000 usec.
Underslept by 827000 usec.
Will sleep 15683000 usec next time.
Sleeping 15683000 usec.
Actually sleeping 15670000 usec.
Underslept by 825000 usec.
Will sleep 16508000 usec next time.
Sleeping 16508000 usec.
Actually sleeping 16472000 usec.
Underslept by 827000 usec./sec, 0.06 CPU cores, 53.3% done. ETA 04 Jun 13:46
Will sleep 17335000 usec next time.
Sleeping 17335000 usec.
Actually sleeping 17310000 usec.
Underslept by 826000 usec.
Will sleep 18161000 usec next time.
Sleeping 18161000 usec.
Actually sleeping 18128000 usec.
Underslept by 825000 usec.
Will sleep 18986000 usec next time.
Sleeping 18986000 usec.
Actually sleeping 18950000 usec.
Underslept by 825000 usec./sec, 0.04 CPU cores, 61.2% done. ETA 04 Jun 13:47
Will sleep 19811000 usec next time.
Sleeping 19811000 usec.
Actually sleeping 19758000 usec.
Underslept by 825000 usec.
Will sleep 20636000 usec next time.
Sleeping 20636000 usec.
Actually sleeping 20606000 usec.
Underslept by 827000 usec.
Will sleep 21463000 usec next time.
Sleeping 21463000 usec.
Actually sleeping 21432000 usec.
Underslept by 825000 usec./sec, 0.05 CPU cores, 68.2% done. ETA 04 Jun 13:48
Will sleep 22288000 usec next time.
Sleeping 22288000 usec.
Actually sleeping 22276000 usec.
Underslept by 826000 usec.
Will sleep 23114000 usec next time.
Sleeping 23114000 usec.
Actually sleeping 23096000 usec.
Underslept by 826000 usec.
Will sleep 23940000 usec next time.
Sleeping 23940000 usec.
Actually sleeping 23938000 usec.
Underslept by 825000 usec.
Will sleep 24765000 usec next time.
Sleeping 24765000 usec.
Actually sleeping 24746000 usec.
Underslept by 825000 usec.
Will sleep 25590000 usec next time.
Sleeping 25590000 usec.
Actually sleeping 25577000 usec.
Underslept by 826000 usec./sec, 0.04 CPU cores, 78.6% done. ETA 04 Jun 13:50
Will sleep 26416000 usec next time.
Sleeping 26416000 usec.
Actually sleeping 26391000 usec.
Underslept by 826000 usec.
Will sleep 27242000 usec next time.
Sleeping 27242000 usec.
Actually sleeping 27217000 usec.
Underslept by 825000 usec./sec, 0.03 CPU cores, 83.0% done. ETA 04 Jun 13:50
Will sleep 28067000 usec next time.
Sleeping 28067000 usec.
Actually sleeping 28041000 usec.
Underslept by 825000 usec.
Will sleep 28892000 usec next time.
Sleeping 28892000 usec.
Actually sleeping 28856000 usec.
Underslept by 825000 usec./sec, 0.03 CPU cores, 88.3% done. ETA 04 Jun 13:51
Will sleep 29717000 usec next time.
Sleeping 29717000 usec.
Actually sleeping 29693000 usec.
Underslept by 826000 usec.
Will sleep 30543000 usec next time.
Sleeping 30543000 usec.
Actually sleeping 30518000 usec.
Underslept by 825000 usec.
Will sleep 31368000 usec next time.
Sleeping 31368000 usec.
Actually sleeping 31345000 usec.
Underslept by 825000 usec.
Will sleep 32193000 usec next time.
Sleeping 32193000 usec.
Actually sleeping 32158000 usec.
Underslept by 824000 usec.
Will sleep 33017000 usec next time.
Sleeping 33017000 usec.
Actually sleeping 33015000 usec.
Underslept by 824000 usec./sec, 0.03 CPU cores, 97.9% done. ETA 04 Jun 13:52
Will sleep 33841000 usec next time.
Sleeping 33841000 usec.
Actually sleeping 33817000 usec.
Underslept by 826000 usec./sec, 0.02 CPU cores, 100.5% done. ETA 04 Jun 13:53
Will sleep 34667000 usec next time.
Sleeping 34667000 usec.
Actually sleeping 34621000 usec.
Underslept by 826000 usec.
Will sleep 35493000 usec next time.
Sleeping 35493000 usec.
Actually sleeping 35475000 usec.
Underslept by 824000 usec.sec, 0.03 CPU cores, 100.5% done. ETA 04 Jun 13:54
Will sleep 36317000 usec next time.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 819.74 sec. (0.05 init + 819.70 sieve) at 36778 p/sec.
Processor time: 39.81 sec. (0.05 init + 39.76 sieve) at 758125 p/sec.
Average processor utilization: 1.02 (init), 0.05 (sieve)

Same GPU spiking as before; average utilization was 5%.

____________
My lucky number is 75898524288+1

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24142 - Posted: 4 Jun 2010 | 18:56:17 UTC - in response to Message 24139.

There's the "money shot". Combined with other parts, this tells me that the timing is far too random for the current method to work.

I'll look at other options. I didn't see any timing less than 300,000 usec, so I might try finding the minimum.

Here are the results from one of the 9600GSO on XP Pro which look a fair bit different from the others:

Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping 262500 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping 262500 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 231250 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 231250 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
p=42070028573697, 476.2K p/sec, 0.10 CPU cores, 95.2% done. ETA 04 Jun 14:53 Ac
tually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 63.91 sec. (0.05 init + 63.86 sieve) at 472077 p/sec.
Processor time: 6.56 sec. (0.27 init + 6.30 sieve) at 4787543 p/sec.
Average processor utilization: 5.67 (init), 0.10 (sieve)

____________
141941*2^4299438-1 is prime!

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24143 - Posted: 4 Jun 2010 | 18:57:24 UTC - in response to Message 24140.

That is just weird!

The only way I can see that happening is if computation doesn't start when the kernel is called. Since it's not happening to everyone, maybe it's a CUDA runtime dll problem?

So let's all standardize on this DLL and try again.
____________

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24145 - Posted: 4 Jun 2010 | 20:15:47 UTC - in response to Message 24143.

That is just weird!

The only way I can see that happening is if computation doesn't start when the kernel is called. Since it's not happening to everyone, maybe it's a CUDA runtime dll problem?

So let's all standardize on this DLL and try again.

Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
.
.
.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping 262500 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 62.47 sec. (0.03 init + 62.44 sieve) at 482828 p/sec.
Processor time: 8.92 sec. (0.27 init + 8.66 sieve) at 3482635 p/sec.
Average processor utilization: 8.50 (init), 0.14 (sieve)

Will give my Vista boxes (32-bit and 64-bit) a try when i get home.

____________
141941*2^4299438-1 is prime!

HAmsty
Volunteer tester

Joined: 26 Dec 08
Posts: 132
ID: 33421
Credit: 12,510,712
RAC: 0

Message 24146 - Posted: 4 Jun 2010 | 20:22:53 UTC

same on my side. i've pm'ed ken my std error output.
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24147 - Posted: 4 Jun 2010 | 20:28:16 UTC - in response to Message 24146.

same on my side. i've pm'ed ken my std error output.

Same as Scott; not the same as you had before. Note the lack of "OVERslept by WAY TOO LONG!" messages. Combined with the steady rate, that tells me the new DLL made the run efficient. :)

Now I'm just hoping it fixes Michael Goetz's problem too.
____________

Michael Goetz
Volunteer moderator

Joined: 21 Jan 10
Posts: 14045
ID: 53948
Credit: 483,803,363
RAC: 628,990

Message 24149 - Posted: 4 Jun 2010 | 20:41:09 UTC - in response to Message 24143.

That is just weird!

The only way I can see that happening is if computation doesn't start when the kernel is called. Since it's not happening to everyone, maybe it's a CUDA runtime dll problem?

So let's all standardize on this DLL and try again.

Ok, here's he result using that DLL, which I believe is the same as the one I was already using. Note that the sleep time starts at just under 1 second and steadily increases to around 30 seconds. During the sleep time, the GPU is idle.

EDIT: Correction, the new DLL is not the same one I used before. WinRAR seems to have a bug and was telling me they were the same when they were not. Nevertheless, this test was done with the correct DLL.

C:\Temp\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Detected GPU 0: GeForce GTX 280
Detected compute capability: 1.3
Detected 30 multiprocessors.
Sleeping 0 usec.
Underslept by 831000 usec.
Will sleep 831000 usec next time.
Sleeping 831000 usec.
Actually sleeping 778000 usec.
Underslept by 828000 usec.
Will sleep 1659000 usec next time.
Sleeping 1659000 usec.
Actually sleeping 1631000 usec.
Underslept by 833000 usec.
Will sleep 2492000 usec next time.
Sleeping 2492000 usec.
Actually sleeping 2470000 usec.
Underslept by 830000 usec.
Will sleep 3322000 usec next time.
Sleeping 3322000 usec.
Actually sleeping 3263000 usec.
Underslept by 828000 usec.
Will sleep 4150000 usec next time.
Sleeping 4150000 usec.
Actually sleeping 4095000 usec.
Underslept by 834000 usec.
Will sleep 4984000 usec next time.
Sleeping 4984000 usec.
Actually sleeping 4954000 usec.
Underslept by 829000 usec.
Will sleep 5813000 usec next time.
Sleeping 5813000 usec.
Actually sleeping 5780000 usec.
Underslept by 844000 usec.
Will sleep 6657000 usec next time.
Sleeping 6657000 usec.
Actually sleeping 6623000 usec.
Underslept by 829000 usec.
Will sleep 7486000 usec next time.
Sleeping 7486000 usec.
Actually sleeping 7458000 usec.
Underslept by 829000 usec.
Will sleep 8315000 usec next time.
Sleeping 8315000 usec.
Actually sleeping 8255000 usec.
Underslept by 830000 usec.
Will sleep 9145000 usec next time.
Sleeping 9145000 usec.
Actually sleeping 9098000 usec.
Underslept by 829000 usec./sec, 0.16 CPU cores, 31.5% done. ETA 04 Jun 15:52
Will sleep 9974000 usec next time.
Sleeping 9974000 usec.
Actually sleeping 9939000 usec.
Underslept by 904000 usec.
Will sleep 10878000 usec next time.
Sleeping 10878000 usec.
Actually sleeping 10812000 usec.
Underslept by 827000 usec.
Will sleep 11705000 usec next time.
Sleeping 11705000 usec.
Actually sleeping 11694000 usec.
Underslept by 829000 usec.
Will sleep 12534000 usec next time.
Sleeping 12534000 usec.
Actually sleeping 12479000 usec.
Underslept by 835000 usec.
Will sleep 13369000 usec next time.
Sleeping 13369000 usec.
Actually sleeping 13297000 usec.
Underslept by 862000 usec./sec, 0.07 CPU cores, 43.7% done. ETA 04 Jun 15:53
Will sleep 14231000 usec next time.
Sleeping 14231000 usec.
Actually sleeping 14122000 usec.
Underslept by 829000 usec.
Will sleep 15060000 usec next time.
Sleeping 15060000 usec.
Actually sleeping 15031000 usec.
Underslept by 828000 usec.
Will sleep 15888000 usec next time.
Sleeping 15888000 usec.
Actually sleeping 15867000 usec.
Underslept by 833000 usec.
Will sleep 16721000 usec next time.
Sleeping 16721000 usec.
Actually sleeping 16674000 usec.
Underslept by 839000 usec./sec, 0.06 CPU cores, 53.3% done. ETA 04 Jun 15:54
Will sleep 17560000 usec next time.
Sleeping 17560000 usec.
Actually sleeping 17525000 usec.
Underslept by 826000 usec.
Will sleep 18386000 usec next time.
Sleeping 18386000 usec.
Actually sleeping 18317000 usec.
Underslept by 826000 usec.
Will sleep 19212000 usec next time.
Sleeping 19212000 usec.
Actually sleeping 19164000 usec.
Underslept by 825000 usec./sec, 0.04 CPU cores, 61.2% done. ETA 04 Jun 15:55
Will sleep 20037000 usec next time.
Sleeping 20037000 usec.
Actually sleeping 19965000 usec.
Underslept by 827000 usec.
Will sleep 20864000 usec next time.
Sleeping 20864000 usec.
Actually sleeping 20821000 usec.
Underslept by 826000 usec.
Will sleep 21690000 usec next time.
Sleeping 21690000 usec.
Actually sleeping 21620000 usec.
Underslept by 832000 usec./sec, 0.04 CPU cores, 68.2% done. ETA 04 Jun 15:56
Will sleep 22522000 usec next time.
Sleeping 22522000 usec.
Actually sleeping 22498000 usec.
Underslept by 827000 usec.
Will sleep 23349000 usec next time.
Sleeping 23349000 usec.
Actually sleeping 23328000 usec.
Underslept by 827000 usec.
Will sleep 24176000 usec next time.
Sleeping 24176000 usec.
Actually sleeping 24172000 usec.
Underslept by 826000 usec.
Will sleep 25002000 usec next time.
Sleeping 25002000 usec.
Actually sleeping 24975000 usec.
Underslept by 825000 usec.
Will sleep 25827000 usec next time.
Sleeping 25827000 usec.
Actually sleeping 25810000 usec.
Underslept by 826000 usec./sec, 0.03 CPU cores, 78.6% done. ETA 04 Jun 15:57
Will sleep 26653000 usec next time.
Sleeping 26653000 usec.
Actually sleeping 26623000 usec.
Underslept by 827000 usec.
Will sleep 27480000 usec next time.
Sleeping 27480000 usec.
Actually sleeping 27452000 usec.
Underslept by 838000 usec./sec, 0.03 CPU cores, 83.0% done. ETA 04 Jun 15:58
Will sleep 28318000 usec next time.
Sleeping 28318000 usec.
Actually sleeping 28285000 usec.
Underslept by 827000 usec.
Will sleep 29145000 usec next time.
Sleeping 29145000 usec.
Actually sleeping 29082000 usec.
Underslept by 862000 usec./sec, 0.03 CPU cores, 88.3% done. ETA 04 Jun 15:59
Will sleep 30007000 usec next time.
Sleeping 30007000 usec.
Actually sleeping 29962000 usec.
Underslept by 829000 usec.
Will sleep 30836000 usec next time.
Sleeping 30836000 usec.
Actually sleeping 30793000 usec.
Underslept by 825000 usec.
Will sleep 31661000 usec next time.
Sleeping 31661000 usec.
Actually sleeping 31632000 usec.
Underslept by 826000 usec.
Will sleep 32487000 usec next time.
Sleeping 32487000 usec.
Actually sleeping 32446000 usec.
Underslept by 831000 usec.
Will sleep 33318000 usec next time.
Sleeping 33318000 usec.
Actually sleeping 33315000 usec.
Underslept by 826000 usec./sec, 0.03 CPU cores, 97.9% done. ETA 04 Jun 16:00
Will sleep 34144000 usec next time.
Sleeping 34144000 usec.
Actually sleeping 34107000 usec.
Underslept by 827000 usec./sec, 0.01 CPU cores, 100.5% done. ETA 04 Jun 16:01
Will sleep 34971000 usec next time.
Sleeping 34971000 usec.
Actually sleeping 34914000 usec.
Underslept by 829000 usec.
Will sleep 35800000 usec next time.
Sleeping 35800000 usec.
Actually sleeping 35778000 usec.
Underslept by 824000 usec.sec, 0.03 CPU cores, 100.5% done. ETA 04 Jun 16:02
Will sleep 36624000 usec next time.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 828.16 sec. (0.04 init + 828.12 sieve) at 36404 p/sec.
Processor time: 37.16 sec. (0.03 init + 37.13 sieve) at 811958 p/sec.
Average processor utilization: 0.78 (init), 0.04 (sieve)

I'm running driver version 197.45
Vista -32 SP3
Q6600
GTX 280
____________
My lucky number is 75898524288+1

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24150 - Posted: 4 Jun 2010 | 22:08:55 UTC

With the same DLL, I am getting the exact same results (just much longer) on my 9500GT in a Vista 64-bit system (driver 191.07). GPU-Z shows off-and-n spikes to about 30%, but never higher for GPU load.

So far, all the issues are on Vista even with the same DLL and across multiple driver versions. We really need a Win7 machine to test this, too (unfortunately my one Win7 box has an ATI card).

____________
141941*2^4299438-1 is prime!

cromido

Joined: 1 Feb 10
Posts: 31
ID: 54429
Credit: 88,764,532
RAC: 8,274

Message 24151 - Posted: 4 Jun 2010 | 22:25:04 UTC - in response to Message 24150.

Here's my results:
Windows XP 32-bit SP3
Intel E6850
8500GT
latest program and standard cuda.dll (of this thread)
Driver 196.21
BOINC suspended

OVERslept by 46875 usec. p/sec, 0.00 CPU cores, 97.0% done. ETA 04 Jun 23:17
Will sleep 903125 usec next time.
Sleeping 903125 usec.
Actually sleeping 903125 usec.
OVERslept by 46875 usec.
Will sleep 856250 usec next time.
Sleeping 856250 usec.
Actually sleeping 856250 usec.
OVERslept by 46875 usec.
Will sleep 809375 usec next time.
Sleeping 809375 usec.
Actually sleeping 809375 usec.
OVERslept by 46875 usec.
Will sleep 762500 usec next time.
Sleeping 762500 usec.
Actually sleeping 762500 usec.
OVERslept by 46875 usec.
Will sleep 715625 usec next time.
Sleeping 715625 usec.
Actually sleeping 715625 usec.
OVERslept by WAY TOO LONG!
Will sleep 653125 usec next time.
Sleeping 653125 usec.
Actually sleeping 637500 usec.
OVERslept by WAY TOO LONG!
Will sleep 590625 usec next time.
Sleeping 590625 usec.
Actually sleeping 590625 usec.
OVERslept by 46875 usec.
Will sleep 543750 usec next time.
Sleeping 543750 usec.
Actually sleeping 543750 usec.
OVERslept by 46875 usec.
Will sleep 496875 usec next time.
Sleeping 496875 usec.
Actually sleeping 496875 usec.
OVERslept by 46875 usec.
Will sleep 450000 usec next time.
Sleeping 450000 usec.
Actually sleeping 450000 usec.
OVERslept by 46875 usec.
Will sleep 403125 usec next time.
Sleeping 403125 usec.
Actually sleeping 403125 usec.
OVERslept by 46875 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 325000 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 325000 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
Underslept by 0 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.

Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 1139.53 sec. (0.06 init + 1139.47 sieve) at 26457 p/sec.
Processor time: 19.95 sec. (0.05 init + 19.91 sieve) at 1514427 p/sec.
Average processor utilization: 0.75 (init), 0.02 (sieve)
____________

Ken_g6
Volunteer developer

Joined: 4 Jul 06
Posts: 941
ID: 3110
Credit: 265,313,309
RAC: 65,012

Message 24152 - Posted: 4 Jun 2010 | 23:28:58 UTC - in response to Message 24151.

OVERslept by WAY TOO LONG!
Nuts, that didn't work.

OK, one more try. I think I've figured out the CUDA initialization here. I can't test it because I don't have a real card, but if it works the app should be perfectly efficient.
____________

Scott Brown
Volunteer moderator
Volunteer tester
Project scientist

Joined: 17 Oct 05
Posts: 2420
ID: 1178
Credit: 20,215,893,042
RAC: 23,936,759

Message 24153 - Posted: 5 Jun 2010 | 0:20:37 UTC - in response to Message 24152.

Nuts, that didn't work.

OK, one more try. I think I've figured out the CUDA initialization here. I can't test it because I don't have a real card, but if it works the app should be perfectly efficient.

T8100 Vostro 1510
8400M GS (256mb)

Microsoft Windows Vista (32-bit) [Version 6.0.6002]

ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q

ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Resuming from checkpoint p=42070004194305 in ppcheck42070e9.txt
Detected GPU 0: GeForce 8400M GS
Detected compute capability: 1.1
Detected 2 multiprocessors.
p=42070029097985, 34.95K p/sec, 0.01 CPU cores, 97.0% done. ETA 04 Jun 20:15
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 698.68 sec. (0.37 init + 698.31 sieve) at 37164 p/sec.
Processor time: 4.38 sec. (0.14 init + 4.24 sieve) at 6116159 p/sec.
Average processor utilization: 0.38 (init), 0.01 (sieve)

GPU-Z reported 96%-99% GPU utilization (mostly 99%) with 38mb of VRAM used and GPU temps were excellent.

I will test in a little while on the 9500GT in 64-bit Vista, but I think this got it. Nicely done Ken!

____________
141941*2^4299438-1 is prime!

Scott Brown
Volunteer moderator