Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Sieving :
ppsieve/tpsieve CUDA testing
Author |
Message |
John Honorary cruncher
 Send message
Joined: 21 Feb 06 Posts: 2875 ID: 2449 Credit: 2,681,934 RAC: 0
                 
|
Exciting News!!!
Over in the PST forum, Ken and the crew have been working on a GPU application for sieving. The program is called ppsieve and is currently being used on the PPSE sieve. If all goes well, we may be able to merge the PPS and PPSE sieves bringing PPSE into BOINC.
Currently available for 32 & 64 bit Linux AND 32 bit Windows (will run on 64 bit). It should work on cards with any compute capability. You can download it here:
ppsieve-cuda.zip (source)
To test, please use the following command line:
./ppsieve-cuda-boinc-(version) -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -c 60
It should output the following factors:
Range: 42070e9 to 42070030e6
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070003101727 | 4207*2^1054290+1
42070003511309 | 6057*2^1043547+1
42070006307657 | 1513*2^1771812+1
42070006388603 | 2059*2^1816098+1
42070007177519 | 5437*2^1121592+1
42070007396759 | 7339*2^1803518+1
42070008823897 | 4639*2^952018+1
42070008858187 | 2893*2^317690+1
42070010190569 | 5625*2^1903125+1
42070011430123 | 3821*2^1406279+1
42070012301263 | 1957*2^1185814+1
42070013521999 | 1965*2^404493+1
42070013970587 | 7143*2^1462422+1
42070013989247 | 5037*2^838603+1
42070017332953 | 6237*2^1916994+1
42070018235321 | 1941*2^363948+1
42070019542387 | 8587*2^1703626+1
42070023987581 | 9811*2^318944+1
42070024339237 | 9257*2^1170495+1
42070024532551 | 4311*2^1690093+1
42070024936837 | 5679*2^1726142+1
42070024995961 | 9111*2^1707153+1
42070026021997 | 4039*2^1819590+1
42070027452199 | 1323*2^854008+1
42070029006583 | 5943*2^663870+1
Found 27 factors
Please provide as much details about your system as possible.
Thank you for testing!
p.s. If you wish to test the CUDA time vs. CPU time, you can download the CPU build here: ppsieve-bin.zip (source)
Just run the same test range and then compare the results.
Other sample test cases:
Range: 20070e9 to 20070010e6
20070000475957 | 4995*2^1822738+1
20070001146497 | 4977*2^626298+1
20070001163929 | 3765*2^461308+1
20070001302811 | 7669*2^725426+1
20070001425977 | 5821*2^1775248+1
20070002245151 | 1221*2^646983+1
20070002606341 | 4809*2^497683+1
20070004816819 | 6699*2^1215561+1
20070005914001 | 9847*2^1634140+1
20070006187837 | 9923*2^287853+1
20070006875981 | 1645*2^965954+1
20070007170259 | 3889*2^49730+1
20070008329039 | 9065*2^832569+1
Found 13 factors
Range: 249871e9 to 2498711e8
249871003789289 | 6295*2^266404+1
249871009510013 | 2771*2^1272671+1
249871010360639 | 1743*2^1337710+1
249871027030549 | 8865*2^1534637+1
249871030776329 | 7815*2^1679937+1
249871032591751 | 2335*2^23512+1
249871038523049 | 7527*2^204096+1
249871049497963 | 6497*2^505399+1
249871066947839 | 8497*2^1221770+1
249871068167599 | 7311*2^450531+1
249871089712009 | 9281*2^1650023+1
249871091913587 | 2139*2^1290902+1
249871099624639 | 8381*2^350375+1
Found 13 factors
Range: 42070e9 to 42070100e6
Found 68 factors
____________
| |
|
tocx Volunteer tester
 Send message
Joined: 23 Nov 09 Posts: 15 ID: 50535 Credit: 203,523,000 RAC: 0
               
|
System: Debian GNU Linux (Squeeze), Kernel 2.6.32 AMD64
Intel i5-750
GeForce 9500 GT (silent)
Nvidia Driver Ver.: 190.53
Cuda-Toolkit: 2.3 Ubuntu 9.04
Running AP26 on all 4 Cores, no other boinc-based GPU-apps running
GPU-Temperature changes from 38°C to 40°C during the sieve runs
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Starting 1 threads.
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 24.00 sec. (0.01 init + 23.99 sieve) at 131122 p/sec.
Processor time: 3.42 sec. (0.02 init + 3.40 sieve) at 925156 p/sec.
Average processor utilization: 1.14 (init), 0.14 (sieve)
____________
| |
|
|
System: OpenSuSE 11.2 Kernel 2.6.31.12 amd64
CPU: Core 2 Quad Q9550 (E0 stepping)
GPU: GeForce GTX 260-192 - NVIDIA driver version: 190.53 (cuda 2.3 support)
Test run with the CPU idling and the GPU at stock clock:
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 5.81 sec. (0.02 init + 5.80 sieve) at 542504 p/sec.
Processor time: 1.35 sec. (0.02 init + 1.33 sieve) at 2363791 p/sec.
Average processor utilization: 1.10 (init), 0.23 (sieve)
Test run with the CPU idling and the GPU at 667 MHz (shaders linked):
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 4.98 sec. (0.02 init + 4.96 sieve) at 634269 p/sec.
Processor time: 1.22 sec. (0.02 init + 1.20 sieve) at 2617475 p/sec.
Average processor utilization: 0.95 (init), 0.24 (sieve) | |
|
|
To give an impression of the current speeds here are the runtimes of the test range on a Q9550 @ 3.4 GHz (FSB 400 * 8.5) CPU use the linux-x86_64 version of the ppsieve-cpu application.
Running only 1 thread:
ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 490000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 8 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 16.44 sec. (0.90 init + 15.54 sieve) at 193956 p/sec.
Processor time: 16.41 sec. (0.86 init + 15.55 sieve) at 193898 p/sec.
Average processor utilization: 0.95 (init), 1.00 (sieve)
Running 4 threads:
ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 540000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Thread 1 starting
Thread 2 starting
Thread 3 starting
Thread 3 completed
Waiting for threads to exit
Thread 0 completed
Thread 1 completed
Thread 2 completed
Sieve complete: 42070000000000 <= p < 42070003000000
Found 8 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 4.94 sec. (0.91 init + 4.03 sieve) at 748919 p/sec.
Processor time: 16.11 sec. (0.91 init + 15.21 sieve) at 198232 p/sec.
Average processor utilization: 0.99 (init), 3.78 (sieve)
---
The "Elapsed time" on a GTX 260-192 @ 667 MHz is nearly the same as on the Q9550 @ 3.4 GHz. So the output of the GTX 260-192 at stock clock is roughly the same as that of 4 cores of my Q9550 at stock clock. This ratio also applies to the current AP26 apps (1.01 (cuda23) vs 1.04). | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Sieve complete: 42070000000000 <= p < 42070003000000
Found 8 factors
Whoops! It looks like I left some flags in the ppconfig.txt file for the CPU version of PPSieve that was on my site, including one that forces sieving for Riesel numbers instead of Proth.
The timing information here is fine, but before doing anything in the PPSE sieve, anyone getting "-1"s in their results file should either download what I just uploaded, or edit ppconfig.txt to remove the "riesel" line.
OK, now that's over with, about the GPU testing...
3M is actually a small test range for the GPU code. I use it because it has good known factors, and becuase I don't have a compute-capable GPU and the emulator's really slow! But a 30M range or something would probably be better for speed comparison, if you have a minute or four:
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
Also, I haven't tried the 32-bit app at all! I'd be interested to know if it works!
____________
| |
|
|
64 bit GPU app - Test range 30M
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
My GTX 260-192 at stock clock with no load on the CPU:
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 49.30 sec. (0.02 init + 49.29 sieve) at 611635 p/sec.
Processor time: 3.85 sec. (0.02 init + 3.83 sieve) at 7866200 p/sec.
Average processor utilization: 1.04 (init), 0.08 (sieve)
and again at 667 MHz with no load on the CPU (shaders linked):
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 42.05 sec. (0.02 init + 42.02 sieve) at 717374 p/sec.
Processor time: 3.72 sec. (0.02 init + 3.70 sieve) at 8142356 p/sec.
Average processor utilization: 0.82 (init), 0.09 (sieve) | |
|
|
32 bit GPU app - Test range 30M
./ppsieve-cuda-x86-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
My GTX 260-192 at stock clock with no load on the CPU:
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 49.59 sec. (0.02 init + 49.56 sieve) at 608232 p/sec.
Processor time: 4.10 sec. (0.02 init + 4.07 sieve) at 7399053 p/sec.
Average processor utilization: 1.11 (init), 0.08 (sieve)
and again at 667 MHz with no load on the CPU (shaders linked):
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 42.28 sec. (0.02 init + 42.26 sieve) at 713325 p/sec.
Processor time: 3.96 sec. (0.02 init + 3.94 sieve) at 7648693 p/sec.
Average processor utilization: 1.07 (init), 0.09 (sieve) | |
|
|
64 bit CPU app - Test range 30M
1 thread - Same CPU and clock rate as stated above:
./ppsieve-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 500000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
p=42070023724033, 196.6K p/sec, 1.00 CPU cores, 79.1% done. ETA 09 Mar 18:11
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 153.56 sec. (0.87 init + 152.69 sieve) at 196576 p/sec.
Processor time: 153.24 sec. (0.87 init + 152.37 sieve) at 196995 p/sec.
Average processor utilization: 1.00 (init), 1.00 (sieve)
4 threads - Same CPU and clock rate as stated above:
ppsieve version 0.3.4a (testing)
Compiled Feb 19 2010 with GCC 4.3.3
Algorithm not specified, starting benchmark...
bsf takes 350000; mul takes 520000; using standard algorithm.
nstart=1999980, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 1 starting
Thread 0 starting
Thread 3 starting
Thread 2 starting
Thread 3 completed
Waiting for threads to exit
Thread 1 completed
Thread 0 completed
Thread 2 completed
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 39.77 sec. (0.90 init + 38.88 sieve) at 772082 p/sec.
Processor time: 152.13 sec. (0.89 init + 151.24 sieve) at 198467 p/sec.
Average processor utilization: 0.99 (init), 3.89 (sieve) | |
|
|
64 bit
System: Archlinux, Kernel 2.6.32-ARCH
Intel Xeon CPU X3360 @ 3.4GHz (C1 stepping)
GeForce GTX 285
Nvidia Driver Version: 190.53
Cuda-Toolkit: 2.3
Running TRP-Sieve on all 4 Cores, no other boinc-based GPU-apps running
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 285
Detected compute capability: 1.3
Detected 30 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 3.60 sec. (0.02 init + 3.58 sieve) at 878148 p/sec.
Processor time: 0.99 sec. (0.01 init + 0.97 sieve) at 3232123 p/sec.
Average processor utilization: 0.68 (init), 0.27 (sieve)
____________
| |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
does this version work for 1.0 cards?
____________
| |
|
|
does this version work for 1.0 cards?
I find no checks for a specific compute capability (1.0, 1.1 or 1.3) in the source code. You can download the zip file with the binaries in it (< 100 K) and simply start the 32 or the 64 bit binary with the commands John gave in his post. The test range is very short.
or, of course, we simply could read John's initial post. I overlooked it too:
Currently available for 32 & 64 bit Linux. It should work on cards with any compute capability. You can download it here:
____________
| |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
oh, sorry, i missed that to. :-(
____________
| |
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
Ubuntu 9.10, kernel 2.6.31-20-generic
Core2 Quad Q9550
GeForce 9800 GT, NVIDIA driver 190.42
This 30M range was run with the CPU cores busy on TRP sieve. Another run with the cores idle was just as fast (as expected).
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1-beta (testing)
Compiled Mar 7 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Starting 1 threads.
Detected GPU 0: GeForce 9800 GT
Detected compute capability: 1.1
Detected 14 multiprocessors.
p=42070029884417, 498.0K p/sec, 0.14 CPU cores, 99.6% done. ETA 13 Mar 17:49
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 60.52 sec. (0.01 init + 60.51 sieve) at 498248 p/sec.
Processor time: 9.42 sec. (0.01 init + 9.41 sieve) at 3203673 p/sec.
Average processor utilization: 0.71 (init), 0.16 (sieve)
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
Has anyone given any thought to what's going to happen when this goes live via BOINC?
In particular, consider how I've got PrimeGrid set up, which is probably a fairly common arrangement for people with CUDA capable GPUs:
1) I've got a quad-core CPU and a GTX280 GPU.
2) I run AP26 on the GPU
3) I run other PrimeGrid stuff on the CPU
4) I do not want to run AP26 on the CPU because I can run it much, much faster on the GPU (about 5 min/WU).
5) There's no explicit BOINC mechanism to say "run X on the CPU and Y on the GPU", unless PrimeGrid makes two separate sub-projects, "AP26-CPU" and "AP26-GPU".
6) So, I have ONLY the CPU tasks selected on project preferences page to feed the right tasks to the CPU
7) ... and "Send work from any subproject..." to send AP26 to the GPU. This works because nothing else exists for the GPU, so it has to send AP26 tasks.
As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU, unless you want to also allow those to run on the CPU (which I think most people would prefer not to do.)
One possible solution is to make separate sub-projects for the GPU versions, but I realize that's far from ideal.
____________
My lucky number is 75898524288+1 | |
|
Vato Volunteer tester
 Send message
Joined: 2 Feb 08 Posts: 863 ID: 18447 Credit: 882,676,451 RAC: 1,290,092
                           
|
I think it would be a mistake to build too much complexity into local bespoke code.
We need the BOINC client to do the right thing with GPU scheduling. We need the BOINC server to allow per subproject CPU/GPU preferences. We should try and feed those requirements into the mainstream BOINC development process and then make use of it when released.
All IMHO of course, and somewhat idealistic, since BOINC development seems to follow whatever direction the Berkeley folks are meandering in at a particular moment in time...
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
I intentionally avoided the whole concept of "well, the BOINC software really *should* do this..." for the obvious reasons, primarily that we're going to have a problem in the near future whereas the BOINC client might eventually get around to solving a problem like this anywhere from tomorrow to never. They've got much bigger GPU scheduling problems to solve first before they could get around to this one.
____________
My lucky number is 75898524288+1 | |
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU ...
The anonymous platform mechanism lets you control exactly what you want to run. I deployed it today on the Linux side of my quad and it even picked up existing tasks correctly.
Obviously, this isn't a solution for the masses, but it works for me.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU ...
The anonymous platform mechanism lets you control exactly what you want to run. I deployed it today on the Linux side of my quad and it even picked up existing tasks correctly.
Obviously, this isn't a solution for the masses, but it works for me.
It's been a loooooong time since I've set up app_info by hand, and I don't really remember how to do it. Anyone know of a good reference for what exactly needs to be done?
____________
My lucky number is 75898524288+1 | |
|
samuel7 Volunteer tester
 Send message
Joined: 1 May 09 Posts: 89 ID: 39425 Credit: 257,425,010 RAC: 0
                    
|
As soon as there's more than one GPU project, there will be no way of selecting what you want to run on the GPU ...
The anonymous platform mechanism lets you control exactly what you want to run. I deployed it today on the Linux side of my quad and it even picked up existing tasks correctly.
Obviously, this isn't a solution for the masses, but it works for me.
It's been a loooooong time since I've set up app_info by hand, and I don't really remember how to do it. Anyone know of a good reference for what exactly needs to be done?
The BOINC wiki has this article. You can use the example in the format section as a template.
Open the client_state file in the data directory and find the Primegrid project data. Copy the app_version section of the first subproject you want to run and paste over the app_version in the template removing only the platform tag. You can edit the flops value if you know it's off for your current duration correction factor. Correct the <app> <name> and declare the files in the app_version also with <file_info> tags like in the example. Repeat for other subprojects you want to run. Save as app_info.xml in the PG project folder and restart BOINC. You should make sure all the files you declared actually exist in the PG folder.
It is probably a good idea to run down your cache and/or make a backup of the data directory (suspend network activity, too!) before deploying.
Below is a portion of my Win app_info for Primegrid.
<app_info>
<app>
<name>ap26</name>
</app>
<file_info>
<name>primegrid_ap26_1.01_windows_intelx86__cuda23.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart.dll</name>
<executable/>
</file_info>
<app_version>
<app_name>ap26</app_name>
<version_num>101</version_num>
<plan_class>cuda23</plan_class>
<avg_ncpus>0.050000</avg_ncpus>
<max_ncpus>0.050000</max_ncpus>
<flops>5604000000.000000</flops>
<coproc>
<type>CUDA</type>
<count>1.000000</count>
</coproc>
<file_ref>
<file_name>primegrid_ap26_1.01_windows_intelx86__cuda23.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart.dll</file_name>
</file_ref>
</app_version>
<app>
<name>psp_sr2sieve</name>
</app>
<file_info>
<name>primegrid_sr2sieve_wrapper_1.12_windows_x86_64.exe</name>
<executable/>
</file_info>
<file_info>
<name>primegrid_sr2sieve_1.8.10_windows_x86_64.exe.orig</name>
<executable/>
</file_info>
<app_version>
<app_name>psp_sr2sieve</app_name>
<version_num>112</version_num>
<flops>2876776723.690640</flops>
<file_ref>
<file_name>primegrid_sr2sieve_wrapper_1.12_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>primegrid_sr2sieve_1.8.10_windows_x86_64.exe.orig</file_name>
<open_name>primegrid_sr2sieve_1.8.10_windows_x86_64.exe.orig</open_name>
</file_ref>
</app_version>
</app_info>
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
Thanks, that makes sense and helps a lot.
____________
My lucky number is 75898524288+1 | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Alright, I've posted a release candidate version to the locations given at the beginning of the thread. It may be slightly faster; it may also lock up your GPU until it's done. Let me know how it goes.
In any case, I've recently been looking at some other projects' GPU speeds, and I'm finding myself disappointed with my speeds. When Milkyway@Home is 17 times faster on high-end NVidia (PDF), and even a simple Collatz app (not the Collatz, but the only source code I could find) is more than twice as fast as a CPU on a mid-range card, but my code is only as fast as a CPU on a high-end card, I wonder if I'm doing something wrong. Would any of the experienced CUDA developers around here care to give my code the once-over, to see if I'm doing something obviously stupid like not giving the card enough threads?
I suppose the other side of the coin could be that my CUDA code isn't bad, but that my and Geoff's CPU code is extraordinarily good.
Edit: P.S. There's a BOINC capable (I think) executable in the zipfile as well. :)
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
In any case, I've recently been looking at some other projects' GPU speeds, and I'm finding myself disappointed with my speeds. When Milkyway@Home is 17 times faster on high-end NVidia (PDF), and even a simple Collatz app (not the Collatz, but the only source code I could find) is more than twice as fast as a CPU on a mid-range card, but my code is only as fast as a CPU on a high-end card, I wonder if I'm doing something wrong.
One thing to consider is that some problems just don't lend themselves very well to parallel processing. Even with the best code in the world it still might not work very well on a GPU. Remember, the GPU isn't all that fast compared to a CPU. It's its ability to run several hundred calculations simultaneously that makes it fast. If the problem doesn't fit the hardware well, the GPU won't be able to crunch it very quickly.
____________
My lucky number is 75898524288+1 | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Yeah...the application here is computationally-bound, and doesn't require much memory. Probably the slowest part are the 64-bit multiplies. When Fermi comes out, I expect that each stream processor will run my app (once recompiled) twice as fast.
Another part of it could be that others are comparing GPU speed to CPU speed on one core. In that case my app is 4 times as fast as the CPU version. :)
____________
| |
|
mfl0p Send message
Joined: 5 Apr 09 Posts: 251 ID: 38042 Credit: 2,757,874,746 RAC: 17,206
                              
|
Nice work so far Ken.. I only see one issue, this appears to be a compute-mode only CUDA application, meaning it will not run on the primary adapter under Windows in current form (driver watchdog timer). Correct?
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
I have no idea what you just said! (I'm new at this CUDA stuff.)
If you mean it's not using the driver API, that's correct. I was hoping to avoid it.
Edit: Did you see cuda_sleep_memcpy.cu?
____________
| |
|
mfl0p Send message
Joined: 5 Apr 09 Posts: 251 ID: 38042 Credit: 2,757,874,746 RAC: 17,206
                              
|
I have no idea what you just said! (I'm new at this CUDA stuff.)
If you mean it's not using the driver API, that's correct. I was hoping to avoid it.
In Windows, if a CUDA kernel runs longer than 5 seconds the program will be terminated by the driver. Briefly looking at your posted source, it appears you're running one huge kernel.
RE: app speeds, currently in AP26 a 1.3 CUDA card is about 5.5 times as fast as one core of an Intel Q6600 CPU. So your app isn't exactly slow, it's just doing things the GPU isn't good at.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Have you investigated whether some of the later compute capabilities add features that increase speed? It is nice to see an application that works on all CUDA cards, but given that only a handful of models are compute capable version 1.0 (G80 chips), added features such as atomic functions in compute capable 1.1 cards might help with speed depending on the processes computed in the application.
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
In Windows, if a CUDA kernel runs longer than 5 seconds the program will be terminated by the driver. Briefly looking at your posted source, it appears you're running one huge kernel.
Not exactly. I load up the GPU with either 384 or 768 P's per multiprocessor, run just those, further check any that found a factor on the CPU, then repeat. There's no specific time checking, but I estimate the kernel won't run more than 1 or 2 seconds at a time.
Scott: I looked into it. I'm not using much global memory, or any shared memory, so atomic functions don't matter. I'm not sure; double-precision might have enough precision to be useful in one case, but it would be tricky. Otherwise there's nothing until compute capability 2.0, which as I mentioned makes multiplication faster.
____________
| |
|
mfl0p Send message
Joined: 5 Apr 09 Posts: 251 ID: 38042 Credit: 2,757,874,746 RAC: 17,206
                              
|
Ok, i'll have to pay more attention to the code when reading. That should work fine in Windows, too.
____________
| |
|
|
64 bit GPU app - Test range 30M
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
My GTX 260-192 at stock clock with no load on the CPU:
ppsieve version cuda-0.1.1-rc1 (testing)
Compiled Mar 17 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 49.12 sec. (0.02 init + 49.10 sieve) at 613957 p/sec.
Processor time: 3.81 sec. (0.02 init + 3.79 sieve) at 7951250 p/sec.
Average processor utilization: 1.04 (init), 0.08 (sieve)
and again at 667 MHz with no load on the CPU (shaders linked):
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1-rc1 (testing)
Compiled Mar 17 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 41.87 sec. (0.02 init + 41.86 sieve) at 720229 p/sec.
Processor time: 3.69 sec. (0.02 init + 3.67 sieve) at 8215573 p/sec.
Average processor utilization: 1.04 (init), 0.09 (sieve)
____________
| |
|
|
64 bit GPU app - Test range 30M - speed comparison
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
My GTX 260-192 at stock clock with no load on the CPU:
ppsieve version cuda-0.1.1-beta (testing) : Elapsed time: 49.30 sec. (0.02 init + 49.29 sieve) at 611635 p/sec.
ppsieve version cuda-0.1.1-rc1 (testing) : Elapsed time: 49.12 sec. (0.02 init + 49.10 sieve) at 613957 p/sec.
and again at 667 MHz with no load on the CPU (shaders linked):
ppsieve version cuda-0.1.1-beta (testing) : Elapsed time: 42.05 sec. (0.02 init + 42.02 sieve) at 717374 p/sec.
ppsieve version cuda-0.1.1-rc1 (testing) : Elapsed time: 41.87 sec. (0.02 init + 41.86 sieve) at 720229 p/sec.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Thanks for all your testing help, guys! I've got one more thing to test, and I hope you won't mind because I expect it to be slower. (But I've been wrong before!)
Linux 64-bit users *only*, with pre-Fermi cards (Fermi isn't in stores yet if you didn't know), please try the two binaries in this zipfile. This is an experiment in 24-bit multiplies instead of 64-bit ones. Both binaries do 24-bit multiplies, despite their names, but they do other stuff differently. Even if it doesn't work here, this is a plausible algorithm for ATI if I can ever figure out how to develop for OpenCL without their GPU.
If anyone reading this *does* have a Fermi (GTX4xx), I'd love to see a benchmark from the original code (linked in the first post by John). If Fermi doesn't run 50-100% faster per shader, I may have to recompile or something for maximum speed.
____________
| |
|
Benva Volunteer tester
 Send message
Joined: 5 May 08 Posts: 73 ID: 22332 Credit: 2,715,050 RAC: 0
     
|
SYSTEM Ubuntu 9.10
Intel Core2Duo T9550 @ 2.66GHZ
G105M
195.36.15 drivers
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1-rc1 (testing)
Compiled Mar 17 2010 with GCC 4.3.3
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce G 105M
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 80.06 sec. (0.05 init + 80.02 sieve) at 39313 p/sec.
Processor time: 12.80 sec. (0.03 init + 12.77 sieve) at 246337 p/sec.
Average processor utilization: 0.67 (init), 0.16 (sieve)
pps-cuda-a1
./ppsieve-cuda-64bit-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Starting 1 threads.
Detected GPU 0: GeForce G 105M
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 166.18 sec. (0.04 init + 166.14 sieve) at 18934 p/sec.
Processor time: 13.11 sec. (0.04 init + 13.07 sieve) at 240683 p/sec.
Average processor utilization: 1.01 (init), 0.08 (sieve)
./ppsieve-cuda-24bit-x86_64-linux -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Starting 1 threads.
Detected GPU 0: GeForce G 105M
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 163.43 sec. (0.04 init + 163.39 sieve) at 19252 p/sec.
Processor time: 12.32 sec. (0.04 init + 12.28 sieve) at 256167 p/sec.
Average processor utilization: 1.01 (init), 0.08 (sieve)
____________
| |
|
|
64 bit GPU app - Test range 30M
./ppsieve-cuda-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
My GTX 260-192 at stock clock with no load on the CPU:
ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
p=42070020971521, 349.5K p/sec, 0.06 CPU cores, 69.9% done. ETA 01 Apr 05:42
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 87.11 sec. (0.02 init + 87.09 sieve) at 346140 p/sec.
Processor time: 4.61 sec. (0.02 init + 4.59 sieve) at 6566015 p/sec.
Average processor utilization: 1.11 (init), 0.05 (sieve)
./ppsieve-cuda-24bit-x86_64-linux -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
My GTX 260-192 at stock clock with no load on the CPU:
ppsieve version cuda-0.1.0-beta (testing)
Compiled Mar 29 2010 with GCC 4.3.3
nstart=76, nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Starting 1 threads.
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 24 multiprocessors.
p=42070019922945, 332.0K p/sec, 0.06 CPU cores, 66.4% done. ETA 01 Apr 05:47
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 94.13 sec. (0.02 init + 94.12 sieve) at 320306 p/sec.
Processor time: 4.69 sec. (0.02 init + 4.67 sieve) at 6449443 p/sec.
Average processor utilization: 1.10 (init), 0.05 (sieve)
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Yeah, apparently my test version is slower as expected. There's probably no need to test it any more. But thanks for the testing you did!
____________
| |
|
mfl0p Send message
Joined: 5 Apr 09 Posts: 251 ID: 38042 Credit: 2,757,874,746 RAC: 17,206
                              
|
Ken, whenever you get the code finalized, I can build a Win32 version if you need.
____________
| |
|
|
Okay, took another range: 174550 to 174551
Intel Xeon W3520 => 43 factors found; 855.903 k p/s
Nvidia GTX 260 => 43 factors found; 894.724 k p/s
Nvidia FX 580 => manually aborted due to long runtime; ~115 k p/s
Like in AP26 with mfl0ps newest app the GTX260 is a bit faster than the Xeon W3520. | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
OK, new version uploaded, probably the finalized version, at the links in the first post. I found a somewhat major bug in previous versions: around 30 of the highest N's in the ranges were being skipped! But that's fixed now.
Bryan, see if you can get this code to build in VS, perhaps without BOINC first. If you need to make changes, perhaps I should set up a GitHub account?
____________
| |
|
mfl0p Send message
Joined: 5 Apr 09 Posts: 251 ID: 38042 Credit: 2,757,874,746 RAC: 17,206
                              
|
Ok, will try building Win32 version soon. Thanks Ken
____________
| |
|
|
Ken, if you're ready for me to do a Mac port, I'd be very happy to start on that if I can get the source code.
Cheers
- Iain | |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
@Iain the source code is linked in the first post of this thread
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Well, I thought I was done, but I've made a few more changes for the release version of PPSieve-CUDA. The biggest change is compiling with CUDA 3.0. I hope it works for everyone!
The biggest code change is that I gave up using boinc_init_parallel() in favor of boinc_init(), because it's more compatible. The rest of the code changes are to header files and paths to BOINC header files. So nothing major there.
By the way, apparently CUDA 3.0 introduces an easier way to lower CPU usage. It might go from 5% down to 1 or 2%. But I'm going to leave lowering CPU usage for V0.1.2, if it's needed.
____________
| |
|
|
Thanks Ken - I just got the 0.1.1-rc2 version ported to Mac OS X (only minor tweaks required as __thread attribute is not supported by GCC on Mac OS X), I'll pick up the new code and rebuild ASAP and post here when it's done (hopefully in a couple of days)... | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Compiling with CUDA 3.0 will probably mean that many will be forced to upgrade drivers to at least the 195.xx series. Just FYI...the 196.xx and 197.xx dirvers have been noted to slow down many cards computational speeds compared to the 190.xx and 191.xx drivers (especially 8xxx and 9xxx series cards under Win7 and Vista), so the gain in freer CPU may actually be lost (and maybe exceeded) by the loss in speed on some cards.
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
OK, it doesn't have to be compiled with 3.0 (yet). I wasn't sure if 2.3 would support Fermi. Since it looks like it does (PDF), I'll see about going back to 2.3.
Edit: To be clear, only the binaries will change, not the source code.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
Compiling with CUDA 3.0 will probably mean that many will be forced to upgrade drivers to at least the 195.xx series. Just FYI...the 196.xx and 197.xx dirvers have been noted to slow down many cards computational speeds compared to the 190.xx and 191.xx drivers (especially 8xxx and 9xxx series cards under Win7 and Vista), so the gain in freer CPU may actually be lost (and maybe exceeded) by the loss in speed on some cards.
The slow downs some people have reported with some versions of the drivers have been significant. Around 25%, IIRC.
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
OK, it doesn't have to be compiled with 3.0 (yet). I wasn't sure if 2.3 would support Fermi. Since it looks like it does (PDF), I'll see about going back to 2.3.
Edit: To be clear, only the binaries will change, not the source code.
That's good...am i reading it correctly that CUDA 2.3 devices will use the native CUBIN that can work with the older drivers, but Fermi devices will need to have the 195.xx driver or higher to utilize the PTX code?
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
...And we're back to binaries compiled with CUDA 2.3. :)
Edit: Scott, I think that's right. I doubt there are any drivers older than that for Fermi.
____________
| |
|
|
The Mac OS X / CUDA version of ppsieve is now available for testing:
Mac 32 bit (OS 10.5+ required) - http://www.pyramid-productions.net/downloads/ppsieve-cuda-boinc-i686-apple-darwin.tar.gz
Note that only 32 bit CUDA executables are supported on the Mac, but as most runtime is spent on the GPU, this is not a problem. Since upgrading to Mac OS 10.6.3, Apple now only support CUDA 3.0, so this app is build and linked with the CUDA 3.0 libraries. However, it should work fine with machines where CUDA 2.3 is installed. If you have a Mac running OS 10.5 and/or CUDA 2.3 I'd be very grateful for your testing.
To test the app, please use the same inputs as in the original post, and obviously the output should be the same!
On my machine (MacBookPro, 2.66 GHz Core 2 Duo, GeForce 9400M / 9600M GT) with the CPU idling, the 9400M takes
Elapsed time: 67.96 sec. (0.02 init + 67.94 sieve) at 46303 p/sec.
Processor time: 6.93 sec. (0.03 init + 6.90 sieve) at 455624 p/sec.
Average processor utilization: 1.54 (init), 0.10 (sieve)
and the 9600M GT takes:
Elapsed time: 41.52 sec. (0.02 init + 41.50 sieve) at 75805 p/sec.
Processor time: 3.90 sec. (0.03 init + 3.87 sieve) at 812352 p/sec.
Average processor utilization: 1.35 (init), 0.09 (sieve)
Any problems or performance results please post to this thread.
Thanks
- Iain | |
|
Kevin Volunteer tester
 Send message
Joined: 4 Aug 09 Posts: 61 ID: 44488 Credit: 5,675,896 RAC: 0
              
|
Does anyone have plans for a Proth Prime Sieve ATI GPU app in the near future?
____________
May the Force be with you always.
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
An ATI app should be possible. This is the kind of highly-parallel, low-memory work that should work very well on ATI.
However, I haven't even been able to get their OpenCL compiler to run. Right now I'm focusing on getting the CPU and CUDA apps into BOINC, so ATI is off my radar for now.
____________
| |
|
|
Ok, will try building Win32 version soon. Thanks Ken
Any progress on the Windows version? Please? :-) | |
|
|
I ran the Mac version. Here's how the BOINC client sees my computer (24" iMac, OS X 10.6.3):
Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8335 @ 2.93GHz [x86 Family 6 Model 23 Stepping 10]
Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM SSE3 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1
OS: Darwin: 10.3.0
Memory: 4.00 GB physical, 559.57 GB virtual
Disk: 595.85 GB total, 559.32 GB free
Local time is UTC -7 hours
NVIDIA GPU 0: GeForce GT 120 (driver version unknown, CUDA version 3000, compute capability 1.1, 256MB, 80 GFLOPS peak)
I first suspended all BOINC tasks. Here's the output:
% /usr/bin/time ./ppsieve-cuda-boinc-i686-apple-darwin -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
Compiled Apr 16 2010 with GCC 4.2.1 (Apple Inc. build 5646) (dot 1)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Found 12 factors
28.37 real 3.61 user 0.13 sys
Screen repainting was pretty herky-jerky when the test was running, not that that's unexpected... it would have been pretty annoying if I was trying to do anything else.
-- Gary | |
|
|
That's great - thanks for testing Gary! | |
|
|
I think we have a problem here...:
I downloaded the ppsieve-cuda from the link, the ppsieve i use is from http://primesearchteam.com/showthread.php?t=25 and of version 0.3.4
Here come my results:
ppsieve-0.3.4 on Intel Core 2 Quad Q9550
Running ppsieve-x86_64-linux with 4 threads.
ppsieve version 0.3.4 (testing)
Compiled Feb 21 2010 with GCC 4.1.2 20080704 (Red Hat 4.1.2-46)
Scanning ABCD file...
Found K's from 1201 to 9999.
Found N's from 0 to 2000000.
Algorithm not specified, starting benchmark...
bsf takes 420000; mul takes 580000; using standard algorithm.
nstart=1999980, nstep=35
Reading ABCD file.
Read 324490054 terms from ABCD format input file `ppse_137TE0.txt'
ppsieve initialized: 1201 <= k <= 9999, 80 <= n <= 2000000
Sieve started: 174550000000000 <= p < 174551000000000
Thread 0 starting
Thread 3 starting
Thread 2 starting
Thread 1 starting
p=174550993787905, 668.5K p/sec, 3.62 CPU cores, 99.4% done. ETA 14 May 20:14
Thread 3 completed
Waiting for threads to exit
Thread 1 completed
Thread 0 completed
Thread 2 completed
Sieve complete: 174550000000000 <= p < 174551000000000
count=30492087,sum=0x87435f1f71650555
Elapsed time: 1595.65 sec. (82.99 init + 1512.67 sieve) at 661137 p/sec.
Processor time: 5550.66 sec. (78.34 init + 5472.32 sieve) at 182752 p/sec.
Average processor utilization: 0.94 (init), 3.62 (sieve)
Found 16 factors
Run completed successfully!
ppsieve-cuda 0.1.1-rc1 (testing) on GeForce GTX260
Running ppsieve-cuda-x86_64-linux.
ppsieve version cuda-0.1.1-rc1 (testing)
Compiled Mar 17 2010 with GCC 4.3.3
Scanning ABCD file...
Found K's from 1201 to 9999.
Found N's from 0 to 2000000.
nstart=80, nstep=32, gpu_nstep=35
Reading ABCD file.
Read 324490054 terms from ABCD format input file `ppse_137TE0.txt'
ppsieve initialized: 1201 <= k <= 9999, 80 <= n <= 2000000
Sieve started: 174550000000000 <= p < 174551000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 27 multiprocessors.
p=174550966262785, 895.5K p/sec, 0.07 CPU cores, 96.6% done. ETA 14 May 21:05
Thread 0 completed
Waiting for threads to exit
Sieve complete: 174550000000000 <= p < 174551000000000
count=30492087,sum=0x87435f1f71650555
Elapsed time: 1203.92 sec. (81.91 init + 1122.01 sieve) at 891327 p/sec.
Processor time: 161.52 sec. (80.80 init + 80.73 sieve) at 12388301 p/sec.
Average processor utilization: 0.99 (init), 0.07 (sieve)
Found 43 factors
Run completed successfully!
[roadrunner@rr022 ppsieve-cuda]# diff -u fppse_174550G-174551G.txt ../ppsieve/fppse_174550G-174551G.txt
--- fppse_174550G-174551G.txt 2010-05-14 21:05:53.000000000 +0200
+++ ../ppsieve/fppse_174550G-174551G.txt 2010-05-14 21:11:44.000000000 +0200
@@ -1,43 +1,16 @@
174550025415817 | 7911*2^73648+1
174550045592773 | 2793*2^586237+1
-174550069177949 | 8745*2^1984556+1
-174550072026563 | 5457*2^226986+1
-174550072429729 | 9075*2^1747880+1
174550087108373 | 3009*2^653483+1
-174550160034671 | 1329*2^1681186+1
174550160534359 | 3255*2^959816+1
174550164384991 | 8355*2^47924+1
174550169778407 | 8553*2^689552+1
174550180112447 | 2569*2^714210+1
-174550180935937 | 1933*2^370384+1
-174550234719989 | 9149*2^1030559+1
-174550274164087 | 6729*2^1373601+1
-174550276818167 | 6207*2^1373038+1
-174550316241167 | 7731*2^1931925+1
-174550374684949 | 8743*2^638110+1
-174550399908163 | 1383*2^1894880+1
174550460586391 | 6543*2^1032642+1
-174550469318573 | 9217*2^1762344+1
-174550494079007 | 4001*2^157237+1
-174550503180689 | 5391*2^1644311+1
-174550579748341 | 3225*2^291262+1
-174550596690163 | 6799*2^1459850+1
-174550612854811 | 2377*2^1165082+1
174550639882459 | 5079*2^1786863+1
-174550644475447 | 4901*2^820583+1
174550668538153 | 7905*2^1360676+1
-174550683576527 | 1715*2^1236227+1
174550695157613 | 2919*2^951421+1
-174550731814651 | 2127*2^1357850+1
174550734921757 | 8485*2^1891676+1
-174550743947477 | 4499*2^832027+1
-174550772799551 | 6351*2^874484+1
174550799235079 | 7803*2^1302508+1
174550886243347 | 8783*2^1311925+1
-174550889752607 | 3299*2^1161717+1
-174550900074251 | 2993*2^1487561+1
-174550900755097 | 3675*2^1656171+1
-174550902785663 | 6123*2^493737+1
174550916859199 | 3197*2^1643463+1
-174550954763071 | 4191*2^1993452+1
174550971619799 | 5577*2^1063059+1
That is not good i think... | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Hm, not good is right. The CUDA app performed fine - all those factors are valid. But the CPU app didn't! :Q
I'm in another race, but in about 3 hours I'll have enough free memory to look into this.
____________
| |
|
|
Okay. Meanwhile i am cross checking some other ranges i have on file and keep the results posted. | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Well, I can't reproduce your CPU results. I get exactly the same results as you got with CUDA. But there could be several reasons for that.
For one thing, the version I downloaded from the PPSE Sieve Reservations page is 0.3.3. Not that 0.3.4 should produce bad results like that, but I can't compare directly.
PPSieve runs my CPU hot, at least as hot as fast LLR tests. Is it possible your machine isn't entirely stable?
Otherwise, PM me and I'll see about getting your version of files to test with.
____________
| |
|
|
Okay, i took 0.3.3 and all is fine.
Intel C2Q Q9550 vs Nvidia GeForce GTX260
diff -u fppse_174550G-174551G.txt ../ppsieve/fppse_174550G-174551G.txt
Doing 198000 to 198001 now on four plattforms:
Intel C2Q Q9550
Intel Xeon W3520
Nvidia GeForce GTX260
Nvidia Quadro FX580
I think the host could be ruled out since it does not produced faulty results in one year and boinc-WUs do validate without problems. | |
|
|
All okay, the only thing that was different is:
# head -38 ../ppsieve/fppse_198000G-198500G.txt | diff -u fppse_198000G-198001G.gtx260.txt -
--- fppse_198000G-198001G.gtx260.txt 2010-05-15 08:00:15.000000000 +0200
+++ - 2010-05-15 13:15:06.343799000 +0200
@@ -13,9 +13,9 @@
198000446092087 | 8441*2^657907+1
198000480975821 | 5379*2^1828509+1
198000523921751 | 6541*2^909876+1
-198000544962289 | 8067*2^925640+1
198000545674577 | 5467*2^1099466+1
198000546654689 | 5593*2^925632+1
+198000544962289 | 8067*2^925640+1
198000583273579 | 2783*2^1822821+1
198000609451933 | 2667*2^1881395+1
198000664307197 | 1435*2^1989456+1
But this is okay, the numbers are only in another order.
All done with 0.3.3 for cpu and 0.1.1-rc1 for cuda.
My copy of 0.3.4 for cpu must be somewhat defective. | |
|
|
What is that while computing the 198900G to 199000G range while using cuda-0.1.1-1rc?
Computation Error: no candidates found for p=198908075406077 | |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Two possibilities for computation errors. Most likely this one is because you're using 0.1.1-rc1. I believe I fixed a bug between rc2 and the final release that could rarely cause this error. It could definitely cause factors to be missed near NMax.
So please download the latest version from the link in the top post.
A computation error means that the GPU says it found some factor (it doesn't return what factor), but the CPU failed to find a factor in that range. So it could also be caused by an unstable GPU or rarely an unstable CPU.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
FYI, the source for PPSieve CUDA is now on GitHub!
http://github.com/Ken-g6/PSieve-CUDA
So now you can see the various directions I've considered. The redc branch is the current one, but I have an idea for the other branch that might pull it ahead, if I can find a large enough, fast-enough area of memory; maybe texture memory.
But first, since I've heard nothing from mfl0p, I think I'd better try to set up a WinXP VM and build a version for Windows.
____________
| |
|
|
Thank you very much for setting up the repository. This makes it easier to follow the developments. I think it is time for me to reinstall the NVIDIA drivers and their CUDA toolkit under Lucid Lynx and get my GTX 260-192 out of hibernation mode again (in the last few weeks I've crunched with a HD 4770 under Windows and Linux).
By the way: The repo contains a file named pps/ppse_37TE1.txt that is a link to a file in a /downloads/... directory that is not in the repo. Is this file too large to include in the repository or are there other reasons not to include the file?
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Heh, didn't know that file was in there. It's a 1.2 GB file, so there's no way to include it. Plus it's not going to be used with BOINC, so its only purpose here would be for testing with many_n_test.sh and maybe some of the other testing scripts. It's not useful for the testing we're doing in this thread.
Edit: By the way, the code hasn't changed in about a month. I just made the code and its previous changes easier to access.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Alright, people, I need HELP with MSVC++.
I tried compiling the source with VC++ 2008 Express. The files compile fine, but when linking, it's like no file sees any other file's header. I included the header files - even some new ones to replace missing Linux versions, so I'm not sure what's going on.
If you know anything about MSVC++ (since I don't), could you please take a look at my source code?
Thanks!
P.S. What all has to be included in the source code to save the proper build instructions? Does the .sln file need to be there? I really want to avoid including the gigantic .ncb file.
____________
| |
|
|
Ken, I took a look at your source code. I am pretty new to C++, but I think I may have spotted your problem.
In your code you used
#include <assert.h>
To include the header, I believe it needs to be inclosed in parenthesis.
#include "assert.h"
I tryied to compile the code after changing it. I got it to compile further before failing, but I think the reason I couldn't compile is because I am running the 2010 visual C++.
I also noticed that a couple of the headers you are trying to load don't appear to exist, "util.h" and "gfn_app.h" There is a "putil.h" so maybe the name was just mistyped.
Hope this helps,
____________
| |
|
Jay Volunteer tester
 Send message
Joined: 28 Apr 10 Posts: 82 ID: 59636 Credit: 10,419,429 RAC: 0
                  
|
Tanya, angle brackets (<>) are used for including non-user-written libraries that are (or should be) in your compiler path. You use the quotes ("") when what you're including is in the same directory or in the directory of another file that includes it. If it's still not found, I think it then checks the compiler path.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
I also noticed that a couple of the headers you are trying to load don't appear to exist, "util.h" and "gfn_app.h" There is a "putil.h" so maybe the name was just mistyped. No, the name was not mistyped. Look at the #ifdef's. If USE_BOINC is #define'd, it will use util.h; but I'm not trying to compile the BOINC version yet.
By the way, I wasn't sure if project-level preprocessor directives got included in the file I zipped up. You should make sure NDEBUG is #defined in the project, or you may get more errors than I did.
I couldn't find any reference to gfn_app.h or gfn_main.h. Where did you see that?
But I still don't think that will fix the 131 errors with 74 unresolved externals.
____________
| |
|
|
Jay, what you are saying sounds mostly right, so I'm not sure if you're saying I got something wrong in my earlier message. I do know that for including headers, at least with the 2010 version of Visual C++, that I need to use quotes or the header won't work, and I have used a non user-written library before: #include <iostream>. I didn't think that the non user-written library had to be included in the compilers path, although I don't know where it would be.
Perhaps I have misuderstood something, as I have done very little with C++.
____________
| |
|
|
I couldn't find any reference to gfn_app.h or gfn_main.h. Where did you see that?
Instead of bringing up the full project in visual C++, I went looking through folder at the individual files. One file was named gfn_main.c which I opened and looked at the code. That was where I saw the line
#include <gfn_app.h>
That is also where the line "#include <gfn_main.c>" is.
By the way, I wasn't sure if project-level preprocessor directives got included in the file I zipped up. You should make sure NDEBUG is #defined in the project, or you may get more errors than I did.
I'm afraid you've lost me here.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Getting better. It seems most of my problem was self-inflicted. I read something about how to get rid of LINK : warning LNK4098: defaultlib "LIBCMT" conflicts with use of other libs; use /NODEFAULTLIB:library It involved not loading a bunch of default libraries. Hence linker errors.
Another large part was solved by linking the CUDA libraries. I'm down to two unresolved references, which are probably just because those functions aren't in MSVC.
Thanks! I'll let you know if I hit any more roadblocks.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Alright, I think I have a working binary! So if you have Win32 or Win64 and want to test it, please download my source and binary zipfile, and run the usual test on the binary in the Release folder.
Next steps include making it work for BOINC and fixing a checkpointing bug in *all other versions*. Don't let me forget to do that!
____________
| |
|
|
I downloaded the zipfile and tryed to run the exe in the release folder. I got a message that the program can't start because cudart.dll is missing from my computer. I think I may have found a place to get the cudart.dll. Do I need to get it and put it in the directory with the exe, or is there something else I need to do?
____________
| |
|
|
just for fun i ran your windows cuda ppsieve on my win32 xp machine.. with NO cude card.
(yes i know)
afetr grabbing a cudart.dll (from the distributed.net beta client)
it run like this..
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: Device Emulation (CPU)
Detected compute capability: 9999.9999
Detected 16 multiprocessors.
Insufficient available memory on GPU 0.
Waiting for threads to exit
Sieve incomplete: 42070000000000 <= p < 42070000000001
Found 0 factors
count=0,sum=0x0000000000000000
Elapsed time: 0.03 sec. (0.03 init + 0.00 sieve) at -1 p/sec.
Processor time: 0.05 sec. (0.05 init + 0.00 sieve) at -1 p/sec.
Average processor utilization: 1.50 (init), -1.#J (sieve)
so.. it didnt fail. whcih is good. :)
as a comparision.. the dnetc client goes like
distributed.net client for CUDA 2.2 on Win32 Copyright 1997-2009, distributed.net
Please visit http://www.distributed.net/ for up-to-date contest information.
Start the client with '-help' for a list of valid command line options.
dnetc v2.9107-516-CTR-09122712 for CUDA 2.2 on Win32 (WindowsNT 5.1).
Please provide the *entire* version descriptor when submitting bug reports.
The distributed.net bug report pages are at http://bugs.distributed.net/
[Jun 04 01:04:36 UTC] Unable to locate CUDA module handle
[Jun 04 01:04:36 UTC] No CUDA-supported GPU found.
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
I downloaded the zipfile and tryed to run the exe in the release folder. I got a message that the program can't start because cudart.dll is missing from my computer. I think I may have found a place to get the cudart.dll. Do I need to get it and put it in the directory with the exe, or is there something else I need to do?
You should be able to copy it from the BOINC directory.
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
i7-920 (stock clocks)
6GB RAM
Vista Home Premium 64-bit [Version 6.0.6002]
BOINC suspended for tests
9500GT (512mb, factory OC card, 191.07 driver)
C:\Users\Scott\Downloads\ppsieve-cuda-vc\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
p=42070007340033, 1.490K p/sec, 0.01 CPU cores, 24.5% done. ETA 04 Jun 00:09
all factors match.
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
p=42070007340033, 1.490K p/sec, 0.01 CPU cores, 24.5% done. ETA 04 Jun 00:09
all factors match.
I did a spit take! But then I realized that's the wrong line. What did the line that starts with "Elapsed time" say?
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
p=42070007340033, 1.490K p/sec, 0.01 CPU cores, 24.5% done. ETA 04 Jun 00:09
all factors match.
I did a spit take! But then I realized that's the wrong line. What did the line that starts with "Elapsed time" say?
Sorry, I stopped it running at about 25% (I am switching the card out this evening for an ATI 4670 that I just picked up). A wall clock estimate for the total run time based on the 25% complete would be in the neighborhood of about 3-3.5 hours. Also, and interestingly, I had very little delayed screen response.
I am at home, but tomorrow I can test it on 32-bit systems with various CUDA cards (9600 GSO, 9600GS, 8600 GT, 8400 GS, 8300 GS). Might try my laptop's 8400M GS tonight...is there a memory minimum limit?
EDIT:
Okay, before pulling the 9500GT, I have gone back and run the shorter 3M test with the following output/results:
ppsieve-cuda.exe -p42070e9 -P42 070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Ignoring invalid checkpoint in ppcheck42070e9.txt
Thread 0 starting
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
p=42070001310721, 21.85K p/sec, 0.11 CPU cores, 43.7% done. ETA 03 Jun 22:38
42070001040127 | 6471*2^37907+1
p=42070001572865, 4.369K p/sec, 0.04 CPU cores, 52.4% done. ETA 03 Jun 22:40
p=42070002097153, 8.738K p/sec, 0.03 CPU cores, 69.9% done. ETA 03 Jun 22:40
p=42070002359297, 4.369K p/sec, 0.03 CPU cores, 78.6% done. ETA 03 Jun 22:41
p=42070002621441, 4.369K p/sec, 0.02 CPU cores, 87.4% done. ETA 03 Jun 22:42
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
p=42070003145729, 3.757K p/sec, 0.02 CPU cores, 104.9% done. ETA 03 Jun 22:43
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 496.42 sec. (0.03 init + 496.38 sieve) at 6337 p/sec.
Processor time: 18.74 sec. (0.05 init + 18.69 sieve) at 168320 p/sec.
Average processor utilization: 1.38 (init), 0.04 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
|
i7-920 @ 2.8 GHz
6GB RAM
Win7-64
GTX 260 Core 216 (Factory OC)
BOINC suspended for all tests
D:\Patrick\ppsieve-cuda-vc\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 27 multiprocessors.
p=42070030146561, 17.48K p/sec, 0.03 CPU cores, 100.5% done. ETA 03 Jun 21:40
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 842.70 sec. (0.03 init + 842.66 sieve) at 35775 p/sec.
Processor time: 37.74 sec. (0.03 init + 37.71 sieve) at 799528 p/sec.
Average processor utilization: 0.97 (init), 0.04 (sieve)
I swiped the cudart.dll from Collatz...
I am at home, but tomorrow I can test it on 32-bit systems with various CUDA cards (9600 GSO, 9600GS, 8600 GT, 8400 GS, 8300 GS). Might try my laptop's 8400M GS tonight...is there a memory minimum limit?
I am also curious as to what this may be...it depends on what version of the CUDA SDK this was compiled with...newer versions will run considerably faster on newer cards as well as include increased capabilities (double precision, anyone?)
I'll try getting the cudart.dll from a project like GPUGrid or Milkyway, which both use at least CUDA 2.2 (due to double precision support) and see what, if any, difference that makes...
EDIT: Whoa, put my foot in my mouth a bit there...Collatz would use 2.2...my bad... :p
And also, when I switched the cudart.dll with the one from GPUGrid, it made NO difference whatsoever...
____________
| |
|
|
Here's my result from the shorter test Scott posted above:
D:\Patrick\ppsieve-cuda-vc\Release>ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce GTX 260
Detected compute capability: 1.3
Detected 27 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 11.19 sec. (0.05 init + 11.14 sieve) at 282411 p/sec.
Processor time: 4.18 sec. (0.06 init + 4.12 sieve) at 763818 p/sec.
Average processor utilization: 1.30 (init), 0.37 (sieve)
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
OK, I had thought it was running over 1MP/s; it was just 1KP/s. I think something may be wrong with my sleep timing. I'll look into it and get back to you.
____________
| |
|
|
OK, I had thought it was running over 1MP/s; it was just 1KP/s. I think something may be wrong with my sleep timing. I'll look into it and get back to you.
I did notice (through watching GPU usage on EVGA Precision) that the GPU usage never stayed constant...it would spike for a second or two to around 75% and then fall to zero for about 10-20 seconds....
Hope that helps!
And BTW, thanks Ken for building a Windows version! It seems like it has a some more ground to cover to catch up with the linux builds, but great job nonetheless!
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Well whaddaya know - I found a second bug, also in code I had thought was stable. That makes two bugs that - while they didn't affect results - could severely impact usability.
Alright, give the newly-updated zipfile a try and I'll see what's developed tomorrow. Thanks for testing!
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Updated code on 9500GT (short test - 3M):
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce 9500 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
p=42070001310721, 21.85K p/sec, 0.11 CPU cores, 43.7% done. ETA 04 Jun 01:14
42070001040127 | 6471*2^37907+1
p=42070001572865, 4.369K p/sec, 0.04 CPU cores, 52.4% done. ETA 04 Jun 01:16
p=42070002097153, 8.738K p/sec, 0.03 CPU cores, 69.9% done. ETA 04 Jun 01:16
p=42070002359297, 4.369K p/sec, 0.03 CPU cores, 78.6% done. ETA 04 Jun 01:17
p=42070002621441, 4.369K p/sec, 0.02 CPU cores, 87.4% done. ETA 04 Jun 01:17
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
p=42070003145729, 3.763K p/sec, 0.02 CPU cores, 104.9% done. ETA 04 Jun 01:19
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 496.11 sec. (0.04 init + 496.07 sieve) at 6341 p/sec.
Processor time: 18.77 sec. (0.05 init + 18.72 sieve) at 168040 p/sec.
Average processor utilization: 1.06 (init), 0.04 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
valterc Volunteer tester Send message
Joined: 30 May 07 Posts: 121 ID: 8810 Credit: 20,947,020,974 RAC: 5,581,214
                        
|
my own test follows Q9450@3400 W7U (cudart v2.3)
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce GTX 275
Detected compute capability: 1.3
Detected 30 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 9.75 sec. (0.02 init + 9.73 sieve) at 323155 p/sec.
Processor time: 3.56 sec. (0.03 init + 3.53 sieve) at 892248 p/sec.
Average processor utilization: 2.00 (init), 0.36 (sieve) | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Pentium D 965 Extreme Edition (HT turned on)
ASUS 9600 GSO (factory OC "TOP" version, 384mb)
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce 9600 GSO
Detected compute capability: 1.1
Detected 12 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 6.56 sec. (0.05 init + 6.52 sieve) at 482798 p/sec.
Processor time: 1.39 sec. (0.06 init + 1.33 sieve) at 2368548 p/sec.
Average processor utilization: 1.33 (init), 0.20 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Same 9600 GSO on 30M test:
ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce 9600 GSO
Detected compute capability: 1.1
Detected 12 multiprocessors.
p=42070029360129, 489.3K p/sec, 0.19 CPU cores, 97.9% done. ETA 04 Jun 08:16
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 62.28 sec. (0.05 init + 62.23 sieve) at 484404 p/sec.
Processor time: 12.30 sec. (0.25 init + 12.05 sieve) at 2502438 p/sec.
Average processor utilization: 5.33 (init), 0.19 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
I ran the test on my Vista-32/Q6600/GTX280.
I won't bother posting the output from the program, because that's not the interesting part.
My GPU temperature *barely* nudged from its idle temperature. That's a really, really, bad sign and indicates that the GPU isn't being utilized efficiently.
Taking a look at the GPU utilization graph on GPU-Z, it showed that the vast majority of the time the utilization was 0%. About every 15 seconds or so, the utilization briefly spiked way up, then returned back to 0. Even stranger was that it wasn't using the CPU during the time the GPU was idle. CPU utilization was at about 10% to 20% of a single core according to task manager. (The output from the program said it was using 0.03 CPU cores, which was significantly lower than what task manager was showing.)
So, for most of the run time, it's not using the GPU or the CPU. I would guess that it's either waiting on a resource or sleeping.
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Pentium D 830 (using --device # option to test on both GPU)
9600 GS0 (ASUS factory OC "TOP" version, 384mb)
9600 GS (768 mb)
Microsoft Windows XP Home (32-bit) [Version 5.1.2600]
DEVICE 0:
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce 9600 GSO
Detected compute capability: 1.1
Detected 12 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 6.72 sec. (0.08 init + 6.64 sieve) at 473710 p/sec.
Processor time: 1.66 sec. (0.11 init + 1.55 sieve) at 2033602 p/sec.
Average processor utilization: 1.40 (init), 0.23 (sieve)
DEVICE 1:
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal --device 1
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 1: GeForce 9600 GS
Detected compute capability: 1.1
Detected 6 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 15.97 sec. (0.05 init + 15.92 sieve) at 197573 p/sec.
Processor time: 2.78 sec. (0.09 init + 2.69 sieve) at 1170503 p/sec.
Average processor utilization: 2.00 (init), 0.17 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
I ran the test on my Vista-32/Q6600/GTX280.
I won't bother posting the output from the program, because that's not the interesting part.
My GPU temperature *barely* nudged from its idle temperature. That's a really, really, bad sign and indicates that the GPU isn't being utilized efficiently.
Taking a look at the GPU utilization graph on GPU-Z, it showed that the vast majority of the time the utilization was 0%. About every 15 seconds or so, the utilization briefly spiked way up, then returned back to 0. Even stranger was that it wasn't using the CPU during the time the GPU was idle. CPU utilization was at about 10% to 20% of a single core according to task manager. (The output from the program said it was using 0.03 CPU cores, which was significantly lower than what task manager was showing.)
So, for most of the run time, it's not using the GPU or the CPU. I would guess that it's either waiting on a resource or sleeping.
Hmmm...this might show something about Vista specifically. On my 9600GSO under 32-bit XP Pro, GPU-Z shows the GPU utilization at 99% for the whole test. My unusually long 9500GT results (which on the OC'ed 32-shader card should be similar to the stock clocked 9600 GS 48-shader card) are also obtained on Vista (albeit 64-bit). Looks like something in the code is not activating the GPU properly under Vista (and I'd suspect under Win 7 also).
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
9600 GS on 30M test:
ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q --device 1
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 1: GeForce 9600 GS
Detected compute capability: 1.1
Detected 6 multiprocessors.
p=42070024641537, 204.4K p/sec, 0.15 CPU cores, 82.1% done. ETA 04 Jun 08:42
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 147.89 sec. (0.06 init + 147.83 sieve) at 203930 p/sec.
Processor time: 22.25 sec. (0.09 init + 22.16 sieve) at 1360635 p/sec.
Average processor utilization: 1.50 (init), 0.15 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
|
I redownloaded the zipfile, got the cudart.dll. Now I get this:
The application was unable to start correctly (0x000007b).
Any idea what's causing this?
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
I redownloaded the zipfile, got the cudart.dll. Now I get this:
The application was unable to start correctly (0x000007b).
Any idea what's causing this?
A stop error with that code is usually associated with a problematic boot device (usually a hard drive)...kinda weird to see it with this CUDA application. You aren't by chance trying to run it off of a USB stick?
____________
141941*2^4299438-1 is prime!
| |
|
|
No USB stick. I've actually tryed running it on several different computers, just in case it was a problem with the NVidia card. All three computers were running windows 7.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Pentium 4 (HT) 3.6Ghz
8600 GT (256mb)
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]
3M Test:
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce 8600 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 18.92 sec. (0.05 init + 18.88 sieve) at 166660 p/sec.
Processor time: 3.31 sec. (0.08 init + 3.23 sieve) at 972592 p/sec.
Average processor utilization: 1.67 (init), 0.17 (sieve)
30M Test:
ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce 8600 GT
Detected compute capability: 1.1
Detected 4 multiprocessors.
p=42070020185089, 165.3K p/sec, 0.17 CPU cores, 67.3% done. ETA 04 Jun 11:30
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 180.02 sec. (0.05 init + 179.97 sieve) at 167509 p/sec.
Processor time: 30.97 sec. (0.28 init + 30.69 sieve) at 982373 p/sec.
Average processor utilization: 6.00 (init), 0.17 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Pentium 4 (HT) 3.8 Ghz
8400 GS (256mb)
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]
3M Test:
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce 8400 GS
Detected compute capability: 1.1
Detected 2 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
p=42070002883585, 48.06K p/sec, 0.10 CPU cores, 96.1% done. ETA 04 Jun 11:43
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 63.83 sec. (0.05 init + 63.78 sieve) at 49319 p/sec.
Processor time: 6.34 sec. (0.20 init + 6.14 sieve) at 512281 p/sec.
Average processor utilization: 4.33 (init), 0.10 (sieve)
30M Test:
ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce 8400 GS
Detected compute capability: 1.1
Detected 2 multiprocessors.
p=42070028835841, 47.17K p/sec, 0.09 CPU cores, 96.1% done. ETA 04 Jun 11:54
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 632.41 sec. (0.05 init + 632.36 sieve) at 47673 p/sec.
Processor time: 59.66 sec. (0.23 init + 59.42 sieve) at 507331 p/sec.
Average processor utilization: 5.00 (init), 0.09 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Pentium 4 (HT) 3.6Ghz
8300 GS (128mb) ...This is about as slow as CUDA devices get!
Microsoft Windows XP Pro (32-bit) [Version 5.1.2600]
3M Test:
ppsieve-cuda.exe -p42070e9 -P42070003e6 -k 1201 -K 9999 -N 2000000 -z normal
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070003000000
Thread 0 starting
Detected GPU 0: GeForce 8300 GS
Detected compute capability: 1.1
Detected 1 multiprocessors.
42070000070587 | 9475*2^197534+1
42070000198537 | 3373*2^1046686+1
42070000300049 | 9139*2^461846+1
42070000345343 | 1715*2^635711+1
42070000464001 | 4179*2^1577462+1
42070000949861 | 4707*2^571847+1
42070001011573 | 7113*2^215532+1
42070001040127 | 6471*2^37907+1
p=42070001572865, 26.21K p/sec, 0.10 CPU cores, 52.4% done. ETA 04 Jun 11:33
42070002482267 | 9951*2^1920408+1
42070002690167 | 2553*2^1888870+1
42070002698543 | 4239*2^368773+1
42070002875941 | 4081*2^1494668+1
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070003000000
Found 12 factors
count=95668,sum=0x37dacb7121ccffe4
Elapsed time: 119.53 sec. (0.06 init + 119.47 sieve) at 26331 p/sec.
Processor time: 11.86 sec. (0.08 init + 11.78 sieve) at 267011 p/sec.
Average processor utilization: 1.25 (init), 0.10 (sieve)
30M Test:
ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce 8300 GS
Detected compute capability: 1.1
Detected 1 multiprocessors.
p=42070029622273, 26.21K p/sec, 0.10 CPU cores, 98.7% done. ETA 04 Jun 11:56
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 1186.17 sec. (0.06 init + 1186.11 sieve) at 25416 p/sec.
Processor time: 119.03 sec. (0.30 init + 118.73 sieve) at 253899 p/sec.
Average processor utilization: 4.75 (init), 0.10 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Alright, I'm getting the impression that sleep isn't fixed - at least not in all cases.
So, I'd like those of you who had problems with it in particular, and anyone else, to test the version I just uploaded, and please report the sleep diagnostics it outputs. By the way, "OVERslept by WAY TOO LONG!" is the line that indicates trouble; but I can only learn the magnitude of the trouble from the other lines.
The other option is to use one CPU core 100%, but I'd like to avoid that if I can.
____________
| |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
I've have this oversleping, too.
just some lines, there are really a lot more of these:
Will sleep 590625 usec next time.
Sleeping 590625 usec.
Actually sleeping 590625 usec.
OVERslept by 46875 usec.
Will sleep 543750 usec next time.
Sleeping 543750 usec.
Actually sleeping 543750 usec.
OVERslept by 46875 usec.
Will sleep 496875 usec next time.
Sleeping 496875 usec.
Actually sleeping 496875 usec.
OVERslept by 46875 usec.
Will sleep 450000 usec next time.
Sleeping 450000 usec.
Actually sleeping 450000 usec.
OVERslept by 46875 usec.
Will sleep 403125 usec next time.
Sleeping 403125 usec.
Actually sleeping 403125 usec.
OVERslept by 15625 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping 371875 usec.
Underslept by 0 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 403125 usec next time.
Sleeping 403125 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping 387500 usec.
Underslept by 0 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 403125 usec next time.
Sleeping 403125 usec.
Actually sleeping 403125 usec.
Underslept by 390625 usec.
Will sleep 793750 usec next time.
Sleeping 793750 usec.
Actually sleeping 793750 usec.
OVERslept by 46875 usec.
Will sleep 746875 usec next time.
Sleeping 746875 usec.
Actually sleeping 746875 usec.
OVERslept by 46875 usec.
Will sleep 700000 usec next time.
Sleeping 700000 usec.
Actually sleeping 700000 usec.
OVERslept by 46875 usec.
Will sleep 653125 usec next time.
Sleeping 653125 usec.
Actually sleeping 653125 usec.
OVERslept by 46875 usec.
Will sleep 606250 usec next time.
Sleeping 606250 usec.
Actually sleeping 590625 usec.
OVERslept by WAY TOO LONG!
Will sleep 543750 usec next time.
Sleeping 543750 usec.
Actually sleeping 528125 usec.
OVERslept by 46875 usec.
Will sleep 496875 usec next time.
Sleeping 496875 usec.
Actually sleeping 496875 usec.
OVERslept by WAY TOO LONG!
Will sleep 434375 usec next time.
Sleeping 434375 usec.
Actually sleeping 418750 usec.
OVERslept by 31250 usec.
Will sleep 403125 usec next time.
Sleeping 403125 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping 371875 usec.
Underslept by 0 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 403125 usec next time.
Sleeping 403125 usec.
Actually sleeping 403125 usec.
OVERslept by 15625 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping 387500 usec.
Underslept by 0 usec.
Will sleep 387500 usec next time.
Sleeping 387500 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 403125 usec next time.
Nvidia 8800 GTS 320MB G80
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 104.80 sec. (0.03 init + 104.77 sieve) at 287752 p/sec.
Processor time: 4.61 sec. (0.05 init + 4.56 sieve) at 6607465 p/sec.
Average processor utilization: 1.50 (init), 0.04 (sieve)
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
Sleeping 403125 usec.
Actually sleeping 403125 usec.
Underslept by 390625 usec.
Will sleep 793750 usec next time.
There's the "money shot". Combined with other parts, this tells me that the timing is far too random for the current method to work.
I'll look at other options. I didn't see any timing less than 300,000 usec, so I might try finding the minimum.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
Ok, here's the results:
Vista-32/Q6600/GTX28
BOINC shut down
C:\Temp\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GTX 280
Detected compute capability: 1.3
Detected 30 multiprocessors.
Sleeping 0 usec.
Underslept by 826000 usec.
Will sleep 826000 usec next time.
Sleeping 826000 usec.
Actually sleeping 785000 usec.
Underslept by 825000 usec.
Will sleep 1651000 usec next time.
Sleeping 1651000 usec.
Actually sleeping 1628000 usec.
Underslept by 825000 usec.
Will sleep 2476000 usec next time.
Sleeping 2476000 usec.
Actually sleeping 2464000 usec.
Underslept by 825000 usec.
Will sleep 3301000 usec next time.
Sleeping 3301000 usec.
Actually sleeping 3261000 usec.
Underslept by 825000 usec.
Will sleep 4126000 usec next time.
Sleeping 4126000 usec.
Actually sleeping 4085000 usec.
Underslept by 825000 usec.
Will sleep 4951000 usec next time.
Sleeping 4951000 usec.
Actually sleeping 4933000 usec.
Underslept by 825000 usec.
Will sleep 5776000 usec next time.
Sleeping 5776000 usec.
Actually sleeping 5753000 usec.
Underslept by 825000 usec.
Will sleep 6601000 usec next time.
Sleeping 6601000 usec.
Actually sleeping 6578000 usec.
Underslept by 825000 usec.
Will sleep 7426000 usec next time.
Sleeping 7426000 usec.
Actually sleeping 7408000 usec.
Underslept by 825000 usec.
Will sleep 8251000 usec next time.
Sleeping 8251000 usec.
Actually sleeping 8208000 usec.
Underslept by 825000 usec.
Will sleep 9076000 usec next time.
Sleeping 9076000 usec.
Actually sleeping 9047000 usec.
Underslept by 825000 usec./sec, 0.17 CPU cores, 31.5% done. ETA 04 Jun 13:44
Will sleep 9901000 usec next time.
Sleeping 9901000 usec.
Actually sleeping 9872000 usec.
Underslept by 825000 usec.
Will sleep 10726000 usec next time.
Sleeping 10726000 usec.
Actually sleeping 10685000 usec.
Underslept by 826000 usec.
Will sleep 11552000 usec next time.
Sleeping 11552000 usec.
Actually sleeping 11544000 usec.
Underslept by 825000 usec.
Will sleep 12377000 usec next time.
Sleeping 12377000 usec.
Actually sleeping 12339000 usec.
Underslept by 828000 usec.
Will sleep 13205000 usec next time.
Sleeping 13205000 usec.
Actually sleeping 13146000 usec.
Underslept by 826000 usec./sec, 0.08 CPU cores, 43.7% done. ETA 04 Jun 13:45
Will sleep 14031000 usec next time.
Sleeping 14031000 usec.
Actually sleeping 14001000 usec.
Underslept by 825000 usec.
Will sleep 14856000 usec next time.
Sleeping 14856000 usec.
Actually sleeping 14831000 usec.
Underslept by 827000 usec.
Will sleep 15683000 usec next time.
Sleeping 15683000 usec.
Actually sleeping 15670000 usec.
Underslept by 825000 usec.
Will sleep 16508000 usec next time.
Sleeping 16508000 usec.
Actually sleeping 16472000 usec.
Underslept by 827000 usec./sec, 0.06 CPU cores, 53.3% done. ETA 04 Jun 13:46
Will sleep 17335000 usec next time.
Sleeping 17335000 usec.
Actually sleeping 17310000 usec.
Underslept by 826000 usec.
Will sleep 18161000 usec next time.
Sleeping 18161000 usec.
Actually sleeping 18128000 usec.
Underslept by 825000 usec.
Will sleep 18986000 usec next time.
Sleeping 18986000 usec.
Actually sleeping 18950000 usec.
Underslept by 825000 usec./sec, 0.04 CPU cores, 61.2% done. ETA 04 Jun 13:47
Will sleep 19811000 usec next time.
Sleeping 19811000 usec.
Actually sleeping 19758000 usec.
Underslept by 825000 usec.
Will sleep 20636000 usec next time.
Sleeping 20636000 usec.
Actually sleeping 20606000 usec.
Underslept by 827000 usec.
Will sleep 21463000 usec next time.
Sleeping 21463000 usec.
Actually sleeping 21432000 usec.
Underslept by 825000 usec./sec, 0.05 CPU cores, 68.2% done. ETA 04 Jun 13:48
Will sleep 22288000 usec next time.
Sleeping 22288000 usec.
Actually sleeping 22276000 usec.
Underslept by 826000 usec.
Will sleep 23114000 usec next time.
Sleeping 23114000 usec.
Actually sleeping 23096000 usec.
Underslept by 826000 usec.
Will sleep 23940000 usec next time.
Sleeping 23940000 usec.
Actually sleeping 23938000 usec.
Underslept by 825000 usec.
Will sleep 24765000 usec next time.
Sleeping 24765000 usec.
Actually sleeping 24746000 usec.
Underslept by 825000 usec.
Will sleep 25590000 usec next time.
Sleeping 25590000 usec.
Actually sleeping 25577000 usec.
Underslept by 826000 usec./sec, 0.04 CPU cores, 78.6% done. ETA 04 Jun 13:50
Will sleep 26416000 usec next time.
Sleeping 26416000 usec.
Actually sleeping 26391000 usec.
Underslept by 826000 usec.
Will sleep 27242000 usec next time.
Sleeping 27242000 usec.
Actually sleeping 27217000 usec.
Underslept by 825000 usec./sec, 0.03 CPU cores, 83.0% done. ETA 04 Jun 13:50
Will sleep 28067000 usec next time.
Sleeping 28067000 usec.
Actually sleeping 28041000 usec.
Underslept by 825000 usec.
Will sleep 28892000 usec next time.
Sleeping 28892000 usec.
Actually sleeping 28856000 usec.
Underslept by 825000 usec./sec, 0.03 CPU cores, 88.3% done. ETA 04 Jun 13:51
Will sleep 29717000 usec next time.
Sleeping 29717000 usec.
Actually sleeping 29693000 usec.
Underslept by 826000 usec.
Will sleep 30543000 usec next time.
Sleeping 30543000 usec.
Actually sleeping 30518000 usec.
Underslept by 825000 usec.
Will sleep 31368000 usec next time.
Sleeping 31368000 usec.
Actually sleeping 31345000 usec.
Underslept by 825000 usec.
Will sleep 32193000 usec next time.
Sleeping 32193000 usec.
Actually sleeping 32158000 usec.
Underslept by 824000 usec.
Will sleep 33017000 usec next time.
Sleeping 33017000 usec.
Actually sleeping 33015000 usec.
Underslept by 824000 usec./sec, 0.03 CPU cores, 97.9% done. ETA 04 Jun 13:52
Will sleep 33841000 usec next time.
Sleeping 33841000 usec.
Actually sleeping 33817000 usec.
Underslept by 826000 usec./sec, 0.02 CPU cores, 100.5% done. ETA 04 Jun 13:53
Will sleep 34667000 usec next time.
Sleeping 34667000 usec.
Actually sleeping 34621000 usec.
Underslept by 826000 usec.
Will sleep 35493000 usec next time.
Sleeping 35493000 usec.
Actually sleeping 35475000 usec.
Underslept by 824000 usec.sec, 0.03 CPU cores, 100.5% done. ETA 04 Jun 13:54
Will sleep 36317000 usec next time.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 819.74 sec. (0.05 init + 819.70 sieve) at 36778 p/sec.
Processor time: 39.81 sec. (0.05 init + 39.76 sieve) at 758125 p/sec.
Average processor utilization: 1.02 (init), 0.05 (sieve)
Same GPU spiking as before; average utilization was 5%.
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
There's the "money shot". Combined with other parts, this tells me that the timing is far too random for the current method to work.
I'll look at other options. I didn't see any timing less than 300,000 usec, so I might try finding the minimum.
Here are the results from one of the 9600GSO on XP Pro which look a fair bit different from the others:
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping 262500 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping 262500 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 231250 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 231250 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
p=42070028573697, 476.2K p/sec, 0.10 CPU cores, 95.2% done. ETA 04 Jun 14:53 Ac
tually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping -34375 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 63.91 sec. (0.05 init + 63.86 sieve) at 472077 p/sec.
Processor time: 6.56 sec. (0.27 init + 6.30 sieve) at 4787543 p/sec.
Average processor utilization: 5.67 (init), 0.10 (sieve)
____________
141941*2^4299438-1 is prime!
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
That is just weird!
The only way I can see that happening is if computation doesn't start when the kernel is called. Since it's not happening to everyone, maybe it's a CUDA runtime dll problem?
So let's all standardize on this DLL and try again.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
That is just weird!
The only way I can see that happening is if computation doesn't start when the kernel is called. Since it's not happening to everyone, maybe it's a CUDA runtime dll problem?
So let's all standardize on this DLL and try again.
Downloaded and ran, but my results look essentially the same:
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping -65625 usec.
.
.
.
Sleeping 246875 usec.
Actually sleeping -50000 usec.
Underslept by 0 usec.
Will sleep 246875 usec next time.
Sleeping 246875 usec.
Actually sleeping 246875 usec.
Underslept by 15625 usec.
Will sleep 262500 usec next time.
Sleeping 262500 usec.
Actually sleeping 262500 usec.
OVERslept by 15625 usec.
Will sleep 246875 usec next time.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 62.47 sec. (0.03 init + 62.44 sieve) at 482828 p/sec.
Processor time: 8.92 sec. (0.27 init + 8.66 sieve) at 3482635 p/sec.
Average processor utilization: 8.50 (init), 0.14 (sieve)
Will give my Vista boxes (32-bit and 64-bit) a try when i get home.
____________
141941*2^4299438-1 is prime!
| |
|
HAmsty Volunteer tester
 Send message
Joined: 26 Dec 08 Posts: 132 ID: 33421 Credit: 12,510,712 RAC: 0
                
|
same on my side. i've pm'ed ken my std error output.
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
same on my side. i've pm'ed ken my std error output.
Same as Scott; not the same as you had before. Note the lack of "OVERslept by WAY TOO LONG!" messages. Combined with the steady rate, that tells me the new DLL made the run efficient. :)
Now I'm just hoping it fixes Michael Goetz's problem too.
____________
| |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14045 ID: 53948 Credit: 483,803,363 RAC: 628,990
                               
|
That is just weird!
The only way I can see that happening is if computation doesn't start when the kernel is called. Since it's not happening to everyone, maybe it's a CUDA runtime dll problem?
So let's all standardize on this DLL and try again.
Ok, here's he result using that DLL, which I believe is the same as the one I was already using. Note that the sleep time starts at just under 1 second and steadily increases to around 30 seconds. During the sleep time, the GPU is idle.
EDIT: Correction, the new DLL is not the same one I used before. WinRAR seems to have a bug and was telling me they were the same when they were not. Nevertheless, this test was done with the correct DLL.
C:\Temp\Release>ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Thread 0 starting
Detected GPU 0: GeForce GTX 280
Detected compute capability: 1.3
Detected 30 multiprocessors.
Sleeping 0 usec.
Underslept by 831000 usec.
Will sleep 831000 usec next time.
Sleeping 831000 usec.
Actually sleeping 778000 usec.
Underslept by 828000 usec.
Will sleep 1659000 usec next time.
Sleeping 1659000 usec.
Actually sleeping 1631000 usec.
Underslept by 833000 usec.
Will sleep 2492000 usec next time.
Sleeping 2492000 usec.
Actually sleeping 2470000 usec.
Underslept by 830000 usec.
Will sleep 3322000 usec next time.
Sleeping 3322000 usec.
Actually sleeping 3263000 usec.
Underslept by 828000 usec.
Will sleep 4150000 usec next time.
Sleeping 4150000 usec.
Actually sleeping 4095000 usec.
Underslept by 834000 usec.
Will sleep 4984000 usec next time.
Sleeping 4984000 usec.
Actually sleeping 4954000 usec.
Underslept by 829000 usec.
Will sleep 5813000 usec next time.
Sleeping 5813000 usec.
Actually sleeping 5780000 usec.
Underslept by 844000 usec.
Will sleep 6657000 usec next time.
Sleeping 6657000 usec.
Actually sleeping 6623000 usec.
Underslept by 829000 usec.
Will sleep 7486000 usec next time.
Sleeping 7486000 usec.
Actually sleeping 7458000 usec.
Underslept by 829000 usec.
Will sleep 8315000 usec next time.
Sleeping 8315000 usec.
Actually sleeping 8255000 usec.
Underslept by 830000 usec.
Will sleep 9145000 usec next time.
Sleeping 9145000 usec.
Actually sleeping 9098000 usec.
Underslept by 829000 usec./sec, 0.16 CPU cores, 31.5% done. ETA 04 Jun 15:52
Will sleep 9974000 usec next time.
Sleeping 9974000 usec.
Actually sleeping 9939000 usec.
Underslept by 904000 usec.
Will sleep 10878000 usec next time.
Sleeping 10878000 usec.
Actually sleeping 10812000 usec.
Underslept by 827000 usec.
Will sleep 11705000 usec next time.
Sleeping 11705000 usec.
Actually sleeping 11694000 usec.
Underslept by 829000 usec.
Will sleep 12534000 usec next time.
Sleeping 12534000 usec.
Actually sleeping 12479000 usec.
Underslept by 835000 usec.
Will sleep 13369000 usec next time.
Sleeping 13369000 usec.
Actually sleeping 13297000 usec.
Underslept by 862000 usec./sec, 0.07 CPU cores, 43.7% done. ETA 04 Jun 15:53
Will sleep 14231000 usec next time.
Sleeping 14231000 usec.
Actually sleeping 14122000 usec.
Underslept by 829000 usec.
Will sleep 15060000 usec next time.
Sleeping 15060000 usec.
Actually sleeping 15031000 usec.
Underslept by 828000 usec.
Will sleep 15888000 usec next time.
Sleeping 15888000 usec.
Actually sleeping 15867000 usec.
Underslept by 833000 usec.
Will sleep 16721000 usec next time.
Sleeping 16721000 usec.
Actually sleeping 16674000 usec.
Underslept by 839000 usec./sec, 0.06 CPU cores, 53.3% done. ETA 04 Jun 15:54
Will sleep 17560000 usec next time.
Sleeping 17560000 usec.
Actually sleeping 17525000 usec.
Underslept by 826000 usec.
Will sleep 18386000 usec next time.
Sleeping 18386000 usec.
Actually sleeping 18317000 usec.
Underslept by 826000 usec.
Will sleep 19212000 usec next time.
Sleeping 19212000 usec.
Actually sleeping 19164000 usec.
Underslept by 825000 usec./sec, 0.04 CPU cores, 61.2% done. ETA 04 Jun 15:55
Will sleep 20037000 usec next time.
Sleeping 20037000 usec.
Actually sleeping 19965000 usec.
Underslept by 827000 usec.
Will sleep 20864000 usec next time.
Sleeping 20864000 usec.
Actually sleeping 20821000 usec.
Underslept by 826000 usec.
Will sleep 21690000 usec next time.
Sleeping 21690000 usec.
Actually sleeping 21620000 usec.
Underslept by 832000 usec./sec, 0.04 CPU cores, 68.2% done. ETA 04 Jun 15:56
Will sleep 22522000 usec next time.
Sleeping 22522000 usec.
Actually sleeping 22498000 usec.
Underslept by 827000 usec.
Will sleep 23349000 usec next time.
Sleeping 23349000 usec.
Actually sleeping 23328000 usec.
Underslept by 827000 usec.
Will sleep 24176000 usec next time.
Sleeping 24176000 usec.
Actually sleeping 24172000 usec.
Underslept by 826000 usec.
Will sleep 25002000 usec next time.
Sleeping 25002000 usec.
Actually sleeping 24975000 usec.
Underslept by 825000 usec.
Will sleep 25827000 usec next time.
Sleeping 25827000 usec.
Actually sleeping 25810000 usec.
Underslept by 826000 usec./sec, 0.03 CPU cores, 78.6% done. ETA 04 Jun 15:57
Will sleep 26653000 usec next time.
Sleeping 26653000 usec.
Actually sleeping 26623000 usec.
Underslept by 827000 usec.
Will sleep 27480000 usec next time.
Sleeping 27480000 usec.
Actually sleeping 27452000 usec.
Underslept by 838000 usec./sec, 0.03 CPU cores, 83.0% done. ETA 04 Jun 15:58
Will sleep 28318000 usec next time.
Sleeping 28318000 usec.
Actually sleeping 28285000 usec.
Underslept by 827000 usec.
Will sleep 29145000 usec next time.
Sleeping 29145000 usec.
Actually sleeping 29082000 usec.
Underslept by 862000 usec./sec, 0.03 CPU cores, 88.3% done. ETA 04 Jun 15:59
Will sleep 30007000 usec next time.
Sleeping 30007000 usec.
Actually sleeping 29962000 usec.
Underslept by 829000 usec.
Will sleep 30836000 usec next time.
Sleeping 30836000 usec.
Actually sleeping 30793000 usec.
Underslept by 825000 usec.
Will sleep 31661000 usec next time.
Sleeping 31661000 usec.
Actually sleeping 31632000 usec.
Underslept by 826000 usec.
Will sleep 32487000 usec next time.
Sleeping 32487000 usec.
Actually sleeping 32446000 usec.
Underslept by 831000 usec.
Will sleep 33318000 usec next time.
Sleeping 33318000 usec.
Actually sleeping 33315000 usec.
Underslept by 826000 usec./sec, 0.03 CPU cores, 97.9% done. ETA 04 Jun 16:00
Will sleep 34144000 usec next time.
Sleeping 34144000 usec.
Actually sleeping 34107000 usec.
Underslept by 827000 usec./sec, 0.01 CPU cores, 100.5% done. ETA 04 Jun 16:01
Will sleep 34971000 usec next time.
Sleeping 34971000 usec.
Actually sleeping 34914000 usec.
Underslept by 829000 usec.
Will sleep 35800000 usec next time.
Sleeping 35800000 usec.
Actually sleeping 35778000 usec.
Underslept by 824000 usec.sec, 0.03 CPU cores, 100.5% done. ETA 04 Jun 16:02
Will sleep 36624000 usec next time.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 828.16 sec. (0.04 init + 828.12 sieve) at 36404 p/sec.
Processor time: 37.16 sec. (0.03 init + 37.13 sieve) at 811958 p/sec.
Average processor utilization: 0.78 (init), 0.04 (sieve)
I'm running driver version 197.45
Vista -32 SP3
Q6600
GTX 280
____________
My lucky number is 75898524288+1 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
With the same DLL, I am getting the exact same results (just much longer) on my 9500GT in a Vista 64-bit system (driver 191.07). GPU-Z shows off-and-n spikes to about 30%, but never higher for GPU load.
So far, all the issues are on Vista even with the same DLL and across multiple driver versions. We really need a Win7 machine to test this, too (unfortunately my one Win7 box has an ATI card).
____________
141941*2^4299438-1 is prime!
| |
|
|
Here's my results:
Windows XP 32-bit SP3
Intel E6850
8500GT
latest program and standard cuda.dll (of this thread)
Driver 196.21
BOINC suspended
OVERslept by 46875 usec. p/sec, 0.00 CPU cores, 97.0% done. ETA 04 Jun 23:17
Will sleep 903125 usec next time.
Sleeping 903125 usec.
Actually sleeping 903125 usec.
OVERslept by 46875 usec.
Will sleep 856250 usec next time.
Sleeping 856250 usec.
Actually sleeping 856250 usec.
OVERslept by 46875 usec.
Will sleep 809375 usec next time.
Sleeping 809375 usec.
Actually sleeping 809375 usec.
OVERslept by 46875 usec.
Will sleep 762500 usec next time.
Sleeping 762500 usec.
Actually sleeping 762500 usec.
OVERslept by 46875 usec.
Will sleep 715625 usec next time.
Sleeping 715625 usec.
Actually sleeping 715625 usec.
OVERslept by WAY TOO LONG!
Will sleep 653125 usec next time.
Sleeping 653125 usec.
Actually sleeping 637500 usec.
OVERslept by WAY TOO LONG!
Will sleep 590625 usec next time.
Sleeping 590625 usec.
Actually sleeping 590625 usec.
OVERslept by 46875 usec.
Will sleep 543750 usec next time.
Sleeping 543750 usec.
Actually sleeping 543750 usec.
OVERslept by 46875 usec.
Will sleep 496875 usec next time.
Sleeping 496875 usec.
Actually sleeping 496875 usec.
OVERslept by 46875 usec.
Will sleep 450000 usec next time.
Sleeping 450000 usec.
Actually sleeping 450000 usec.
OVERslept by 46875 usec.
Will sleep 403125 usec next time.
Sleeping 403125 usec.
Actually sleeping 403125 usec.
OVERslept by 46875 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 325000 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 325000 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
Underslept by 0 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 0 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Sleeping 356250 usec.
Actually sleeping 356250 usec.
OVERslept by 15625 usec.
Will sleep 340625 usec next time.
Sleeping 340625 usec.
Actually sleeping 340625 usec.
Underslept by 15625 usec.
Will sleep 356250 usec next time.
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 1139.53 sec. (0.06 init + 1139.47 sieve) at 26457 p/sec.
Processor time: 19.95 sec. (0.05 init + 19.91 sieve) at 1514427 p/sec.
Average processor utilization: 0.75 (init), 0.02 (sieve)
____________
| |
|
Ken_g6 Volunteer developer
 Send message
Joined: 4 Jul 06 Posts: 941 ID: 3110 Credit: 265,313,309 RAC: 65,012
                            
|
OVERslept by WAY TOO LONG! Nuts, that didn't work.
OK, one more try. I think I've figured out the CUDA initialization here. I can't test it because I don't have a real card, but if it works the app should be perfectly efficient.
____________
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
Nuts, that didn't work.
OK, one more try. I think I've figured out the CUDA initialization here. I can't test it because I don't have a real card, but if it works the app should be perfectly efficient.
T8100 Vostro 1510
8400M GS (256mb)
Microsoft Windows Vista (32-bit) [Version 6.0.6002]
ppsieve-cuda.exe -p42070e9 -P42070030e6 -k 1201 -K 9999 -N 2000000 -z normal -q
ppsieve version cuda-0.1.1 (testing)
nstart=76, nstep=32, gpu_nstep=32
ppsieve initialized: 1201 <= k <= 9999, 76 <= n <= 2000000
Sieve started: 42070000000000 <= p < 42070030000000
Resuming from checkpoint p=42070004194305 in ppcheck42070e9.txt
Thread 0 starting
Detected GPU 0: GeForce 8400M GS
Detected compute capability: 1.1
Detected 2 multiprocessors.
p=42070029097985, 34.95K p/sec, 0.01 CPU cores, 97.0% done. ETA 04 Jun 20:15
Thread 0 completed
Waiting for threads to exit
Sieve complete: 42070000000000 <= p < 42070030000000
Found 97 factors
count=955289,sum=0x2dbc17167afb6a8d
Elapsed time: 698.68 sec. (0.37 init + 698.31 sieve) at 37164 p/sec.
Processor time: 4.38 sec. (0.14 init + 4.24 sieve) at 6116159 p/sec.
Average processor utilization: 0.38 (init), 0.01 (sieve)
GPU-Z reported 96%-99% GPU utilization (mostly 99%) with 38mb of VRAM used and GPU temps were excellent.
I will test in a little while on the 9500GT in 64-bit Vista, but I think this got it. Nicely done Ken!
____________
141941*2^4299438-1 is prime!
| |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2420 ID: 1178 Credit: 20,215,893,042 RAC: 23,936,759
                                                
|
|