Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Number crunching :
Genefer on 5070 Ti
| Author |
Message |
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2883 ID: 29980 Credit: 716,426,128 RAC: 203,941
                                   
|
|
I got the MSI Ventus 5070 Ti. Ran some Genefer on it, thought some might be interested in how it performs. The Ventus is a lower cost model. They call it an "OC" model but it looks like MSI did the bare minimum to call it that. As such it is a lower performing model and close to reference.
Genefer 19 - 93% load, 250W, 1273s average
Genefer 18 - 85% load, 195W, 424s (1 unit)
Genefer 17 - 70% load, 150W, 182s average
Wonder how it compares to 4080 or 4080 Super as the nearest gaming performance models of last gen. Up to Ampere at least I saw greater compute improvements gen on gen relative to gaming. Never tested Ada in the same way.
If anyone has specific test requests I'll consider them. Must be on Windows. | |
|
EA6LE   Send message
Joined: 4 Feb 21 Posts: 65 ID: 1345478 Credit: 14,178,312,718 RAC: 14,121,605
                            
|
|
looks like is slower than a 4070TI
check his link: https://www.primegrid.com/forum_thread.php?id=10186 | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2526 ID: 1178 Credit: 27,697,724,091 RAC: 4,968,202
                                                           
|
|
Hmmm...maybe the drivers are not optimal yet?
That performance is very underwhelming. By comparison (running single units), the GFN18 time is only 30s faster than my RTX 4070 (not super or Ti) founder's edition. The GFN17 time is only 4s faster than my RTX 4060Ti (also founder's edition), and I get notably better throughout with that card when running two units at a time.
I'd be very curious to see your times running two GFN17 and two GFN18 at a time on that card (GFN17 is better throughout on the 4060Ti and 4070, but GFN18 is not better on either when doubling up).
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2883 ID: 29980 Credit: 716,426,128 RAC: 203,941
                                   
|
looks like is slower than a 4070TI
check his link: https://www.primegrid.com/forum_thread.php?id=10186
Those results are from 2 years ago. Work today could be very different. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2883 ID: 29980 Credit: 716,426,128 RAC: 203,941
                                   
|
That performance is very underwhelming. By comparison (running single units), the GFN18 time is only 30s faster than my RTX 4070 (not super or Ti) founder's edition. The GFN17 time is only 4s faster than my RTX 4060Ti (also founder's edition), and I get notably better throughout with that card when running two units at a time.
Looking through your systems, your 4070 ran 3x GFN17 a week or so ago at around 280 seconds. Anything to note about its config? Looks like they're singles, if so that's quite a bit slower than the 182s I got. I also have a 4070 FE, running a couple units just now 212s average of two. Skimming through some of your many listed 4060 Ti systems I see the faster ones around 330s.
I did mention the reported loading since I knew the smaller units can't fully load the GPU. Anyone got a link to how to set up to run 2 units at once on GPU? | |
|
tng Send message
Joined: 29 Aug 10 Posts: 601 ID: 66603 Credit: 63,876,606,271 RAC: 15,486,133
                                                             
|
|
I've been doing some testing on a 5080 and a 5070TI. Still incomplete, especially for the TI.
(Hmm -- can't make an image work)
Basically unimpressive except for the 5080 on DYFL.
____________
| |
|
tng Send message
Joined: 29 Aug 10 Posts: 601 ID: 66603 Credit: 63,876,606,271 RAC: 15,486,133
                                                             
|
looks like is slower than a 4070TI
check his link: https://www.primegrid.com/forum_thread.php?id=10186
Those results are from 2 years ago. Work today could be very different.
Testing underway now.
____________
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2883 ID: 29980 Credit: 716,426,128 RAC: 203,941
                                   
|
I've been doing some testing on a 5080 and a 5070TI. Still incomplete, especially for the TI.
I just ran one unit of AP27, taking 169s and around 270W. The system does have various things loaded as it is in use which may have some impact. Also as mentioned mine is not very factory OC board. Which model do you have? Your time of 146s is about 16% faster.
Edit: forgot to say, in gaming testing, I found very little difference in performance at 250W power limit vs default 300W. Some didn't even pass 250W normally, but those that did exceed it only lost less than 2% from the lower power limit. This might be an area to explore later. When manually overclocking I saw about 12% increase, which appears to be directly scaling with the core clock. | |
|
tng Send message
Joined: 29 Aug 10 Posts: 601 ID: 66603 Credit: 63,876,606,271 RAC: 15,486,133
                                                             
|
I've been doing some testing on a 5080 and a 5070TI. Still incomplete, especially for the TI.
I just ran one unit of AP27, taking 169s and around 270W. The system does have various things loaded as it is in use which may have some impact. Also as mentioned mine is not very factory OC board. Which model do you have? Your time of 146s is about 16% faster.
Edit: forgot to say, in gaming testing, I found very little difference in performance at 250W power limit vs default 300W. Some didn't even pass 250W normally, but those that did exceed it only lost less than 2% from the lower power limit. This might be an area to explore later. When manually overclocking I saw about 12% increase, which appears to be directly scaling with the core clock.
My card is the ASUS TUF Gaming OC.
____________
| |
|
tng Send message
Joined: 29 Aug 10 Posts: 601 ID: 66603 Credit: 63,876,606,271 RAC: 15,486,133
                                                             
|
|
This task was on a 4090.
This was a 5080.
5080 only an hour slower.
____________
| |
|
Chooka   Send message
Joined: 15 May 18 Posts: 422 ID: 1014486 Credit: 1,731,155,421 RAC: 303,783
                                
|
|
Following with interest.
So far the 50 series looks like a complete dud! Especially on price. I can't justify dropping AUD$2500 on a 5080. No way. Same with the 5070Ti - Cheapest is AUD$1700...... for a 70 series! Insane.
It will be interesting to see if the 9070 XT puts downward pressure on the ridiculous NGREEDIA prices.
Appreciate the testing guys. Thank you.
____________
Слава Україні! | |
|
Chooka   Send message
Joined: 15 May 18 Posts: 422 ID: 1014486 Credit: 1,731,155,421 RAC: 303,783
                                
|
I've been doing some testing on a 5080 and a 5070TI. Still incomplete, especially for the TI.
I just ran one unit of AP27, taking 169s and around 270W. The system does have various things loaded as it is in use which may have some impact. Also as mentioned mine is not very factory OC board. Which model do you have? Your time of 146s is about 16% faster.
Edit: forgot to say, in gaming testing, I found very little difference in performance at 250W power limit vs default 300W. Some didn't even pass 250W normally, but those that did exceed it only lost less than 2% from the lower power limit. This might be an area to explore later. When manually overclocking I saw about 12% increase, which appears to be directly scaling with the core clock.
I'm doing some AP now with my 4070 Ti (non super) and run times are 188 seconds HOWEVER, I am running the CPU at the moment and I have my card undervolted to 900mV so it's only drawing 170W.
____________
Слава Україні! | |
|
Chooka   Send message
Joined: 15 May 18 Posts: 422 ID: 1014486 Credit: 1,731,155,421 RAC: 303,783
                                
|
|
My quick testing on my 4070 Ti (non super)
GFN-18 - 370 sec - 250W
GFN-19 - 1105 sec (stock gpu) - 285W - 97%
GFN-19 - 1300 sec (undervolted to 900mV) - 167W - 96% - 75 degrees.
CPU is currently crunch PRS on the cores. It's also only a 3900X CPU so not the fastest.
I'm not sure why your 5070Ti would be slower on GFN-19?
Same with GFN-18 (370 vs 424)
Interesting stuff!
____________
Слава Україні! | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2883 ID: 29980 Credit: 716,426,128 RAC: 203,941
                                   
|
|
Interesting results there. I have no idea what it means for now. I'll try to run some more/bigger units overnight when I'm not using that system on both my 5070 Ti and 4070. | |
|
Scott Brown Volunteer moderator Project administrator Volunteer tester Project scientist
 Send message
Joined: 17 Oct 05 Posts: 2526 ID: 1178 Credit: 27,697,724,091 RAC: 4,968,202
                                                           
|
That performance is very underwhelming. By comparison (running single units), the GFN18 time is only 30s faster than my RTX 4070 (not super or Ti) founder's edition. The GFN17 time is only 4s faster than my RTX 4060Ti (also founder's edition), and I get notably better throughout with that card when running two units at a time.
Looking through your systems, your 4070 ran 3x GFN17 a week or so ago at around 280 seconds. Anything to note about its config? Looks like they're singles, if so that's quite a bit slower than the 182s I got. I also have a 4070 FE, running a couple units just now 212s average of two. Skimming through some of your many listed 4060 Ti systems I see the faster ones around 330s.
I did mention the reported loading since I knew the smaller units can't fully load the GPU. Anyone got a link to how to set up to run 2 units at once on GPU?
The 182s value was from my wife's machine with a 4060Ti Founders running single GFN17 units (less than 90% load). The 330s value is from my 4060Ti Founders running 2xGFN17 units (around a 95% load).
The 4070 Founder (my office machine) with the 280s times is also set to run 2xGFN17 units. Haven't run single GFN17 units on that card is quite some time, but it is likely at around ~150s times in that configuration.
| |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2883 ID: 29980 Credit: 716,426,128 RAC: 203,941
                                   
|
My quick testing on my 4070 Ti (non super)
GFN-18 - 370 sec - 250W
Ran some more GFN18:
5070 Ti: 441s average of 7 units
4070 FE: 488s average of 48 units
It is looking like Blackwell is slower than Ada of a similar tier. I guess the question is why? I think they changed something about the core arrangement when it comes to compute. Will have to take a closer look now. Edit: in a quick reading in Ada the cores in each SM could all do FP32, with half also doing INT32. In Blackwell, they can all do both. Unless something else has changed about them, that doesn't seem to be a likely cause. | |
|
|
|
In a quick reading in Ada the cores in each SM could all do FP32, with half also doing INT32. In Blackwell, they can all do both. Unless something else has changed about them, that doesn't seem to be a likely cause.
Waiting for https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions to be updated with Compute Capability 10.0 (RTX 50 series).
According to https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-10-0, the number of INT32 cores is 64 (like CC 8.9, RTX 40 series).
According to https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf, page 12, "The number of possible INT32 integer operations in Blackwell are doubled compared to Ada, by fully unifying them with FP32 cores". The number of INT32 cores is expected to be 128.
The two documents are inconsistent... | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2883 ID: 29980 Credit: 716,426,128 RAC: 203,941
                                   
|
|
The illustration in the RTX Blackwell pdf page 12 is the one I saw elsewhere.
As for CUDA Compute Capability, I am even more confused.
https://developer.nvidia.com/cuda-gpus
Above page lists 5080/5090 as Compute Capability 10.0
GPU-Z reports my 5070 Ti as Compute Capability 12.0. I don't know if this can be read from the hardware, or if they are using a lookup table. The latter having more scope for error.
In the CUDA C Programming Guide, comparing Compute Capability 10.0 and 12.0, the difference is 10.0 appears to have full rate FP64, whereas 12.0 has minimal support. Thus 12.0 looks more like the RTX series and 10.0 may be the server version. | |
|
mfl0p Send message
Joined: 5 Apr 09 Posts: 262 ID: 38042 Credit: 4,747,340,532 RAC: 4,261,451
                                 
|
|
As Yves has mentioned before, Nvidia doesn't like to post specifics and at times we have to guess what their marketing really means. The Blackwell marketing says we now have 2x integer compute. But what operations? You can find unofficial information on forums that we actually had an integer multiply using the FP32 compute units before Blackwell. But why doesn't Nvidia tell us this directly? | |
|
|
|
As Yves has mentioned before, Nvidia doesn't like to post specifics and at times we have to guess what their marketing really means. The Blackwell marketing says we now have 2x integer compute. But what operations? You can find unofficial information on forums that we actually had an integer multiply using the FP32 compute units before Blackwell. But why doesn't Nvidia tell us this directly?
Thank you for posting your comment -- this gives me context to the below comment Yves gave me over here: https://www.mersenneforum.org/node/1065385
In other words, trust Yves and internet forums over Nvidia marketing :)
---------------------
Compute Capability 8.6 (GeForce 30): SM = 64 MAD32_64/FP32 + 64 INT32/FP32 + 2 FP64. Half of the cores are able to execute a MAD instruction z += x * y, where x, y are 32-bit integers, the result of the multiplication and z are 64-bit integers. The other half of the cores execute other instructions (add, shift, logical operations, etc).
Compute Capability 8.9 (GeForce 40): SM are identical to 8.6 but process size is 5 nm (Ampere was 8 nm) then GPU is operating at higher frequency. More importantly, L2 cache size is 10x: 40x0 are at least 50% faster than 30x0. | |
|
tito Send message
Joined: 28 Jun 08 Posts: 6 ID: 24708 Credit: 1,097,658,040 RAC: 4,569
                       
|
|
Is it 5090?
https://www.primegrid.com/hosts_user.php?userid=228 if so: not impressive at all. | |
|
mfl0p Send message
Joined: 5 Apr 09 Posts: 262 ID: 38042 Credit: 4,747,340,532 RAC: 4,261,451
                                 
|
|
GFN18 won't be utilizing all the 21760 "cores"
We need to see some larger GFN to see the full potential. | |
|
mikey Send message
Joined: 17 Mar 09 Posts: 2339 ID: 37043 Credit: 1,055,100,737 RAC: 156,395
                        
|
GFN18 won't be utilizing all the 21760 "cores"
We need to see some larger GFN to see the full potential.
So are there enough left over to run two tasks at the same time? How about 3 or 10? | |
|
|
|
So are there enough left over to run two tasks at the same time? How about 3 or 10?
Sounds about right. That's what I do when running GFN-16s on my 4090s. I run 2 or 3 (sorry, can't remember which right now, as there's entirely too much blood in my coffee stream). | |
|
compositeVolunteer tester Send message
Joined: 16 Feb 10 Posts: 1256 ID: 55391 Credit: 2,053,436,982 RAC: 83,341
                            
|
Compute Capability 8.6 (GeForce 30): SM = 64 MAD32_64/FP32 + 64 INT32/FP32 + 2 FP64. Half of the cores are able to execute a MAD instruction z += x * y, where x, y are 32-bit integers, the result of the multiplication and z are 64-bit integers. The other half of the cores execute other instructions (add, shift, logical operations, etc).
Compute Capability 8.9 (GeForce 40): SM are identical to 8.6 but process size is 5 nm (Ampere was 8 nm) then GPU is operating at higher frequency. More importantly, L2 cache size is 10x: 40x0 are at least 50% faster than 30x0.
Given that cache has a superscalar effect on performance, does using only the half of 30x0 cores which have the MAD32 instruction result in small performance penalty, but with much better power efficiency than using all cores? | |
|
|
|
|
As per the March 26 Nvidia Forums moderator post on the "Blackwell Integer" thread here:
https://forums.developer.nvidia.com/t/blackwell-integer/320578
Nvidia has fixed their GPU Compute Capability link below to show GeForce RTX 5080 and 5090 as Compute Capability 12.0 rather than 10.0
https://developer.nvidia.com/cuda-gpus
Also FYI see Yves Gallot post here in him generating assembly code to try to understand Blackwell architecture:
https://www.primegrid.com/forum_thread.php?id=10929 | |
|
|
|
|
yes.. 50xx generation is simply "cripled",, even 5090 is faster on big gfn s ,,, price/time on tasks- ratio show 4090 rtx is much better, | |
|
Post to thread
Message boards :
Number crunching :
Genefer on 5070 Ti |