Join PrimeGrid
Returning Participants
Community
Leader Boards
Results
Other
drummers-lowrise
|
Message boards :
Generalized Fermat Prime Search :
Genefer performance in relation to PCI Express bus bandwidth
Author |
Message |
|
I was wondering how much does the PCI Express bus bandwidth affects performance of the the Genefer GPU application?
Probably it depends on how often does the CPU feeds the GPU with new work to be done while at the same time collecting results of previous work.
You see, I am thinking of reusing one of those former cryptocurrency miners as a dedicated BOINC machine. Now - after mining most currencies lost profitability - such GPU based miners can be bought for a small fraction of original parts cost. They are usually constructed by using 6 up to 12 GPUs. The RTX 3060 Ti seems to be a popular model here while it has still decent computing power. Yet those cards always use risers - that act both as extender and also x16 to x1 port adapter - so they can be connected to the motherboard x1 PCI Express ports. Otherwise they would not fit next to each other connected directly to the motherboard. So the communication with the CPU/RAM etc. is either 16 (if the x1 ports on the motherboard are acting still as 3.0 ports) or 64 times (if the x1 ports on the motherboard are switched to 1.0 speed which seems to be popular due to the increased stability when running multiple GPUs) slower than if they would be connected to regular x16 PCI Express 3.0 ports. | |
|
Crun-chi Volunteer tester
 Send message
Joined: 25 Nov 09 Posts: 3247 ID: 50683 Credit: 152,646,050 RAC: 18,212
                         
|
Al my GFN16, and GFN17 ( in current date) are found in GPU attached to risers, so dont have any problem with that. Time is same, running it on PCIEx 16 slot on the motherboard or at 1x slot on riser.
I also use my miner for that purpose, fresh install of Linux little tuning with nvidia-smi, for lower consumption, an that is that :)
____________
92*10^1585996-1 NEAR-REPDIGIT PRIME :) :) :)
4 * 650^498101-1 CRUS PRIME
2022202116^131072+1 GENERALIZED FERMAT
Proud member of team Aggie The Pew. Go Aggie! | |
|
Michael Goetz Volunteer moderator Project administrator
 Send message
Joined: 21 Jan 10 Posts: 14037 ID: 53948 Credit: 477,051,011 RAC: 285,770
                               
|
I was wondering how much does the PCI Express bus bandwidth affects performance of the the Genefer GPU application?
Probably it depends on how often does the CPU feeds the GPU with new work to be done while at the same time collecting results of previous work.
You see, I am thinking of reusing one of those former cryptocurrency miners as a dedicated BOINC machine. Now - after mining most currencies lost profitability - such GPU based miners can be bought for a small fraction of original parts cost. They are usually constructed by using 6 up to 12 GPUs. The RTX 3060 Ti seems to be a popular model here while it has still decent computing power. Yet those cards always use risers - that act both as extender and also x16 to x1 port adapter - so they can be connected to the motherboard x1 PCI Express ports. Otherwise they would not fit next to each other connected directly to the motherboard. So the communication with the CPU/RAM etc. is either 16 (if the x1 ports on the motherboard are acting still as 3.0 ports) or 64 times (if the x1 ports on the motherboard are switched to 1.0 speed which seems to be popular due to the increased stability when running multiple GPUs) slower than if they would be connected to regular x16 PCI Express 3.0 ports.
The interface bandwidth is important for gaming. Probably useful for some other things.
But like mining, it doesn't affect our apps much. The data transfers between CPU and GPU are relatively small and infrequent. You could probably replace the PCIe x16 connection with a carrier pigeon without slowing down the app. :)
____________
My lucky number is 75898524288+1 | |
|
|
I was finally able to test this, and it seems that Genefer GPU application performance actually is somehow affected by PCIe bus bandwidth. Although it may be only the case of newer, faster GPUs.
I've got a machine with 4 x RTX 4070 Ti GPUs installed. It's using Ryzen 5950x CPU on Asus TUF Gaming B550 Plus motherboard. This motherboard has one 4.0 mode x16 speed PCIe slot, one 3.0 mode x16 speed PCIe slot (yet this one actually only has x16 physical slot but it runs at x4 speed, so other lanes here are not connected to anything), and three 3.0 mode x1 speed PCIe slots (yet there share lanes with the second one x16 slot, so if any of the x1 slots are used the second x16 slot runs at x1 rather than x4 speed). All of the GPUs are connected via standard powered mining risers. So small PCB is inserted into each of PCIe motherboard slot, this PCB has USB 3.0 connector with cable connected to larger PCB that has power connector and regular x16 PCIe slot connected to GPU.
I couldn't get this machine to boot correctly, so as advised in many cryptocurrency miner building guides, I have connected only one GPU directly to the motherboard and it finally booted, next in BIOS I've changed mode/generation of each PCIe slot to 1.0 and enabled 4G decoding. This allowed the machine to boot every time with all GPUs connected via risers.
I have tested performance when running 4 GFN-19 tasks at the same time with no other CPU tasks running to interfere. Each GPU was running the task at slightly different speed with different power consumption. GPU core usage was in 70 - 75 % range, with power consumption in 150 - 220 W range (this GPU type is rated at 285 W) which resulted in task running time up to about 40 minutes. Here goes the screenshot of all the test parameters at about 2/3 progress.
This was way slower than the average time of 18 minutes per task as reported in this thread.
Thinking it may be related to PCIe slots 1.0 mode, I have changed the mode to 2.0 for all PCIe slots and yet, as opposed to most guides, I was still able to boot correctly every time. I have run the test again. This time GPU core usage was in 75 - 85 % range, with power consumption in 190 - 225 W range which resulted in tasks running time up to about 28 minutes. Better, but still slower than 18 minutes I was expecting. Here goes the screenshot of all the test parameters at about 2/3 progress.
In both test cases none of the tasks resulted in error while computing and all of them were validated later correctly.
I couldn't get the machine to boot at all after trying to change all PCIe slots to 3.0 mode. Probably the cables of risers are not able to handle such signals.
What is interesting to note here, it is not like that with every BOINC project and probably even it won't be like this with every application of each project (I didn't test any other PG applications on this machine yet). It's probably related to how often does the CPU feed new data to be processed by GPU. For example in Distributed.net RC5-72 client I was able to get full rated speed as compared with others using this GPU, with GPU usage of 99 % and power consumption of 280 W. | |
|
mackerel Volunteer tester
 Send message
Joined: 2 Oct 08 Posts: 2652 ID: 29980 Credit: 570,442,335 RAC: 10,182
                              
|
You can try running GPU-Z which shows "Bus Interface Load". As a quick example, I'm running a GFN 17 and I'm seeing 1% bus load with a 4070 on 3.0 x16 interface. If this value starts going much away from zero, it might start to impact in latency of transfer and reduce performance that way.
Maybe you can get faster bus speeds if you get better risers. Many of the mining era ones looked pretty bad, recycling things like physical USB connections which were never intended for that use case. | |
|
Yves Gallot Volunteer developer Project scientist Send message
Joined: 19 Aug 12 Posts: 843 ID: 164101 Credit: 306,521,622 RAC: 5,385

|
PCIe bus load depends on GFN subproject.
If PCIe bus bandwidth is the problem then testing GFN-20 or GFN-21 should increase the power consumption and achieve 280 W. | |
|
Post to thread
Message boards :
Generalized Fermat Prime Search :
Genefer performance in relation to PCI Express bus bandwidth |