PrimeGrid
Please visit donation page to help the project cover running costs for this month
1) Message boards : Number crunching : PrimeGrid will be offline for server upgrades (Message 165567)
Posted 3 days ago by Zyfdnug

I'm not familiar with either of those, sorry. That being said, I've heard good things about Mastadon, but Discord is very feature rich and suits us well. We're not looking to change, but we are open to hearing about why something else is amazingly better than what we already have. :)


I'm far from being a good consultant for social media myself... however, Mastodon is probably not exactly a good replacement for discord. It's more aiming at providing micro-blogging functionality like Twitter.

For more in-depth communication without intruding on a wide audiences "time lines", other tools would be more suitable.

For the Mastodon thing (which actually is not a good term, as Mastodon is just one frontend for a family of protocols...) it would be really easy to use -- set up an account somewhere, start posting, use hashtags. See what happens. As access to the protocol is really open, tools for crossposting, mirroring, automating, whatever seem to be available, but personally I know none of those.

Advantages: Open protocol, open tools, federated i.e. no dependency on particular providers with their own rules and walls. More privacy friendly than a commercial platform.

For more discussion, including closed user groups, I'd recommend Matrix. Sometimes annoying with its delayed key exchanges and hard to understand implications of some settings, but a federated communications system built around strict privacy and encryption considerations is, in this sense, hard to beat. May need a bit more time to get used to, but there are public servers available where it's possible to set up both open and closed groups. My experience is limited as well, I'm just a paranoid user and found it works. A certain tech affinity is going to help, but PrimeGrid seems to have that sort of user base :-)

What I can help with: I promise to follow any "official" PrimeGrid account on Mastodon. Started looking for something, even: https://social.tchncs.de/@Zyfdnug/111148086870506164

I would also join a PrimeGrid Matrix group or channel (don't even know the terminology...) and I think I could even just start something... not much time to actually interact there these days, though, so please remind me in three weeks if that would be interesting!

Cheers,

Zyfdnug

EDIT & PS: I just noticed this is not the best thread for this particular discussion. Please feel free to move it to a more suitable place!
2) Message boards : Number crunching : PrimeGrid will be offline for server upgrades (Message 165539)
Posted 5 days ago by Zyfdnug

Discord, of course, will remain available.

Now that you mention it -- would it be possible to use a more open and privacy friendly communications channel, such as Matrix or Mastodon?

Zyfdnug
3) Message boards : Problems and Help : AMD ROCm CL_OUT_OF_HOST_MEMORY when running through BOINC (Message 162444)
Posted 145 days ago by Zyfdnug
I managed to resolve this -- patience, actually thinking a bit myself, and a good search engine brought me to

https://github.com/BOINC/boinc/issues/4948

which contained the actual solution here.

In short, adding
ProtectSystem=full
via
systemctl edit boinc-client.service
proved to be a useful solution.

I'll reach out to the Debian project's boinc package maintainers with this information, so the installation can either be adapted, or at least the documentation be updated.

Zyfdnug
4) Message boards : Problems and Help : AMD ROCm CL_OUT_OF_HOST_MEMORY when running through BOINC (Message 162352)
Posted 147 days ago by Zyfdnug
Looks like this is an issue only when running under systemd. Both root or the boinc client user can, when started on the shell, properly crunch their numbers using OpenCL.


Unfortunately, I have no reference system where I could try PrimeGrid under boinc with a different Linux distribution.

Has anybody ever needed to change systemd unit settings, and can recommend anything?

Thanks,

Zyfdnug
5) Message boards : Problems and Help : SGS: execv failed twice: Text file busy (Message 162337)
Posted 147 days ago by Zyfdnug
Looks like this is indeed resolved after waiting / restarting.

I'm still a bit surprised that this affected only a subset of the tasks, but have not checked if they had something -- most likely the slot they were assigned to -- in common.

I guess this question can reasonably well be considered fully answered -- thanks!

Zyfdnug
6) Message boards : Problems and Help : SGS: execv failed twice: Text file busy (Message 161992)
Posted 158 days ago by Zyfdnug
I'v noticed a considerable number of Sophie Germain tasks fail with stderr such as

BOINC llr wrapper (version 8.04)
Using Jean Penne's llr (64 bit)
execl failed once: Text file busy
execl failed twice: Text file busy
Error reading the LLR version number, continuing...
LLR command line: primegrid_llr -d -oDiskWriteTime=1 -oThreadsPerTest=4 llr.in
execv failed once: Text file busy
execv failed twice: Text file busy
app error: 27648
20:07:43 (128522): called boinc_finish(27648)


I noticed these situations only today, while running boinc and Primegrid in a non-standard way.

However, this affects only a fraction of all SGS jobs:
https://www.primegrid.com/results.php?userid=1196790&offset=0&show_names=0&state=0&appid=2
Also, this seems to happen with only one host, which is actually a system I'm currently deploying and which is, software wise, not in a particulary good state.

This, however, does not explain this particular issue -- first, it's a selection of tasks affected. Second, it's an error that I would not attribute to broken hardware, but which looks more like a software issue, with boinc wrapping Primegrid which in turn manages binaries for different projects, and potentially updates the actual worker binaries ocasionally.

On the other hand, I have seen such exec*() errors very rarely, and would be quite astonished to find that the actual worker software is that frequently updated today.

The fact that those issues apparently affected this particular host only, starting today at 14:00 UTC, correlates with me starting the boinc client, as boinc user, in a terminal. I can not claim to see a potential cause for that, but it *is* an interesting coincidence.

Has anybody noticed similar errors recently, or can suggest what I could do to further analyze things?

Best,

Zyfdnug
7) Message boards : Problems and Help : AMD ROCm CL_OUT_OF_HOST_MEMORY when running through BOINC (Message 161990)
Posted 158 days ago by Zyfdnug
The GPU exlusion via configuration is indeed what I tried next.


I have
coproc_info.xml:
<coprocs> <ati_opencl> <name>AMD Radeon RX 6750 XT</name> <vendor>Advanced Micro Devices, Inc.</vendor> <vendor_id>4098</vendor_id> <available>1</available> <half_fp_config>0</half_fp_config> <single_fp_config>191</single_fp_config> <double_fp_config>63</double_fp_config> <endian_little>1</endian_little> <execution_capabilities>1</execution_capabilities> <extensions>cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_ khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable _store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program </extensions> <global_mem_size>12868124672</global_mem_size> <local_mem_size>65536</local_mem_size> <max_clock_frequency>2880</max_clock_frequency> <max_compute_units>20</max_compute_units> <nv_compute_capability_major>0</nv_compute_capability_major> <nv_compute_capability_minor>0</nv_compute_capability_minor> <amd_simd_per_compute_unit>4</amd_simd_per_compute_unit> <amd_simd_width>32</amd_simd_width> <amd_simd_instruction_width>1</amd_simd_instruction_width> <opencl_platform_version>OpenCL 2.1 AMD-APP (3513.0)</opencl_platform_version> <opencl_device_version>OpenCL 2.0 </opencl_device_version> <opencl_driver_version>3513.0 (HSA1.1,LC)</opencl_driver_version> <device_num>0</device_num> <peak_flops>14745600000000.000000</peak_flops> <opencl_available_ram>12868124672.000000</opencl_available_ram> <opencl_device_index>0</opencl_device_index> <warn_bad_cuda>0</warn_bad_cuda> </ati_opencl> <ati_opencl> <name>gfx1036</name> <vendor>Advanced Micro Devices, Inc.</vendor> <vendor_id>4098</vendor_id> <available>1</available> <half_fp_config>0</half_fp_config> <single_fp_config>191</single_fp_config> <double_fp_config>63</double_fp_config> <endian_little>1</endian_little> <execution_capabilities>1</execution_capabilities> <extensions>cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program </extensions> <global_mem_size>536870912</global_mem_size> <local_mem_size>65536</local_mem_size> <max_clock_frequency>2200</max_clock_frequency> <max_compute_units>1</max_compute_units> <nv_compute_capability_major>0</nv_compute_capability_major> <nv_compute_capability_minor>0</nv_compute_capability_minor> <amd_simd_per_compute_unit>4</amd_simd_per_compute_unit> <amd_simd_width>32</amd_simd_width> <amd_simd_instruction_width>1</amd_simd_instruction_width> <opencl_platform_version>OpenCL 2.1 AMD-APP (3513.0)</opencl_platform_version> <opencl_device_version>OpenCL 2.0 </opencl_device_version> <opencl_driver_version>3513.0 (HSA1.1,LC)</opencl_driver_version> <device_num>1</device_num> <peak_flops>563200000000.000000</peak_flops> <opencl_available_ram>536870912.000000</opencl_available_ram> <opencl_device_index>1</opencl_device_index> <warn_bad_cuda>0</warn_bad_cuda> </ati_opencl> <warning>NVIDIA: libcuda.so: cannot open shared object file: No such file or directory</warning> <warning>ATI: libaticalrt.so: cannot open shared object file: No such file or directory</warning> </coprocs>


cc_config.xml:
<cc_config> <log_flags> <task>1</task> <file_xfer>1</file_xfer> <sched_ops>1</sched_ops> </log_flags> <options> <exclude_gpu> <url>http://www.primegrid.com/</url> <device_num>1</device_num> </exclude_gpu> </options> </cc_config>


The coproc_info file should make it clear what the id numbers refer to.

This is what I observe now:

root@Zwerg:/var/lib/boinc# sudo -u boinc boinc [243/375] 27-Apr-2023 16:37:46 [---] Starting BOINC client version 7.20.5 for x86_64-pc-linux-gnu 27-Apr-2023 16:37:46 [---] This a development version of BOINC and may not function properly 27-Apr-2023 16:37:46 [---] log flags: file_xfer, sched_ops, task 27-Apr-2023 16:37:46 [---] Libraries: libcurl/7.88.1 OpenSSL/3.0.8 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 (+li bidn2/2.3.3) libssh2/1.10.0 nghttp2/1.52.0 librtmp/2.3 27-Apr-2023 16:37:46 [---] Data directory: /var/lib/boinc-client 27-Apr-2023 16:37:50 [---] OpenCL: AMD/ATI GPU 0: AMD Radeon RX 6750 XT (driver version 3513.0 (HSA1.1,LC), device version OpenCL 2.0, 12272MB, 12272MB available, 14746 GFLOPS peak) 27-Apr-2023 16:37:50 [---] OpenCL: AMD/ATI GPU 1 (ignored by config): gfx1036 (driver version 3513.0 (HSA1.1,LC), device version OpenCL 2.0, 512MB, 512MB available, 563 GFLOPS peak) 27-Apr-2023 16:37:50 [---] libc: version 2.36 27-Apr-2023 16:37:50 [---] Host name: Zwerg 27-Apr-2023 16:37:50 [---] Processor: 24 AuthenticAMD AMD Ryzen 9 7900X 12-Core Processor [Family 25 Model 97 Stepping 2] 27-Apr-2023 16:37:50 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmper f rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat _l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq r dseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mb m_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean fl ushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl avx512vbmi u 27-Apr-2023 16:37:50 [---] OS: Linux Debian: Debian GNU/Linux 12 (bookworm) [6.1.0-7-amd64|libc 2.36] 27-Apr-2023 16:37:50 [---] Memory: 61.94 GB physical, 976.00 MB virtual 27-Apr-2023 16:37:50 [---] Disk: 464.48 GB total, 438.70 GB free 27-Apr-2023 16:37:50 [---] Local time is UTC +2 hours 27-Apr-2023 16:37:50 [---] Config: GUI RPCs allowed from: 27-Apr-2023 16:37:50 [---] 192.168.0.1 27-Apr-2023 16:37:50 [---] Zwerg.redacteddomainname 27-Apr-2023 16:37:50 [PrimeGrid] Config: excluded GPU. Type: all. App: all. Device: 1 27-Apr-2023 16:37:50 [PrimeGrid] General prefs: from PrimeGrid (last modified 29-Nov-2020 16:20:55) 27-Apr-2023 16:37:50 [PrimeGrid] Computer location: home


I think the above log output shows that the correct GPU device is disabled for Primegrid.

Also, task reports show that only gfx1031 is in use now.

However, jobs running under systemd fail with opencl error: CL_OUT_OF_HOST_MEMORY. If I run the boinc client on the shell, as the boinc user, things seem to proceed correctly.
On the shell, as root user, managed to chew through https://www.primegrid.com/result.php?resultid=1508328550

I'll leave the client run for a while now, but at this time, it looks like systemd or the unit file bring in some problem. I already tried running the boinc client with much increased locked memory limit, but that did not lead to success.
8) Message boards : Problems and Help : AMD ROCm CL_OUT_OF_HOST_MEMORY when running through BOINC (Message 161959)
Posted 159 days ago by Zyfdnug
In this case, both devices are handled by the same vendor's software, so disabling a vendo for OpenCL would be counterproductive ;-)

I tried disabling the CPU-integrated device in boinc's cc_config.xml file. The boinc log reports to have it disabled, also reports the exclude_gpu tag to not be recognized, but Primegrid's software still uses it when run through boinc.

So I tried running the boinc client in the foreground (not through systemd), and got very promising results: https://www.primegrid.com/result.php?resultid=1508473533

Getting this behaviour controlled through configuration seems to be a bit tricky ;-)
9) Message boards : Problems and Help : AMD ROCm CL_OUT_OF_HOST_MEMORY when running through BOINC (Message 161956)
Posted 160 days ago by Zyfdnug
That is quite interesting... first, I was not aware this CPU had an integrated graphics device.

Also, if I start the binary from the shell, as root user, it uses a different device (which would be the discrete Radeon card):

# /var/lib/boinc-client/projects/www.primegrid.com/genefer22g_linux64_22.12.02 -p -n 22 -b 1053460 -f gproof geneferg version 22.12.2 (linux x64, gcc-7.5.0, boinc-7.20.2) Copyright (c) 2022, Yves Gallot genefer is free source code, under the MIT license. Command line: '-p -n 22 -b 1053460 -f gproof' Running on device 'gfx1031', vendor 'Advanced Micro Devices, Inc.', version 'OpenCL 2.0 ', driver '3513.0 (HSA1.1,LC)', data size: 96 MB. Resuming from a checkpoint. 7.58% done, 26:07:20 remaining, 1.21 ms/bit.


I did not notice the different device identifiers (and if I did, I wouldn't be able to interpret them anyway ;-)

So it appears that, for whatever reason, the binary picks different OpenCL devices to work with when called from the shell and from the boinc manager.

My question then is -- how can I tell boinc or PrimeGrid which GPU to use for OpenCL?
10) Message boards : Problems and Help : AMD ROCm CL_OUT_OF_HOST_MEMORY when running through BOINC (Message 161926)
Posted 161 days ago by Zyfdnug
I have the same problem here.

New computer I'm currently setting up, and the software is in a somewhat messy state after experimenting with different ways to get OpenCL up and running at all.

Now I have the AMD OpenCL stack in usable shape, as far as I can see, and the result is as can be seen here: https://www.primegrid.com/result.php?resultid=1507777023

I doubt it's a real out-of-memory situation, as (without a GPU task running):

# LANG=C free -m total used free shared buff/cache available Mem: 63427 5956 52301 25 5903 57471 Swap: 975 0 975


I would appreciate any hint about what I can do to fix this.

Thanks,

Zyfdnug


Next 10 posts
[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2023 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 1.91, 2.43, 2.67
Generated 3 Oct 2023 | 7:40:11 UTC