Fix Longhorn Volumes Stuck in Attach/Detach Loop on OpenShift/OKD

By Péter Nyári on 2022.12.12.

If you are using Longhorn Volumes on OpenShift/OKD, you may have encountered the dreaded “Attach/Detach loop”. This phenomenon is caused when Longhorn Volumes become stuck in an infinite loop of trying to attach and detach themselves from the OpenShift/OKD cluster. In this article we will explore the causes of this issue, as well as some potential solutions.

Not only Longhorn, but also other storage solutions such as OpenEBS that utilize iSCSI mounts require these steps to be taken to resolve the issue.

1. What are Longhorn Volumes?

Longhorn is an open-source project that provides persistent block storage for applications running in Kubernetes. Longhorn is designed to simplify the process of setting up and managing persistent storage for applications running in Kubernetes, making it easier and faster to deploy applications with persistent storage.

Longhorn provides consistency, scalability, and reliability of storage for applications running on Kubernetes. Longhorn also simplifies the process of setting up and managing storage for applications running on Kubernetes, allowing users to quickly set up and manage storage without having to understand the complexities of storage management.

Longhorn also supports a variety of storage types, including block devices, object stores, and file systems, making it easy to use the appropriate storage for your application.

2. Problem Description

All versions of Longhorn are affected by this issue. The symptom is that all volumes are stuck in an Attach/Detach loop. To diagnose the problem, you can use dmesg on the storage nodes, and you will be able to detect errors such as the following:

[Sat Dec 10 18:52:01 2022] audit: type=1400 audit(1670698321.515:7214): avc: denied { dac_override } for pid=231579 comm="iscsiadm" capability=1 scontext=system_u:system_r:iscsid_t:s0 tcontext=system_u:system_r:iscsid_t:s0 tclass=capability permissive=0

[Sat Dec 10 18:52:01 2022] audit: type=1300 audit(1670698321.515:7214): arch=c000003e syscall=83 success=no exit=-13 a0=55b9035185c0 a1=1f8 a2=ffffffffffffff00 a3=0 items=0 ppid=231163 pid=231579 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="iscsiadm" exe="/usr/sbin/iscsiadm" subj=system_u:system_r:iscsid_t:s0 key=(null)

[Sat Dec 10 18:52:01 2022] audit: type=1327 audit(1670698321.515:7214): proctitle=697363736961646D002D6D00646973636F76657279002D740073656E6474617267657473002D700031302E3133312E312E31363

3. Cause of the problem

This problem is likely to occur in certain versions of OKD, such as 4.11.0-0.okd-2022-12-02-145640, if you are using open-iscsi version 2.1.4 or earlier, due to the host SELinux policies preventing iscsiadm from operating properly.

In newer versions of open-iscsi, the permissions for mkdir() were updated from 0660 to 0770 to prevent any SELinux dac_override conflicts. Additionally, the umask was adjusted to preserve the execute bit permission on directories created in iscsid and iscsiadm.

Although the Fedora CoreOS operating system of OKD had been functioning correctly in prior versions, certain versions experienced disruption due to SELinux blocking dac_override. As a result, Longhorn volumes were unable to be mounted using iscsadm.

ioflair has come up with a solution to permit older editions of OKD to work with our suggested workaround, when it's not feasible to move up to a newer release.

4. Fixing the problem

To address this problem, there are three potential solutions. Upgrading to a functioning version is the most ideal solution, but if it is not feasible, our recommended workaround is your next best option.

✅ Upgrade OKD to a newer version

Upgrading to a more current version of OKD, that uses a Fedora CoreOS version which solves this issue is ideal. However, it might not be possible to upgrade from OKD 4.11 to 4.12, so this might be not right solution to your problem. Current stable builds are not suitable and it might take a while to have a working release.

Update: Our fix has been added to OKD 4.12 since 4.12.0-0.okd-2023-03-05-022504

⚠️ Upgrade open-iscsi to a newer version

Upgrading individual packages in Fedora CoreOS is not possible, as the problem originates from SELinux regulations and an out-of-date open-iscsi version. You can determine if any updates to the Fedora CoreOS used by OKD are suitable for you.

✅ Apply SELinux Permissions using a local CIL via MachineConfig

To solve this problem in a broken environment, ioflair suggests granting the dac_override permission using a MachineConfig with a local CIL.

This is a safe workaround, as we keep SELinux running, and only give the right permission to iscsiadm to make it work again. Once a new release fixes the core issue, the MachineConfig can be removed.

Conclusion

Longhorn Volumes are a great way to increase the storage capabilities of OpenShift/OKD. However, it is very important to understand the architecture in greater detail before moving forward with the implementation.

It is possible for Longhorn Volumes to become stuck in an attach/detach loop due to an issue with the underlying architecture. Fortunately, this issue can be solved by applying a few simple configuration steps.

We hope this article has been useful in understanding why Longhorn Volumes can become stuck in an attach/detach loop and how to fix it.

Would you like support?
IT makes sense to Get in touch!

Contact us
Founder, CEO at ioflair

Related Posts

IT makes sense
ioflair solutions Kft © 2024 - All rights reserved.

Made with ❤️  in Hungary.