Storage technologies are rapidly changing in the last 3 decades, Current trend is towards software-driven data center technologies. Now we have software-driven cluster files systems such as Gluster, which gives you more elasticity and scalability.
In the clustered environment there is a possibility you will face this split-brain scenario. In simple terms, split-brain occurs when two nodes of a cluster are disconnected. Each node thinks the other one is not working.
Let's understand what is split-brain.
What is Split-Brain?
- The difference in file data/metadata across the bricks of a replica.
- Cannot identify which brick holds the good copy, even when all bricks are available.
- Each brick accuses the other of needing healing.
- All modification FOPs fail with input/output Error (EIO)
- Data split-brain: Contents of the file under split-brain are different in different replica pairs and automatic healing is not possible.
-
Metadata split-brain:, The metadata of the files (for example, user-defined extended attribute) are different and automatic healing is not possible.
Entry split-brain: It happens when a file has a different GFID on each replica pair.
How glusterfs data domain entered to split-brain condition?
How to identify the files in a split-brain condition
To check whether the GFS files are in split-brain or not execute gluster volume heal gvol0 info split-brain. As per this below mention output, the command will display the files that are in split-brain condition.
During this time period, both storage domains went offline, because the storage master node got affected due by this network latency.
[root@KVM01 dom_md]# gluster volume heal gvol0 info split-brain
Brick KVM01:/nodirectwritedata/glusterfs/brick1/gvol0
/22a3d534-86b2-4f63-aa44-9ac555404692/images/6639bd7e-33b1-42a5-89b0-0eee2b3a7262/e9981ace-c0c0-4bc6-9e6d-e03a805f083a
/22a3d534-86b2-4f63-aa44-9ac555404692/dom_md/ids
/22a3d534-86b2-4f63-aa44-9ac555404692/images/bdea3934-edce-494b-91cf-06fb536a9f9c/c04307ed-e8dc-459d-8bd5-02446b2b9175
/22a3d534-86b2-4f63-aa44-9ac555404692/images/a9b5b747-2fae-4b32-b839-2ea03dfcf35e/cb1b3014-1fbd-44d6-854c-fe55dc22f4a3
/22a3d534-86b2-4f63-aa44-9ac555404692/images/d28ab741-0d69-4ad6-97e3-4449b42b782f/10bc2496-ff19-4087-bc55-aab201b39936
/22a3d534-86b2-4f63-aa44-9ac555404692/images/7c25b5da-aabd-49ad-bf4a-f458f382e525/a44a71d7-98d4-47cd-aeae-f8fe5ac4bf1e
/22a3d534-86b2-4f63-aa44-9ac555404692/images/d7882784-cf18-4c8c-af22-f46fe3a96c8e/4fa5c17b-2739-46a7-8c20-3e943cc764b5
/22a3d534-86b2-4f63-aa44-9ac555404692/images/05caeb56-9287-484b-aef0-8f389d27f1bf/d370a0d8-889d-488d-bcaa-4ac652f7c5fe
/22a3d534-86b2-4f63-aa44-9ac555404692/dom_md/leases
/22a3d534-86b2-4f63-aa44-9ac555404692/dom_md/outbox
Status: Connected
Number of entries in split-brain: 10
Recovery Process
There are a few ways to perform the glusterfs split-brain recovery. All the recovery scenarios are there in the glusterfs document:https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/.
This was a rare situation in that we need to pick up the latest modified block as the recovery file. The latest file modified time can be validated via the stat command.
As per the log, brick 2 got the latest modified time stamp. This block can be used to recover the files to get out of the split-brain condition.
[root@KVM01 dom_md]# stat /nodirectwritedata/glusterfs/brick1/gvol0/22a3d534-86b2-4f63-aa44-9ac555404692/images/a9b5b747-2fae-4b32-b839-2ea03dfcf35e/cb1b3014-1fbd-44d6-854c-fe55dc22f4a3
File: /nodirectwritedata/glusterfs/brick1/gvol0/22a3d534-86b2-4f63-aa44-9ac555404692/images/a9b5b747-2fae-4b32-b839-2ea03dfcf35e/cb1b3014-1fbd-44d6-854c-fe55dc22f4a3
Size: 268435456000 Blocks: 524301176 IO Block: 4096 regular file
Device: fc07h/64519d Inode: 3224096484 Links: 2
Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm)
Context: system_u:object_r:glusterd_brick_t:s0
Access: 2022-10-13 10:05:08.396792385 -0400
Modify: 2022-10-13 10:05:15.705792522 -0400
Change: 2022-10-14 09:59:16.348788467 -0400
Birth: 2022-09-22 11:18:48.043805922 -0400
[root@KVM02 log]# stat /nodirectwritedata/glusterfs/brick2/gvol0/22a3d534-86b2-4f63-aa44-9ac555404692/images/a9b5b747-2fae-4b32-b839-2ea03dfcf35e/cb1b3014-1fbd-44d6-854c-fe55dc22f4a3
File: /nodirectwritedata/glusterfs/brick2/gvol0/22a3d534-86b2-4f63-aa44-9ac555404692/images/a9b5b747-2fae-4b32-b839-2ea03dfcf35e/cb1b3014-1fbd-44d6-854c-fe55dc22f4a3
Size: 268435456000 Blocks: 524300992 IO Block: 4096 regular file
Device: fc07h/64519d Inode: 4109 Links: 2
Access: (0660/-rw-rw----) Uid: ( 36/ vdsm) Gid: ( 36/ kvm)
Context: system_u:object_r:glusterd_brick_t:s0
Access: 2022-10-05 15:15:46.368629123 -0400
Modify: 2022-10-13 10:08:08.513029740 -0400
Change: 2022-10-14 10:01:50.679076234 -0400
Birth: 2022-09-22 11:21:29.195597907 -0400
Healing
While performing the healing you have to make sure the session should be consistent without any interruption. you can use tmux to create a consistent session, hope this cheat sheet will be useful to understand tmux : https://tmuxcheatsheet.com/. Healing time will vary with the size, 500GB vm disk took 4hr to complete the healing.
gluster volume heal --VOLNAME-- split-brain source-brick --HOSTNAME:BRICKNAME-- --FILE--
Sample output of the heal
[root@KVM02 ~]#gluster volume heal gvol0 split-brain source-brick KVM01.local.com:/nodirectwritedata/glusterfs/brick2/gvol0 /22a3d534-86b2-4f63-aa44-9ac555404692/images/a9b5b747-2fae-4b32-b839-2ea03dfcf35e/cb1b3014-1fbd-44d6-854c-fe55dc22f4a3
Healed /22a3d534-86b2-4f63-aa44-9ac555404692/images/a9b5b747-2fae-4b32-b839-2ea03dfcf35e/cb1b3014-1fbd-44d6-854c-fe55dc22f4a3.
Conclusion
There can be situations glusterfs replicate volume can move to an inconsistent state due to managing network traffic. This can be avoided by having 3 blocks for glusterfs volume or the need to enable fencing via OLVM engine. Next blog I will elaborate on how you can increase the network threshold to 100%. This gives you breathing space to avoid split brain conditions.
No comments:
Post a Comment