XFS filesystem shutdown when filesystem is close to full

XFS filesystem shutdown when filesystem is close to full

 SOLUTION 已驗證 - 已更新 2019年九月27日10:09 - 

English 

環境

  • Red Hat Enterprise Linux 7.5
  • kernel-3.10.0-862.2.3.el7
  • XFS Filesystems with V4 superblocks

問題

  • We are seeing XFS filesystem forced shutdowns when a filesystem is close to full

Raw

kernel: XFS (sdl1): Mounting V4 Filesystem
kernel: XFS (sdl1): Ending clean mount
...
kernel: XFS (sdl1): Internal error xfs_trans_cancel at line 984 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4af/0x750 [xfs]
kernel: CPU: 11 PID: 32290 Comm: java Kdump: loaded Not tainted 3.10.0-862.2.3.el7.x86_64 #1
kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 02/17/2017
kernel: Call Trace:
kernel: [<ffffffff9bf0d78e>] dump_stack+0x19/0x1b
kernel: [<ffffffffc06203db>] xfs_error_report+0x3b/0x40 [xfs]
kernel: [<ffffffffc063104f>] ? xfs_create+0x4af/0x750 [xfs]
kernel: [<ffffffffc063c85d>] xfs_trans_cancel+0xbd/0xe0 [xfs]
kernel: [<ffffffffc063104f>] xfs_create+0x4af/0x750 [xfs]
kernel: [<ffffffffc062dfd0>] xfs_generic_create+0xd0/0x2b0 [xfs]
kernel: [<ffffffffc062e1e4>] xfs_vn_mknod+0x14/0x20 [xfs]
kernel: [<ffffffffc062e223>] xfs_vn_create+0x13/0x20 [xfs]
kernel: [<ffffffff9ba27bd3>] vfs_create+0xd3/0x140
kernel: [<ffffffff9ba2b9c0>] do_last+0x10f0/0x12c0
kernel: [<ffffffff9ba2bc67>] path_openat+0xd7/0x640
kernel: [<ffffffff9ba2a2dc>] ? filename_lookup+0x7c/0xc0
kernel: [<ffffffff9ba2d7fd>] do_filp_open+0x4d/0xb0
kernel: [<ffffffff9ba3ac77>] ? __alloc_fd+0x47/0x170
kernel: [<ffffffff9ba19b07>] do_sys_open+0x137/0x240
kernel: [<ffffffff9ba19c2e>] SyS_open+0x1e/0x20
kernel: [<ffffffff9bf1f795>] system_call_fastpath+0x1c/0x21
kernel: XFS (sdl1): xfs_do_force_shutdown(0x8) called from line 985 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffc063c876
kernel: XFS (sdl1): Corruption of in-memory data detected.  Shutting down filesystem
kernel: XFS (sdl1): Please umount the filesystem and rectify the problem(s)

決議

Upgrade to kernel-3.10.0-1062.el7 from Errata RHSA-2019:2029 or later.

Or the RHEL7.6 zstream kernel-3.10.0-957.35.1.el7 Errata RHSA-2019:2837 or later.

Workaround

  • Try to maintain at least 5% free space on the filesystem.

診斷步驟

Working with a filesystem metadump

Raw

$ bzcat metadump.bz2 | xfs_mdrestore - metadump.image

$ xfs_db -c version metadump.image
versionnum [0xbdb4+0x8a] = V4,NLINK,DIRV2,ATTR,ALIGN,DALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT

$ xfs_db -c sb -c "print agcount" metadump.image
count = 32

xfs_db> p sectsize
sectsize = 4096

Each allocation group appears to have free blocks available, but appear to be close to the limit as XFS needs to keep a minimum number of free blocks available for metadata changes (even to delete files).

Raw

$ for AG in $(seq 0 31); do xfs_db -c "agf $AG" -c "print freeblks" sdl1.image; done
freeblks = 416
freeblks = 121
...
freeblks = 75
freeblks = 79
...
$ for AG in $(seq 0 31); do xfs_db -c "agi $AG" -c "print freecount" metadump.image 2>/dev/null; done
freecount = 22519
freecount = 4036
...

Mounting the filesystem image in a test environment we see that it is close to full.

Raw

[root@rhel7 ~]# uname -r
3.10.0-862.2.3.el7.x86_64
[root@rhel7 ~]# mount metadump.image /mnt
[root@rhel7 ~]# df -i /mnt
Filesystem     Inodes  IUsed IFree IUse% Mounted on
/dev/loop0     372560 299423 73137   81% /mnt
# df /mnt
Filesystem      1K-blocks       Used Available Use% Mounted on
/dev/loop0     3905070088 3905070036        52 100% /mnt

The problem could not be triggered easily in the test environment.

Raw

[root@rhel7 ~]# touch /mnt/a
touch: cannot touch ‘/mnt/a’: No space left on device
[root@rhel7 ~]# find /mnt -type d -exec touch '{}'/a_file \; > touch_in_each_dir 2>&1 
[root@rhel7 ~]# tail -1 touch_in_each_dir 
touch: cannot touch ‘/mnt/dir1/dir2/a_file’: No space left on device
[root@rhel7 ~]# dmesg -t | tail -2
XFS (loop0): Mounting V4 Filesystem
XFS (loop0): Ending clean mount
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章