summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2026-03-04ceph: supply snapshot context in ceph_zero_partial_object()ethanwu
[ Upstream commit f16bd3fa74a2084ee7e16a8a2be7e7399b970907 ] The ceph_zero_partial_object function was missing proper snapshot context for its OSD write operations, which could lead to data inconsistencies in snapshots. Reproducer: ../src/vstart.sh --new -x --localhost --bluestore ./bin/ceph auth caps client.fs_a mds 'allow rwps fsname=a' mon 'allow r fsname=a' osd 'allow rw tag cephfs data=a' mount -t ceph fs_a@.a=/ /mnt/mycephfs/ -o conf=./ceph.conf dd if=/dev/urandom of=/mnt/mycephfs/foo bs=64K count=1 mkdir /mnt/mycephfs/.snap/snap1 md5sum /mnt/mycephfs/.snap/snap1/foo fallocate -p -o 0 -l 4096 /mnt/mycephfs/foo echo 3 > /proc/sys/vm/drop/caches md5sum /mnt/mycephfs/.snap/snap1/foo # get different md5sum!! Cc: stable@vger.kernel.org Fixes: ad7a60de882ac ("ceph: punch hole support") Signed-off-by: ethanwu <ethanwu@synology.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Tested-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04btrfs: continue trimming remaining devices on failurejinbaohong
[ Upstream commit 912d1c6680bdb40b72b1b9204706f32b6eb842c3 ] Commit 93bba24d4b5a ("btrfs: Enhance btrfs_trim_fs function to handle error better") intended to make device trimming continue even if one device fails, tracking failures and reporting them at the end. However, it used 'break' instead of 'continue', causing the loop to exit on the first device failure. Fix this by replacing 'break' with 'continue'. Fixes: 93bba24d4b5a ("btrfs: Enhance btrfs_trim_fs function to handle error better") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: jinbaohong <jinbaohong@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04ocfs2: fix reflink preserve cleanup issueHeming Zhao
[ Upstream commit 5138c936c2c82c9be8883921854bc6f7e1177d8c ] commit c06c303832ec ("ocfs2: fix xattr array entry __counted_by error") doesn't handle all cases and the cleanup job for preserved xattr entries still has bug: - the 'last' pointer should be shifted by one unit after cleanup an array entry. - current code logic doesn't cleanup the first entry when xh_count is 1. Note, commit c06c303832ec is also a bug fix for 0fe9b66c65f3. Link: https://lkml.kernel.org/r/20251210015725.8409-2-heming.zhao@suse.com Fixes: 0fe9b66c65f3 ("ocfs2: Add preserve to reflink.") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04nfsd: fix return error code for nfsd_map_name_to_[ug]idAnthony Iliopoulos
[ Upstream commit 404d779466646bf1461f2090ff137e99acaecf42 ] idmap lookups can time out while the cache is waiting for a userspace upcall reply. In that case cache_check() returns -ETIMEDOUT to callers. The nfsd_map_name_to_[ug]id functions currently proceed with attempting to map the id to a kuid despite a potentially temporary failure to perform the idmap lookup. This results in the code returning the error NFSERR_BADOWNER which can cause client operations to return to userspace with failure. Fix this by returning the failure status before attempting kuid mapping. This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients can retry the operation instead of aborting it. Fixes: 65e10f6d0ab0 ("nfsd: Convert idmap to use kuids and kgids") Cc: stable@vger.kernel.org Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04xfs: fix remote xattr valuelblk checkDarrick J. Wong
[ Upstream commit bd3138e8912c9db182eac5fed1337645a98b7a4f ] In debugging other problems with generic/753, it turns out that it's possible for the system go to down in the middle of a remote xattr set operation such that the leaf block entry is marked incomplete and valueblk is set to zero. Make this no longer a failure. Cc: <stable@vger.kernel.org> # v4.15 Fixes: 13791d3b833428 ("xfs: scrub extended attribute leaf space") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04xfs: fix freemap adjustments when adding xattrs to leaf blocksDarrick J. Wong
[ Upstream commit 3eefc0c2b78444b64feeb3783c017d6adc3cd3ce ] xfs/592 and xfs/794 both trip this assertion in the leaf block freemap adjustment code after ~20 minutes of running on my test VMs: ASSERT(ichdr->firstused >= ichdr->count * sizeof(xfs_attr_leaf_entry_t) + xfs_attr3_leaf_hdr_size(leaf)); Upon enabling quite a lot more debugging code, I narrowed this down to fsstress trying to set a local extended attribute with namelen=3 and valuelen=71. This results in an entry size of 80 bytes. At the start of xfs_attr3_leaf_add_work, the freemap looks like this: i 0 base 448 size 0 rhs 448 count 46 i 1 base 388 size 132 rhs 448 count 46 i 2 base 2120 size 4 rhs 448 count 46 firstused = 520 where "rhs" is the first byte past the end of the leaf entry array. This is inconsistent -- the entries array ends at byte 448, but freemap[1] says there's free space starting at byte 388! By the end of the function, the freemap is in worse shape: i 0 base 456 size 0 rhs 456 count 47 i 1 base 388 size 52 rhs 456 count 47 i 2 base 2120 size 4 rhs 456 count 47 firstused = 440 Important note: 388 is not aligned with the entries array element size of 8 bytes. Based on the incorrect freemap, the name area starts at byte 440, which is below the end of the entries array! That's why the assertion triggers and the filesystem shuts down. How did we end up here? First, recall from the previous patch that the freemap array in an xattr leaf block is not intended to be a comprehensive map of all free space in the leaf block. In other words, it's perfectly legal to have a leaf block with: * 376 bytes in use by the entries array * freemap[0] has [base = 376, size = 8] * freemap[1] has [base = 388, size = 1500] * the space between 376 and 388 is free, but the freemap stopped tracking that some time ago If we add one xattr, the entries array grows to 384 bytes, and freemap[0] becomes [base = 384, size = 0]. So far, so good. But if we add a second xattr, the entries array grows to 392 bytes, and freemap[0] gets pushed up to [base = 392, size = 0]. This is bad, because freemap[1] hasn't been updated, and now the entries array and the free space claim the same space. The fix here is to adjust all freemap entries so that none of them collide with the entries array. Note that this fix relies on commit 2a2b5932db6758 ("xfs: fix attr leaf header freemap.size underflow") and the previous patch that resets zero length freemap entries to have base = 0. Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: 1da177e4c3f415 ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04xfs: delete attr leaf freemap entries when emptyDarrick J. Wong
[ Upstream commit 6f13c1d2a6271c2e73226864a0e83de2770b6f34 ] Back in commit 2a2b5932db6758 ("xfs: fix attr leaf header freemap.size underflow"), Brian Foster observed that it's possible for a small freemap at the end of the end of the xattr entries array to experience a size underflow when subtracting the space consumed by an expansion of the entries array. There are only three freemap entries, which means that it is not a complete index of all free space in the leaf block. This code can leave behind a zero-length freemap entry with a nonzero base. Subsequent setxattr operations can increase the base up to the point that it overlaps with another freemap entry. This isn't in and of itself a problem because the code in _leaf_add that finds free space ignores any freemap entry with zero size. However, there's another bug in the freemap update code in _leaf_add, which is that it fails to update a freemap entry that begins midway through the xattr entry that was just appended to the array. That can result in the freemap containing two entries with the same base but different sizes (0 for the "pushed-up" entry, nonzero for the entry that's actually tracking free space). A subsequent _leaf_add can then allocate xattr namevalue entries on top of the entries array, leading to data loss. But fixing that is for later. For now, eliminate the possibility of confusion by zeroing out the base of any freemap entry that has zero size. Because the freemap is not intended to be a complete index of free space, a subsequent failure to find any free space for a new xattr will trigger block compaction, which regenerates the freemap. It looks like this bug has been in the codebase for quite a long time. Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: 1da177e4c3f415 ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04xfs: mark data structures corrupt on EIO and ENODATADarrick J. Wong
[ Upstream commit f39854a3fb2f06dc69b81ada002b641ba5b4696b ] I learned a few things this year: first, blk_status_to_errno can return ENODATA for critical media errors; and second, the scrub code doesn't mark data structures as corrupt on ENODATA or EIO. Currently, scrub failing to capture these errors isn't all that impactful -- the checking code will exit to userspace with EIO/ENODATA, and xfs_scrub will log a complaint and exit with nonzero status. Most people treat fsck tools failing as a sign that the fs is corrupt, but online fsck should mark the metadata bad and keep moving. Cc: stable@vger.kernel.org # v4.15 Fixes: 4700d22980d459 ("xfs: create helpers to record and deal with scrub problems") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04jfs: nlink overflow in jfs_renameJori Koolstra
[ Upstream commit 9218dc26fd922b09858ecd3666ed57dfd8098da8 ] If nlink is maximal for a directory (-1) and inside that directory you perform a rename for some child directory (not moving from the parent), then the nlink of the first directory is first incremented and later decremented. Normally this is fine, but when nlink = -1 this causes a wrap around to 0, and then drop_nlink issues a warning. After applying the patch syzbot no longer issues any warnings. I also ran some basic fs tests to look for any regressions. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Reported-by: syzbot+9131ddfd7870623b719f@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=9131ddfd7870623b719f Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04jfs: Add missing set_freezable() for freezable kthreadHaotian Zhang
[ Upstream commit eb0cfcf265714b419cc3549895a00632e76732ae ] The jfsIOWait() thread calls try_to_freeze() but lacks set_freezable(), causing it to remain non-freezable by default. This prevents proper freezing during system suspend. Add set_freezable() to make the thread freezable as intended. Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04pstore: ram_core: fix incorrect success return when vmap() failsRuipeng Qi
[ Upstream commit 05363abc7625cf18c96e67f50673cd07f11da5e9 ] In persistent_ram_vmap(), vmap() may return NULL on failure. If offset is non-zero, adding offset_in_page(start) causes the function to return a non-NULL pointer even though the mapping failed. persistent_ram_buffer_map() therefore incorrectly returns success. Subsequent access to prz->buffer may dereference an invalid address and cause crashes. Add proper NULL checking for vmap() failures. Signed-off-by: Ruipeng Qi <ruipengqi3@gmail.com> Link: https://patch.msgid.link/20260203020358.3315299-1-ruipengqi3@gmail.com Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04minix: Add required sanity checking to minix_check_superblock()Jori Koolstra
[ Upstream commit 8c97a6ddc95690a938ded44b4e3202f03f15078c ] The fs/minix implementation of the minix filesystem does not currently support any other value for s_log_zone_size than 0. This is also the only value supported in util-linux; see mkfs.minix.c line 511. In addition, this patch adds some sanity checking for the other minix superblock fields, and moves the minix_blocks_needed() checks for the zmap and imap also to minix_check_super_block(). This also closes a related syzbot bug report. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Link: https://patch.msgid.link/20251208153947.108343-1-jkoolstra@xs4all.nl Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: syzbot+5ad0824204c7bf9b67f2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5ad0824204c7bf9b67f2 Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04hfsplus: pretend special inodes as regular filesTetsuo Handa
[ Upstream commit ed8889ca21b6ab37bc1435c4009ce37a79acb9e6 ] Since commit af153bb63a33 ("vfs: catch invalid modes in may_open()") requires any inode be one of S_IFDIR/S_IFLNK/S_IFREG/S_IFCHR/S_IFBLK/ S_IFIFO/S_IFSOCK type, use S_IFREG for special inodes. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/d0a07b1b-8b73-4002-8e29-e2bd56871262@I-love.SAKURA.ne.jp Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04hfsplus: fix volume corruption issue for generic/498Viacheslav Dubeyko
[ Upstream commit 9a8c4ad44721da4c48e1ff240ac76286c82837fe ] The xfstests' test-case generic/498 leaves HFS+ volume in corrupted state: sudo ./check generic/498 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ #18 SMP PREEMPT_DYNAMIC Thu Dec 4 12:24:45 PST 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/498 _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent (see XFSTESTS-2/xfstests-dev/results//generic/498.full for details) Ran: generic/498 Failures: generic/498 Failed 1 of 1 tests sudo fsck.hfsplus -d /dev/loop51 ** /dev/loop51 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. Invalid leaf record count (It should be 16 instead of 2) ** Checking multi-linked files. CheckHardLinks: found 1 pre-Leopard file inodes. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000 CBTStat = 0x8000 CatStat = 0x00000000 ** Repairing volume. ** Rechecking volume. ** Checking non-journaled HFS Plus Volume. The volume name is untitled ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. CheckHardLinks: found 1 pre-Leopard file inodes. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. ** Checking volume information. ** The volume untitled was repaired successfully. The generic/498 test executes such steps on final phase: mkdir $SCRATCH_MNT/A mkdir $SCRATCH_MNT/B mkdir $SCRATCH_MNT/A/C touch $SCRATCH_MNT/B/foo $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/B/foo ln $SCRATCH_MNT/B/foo $SCRATCH_MNT/A/C/foo $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/A "Simulate a power failure and mount the filesystem to check that what we explicitly fsync'ed exists." _flakey_drop_and_remount The FSCK tool complains about "Invalid leaf record count". HFS+ b-tree header contains leaf_count field is updated by hfs_brec_insert() and hfs_brec_remove(). The hfs_brec_insert() is involved into hard link creation process. However, modified in-core leaf_count field is stored into HFS+ b-tree header by hfs_btree_write() method. But, unfortunately, hfs_btree_write() hasn't been called by hfsplus_cat_write_inode() and hfsplus_file_fsync() stores not fully consistent state of the Catalog File's b-tree. This patch adds calling hfs_btree_write() method in the hfsplus_cat_write_inode() with the goal of storing consistent state of Catalog File's b-tree. Finally, it makes FSCK tool happy. sudo ./check generic/498 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 6.18.0-rc1+ #22 SMP PREEMPT_DYNAMIC Sat Dec 6 17:01:31 PST 2025 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/498 33s ... 31s Ran: generic/498 Passed all 1 tests Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20251207035821.3863657-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04ext4: fix memory leak in ext4_ext_shift_extents()Zilin Guan
commit ca81109d4a8f192dc1cbad4a1ee25246363c2833 upstream. In ext4_ext_shift_extents(), if the extent is NULL in the while loop, the function returns immediately without releasing the path obtained via ext4_find_extent(), leading to a memory leak. Fix this by jumping to the out label to ensure the path is properly released. Fixes: a18ed359bdddc ("ext4: always check ext4_ext_find_extent result") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20251225084800.905701-1-zilin@seu.edu.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-04ext4: don't cache extent during splitting extentZhang Yi
commit 8b4b19a2f96348d70bfa306ef7d4a13b0bcbea79 upstream. Caching extents during the splitting process is risky, as it may result in stale extents remaining in the status tree. Moreover, in most cases, the corresponding extent block entries are likely already cached before the split happens, making caching here not particularly useful. Assume we have an unwritten extent, and then DIO writes the first half. [UUUUUUUUUUUUUUUU] on-disk extent U: unwritten extent [UUUUUUUUUUUUUUUU] extent status tree |<- ->| ----> dio write this range First, when ext4_split_extent_at() splits this extent, it truncates the existing extent and then inserts a new one. During this process, this extent status entry may be shrunk, and calls to ext4_find_extent() and ext4_cache_extents() may occur, which could potentially insert the truncated range as a hole into the extent status tree. After the split is completed, this hole is not replaced with the correct status. [UUUUUUU|UUUUUUUU] on-disk extent U: unwritten extent [UUUUUUU|HHHHHHHH] extent status tree H: hole Then, the outer calling functions will not correct this remaining hole extent either. Finally, if we perform a delayed buffer write on this latter part, it will re-insert the delayed extent and cause an error in space accounting. In adition, if the unwritten extent cache is not shrunk during the splitting, ext4_cache_extents() also conflicts with existing extents when caching extents. In the future, we will add checks when caching extents, which will trigger a warning. Therefore, Do not cache extents that are being split. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Cc: stable@kernel.org Message-ID: <20251129103247.686136-6-yi.zhang@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-04btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not foundFilipe Manana
[ Upstream commit ecb7c2484cfc83a93658907580035a8adf1e0a92 ] If btrfs_search_slot_for_read() returns 1, it means we did not find any key greater than or equals to the key we asked for, meaning we have reached the end of the tree and therefore the path is not valid. If this happens we need to break out of the loop and stop, instead of continuing and accessing an invalid path. Fixes: 5223cc60b40a ("btrfs: drop the path before adding qgroup items when enabling qgroups") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04ovl: Fix uninit-value in ovl_fill_realQing Wang
[ Upstream commit 1992330d90dd766fcf1730fd7bf2d6af65370ac4 ] Syzbot reported a KMSAN uninit-value issue in ovl_fill_real. This iusse's call chain is: __do_sys_getdents64() -> iterate_dir() ... -> ext4_readdir() -> fscrypt_fname_alloc_buffer() // alloc -> fscrypt_fname_disk_to_usr // write without tail '\0' -> dir_emit() -> ovl_fill_real() // read by strcmp() The string is used to store the decrypted directory entry name for an encrypted inode. As shown in the call chain, fscrypt_fname_disk_to_usr() write it without null-terminate. However, ovl_fill_real() uses strcmp() to compare the name against "..", which assumes a null-terminated string and may trigger a KMSAN uninit-value warning when the buffer tail contains uninit data. Reported-by: syzbot+d130f98b2c265fae5297@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d130f98b2c265fae5297 Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs") Signed-off-by: Qing Wang <wangqing7171@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260128132406.23768-2-amir73il@gmail.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAINOlga Kornievskaia
[ Upstream commit 5248d8474e594d156bee1ed10339cc16e207a28b ] It is possible to have a task get stuck on waiting on the NFS_LAYOUT_DRAIN in the following scenario 1. cpu a: waiter test NFS_LAYOUT_DRAIN (1) and plh_outstanding (1) 2. cpu b: atomic_dec_and_test() -> clear bit -> wake up 3. cpu c: sets NFS_LAYOUT_DRAIN again 4. cpu a: calls wait_on_bit() sleeps forever. To expand on this we have say 2 outstanding pnfs write IO that get ESTALE which causes both to call pnfs_destroy_layout() and set the NFS_LAYOUT_DRAIN bit but the 1st one doesn't call the pnfs_put_layout_hdr() yet (as that would prevent the 2nd ESTALE write from trying to call pnfs_destroy_layout()). If the 1st ESTALE write is the one that initially sets the NFS_LAYOUT_DRAIN so that new IO on this file initiates new LAYOUTGET. Another new write would find NFS_LAYOUT_DRAIN set and phl_outstanding>0 (step 1) and would wait_on_bit(). LAYOUTGET completes doing step 2. Now, the 2nd of ESTALE writes is calling pnfs_destory_layout() and set the NFS_LAYOUT_DRAIN bit (step 3). Finally, the waiting write wakes up to check the bit and goes back to sleep. The problem revolves around the fact that if NFS_LAYOUT_INVALID_STID was already set, it should not do the work of pnfs_mark_layout_stateid_invalid(), thus NFS_LAYOUT_DRAIN will not be set more than once for an invalid layout. Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: 880265c77ac4 ("pNFS: Avoid a live lock condition in pnfs_update_layout()") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04procfs: fix missing RCU protection when reading real_parent in do_task_stat()Jinliang Zheng
[ Upstream commit 76149d53502cf17ef3ae454ff384551236fba867 ] When reading /proc/[pid]/stat, do_task_stat() accesses task->real_parent without proper RCU protection, which leads to: cpu 0 cpu 1 ----- ----- do_task_stat var = task->real_parent release_task call_rcu(delayed_put_task_struct) task_tgid_nr_ns(var) rcu_read_lock <--- Too late to protect task->real_parent! task_pid_ptr <--- UAF! rcu_read_unlock This patch uses task_ppid_nr_ns() instead of task_tgid_nr_ns() to add proper RCU protection for accessing task->real_parent. Link: https://lkml.kernel.org/r/20260128083007.3173016-1-alexjlzheng@tencent.com Fixes: 06fffb1267c9 ("do_task_stat: don't take rcu_read_lock()") Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: David Hildenbrand <david@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: ruippan <ruippan@tencent.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04fat: avoid parent link count underflow in rmdirZhiyu Zhang
[ Upstream commit 8cafcb881364af5ef3a8b9fed4db254054033d8a ] Corrupted FAT images can leave a directory inode with an incorrect i_nlink (e.g. 2 even though subdirectories exist). rmdir then unconditionally calls drop_nlink(dir) and can drive i_nlink to 0, triggering the WARN_ON in drop_nlink(). Add a sanity check in vfat_rmdir() and msdos_rmdir(): only drop the parent link count when it is at least 3, otherwise report a filesystem error. Link: https://lkml.kernel.org/r/20260101111148.1437-1-zhiyuzhang999@gmail.com Fixes: 9a53c3a783c2 ("[PATCH] r/o bind mounts: unlink: monitor i_nlink") Signed-off-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Reported-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/aVN06OKsKxZe6-Kv@casper.infradead.org/T/#t Tested-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04nfsd: never defer requests during idmap lookupAnthony Iliopoulos
[ Upstream commit f9c206cdc4266caad6a9a7f46341420a10f03ccb ] During v4 request compound arg decoding, some ops (e.g. SETATTR) can trigger idmap lookup upcalls. When those upcall responses get delayed beyond the allowed time limit, cache_check() will mark the request for deferral and cause it to be dropped. This prevents nfs4svc_encode_compoundres from being executed, and thus the session slot flag NFSD4_SLOT_INUSE never gets cleared. Subsequent client requests will fail with NFSERR_JUKEBOX, given that the slot will be marked as in-use, making the SEQUENCE op fail. Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred. Fixes: 2f425878b6a7 ("nfsd: don't use the deferral service, return NFS4ERR_DELAY") Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04pstore/ram: fix buffer overflow in persistent_ram_save_old()Sai Ritvik Tanksalkar
[ Upstream commit 5669645c052f235726a85f443769b6fc02f66762 ] persistent_ram_save_old() can be called multiple times for the same persistent_ram_zone (e.g., via ramoops_pstore_read -> ramoops_get_next_prz for PSTORE_TYPE_DMESG records). Currently, the function only allocates prz->old_log when it is NULL, but it unconditionally updates prz->old_log_size to the current buffer size and then performs memcpy_fromio() using this new size. If the buffer size has grown since the first allocation (which can happen across different kernel boot cycles), this leads to: 1. A heap buffer overflow (OOB write) in the memcpy_fromio() calls 2. A subsequent OOB read when ramoops_pstore_read() accesses the buffer using the incorrect (larger) old_log_size The KASAN splat would look similar to: BUG: KASAN: slab-out-of-bounds in ramoops_pstore_read+0x... Read of size N at addr ... by task ... The conditions are likely extremely hard to hit: 0. Crash with a ramoops write of less-than-record-max-size bytes. 1. Reboot: ramoops registers, pstore_get_records(0) reads old crash, allocates old_log with size X 2. Crash handler registered, timer started (if pstore_update_ms >= 0) 3. Oops happens (non-fatal, system continues) 4. pstore_dump() writes oops via ramoops_pstore_write() size Y (>X) 5. pstore_new_entry = 1, pstore_timer_kick() called 6. System continues running (not a panic oops) 7. Timer fires after pstore_update_ms milliseconds 8. pstore_timefunc() → schedule_work() → pstore_dowork() → pstore_get_records(1) 9. ramoops_get_next_prz() → persistent_ram_save_old() 10. buffer_size() returns Y, but old_log is X bytes 11. Y > X: memcpy_fromio() overflows heap Requirements: - a prior crash record exists that did not fill the record size (almost impossible since the crash handler writes as much as it can possibly fit into the record, capped by max record size and the kmsg buffer almost always exceeds the max record size) - pstore_update_ms >= 0 (disabled by default) - Non-fatal oops (system survives) Free and reallocate the buffer when the new size differs from the previously allocated size. This ensures old_log always has sufficient space for the data being copied. Fixes: 201e4aca5aa1 ("pstore/ram: Should update old dmesg buffer before reading") Signed-off-by: Sai Ritvik Tanksalkar <stanksal@purdue.edu> Link: https://patch.msgid.link/20260201132240.2948732-1-stanksal@purdue.edu Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04iomap: fix submission side handling of completion side errorsChristoph Hellwig
[ Upstream commit 4ad357e39b2ecd5da7bcc7e840ee24d179593cd5 ] The "if (dio->error)" in iomap_dio_bio_iter exists to stop submitting more bios when a completion already return an error. Commit cfe057f7db1f ("iomap_dio_actor(): fix iov_iter bugs") made it revert the iov by "copied", which is very wrong given that we've already consumed that range and submitted a bio for it. Fixes: cfe057f7db1f ("iomap_dio_actor(): fix iov_iter bugs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04btrfs: qgroup: return correct error when deleting qgroup relation itemFilipe Manana
[ Upstream commit 51b1fcf71c88c3c89e7dcf07869c5de837b1f428 ] If we fail to delete the second qgroup relation item, we end up returning success or -ENOENT in case the first item does not exist, instead of returning the error from the second item deletion. Fixes: 73798c465b66 ("btrfs: qgroup: Try our best to delete qgroup relations") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04gfs2: Fix use-after-free in iomap inline data write pathDeepanshu Kartikey
[ Upstream commit faddeb848305e79db89ee0479bb0e33380656321 ] The inline data buffer head (dibh) is being released prematurely in gfs2_iomap_begin() via release_metapath() while iomap->inline_data still points to dibh->b_data. This causes a use-after-free when iomap_write_end_inline() later attempts to write to the inline data area. The bug sequence: 1. gfs2_iomap_begin() calls gfs2_meta_inode_buffer() to read inode metadata into dibh 2. Sets iomap->inline_data = dibh->b_data + sizeof(struct gfs2_dinode) 3. Calls release_metapath() which calls brelse(dibh), dropping refcount to 0 4. kswapd reclaims the page (~39ms later in the syzbot report) 5. iomap_write_end_inline() tries to memcpy() to iomap->inline_data 6. KASAN detects use-after-free write to freed memory Fix by storing dibh in iomap->private and incrementing its refcount with get_bh() in gfs2_iomap_begin(). The buffer is then properly released in gfs2_iomap_end() after the inline write completes, ensuring the page stays alive for the entire iomap operation. Note: A C reproducer is not available for this issue. The fix is based on analysis of the KASAN report and code review showing the buffer head is freed before use. [agruenba: Take buffer head reference in gfs2_iomap_begin() to avoid leaks in gfs2_iomap_get() and gfs2_iomap_alloc().] Reported-by: syzbot+ea1cd4aa4d1e98458a55@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ea1cd4aa4d1e98458a55 Fixes: d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock") Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04gfs2: Add metapath_dibh helperAndreas Gruenbacher
[ Upstream commit 92099f0c92270c8c7a79e6bc6e0312ad248ea331 ] Add a metapath_dibh() helper for extracting the inode's buffer head from a metapath. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: faddeb848305 ("gfs2: Fix use-after-free in iomap inline data write path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04gfs2: Move the inode glock locking to gfs2_file_buffered_writeAndreas Gruenbacher
[ Upstream commit b924bdab7445946e2ed364a0e6e249d36f1f1158 ] So far, for buffered writes, we were taking the inode glock in gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of not holding the inode glock while iomap_write_actor faults in user pages. It turns out that iomap_write_actor is called inside iomap_begin ... iomap_end, so the user pages were still faulted in while holding the inode glock and the locking code in iomap_begin / iomap_end was completely pointless. Move the locking into gfs2_file_buffered_write instead. We'll take care of the potential deadlocks due to faulting in user pages while holding a glock in a subsequent patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: faddeb848305 ("gfs2: Fix use-after-free in iomap inline data write path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04gfs2: Add wrapper for iomap_file_buffered_writeAndreas Gruenbacher
[ Upstream commit 2eb7509a05443048fb4df60b782de3f03c6c298b ] Add a wrapper around iomap_file_buffered_write. We'll add code for when the operation needs to be retried here later. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: faddeb848305 ("gfs2: Fix use-after-free in iomap inline data write path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extentAndreas Gruenbacher
[ Upstream commit 152f58c9af21abf913699e671b425fd38447b170 ] We don't need two very similar functions for mapping logical blocks to physical blocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: faddeb848305 ("gfs2: Fix use-after-free in iomap inline data write path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extentAndreas Gruenbacher
[ Upstream commit 9153dac13a6966b63183bac450d5cd39b07cc85c ] Convert gfs2_extent_map to iomap and split it into gfs2_get_extent and gfs2_alloc_extent. Instead of hardcoding the extent size, pass it in via the extlen parameter. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: faddeb848305 ("gfs2: Fix use-after-free in iomap inline data write path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04gfs2: Add new gfs2_iomap_get helperAndreas Gruenbacher
[ Upstream commit 54992257fe4bb9f76f66b3863492aa8cc5567790 ] Rename the current gfs2_iomap_get and gfs2_iomap_alloc functions to __*. Add a new gfs2_iomap_get helper that doesn't expose struct metapath. Rename gfs2_iomap_get_alloc to gfs2_iomap_alloc. Use the new helpers where they make sense. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: faddeb848305 ("gfs2: Fix use-after-free in iomap inline data write path") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04fs: add <linux/init_task.h> for 'init_fs'Ben Dooks
[ Upstream commit 589cff4975afe1a4eaaa1d961652f50b1628d78d ] The init_fs symbol is defined in <linux/init_task.h> but was not included in fs/fs_struct.c so fix by adding the include. Fixes the following sparse warning: fs/fs_struct.c:150:18: warning: symbol 'init_fs' was not declared. Should it be static? Fixes: 3e93cd671813e ("Take fs_struct handling to new file") Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Link: https://patch.msgid.link/20260108115856.238027-1-ben.dooks@codethink.co.uk Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04hfsplus: return error when node already exists in hfs_bnode_createShardul Bankar
[ Upstream commit d8a73cc46c8462a969a7516131feb3096f4c49d3 ] When hfs_bnode_create() finds that a node is already hashed (which should not happen in normal operation), it currently returns the existing node without incrementing its reference count. This causes a reference count inconsistency that leads to a kernel panic when the node is later freed in hfs_bnode_put(): kernel BUG at fs/hfsplus/bnode.c:676! BUG_ON(!atomic_read(&node->refcnt)) This scenario can occur when hfs_bmap_alloc() attempts to allocate a node that is already in use (e.g., when node 0's bitmap bit is incorrectly unset), or due to filesystem corruption. Returning an existing node from a create path is not normal operation. Fix this by returning ERR_PTR(-EEXIST) instead of the node when it's already hashed. This properly signals the error condition to callers, which already check for IS_ERR() return values. Reported-by: syzbot+1c8ff72d0cd8a50dfeaa@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=1c8ff72d0cd8a50dfeaa Link: https://lore.kernel.org/all/784415834694f39902088fa8946850fc1779a318.camel@ibm.com/ Fixes: 634725a92938 ("[PATCH] hfs: cleanup HFS+ prints") Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20251229204938.1907089-1-shardul.b@mpiricsoftware.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-19f2fs: fix to avoid UAF in f2fs_write_end_io()Chao Yu
[ Upstream commit ce2739e482bce8d2c014d76c4531c877f382aa54 ] As syzbot reported an use-after-free issue in f2fs_write_end_io(). It is caused by below race condition: loop device umount - worker_thread - loop_process_work - do_req_filebacked - lo_rw_aio - lo_rw_aio_complete - blk_mq_end_request - blk_update_request - f2fs_write_end_io - dec_page_count - folio_end_writeback - kill_f2fs_super - kill_block_super - f2fs_put_super : free(sbi) : get_pages(, F2FS_WB_CP_DATA) accessed sbi which is freed In kill_f2fs_super(), we will drop all page caches of f2fs inodes before call free(sbi), it guarantee that all folios should end its writeback, so it should be safe to access sbi before last folio_end_writeback(). Let's relocate ckpt thread wakeup flow before folio_end_writeback() to resolve this issue. Cc: stable@kernel.org Fixes: e234088758fc ("f2fs: avoid wait if IO end up when do_checkpoint for better performance") Reported-by: syzbot+b4444e3c972a7a124187@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b4444e3c972a7a124187 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ folio => page ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19f2fs: fix out-of-bounds access in sysfs attribute read/writeYongpeng Yang
[ Upstream commit 98ea0039dbfdd00e5cc1b9a8afa40434476c0955 ] Some f2fs sysfs attributes suffer from out-of-bounds memory access and incorrect handling of integer values whose size is not 4 bytes. For example: vm:~# echo 65537 > /sys/fs/f2fs/vde/carve_out vm:~# cat /sys/fs/f2fs/vde/carve_out 65537 vm:~# echo 4294967297 > /sys/fs/f2fs/vde/atgc_age_threshold vm:~# cat /sys/fs/f2fs/vde/atgc_age_threshold 1 carve_out maps to {struct f2fs_sb_info}->carve_out, which is a 8-bit integer. However, the sysfs interface allows setting it to a value larger than 255, resulting in an out-of-range update. atgc_age_threshold maps to {struct atgc_management}->age_threshold, which is a 64-bit integer, but its sysfs interface cannot correctly set values larger than UINT_MAX. The root causes are: 1. __sbi_store() treats all default values as unsigned int, which prevents updating integers larger than 4 bytes and causes out-of-bounds writes for integers smaller than 4 bytes. 2. f2fs_sbi_show() also assumes all default values are unsigned int, leading to out-of-bounds reads and incorrect access to integers larger than 4 bytes. This patch introduces {struct f2fs_attr}->size to record the actual size of the integer associated with each sysfs attribute. With this information, sysfs read and write operations can correctly access and update values according to their real data size, avoiding memory corruption and truncation. Fixes: b59d0bae6ca3 ("f2fs: add sysfs support for controlling the gc_thread") Cc: stable@kernel.org Signed-off-by: Jinbao Liu <liujinbao1@xiaomi.com> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ f2fs_sbi_show() changes + .size for F2FS_STAT_ATTR ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19fs: dlm: fix invalid derefence of sb_lvbptrAlexander Aring
[ Upstream commit 7175e131ebba47afef47e6ac4d5bab474d1e6e49 ] I experience issues when putting a lkbsb on the stack and have sb_lvbptr field to a dangled pointer while not using DLM_LKF_VALBLK. It will crash with the following kernel message, the dangled pointer is here 0xdeadbeef as example: [ 102.749317] BUG: unable to handle page fault for address: 00000000deadbeef [ 102.749320] #PF: supervisor read access in kernel mode [ 102.749323] #PF: error_code(0x0000) - not-present page [ 102.749325] PGD 0 P4D 0 [ 102.749332] Oops: 0000 [#1] PREEMPT SMP PTI [ 102.749336] CPU: 0 PID: 1567 Comm: lock_torture_wr Tainted: G W 5.19.0-rc3+ #1565 [ 102.749343] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014 [ 102.749344] RIP: 0010:memcpy_erms+0x6/0x10 [ 102.749353] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe [ 102.749355] RSP: 0018:ffff97a58145fd08 EFLAGS: 00010202 [ 102.749358] RAX: ffff901778b77070 RBX: 0000000000000000 RCX: 0000000000000040 [ 102.749360] RDX: 0000000000000040 RSI: 00000000deadbeef RDI: ffff901778b77070 [ 102.749362] RBP: ffff97a58145fd10 R08: ffff901760b67a70 R09: 0000000000000001 [ 102.749364] R10: ffff9017008e2cb8 R11: 0000000000000001 R12: ffff901760b67a70 [ 102.749366] R13: ffff901760b78f00 R14: 0000000000000003 R15: 0000000000000001 [ 102.749368] FS: 0000000000000000(0000) GS:ffff901876e00000(0000) knlGS:0000000000000000 [ 102.749372] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 102.749374] CR2: 00000000deadbeef CR3: 000000017c49a004 CR4: 0000000000770ef0 [ 102.749376] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 102.749378] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 102.749379] PKRU: 55555554 [ 102.749381] Call Trace: [ 102.749382] <TASK> [ 102.749383] ? send_args+0xb2/0xd0 [ 102.749389] send_common+0xb7/0xd0 [ 102.749395] _unlock_lock+0x2c/0x90 [ 102.749400] unlock_lock.isra.56+0x62/0xa0 [ 102.749405] dlm_unlock+0x21e/0x330 [ 102.749411] ? lock_torture_stats+0x80/0x80 [dlm_locktorture] [ 102.749416] torture_unlock+0x5a/0x90 [dlm_locktorture] [ 102.749419] ? preempt_count_sub+0xba/0x100 [ 102.749427] lock_torture_writer+0xbd/0x150 [dlm_locktorture] [ 102.786186] kthread+0x10a/0x130 [ 102.786581] ? kthread_complete_and_exit+0x20/0x20 [ 102.787156] ret_from_fork+0x22/0x30 [ 102.787588] </TASK> [ 102.787855] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common kvm_intel iTCO_wdt iTCO_vendor_support kvm vmw_vsock_virtio_transport qxl irqbypass vmw_vsock_virtio_transport_common drm_ttm_helper crc32_pclmul joydev crc32c_intel ttm vsock virtio_scsi virtio_balloon snd_pcm drm_kms_helper virtio_console snd_timer snd drm soundcore syscopyarea i2c_i801 sysfillrect sysimgblt i2c_smbus pcspkr fb_sys_fops lpc_ich serio_raw [ 102.792536] CR2: 00000000deadbeef [ 102.792930] ---[ end trace 0000000000000000 ]--- This patch fixes the issue by checking also on DLM_LKF_VALBLK on exflags is set when copying the lvbptr array instead of if it's just null which fixes for me the issue. I think this patch can fix other dlm users as well, depending how they handle the init, freeing memory handling of sb_lvbptr and don't set DLM_LKF_VALBLK for some dlm_lock() calls. It might a there could be a hidden issue all the time. However with checking on DLM_LKF_VALBLK the user always need to provide a sb_lvbptr non-null value. There might be more intelligent handling between per ls lvblen, DLM_LKF_VALBLK and non-null to report the user the way how DLM API is used is wrong but can be added for later, this will only fix the current behaviour. Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> [ Adjust context ] Signed-off-by: Bin Lan <lanbincn@139.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19romfs: check sb_set_blocksize() return valueDeepanshu Kartikey
[ Upstream commit ab7ad7abb3660c58ffffdf07ff3bb976e7e0afa0 ] romfs_fill_super() ignores the return value of sb_set_blocksize(), which can fail if the requested block size is incompatible with the block device's configuration. This can be triggered by setting a loop device's block size larger than PAGE_SIZE using ioctl(LOOP_SET_BLOCK_SIZE, 32768), then mounting a romfs filesystem on that device. When sb_set_blocksize(sb, ROMBSIZE) is called with ROMBSIZE=4096 but the device has logical_block_size=32768, bdev_validate_blocksize() fails because the requested size is smaller than the device's logical block size. sb_set_blocksize() returns 0 (failure), but romfs ignores this and continues mounting. The superblock's block size remains at the device's logical block size (32768). Later, when sb_bread() attempts I/O with this oversized block size, it triggers a kernel BUG in folio_set_bh(): kernel BUG at fs/buffer.c:1582! BUG_ON(size > PAGE_SIZE); Fix by checking the return value of sb_set_blocksize() and failing the mount with -EINVAL if it returns 0. Reported-by: syzbot+9c4e33e12283d9437c25@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9c4e33e12283d9437c25 Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20260113084037.1167887-1-kartikey406@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-19nilfs2: Fix potential block overflow that cause system hangEdward Adam Davis
commit ed527ef0c264e4bed6c7b2a158ddf516b17f5f66 upstream. When a user executes the FITRIM command, an underflow can occur when calculating nblocks if end_block is too small. Since nblocks is of type sector_t, which is u64, a negative nblocks value will become a very large positive integer. This ultimately leads to the block layer function __blkdev_issue_discard() taking an excessively long time to process the bio chain, and the ns_segctor_sem lock remains held for a long period. This prevents other tasks from acquiring the ns_segctor_sem lock, resulting in the hang reported by syzbot in [1]. If the ending block is too small, typically if it is smaller than 4KiB range, depending on the usage of the segment 0, it may be possible to attempt a discard request beyond the device size causing the hang. Exiting successfully and assign the discarded size (0 in this case) to range->len. Although the start and len values in the user input range are too small, a conservative strategy is adopted here to safely ignore them, which is equivalent to a no-op; it will not perform any trimming and will not throw an error. [1] task:segctord state:D stack:28968 pid:6093 tgid:6093 ppid:2 task_flags:0x200040 flags:0x00080000 Call Trace: rwbase_write_lock+0x3dd/0x750 kernel/locking/rwbase_rt.c:272 nilfs_transaction_lock+0x253/0x4c0 fs/nilfs2/segment.c:357 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2569 [inline] nilfs_segctor_thread+0x6ec/0xe00 fs/nilfs2/segment.c:2684 [ryusuke: corrected part of the commit message about the consequences] Fixes: 82e11e857be3 ("nilfs2: add nilfs_sufile_trim_fs to trim clean segs") Reported-by: syzbot+7eedce5eb281acd832f0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7eedce5eb281acd832f0 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06writeback: fix 100% CPU usage when dirtytime_expire_interval is 0Laveesh Bansal
[ Upstream commit 543467d6fe97e27e22a26e367fda972dbefebbff ] When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback() schedules delayed work with a delay of 0, causing immediate execution. The function then reschedules itself with 0 delay again, creating an infinite busy loop that causes 100% kworker CPU usage. Fix by: - Only scheduling delayed work in wakeup_dirtytime_writeback() when dirtytime_expire_interval is non-zero - Cancelling the delayed work in dirtytime_interval_handler() when the interval is set to 0 - Adding a guard in start_dirtytime_writeback() for defensive coding Tested by booting kernel in QEMU with virtme-ng: - Before fix: kworker CPU spikes to ~73% - After fix: CPU remains at normal levels - Setting interval back to non-zero correctly resumes writeback Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written") Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227 Signed-off-by: Laveesh Bansal <laveeshb@laveeshbansal.com> Link: https://patch.msgid.link/20260106145059.543282-2-laveeshb@laveeshbansal.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> [ adapted system_percpu_wq to system_wq for the workqueue used in dirtytime_interval_handler() ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06xfs: set max_agbno to allow sparse alloc of last full inode chunkBrian Foster
[ Upstream commit c360004c0160dbe345870f59f24595519008926f ] Sparse inode cluster allocation sets min/max agbno values to avoid allocating an inode cluster that might map to an invalid inode chunk. For example, we can't have an inode record mapped to agbno 0 or that extends past the end of a runt AG of misaligned size. The initial calculation of max_agbno is unnecessarily conservative, however. This has triggered a corner case allocation failure where a small runt AG (i.e. 2063 blocks) is mostly full save for an extent to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this case, which happens to be the offset of the last possible valid inode chunk in the AG. In practice, we should be able to allocate the 4-block cluster at agbno 2052 to map to the parent inode record at agbno 2048, but the max_agbno value precludes it. Note that this can result in filesystem shutdown via dirty trans cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because the tail AG selection by the allocator sets t_highest_agno on the transaction. If the inode allocator spins around and finds an inode chunk with free inodes in an earlier AG, the subsequent dir name creation path may still fail to allocate due to the AG restriction and cancel. To avoid this problem, update the max_agbno calculation to the agbno prior to the last chunk aligned agbno in the AG. This is not necessarily the last valid allocation target for a sparse chunk, but since inode chunks (i.e. records) are chunk aligned and sparse allocs are cluster sized/aligned, this allows the sb_spino_align alignment restriction to take over and round down the max effective agbno to within the last valid inode chunk in the AG. Note that even though the allocator improvements in the aforementioned commit seem to avoid this particular dirty trans cancel situation, the max_agbno logic improvement still applies as we should be able to allocate from an AG that has been appropriately selected. The more important target for this patch however are older/stable kernels prior to this allocator rework/improvement. Cc: stable@vger.kernel.org # v4.2 Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> [ xfs_ag_block_count(args.mp, pag_agno(pag)) => args.mp->m_sb.sb_agblocks ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06btrfs: fix deadlock in wait_current_trans() due to ignored transaction typeRobbie Ko
commit 5037b342825df7094a4906d1e2a9674baab50cb2 upstream. When wait_current_trans() is called during start_transaction(), it currently waits for a blocked transaction without considering whether the given transaction type actually needs to wait for that particular transaction state. The btrfs_blocked_trans_types[] array already defines which transaction types should wait for which transaction states, but this check was missing in wait_current_trans(). This can lead to a deadlock scenario involving two transactions and pending ordered extents: 1. Transaction A is in TRANS_STATE_COMMIT_DOING state 2. A worker processing an ordered extent calls start_transaction() with TRANS_JOIN 3. join_transaction() returns -EBUSY because Transaction A is in TRANS_STATE_COMMIT_DOING 4. Transaction A moves to TRANS_STATE_UNBLOCKED and completes 5. A new Transaction B is created (TRANS_STATE_RUNNING) 6. The ordered extent from step 2 is added to Transaction B's pending ordered extents 7. Transaction B immediately starts commit by another task and enters TRANS_STATE_COMMIT_START 8. The worker finally reaches wait_current_trans(), sees Transaction B in TRANS_STATE_COMMIT_START (a blocked state), and waits unconditionally 9. However, TRANS_JOIN should NOT wait for TRANS_STATE_COMMIT_START according to btrfs_blocked_trans_types[] 10. Transaction B is waiting for pending ordered extents to complete 11. Deadlock: Transaction B waits for ordered extent, ordered extent waits for Transaction B This can be illustrated by the following call stacks: CPU0 CPU1 btrfs_finish_ordered_io() start_transaction(TRANS_JOIN) join_transaction() # -EBUSY (Transaction A is # TRANS_STATE_COMMIT_DOING) # Transaction A completes # Transaction B created # ordered extent added to # Transaction B's pending list btrfs_commit_transaction() # Transaction B enters # TRANS_STATE_COMMIT_START # waiting for pending ordered # extents wait_current_trans() # waits for Transaction B # (should not wait!) Task bstore_kv_sync in btrfs_commit_transaction waiting for ordered extents: __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 btrfs_commit_transaction+0xbf7/0xda0 [btrfs] btrfs_sync_file+0x342/0x4d0 [btrfs] __x64_sys_fdatasync+0x4b/0x80 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Task kworker in wait_current_trans waiting for transaction commit: Workqueue: btrfs-syno_nocow btrfs_work_helper [btrfs] __schedule+0x2e7/0x8a0 schedule+0x64/0xe0 wait_current_trans+0xb0/0x110 [btrfs] start_transaction+0x346/0x5b0 [btrfs] btrfs_finish_ordered_io.isra.0+0x49b/0x9c0 [btrfs] btrfs_work_helper+0xe8/0x350 [btrfs] process_one_work+0x1d3/0x3c0 worker_thread+0x4d/0x3e0 kthread+0x12d/0x150 ret_from_fork+0x1f/0x30 Fix this by passing the transaction type to wait_current_trans() and checking btrfs_blocked_trans_types[cur_trans->state] against the given type before deciding to wait. This ensures that transaction types which are allowed to join during certain blocked states will not unnecessarily wait and cause deadlocks. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Cc: Motiejus Jakštys <motiejus@jakstys.lt> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06ext4: fix iloc.bh leak in ext4_xattr_inode_update_refYang Erkun
commit d250bdf531d9cd4096fedbb9f172bb2ca660c868 upstream. The error branch for ext4_xattr_inode_update_ref forget to release the refcount for iloc.bh. Find this when review code. Fixes: 57295e835408 ("ext4: guard against EA inode refcount underflow in xattr update") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20251213055706.3417529-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06pnfs/flexfiles: Fix memory leak in nfs4_ff_alloc_deviceid_node()Zilin Guan
[ Upstream commit 0c728083654f0066f5e10a1d2b0bd0907af19a58 ] In nfs4_ff_alloc_deviceid_node(), if the allocation for ds_versions fails, the function jumps to the out_scratch label without freeing the already allocated dsaddrs list, leading to a memory leak. Fix this by jumping to the out_err_drain_dsaddrs label, which properly frees the dsaddrs list before cleaning up other resources. Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-19NFS: add barriers when testing for NFS_FSDATA_BLOCKEDNeilBrown
commit 99bc9f2eb3f79a2b4296d9bf43153e1d10ca50d3 upstream. dentry->d_fsdata is set to NFS_FSDATA_BLOCKED while unlinking or renaming-over a file to ensure that no open succeeds while the NFS operation progressed on the server. Setting dentry->d_fsdata to NFS_FSDATA_BLOCKED is done under ->d_lock after checking the refcount is not elevated. Any attempt to open the file (through that name) will go through lookp_open() which will take ->d_lock while incrementing the refcount, we can be sure that once the new value is set, __nfs_lookup_revalidate() *will* see the new value and will block. We don't have any locking guarantee that when we set ->d_fsdata to NULL, the wait_var_event() in __nfs_lookup_revalidate() will notice. wait/wake primitives do NOT provide barriers to guarantee order. We must use smp_load_acquire() in wait_var_event() to ensure we look at an up-to-date value, and must use smp_store_release() before wake_up_var(). This patch adds those barrier functions and factors out block_revalidate() and unblock_revalidate() far clarity. There is also a hypothetical bug in that if memory allocation fails (which never happens in practice) we might leave ->d_fsdata locked. This patch adds the missing call to unblock_revalidate(). Reported-and-tested-by: Richard Kojedzinszky <richard+debian+bugreport@kojedz.in> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071501 Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-19NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENTNeilBrown
commit f16857e62bac60786104c020ad7c86e2163b2c5b upstream. nfs_unlink() calls d_delete() twice if it receives ENOENT from the server - once in nfs_dentry_handle_enoent() from nfs_safe_remove and once in nfs_dentry_remove_handle_error(). nfs_rmddir() also calls it twice - the nfs_dentry_handle_enoent() call is direct and inside a region locked with ->rmdir_sem It is safe to call d_delete() twice if the refcount > 1 as the dentry is simply unhashed. If the refcount is 1, the first call sets d_inode to NULL and the second call crashes. This patch guards the d_delete() call from nfs_dentry_handle_enoent() leaving the one under ->remdir_sem in case that is important. In mainline it would be safe to remove the d_delete() call. However in older kernels to which this might be backported, that would change the behaviour of nfs_unlink(). nfs_unlink() used to unhash the dentry which resulted in nfs_dentry_handle_enoent() not calling d_delete(). So in older kernels we need the d_delete() in nfs_dentry_remove_handle_error() when called from nfs_unlink() but not when called from nfs_rmdir(). To make the code work correctly for old and new kernels, and from both nfs_unlink() and nfs_rmdir(), we protect the d_delete() call with simple_positive(). This ensures it is never called in a circumstance where it could crash. Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Fixes: 9019fb391de0 ("NFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()") Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-19nfsd: provide locking for v4_end_graceNeilBrown
[ Upstream commit 2857bd59feb63fcf40fe4baf55401baea6b4feb4 ] Writing to v4_end_grace can race with server shutdown and result in memory being accessed after it was freed - reclaim_str_hashtbl in particularly. We cannot hold nfsd_mutex across the nfsd4_end_grace() call as that is held while client_tracking_op->init() is called and that can wait for an upcall to nfsdcltrack which can write to v4_end_grace, resulting in a deadlock. nfsd4_end_grace() is also called by the landromat work queue and this doesn't require locking as server shutdown will stop the work and wait for it before freeing anything that nfsd4_end_grace() might access. However, we must be sure that writing to v4_end_grace doesn't restart the work item after shutdown has already waited for it. For this we add a new flag protected with nn->client_lock. It is set only while it is safe to make client tracking calls, and v4_end_grace only schedules work while the flag is set with the spinlock held. So this patch adds a nfsd_net field "client_tracking_active" which is set as described. Another field "grace_end_forced", is set when v4_end_grace is written. After this is set, and providing client_tracking_active is set, the laundromat is scheduled. This "grace_end_forced" field bypasses other checks for whether the grace period has finished. This resolves a race which can result in use-after-free. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Closes: https://lore.kernel.org/linux-nfs/20250623030015.2353515-1-neil@brown.name/T/#t Fixes: 7f5ef2e900d9 ("nfsd: add a v4_end_grace file to /proc/fs/nfsd") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neil@brown.name> Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ Adjust context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-19NFS: Fix up the automount fs_context to use the correct credTrond Myklebust
[ Upstream commit a2a8fc27dd668e7562b5326b5ed2f1604cb1e2e9 ] When automounting, the fs_context should be fixed up to use the cred from the parent filesystem, since the operation is just extending the namespace. Authorisation to enter that namespace will already have been provided by the preceding lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-19NFSv4: ensure the open stateid seqid doesn't go backwardsScott Mayhew
[ Upstream commit 2e47c3cc64b44b0b06cd68c2801db92ff143f2b2 ] We have observed an NFSv4 client receiving a LOCK reply with a status of NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an earlier seqid value in the stateid. As this was for a new lockowner, that would imply that nfs_set_open_stateid_locked() had updated the open stateid seqid with an earlier value. Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out of sequence, the task will sleep on the state->waitq for up to 5 seconds. If the task waits for the full 5 seconds, then after finishing the wait it'll update the open stateid seqid with whatever value the incoming seqid has. If there are multiple waiters in this scenario, then the last one to perform said update may not be the one with the highest seqid. Add a check to ensure that the seqid can only be incremented, and add a tracepoint to indicate when old seqids are skipped. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-19ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()Ye Bin
[ Upstream commit 5701875f9609b000d91351eaa6bfd97fe2f157f4 ] There's issue as follows: BUG: KASAN: use-after-free in ext4_xattr_inode_dec_ref_all+0x6ff/0x790 Read of size 4 at addr ffff88807b003000 by task syz-executor.0/15172 CPU: 3 PID: 15172 Comm: syz-executor.0 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0xbe/0xfd lib/dump_stack.c:123 print_address_description.constprop.0+0x1e/0x280 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 ext4_xattr_inode_dec_ref_all+0x6ff/0x790 fs/ext4/xattr.c:1137 ext4_xattr_delete_inode+0x4c7/0xda0 fs/ext4/xattr.c:2896 ext4_evict_inode+0xb3b/0x1670 fs/ext4/inode.c:323 evict+0x39f/0x880 fs/inode.c:622 iput_final fs/inode.c:1746 [inline] iput fs/inode.c:1772 [inline] iput+0x525/0x6c0 fs/inode.c:1758 ext4_orphan_cleanup fs/ext4/super.c:3298 [inline] ext4_fill_super+0x8c57/0xba40 fs/ext4/super.c:5300 mount_bdev+0x355/0x410 fs/super.c:1446 legacy_get_tree+0xfe/0x220 fs/fs_context.c:611 vfs_get_tree+0x8d/0x2f0 fs/super.c:1576 do_new_mount fs/namespace.c:2983 [inline] path_mount+0x119a/0x1ad0 fs/namespace.c:3316 do_mount+0xfc/0x110 fs/namespace.c:3329 __do_sys_mount fs/namespace.c:3540 [inline] __se_sys_mount+0x219/0x2e0 fs/namespace.c:3514 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Memory state around the buggy address: ffff88807b002f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88807b002f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88807b003000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88807b003080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88807b003100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Above issue happens as ext4_xattr_delete_inode() isn't check xattr is valid if xattr is in inode. To solve above issue call xattr_check_inode() check if xattr if valid in inode. In fact, we can directly verify in ext4_iget_extra_inode(), so that there is no divergent verification. Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250208063141.1539283-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: David Nyström <david.nystrom@est.tech> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>