kernel - Hosts the 0x221E linux distro kernel.

Age	Commit message (Collapse)	Author
2026-02-19	f2fs: fix incomplete block usage in compact SSA summaries	Daeho Jeong
	commit 91b76f1059b60f453b51877f29f0e35693737383 upstream. In a previous commit, a bug was introduced where compact SSA summaries failed to utilize the entire block space in non-4KB block size configurations, leading to inefficient space management. This patch fixes the calculation logic to ensure that compact SSA summaries can fully occupy the block regardless of the block size. Reported-by: Chris Mason <clm@meta.com> Fixes: e48e16f3e37f ("f2fs: support non-4KB block size without packed_ssa feature") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix to do sanity check on node footer in {read,write}_end_io	Chao Yu
	[ Upstream commit 50ac3ecd8e05b6bcc350c71a4307d40c030ec7e4 ] -----------[ cut here ]------------ kernel BUG at fs/f2fs/data.c:358! Call Trace: <IRQ> blk_update_request+0x5eb/0xe70 block/blk-mq.c:987 blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1149 blk_complete_reqs block/blk-mq.c:1224 [inline] blk_done_softirq+0x107/0x160 block/blk-mq.c:1229 handle_softirqs+0x283/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050 </IRQ> In f2fs_write_end_io(), it detects there is inconsistency in between node page index (nid) and footer.nid of node page. If footer of node page is corrupted in fuzzed image, then we load corrupted node page w/ async method, e.g. f2fs_ra_node_pages() or f2fs_ra_node_page(), in where we won't do sanity check on node footer, once node page becomes dirty, we will encounter this bug after node page writeback. Cc: stable@kernel.org Reported-by: syzbot+803dd716c4310d16ff3a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=803dd716c4310d16ff3a Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ Context ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix to do sanity check on node footer in __write_node_folio()	Chao Yu
	[ Upstream commit 0a736109c9d29de0c26567e42cb99b27861aa8ba ] Add node footer sanity check during node folio's writeback, if sanity check fails, let's shutdown filesystem to avoid looping to redirty and writeback in .writepages. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	Revert "f2fs: block cache/dio write during f2fs_enable_checkpoint()"	Chao Yu
	[ Upstream commit 3996b70209f145bfcf2afc7d05dd92c27b233b48 ] This reverts commit 196c81fdd438f7ac429d5639090a9816abb9760a. Original patch may cause below deadlock, revert it. write remount - write_begin - lock_page --- lock A - prepare_write_begin - f2fs_map_lock - f2fs_enable_checkpoint - down_write(cp_enable_rwsem) --- lock B - sync_inode_sb - writepages - lock_page --- lock A - down_read(cp_enable_rwsem) --- lock A Cc: stable@kernel.org Fixes: 196c81fdd438 ("f2fs: block cache/dio write during f2fs_enable_checkpoint()") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> [ drop tracing bits ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: optimize f2fs_overwrite_io() for f2fs_iomap_begin	Yeongjin Gil
	commit d860974a7e38d35e9e2c4dc8a9f4223b38b6ad99 upstream. When overwriting already allocated blocks, f2fs_iomap_begin() calls f2fs_overwrite_io() to check block mappings. However, f2fs_overwrite_io() iterates through all mapped blocks in the range, which can be inefficient for fragmented files with large I/O requests. This patch optimizes f2fs_overwrite_io() by adding a 'check_first' parameter and introducing __f2fs_overwrite_io() helper. When called from f2fs_iomap_begin(), we only check the first mapping to determine if the range is already allocated, which is sufficient for setting map.m_may_create. This optimization significantly reduces the number of f2fs_map_blocks() calls in f2fs_overwrite_io() when called from f2fs_iomap_begin(), especially for fragmented files with large I/O requests. Cc: stable@kernel.org Fixes: 351bc761338d ("f2fs: optimize f2fs DIO overwrites") Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix to avoid mapping wrong physical block for swapfile	Chao Yu
	commit 5c145c03188bc9ba1c29e0bc4d527a5978fc47f9 upstream. Xiaolong Guo reported a f2fs bug in bugzilla [1] [1] https://bugzilla.kernel.org/show_bug.cgi?id=220951 Quoted: "When using stress-ng's swap stress test on F2FS filesystem with kernel 6.6+, the system experiences data corruption leading to either: 1 dm-verity corruption errors and device reboot 2 F2FS node corruption errors and boot hangs The issue occurs specifically when: 1 Using F2FS filesystem (ext4 is unaffected) 2 Swapfile size is less than F2FS section size (2MB) 3 Swapfile has fragmented physical layout (multiple non-contiguous extents) 4 Kernel version is 6.6+ (6.1 is unaffected) The root cause is in check_swap_activate() function in fs/f2fs/data.c. When the first extent of a small swapfile (< 2MB) is not aligned to section boundaries, the function incorrectly treats it as the last extent, failing to map subsequent extents. This results in incorrect swap_extent creation where only the first extent is mapped, causing subsequent swap writes to overwrite wrong physical locations (other files' data). Steps to Reproduce 1 Setup a device with F2FS-formatted userdata partition 2 Compile stress-ng from https://github.com/ColinIanKing/stress-ng 3 Run swap stress test: (Android devices) adb shell "cd /data/stressng; ./stress-ng-64 --metrics-brief --timeout 60 --swap 0" Log: 1 Ftrace shows in kernel 6.6, only first extent is mapped during second f2fs_map_blocks call in check_swap_activate(): stress-ng-swap-8990: f2fs_map_blocks: ino=11002, file offset=0, start blkaddr=0x43143, len=0x1 (Only 4KB mapped, not the full swapfile) 2 in kernel 6.1, both extents are correctly mapped: stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=0, start blkaddr=0x13cd4, len=0x1 stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=1, start blkaddr=0x60c84b, len=0xff The problematic code is in check_swap_activate(): if ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec \|\| nr_pblocks % blks_per_sec \|\| !f2fs_valid_pinned_area(sbi, pblock)) { bool last_extent = false; not_aligned++; nr_pblocks = roundup(nr_pblocks, blks_per_sec); if (cur_lblock + nr_pblocks > sis->max) nr_pblocks -= blks_per_sec; /* this extent is last one */ if (!nr_pblocks) { nr_pblocks = last_lblock - cur_lblock; last_extent = true; } ret = f2fs_migrate_blocks(inode, cur_lblock, nr_pblocks); if (ret) { if (ret == -ENOENT) ret = -EINVAL; goto out; } if (!last_extent) goto retry; } When the first extent is unaligned and roundup(nr_pblocks, blks_per_sec) exceeds sis->max, we subtract blks_per_sec resulting in nr_pblocks = 0. The code then incorrectly assumes this is the last extent, sets nr_pblocks = last_lblock - cur_lblock (entire swapfile), and performs migration. After migration, it doesn't retry mapping, so subsequent extents are never processed. " In order to fix this issue, we need to lookup block mapping info after we migrate all blocks in the tail of swapfile. Cc: stable@kernel.org Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Reported-and-tested-by: Xiaolong Guo <guoxiaolong2008@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220951 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: support non-4KB block size without packed_ssa feature	Daeho Jeong
	commit e48e16f3e37fac76e2f0c14c58df2b0398a323b0 upstream. Currently, F2FS requires the packed_ssa feature to be enabled when utilizing non-4KB block sizes (e.g., 16KB). This restriction limits the flexibility of filesystem formatting options. This patch allows F2FS to support non-4KB block sizes even when the packed_ssa feature is disabled. It adjusts the SSA calculation logic to correctly handle summary entries in larger blocks without the packed layout. Cc: stable@kernel.org Fixes: 7ee8bc3942f2 ("f2fs: revert summary entry count from 2048 to 512 in 16kb block support") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix to avoid UAF in f2fs_write_end_io()	Chao Yu
	commit ce2739e482bce8d2c014d76c4531c877f382aa54 upstream. As syzbot reported an use-after-free issue in f2fs_write_end_io(). It is caused by below race condition: loop device umount - worker_thread - loop_process_work - do_req_filebacked - lo_rw_aio - lo_rw_aio_complete - blk_mq_end_request - blk_update_request - f2fs_write_end_io - dec_page_count - folio_end_writeback - kill_f2fs_super - kill_block_super - f2fs_put_super : free(sbi) : get_pages(, F2FS_WB_CP_DATA) accessed sbi which is freed In kill_f2fs_super(), we will drop all page caches of f2fs inodes before call free(sbi), it guarantee that all folios should end its writeback, so it should be safe to access sbi before last folio_end_writeback(). Let's relocate ckpt thread wakeup flow before folio_end_writeback() to resolve this issue. Cc: stable@kernel.org Fixes: e234088758fc ("f2fs: avoid wait if IO end up when do_checkpoint for better performance") Reported-by: syzbot+b4444e3c972a7a124187@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b4444e3c972a7a124187 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix out-of-bounds access in sysfs attribute read/write	Yongpeng Yang
	commit 98ea0039dbfdd00e5cc1b9a8afa40434476c0955 upstream. Some f2fs sysfs attributes suffer from out-of-bounds memory access and incorrect handling of integer values whose size is not 4 bytes. For example: vm:~# echo 65537 > /sys/fs/f2fs/vde/carve_out vm:~# cat /sys/fs/f2fs/vde/carve_out 65537 vm:~# echo 4294967297 > /sys/fs/f2fs/vde/atgc_age_threshold vm:~# cat /sys/fs/f2fs/vde/atgc_age_threshold 1 carve_out maps to {struct f2fs_sb_info}->carve_out, which is a 8-bit integer. However, the sysfs interface allows setting it to a value larger than 255, resulting in an out-of-range update. atgc_age_threshold maps to {struct atgc_management}->age_threshold, which is a 64-bit integer, but its sysfs interface cannot correctly set values larger than UINT_MAX. The root causes are: 1. __sbi_store() treats all default values as unsigned int, which prevents updating integers larger than 4 bytes and causes out-of-bounds writes for integers smaller than 4 bytes. 2. f2fs_sbi_show() also assumes all default values are unsigned int, leading to out-of-bounds reads and incorrect access to integers larger than 4 bytes. This patch introduces {struct f2fs_attr}->size to record the actual size of the integer associated with each sysfs attribute. With this information, sysfs read and write operations can correctly access and update values according to their real data size, avoiding memory corruption and truncation. Fixes: b59d0bae6ca3 ("f2fs: add sysfs support for controlling the gc_thread") Cc: stable@kernel.org Signed-off-by: Jinbao Liu <liujinbao1@xiaomi.com> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix IS_CHECKPOINTED flag inconsistency issue caused by concurrent ↵	Yongpeng Yang
	atomic commit and checkpoint writes commit 7633a7387eb4d0259d6bea945e1d3469cd135bbc upstream. During SPO tests, when mounting F2FS, an -EINVAL error was returned from f2fs_recover_inode_page. The issue occurred under the following scenario Thread A Thread B f2fs_ioc_commit_atomic_write - f2fs_do_sync_file // atomic = true - f2fs_fsync_node_pages : last_folio = inode folio : schedule before folio_lock(last_folio) f2fs_write_checkpoint - block_operations// writeback last_folio - schedule before f2fs_flush_nat_entries : set_fsync_mark(last_folio, 1) : set_dentry_mark(last_folio, 1) : folio_mark_dirty(last_folio) - __write_node_folio(last_folio) : f2fs_down_read(&sbi->node_write)//block - f2fs_flush_nat_entries : {struct nat_entry}->flag \|= BIT(IS_CHECKPOINTED) - unblock_operations : f2fs_up_write(&sbi->node_write) f2fs_write_checkpoint//return : f2fs_do_write_node_page() f2fs_ioc_commit_atomic_write//return SPO Thread A calls f2fs_need_dentry_mark(sbi, ino), and the last_folio has already been written once. However, the {struct nat_entry}->flag did not have the IS_CHECKPOINTED set, causing set_dentry_mark(last_folio, 1) and write last_folio again after Thread B finishes f2fs_write_checkpoint. After SPO and reboot, it was detected that {struct node_info}->blk_addr was not NULL_ADDR because Thread B successfully write the checkpoint. This issue only occurs in atomic write scenarios. For regular file fsync operations, the folio must be dirty. If block_operations->f2fs_sync_node_pages successfully submit the folio write, this path will not be executed. Otherwise, the f2fs_write_checkpoint will need to wait for the folio write submission to complete, as sbi->nr_pages[F2FS_DIRTY_NODES] > 0. Therefore, the situation where f2fs_need_dentry_mark checks that the {struct nat_entry}->flag /wo the IS_CHECKPOINTED flag, but the folio write has already been submitted, will not occur. Therefore, for atomic file fsync, sbi->node_write should be acquired through __write_node_folio to ensure that the IS_CHECKPOINTED flag correctly indicates that the checkpoint write has been completed. Fixes: 608514deba38 ("f2fs: set fsync mark only for the last dnode") Cc: stable@kernel.org Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Signed-off-by: Jinbao Liu <liujinbao1@xiaomi.com> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix to check sysfs filename w/ gc_pin_file_thresh correctly	Chao Yu
	commit 0eda086de85e140f53c6123a4c00662f4e614ee4 upstream. Sysfs entry name is gc_pin_file_thresh instead of gc_pin_file_threshold, fix it. Cc: stable@kernel.org Fixes: c521a6ab4ad7 ("f2fs: fix to limit gc_pin_file_threshold") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-19	f2fs: fix to add gc count stat in f2fs_gc_range	Zhiguo Niu
	commit 761dac9073cd67d4705a94cd1af674945a117f4c upstream. It missed the stat count in f2fs_gc_range. Cc: stable@kernel.org Fixes: 9bf1dcbdfdc8 ("f2fs: fix to account gc stats correctly") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-12-09	Merge tag 'f2fs-for-6.19-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series focuses on minor clean-ups and performance optimizations across sysfs, documentation, debugfs, tracepoints, slab allocation, and GC. Furthermore, it resolves several corner-case bugs caught by xfstests, as well as issues related to 16KB page support and f2fs_enable_checkpoint. Enhancement: - wrap ASCII tables in literal blocks to fix LaTeX build - optimize trace_f2fs_write_checkpoint with enums - support to show curseg.next_blkoff in debugfs - add a sysfs entry to show max open zones - add fadvise tracepoint - use global inline_xattr_slab instead of per-sb slab cache - set default valid_thresh_ratio to 80 for zoned devices - maintain one time GC mode is enabled during whole zoned GC cycle Bug fix: - ensure node page reads complete before f2fs_put_super() finishes - do not account invalid blocks in get_left_section_blocks() - revert summary entry count from 2048 to 512 in 16kb block support - detect recoverable inode during dryrun of find_fsync_dnodes() - fix age extent cache insertion skip on counter overflow - add sanity checks before unlinking and loading inodes - ensure minimum trim granularity accounts for all devices - block cache/dio write during f2fs_enable_checkpoint() - propagate error from f2fs_enable_checkpoint() - invalidate dentry cache on failed whiteout creation - avoid updating compression context during writeback - avoid updating zero-sized extent in extent cache - avoid potential deadlock" * tag 'f2fs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (39 commits) f2fs: ignore discard return value f2fs: optimize trace_f2fs_write_checkpoint with enums f2fs: fix to not account invalid blocks in get_left_section_blocks() f2fs: support to show curseg.next_blkoff in debugfs docs: f2fs: wrap ASCII tables in literal blocks to fix LaTeX build f2fs: expand scalability of f2fs mount option f2fs: change default schedule timeout value f2fs: introduce f2fs_schedule_timeout() f2fs: use memalloc_retry_wait() as much as possible f2fs: add a sysfs entry to show max open zones f2fs: wrap all unusable_blocks_per_sec code in CONFIG_BLK_DEV_ZONED f2fs: simplify list initialization in f2fs_recover_fsync_data() f2fs: revert summary entry count from 2048 to 512 in 16kb block support f2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes() f2fs: fix return value of f2fs_recover_fsync_data() f2fs: add fadvise tracepoint f2fs: fix age extent cache insertion skip on counter overflow f2fs: Add sanity checks before unlinking and loading inodes f2fs: Rename f2fs_unlink exit label f2fs: ensure minimum trim granularity accounts for all devices ...
2025-12-05	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-04	f2fs: ignore discard return value	Chaitanya Kulkarni
	__blkdev_issue_discard() always returns 0, making the error assignment in __submit_discard_cmd() dead code. Initialize err to 0 and remove the error assignment from the __blkdev_issue_discard() call to err. Move fault injection code into already present if branch where err is set to -EIO. This preserves the fault injection behavior while removing dead error handling. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: optimize trace_f2fs_write_checkpoint with enums	YH Lin
	This patch optimizes the tracepoint by replacing these hardcoded strings with a new enumeration f2fs_cp_phase. 1.Defines enum f2fs_cp_phase with values for each checkpoint phase. 2.Updates trace_f2fs_write_checkpoint to accept a u16 phase argument instead of a string pointer. 3.Uses __print_symbolic in TP_printk to convert the enum values back to their corresponding strings for human-readable trace output. This change reduces the storage overhead for each trace event by replacing a variable-length string with a 2-byte integer, while maintaining the same readable output in ftrace. Signed-off-by: YH Lin <yhli@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix to not account invalid blocks in get_left_section_blocks()	Chao Yu
	w/ LFS mode, in get_left_section_blocks(), we should not account the blocks which were used before and now are invalided, otherwise those blocks will be counted as freed one in has_curseg_enough_space(), result in missing to trigger GC in time. Cc: stable@kernel.org Fixes: 249ad438e1d9 ("f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode.") Fixes: bf34c93d2645 ("f2fs: check curseg space before foreground GC") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: support to show curseg.next_blkoff in debugfs	Chao Yu
	cat /sys/kernel/debug/f2fs/status Main area: 17 segs, 17 secs 17 zones TYPE blkoff segno secno zoneno dirty_seg full_seg valid_blk - COLD data: 0 4 4 4 0 0 0 - WARM data: 0 7 7 7 0 0 0 - HOT data: 1 5 5 5 2 0 512 - Dir dnode: 3 0 0 0 1 0 2 - File dnode: 0 1 1 1 0 0 0 - Indir nodes: 0 2 2 2 0 0 0 - Pinned file: 0 -1 -1 -1 - ATGC data: 0 -1 -1 -1 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: expand scalability of f2fs mount option	Chao Yu
	opt field in structure f2fs_mount_info and opt_mask field in structure f2fs_fs_context is 32-bits variable, now we're running out of available bits in them, let's expand them to 64-bits for better scalability. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: change default schedule timeout value	Chao Yu
	This patch changes default schedule timeout value from 20ms to 1ms, in order to give caller more chances to check whether IO or non-IO congestion condition has already been mitigable. In addition, default interval of periodical discard submission is kept to 20ms. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: introduce f2fs_schedule_timeout()	Chao Yu
	In f2fs retry logic, we will call f2fs_io_schedule_timeout() to sleep as uninterruptible state (waiting for IO) for a while, however, in several paths below, we are not blocked by IO: - f2fs_write_single_data_page() return -EAGAIN due to racing on cp_rwsem. - f2fs_flush_device_cache() failed to submit preflush command. - __issue_discard_cmd_range() sleeps periodically in between two in batch discard submissions. So, in order to reveal state of task more accurate, let's introduce f2fs_schedule_timeout() and call it in above paths in where we are waiting for non-IO reasons. Then we can get real reason of uninterruptible sleep for a thread in tracepoint, perfetto, etc. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: use memalloc_retry_wait() as much as possible	Chao Yu
	memalloc_retry_wait() is recommended in memory allocation retry logic, use it as much as possible. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: add a sysfs entry to show max open zones	Yongpeng Yang
	This patch adds a sysfs entry showing the max zones that F2FS can write concurrently. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: wrap all unusable_blocks_per_sec code in CONFIG_BLK_DEV_ZONED	Yongpeng Yang
	The usage of unusable_blocks_per_sec is already wrapped by CONFIG_BLK_DEV_ZONED, except for its declaration and the definitions of CAP_BLKS_PER_SEC and CAP_SEGS_PER_SEC. This patch ensures that all code related to unusable_blocks_per_sec is properly wrapped under the CONFIG_BLK_DEV_ZONED option. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: simplify list initialization in f2fs_recover_fsync_data()	Baolin Liu
	In f2fs_recover_fsync_data(),use LIST_HEAD() to declare and initialize the list_head in one step instead of using INIT_LIST_HEAD() separately. No functional change. Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: revert summary entry count from 2048 to 512 in 16kb block support	Daeho Jeong
	The recent increase in the number of Segment Summary Area (SSA) entries from 512 to 2048 was an unintentional change in logic of 16kb block support. This commit corrects the issue. To better utilize the space available from the erroneous 2048-entry calculation, we are implementing a solution to share the currently unused SSA space with neighboring segments. This enhances overall SSA utilization without impacting the established 8MB segment size. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes()	Chao Yu
	mkfs.f2fs -f /dev/vdd mount /dev/vdd /mnt/f2fs touch /mnt/f2fs/foo sync # avoid CP_UMOUNT_FLAG in last f2fs_checkpoint.ckpt_flags touch /mnt/f2fs/bar f2fs_io fsync /mnt/f2fs/bar f2fs_io shutdown 2 /mnt/f2fs umount /mnt/f2fs blockdev --setro /dev/vdd mount /dev/vdd /mnt/f2fs mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only. For the case if we create and fsync a new inode before sudden power-cut, without norecovery or disable_roll_forward mount option, the following mount will succeed w/o recovering last fsynced inode. The problem here is that we only check inode_list list after find_fsync_dnodes() in f2fs_recover_fsync_data() to find out whether there is recoverable data in the iamge, but there is a missed case, if last fsynced inode is not existing in last checkpoint, then, we will fail to get its inode due to nat of inode node is not existing in last checkpoint, so the inode won't be linked in inode_list. Let's detect such case in dyrun mode to fix this issue. After this change, mount will fail as expected below: mount: /mnt/f2fs: cannot mount /dev/vdd read-only. dmesg(1) may have more information after failed mount system call. demsg: F2FS-fs (vdd): Need to recover fsync data, but write access unavailable, please try mount w/ disable_roll_forward or norecovery Cc: stable@kernel.org Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix return value of f2fs_recover_fsync_data()	Chao Yu
	With below scripts, it will trigger panic in f2fs: mkfs.f2fs -f /dev/vdd mount /dev/vdd /mnt/f2fs touch /mnt/f2fs/foo sync echo 111 >> /mnt/f2fs/foo f2fs_io fsync /mnt/f2fs/foo f2fs_io shutdown 2 /mnt/f2fs umount /mnt/f2fs mount -o ro,norecovery /dev/vdd /mnt/f2fs or mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0 F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f F2FS-fs (vdd): Stopped filesystem due to reason: 0 F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1 Filesystem f2fs get_tree() didn't set fc->root, returned 1 ------------[ cut here ]------------ kernel BUG at fs/super.c:1761! Oops: invalid opcode: 0000 [#1] SMP PTI CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:vfs_get_tree.cold+0x18/0x1a Call Trace: <TASK> fc_mount+0x13/0xa0 path_mount+0x34e/0xc50 __x64_sys_mount+0x121/0x150 do_syscall_64+0x84/0x800 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa6cc126cfe The root cause is we missed to handle error number returned from f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or ro,disable_roll_forward mount option, result in returning a positive error number to vfs_get_tree(), fix it. Cc: stable@kernel.org Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: add fadvise tracepoint	Jaegeuk Kim
	This adds a tracepoint in the fadvise call path. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix age extent cache insertion skip on counter overflow	Xiaole He
	The age extent cache uses last_blocks (derived from allocated_data_blocks) to determine data age. However, there's a conflict between the deletion marker (last_blocks=0) and legitimate last_blocks=0 cases when allocated_data_blocks overflows to 0 after reaching ULLONG_MAX. In this case, valid extents are incorrectly skipped due to the "if (!tei->last_blocks)" check in __update_extent_tree_range(). This patch fixes the issue by: 1. Reserving ULLONG_MAX as an invalid/deletion marker 2. Limiting allocated_data_blocks to range [0, ULLONG_MAX-1] 3. Using F2FS_EXTENT_AGE_INVALID for deletion scenarios 4. Adjusting overflow age calculation from ULLONG_MAX to (ULLONG_MAX-1) Reproducer (using a patched kernel with allocated_data_blocks initialized to ULLONG_MAX - 3 for quick testing): Step 1: Mount and check initial state # dd if=/dev/zero of=/tmp/test.img bs=1M count=100 # mkfs.f2fs -f /tmp/test.img # mkdir -p /mnt/f2fs_test # mount -t f2fs -o loop,age_extent_cache /tmp/test.img /mnt/f2fs_test # cat /sys/kernel/debug/f2fs/status \| grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551612 # ULLONG_MAX - 3 Inner Struct Count: tree: 1(0), node: 0 Step 2: Create files and write data to trigger overflow # touch /mnt/f2fs_test/{1,2,3,4}.txt; sync # cat /sys/kernel/debug/f2fs/status \| grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551613 # ULLONG_MAX - 2 Inner Struct Count: tree: 5(0), node: 1 # dd if=/dev/urandom of=/mnt/f2fs_test/1.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status \| grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551614 # ULLONG_MAX - 1 Inner Struct Count: tree: 5(0), node: 2 # dd if=/dev/urandom of=/mnt/f2fs_test/2.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status \| grep -A 4 "Block Age" Allocated Data Blocks: 18446744073709551615 # ULLONG_MAX Inner Struct Count: tree: 5(0), node: 3 # dd if=/dev/urandom of=/mnt/f2fs_test/3.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status \| grep -A 4 "Block Age" Allocated Data Blocks: 0 # Counter overflowed! Inner Struct Count: tree: 5(0), node: 4 Step 3: Trigger the bug - next write should create node but gets skipped # dd if=/dev/urandom of=/mnt/f2fs_test/4.txt bs=4K count=1; sync # cat /sys/kernel/debug/f2fs/status \| grep -A 4 "Block Age" Allocated Data Blocks: 1 Inner Struct Count: tree: 5(0), node: 4 Expected: node: 5 (new extent node for 4.txt) Actual: node: 4 (extent insertion was incorrectly skipped due to last_blocks = allocated_data_blocks = 0 in __get_new_block_age) After this fix, the extent node is correctly inserted and node count becomes 5 as expected. Fixes: 71644dff4811 ("f2fs: add block_age-based extent cache") Cc: stable@kernel.org Signed-off-by: Xiaole He <hexiaole1994@126.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: Add sanity checks before unlinking and loading inodes	Nikola Z. Ivanov
	Add check for inode->i_nlink == 1 for directories during unlink, as their value is decremented twice, which can trigger a warning in drop_nlink. In such case mark the filesystem as corrupted and return from the function call with the relevant failure return value. Additionally add the check for i_nlink == 1 in sanity_check_inode in order to detect on-disk corruption early. Reported-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c07d47c7bc68f47b9083 Tested-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: Rename f2fs_unlink exit label	Nikola Z. Ivanov
	Rename "fail" label to "out" as it's used as a default exit path out of f2fs_unlink as well as error path. Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: ensure minimum trim granularity accounts for all devices	Yongpeng Yang
	When F2FS uses multiple block devices, each device may have a different discard granularity. The minimum trim granularity must be at least the maximum discard granularity of all devices, excluding zoned devices. Use max_t instead of the max() macro to compute the maximum value. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix uninitialized one_time_gc in victim_sel_policy	Xiaole He
	The one_time_gc field in struct victim_sel_policy is conditionally initialized but unconditionally read, leading to undefined behavior that triggers UBSAN warnings. In f2fs_get_victim() at fs/f2fs/gc.c:774, the victim_sel_policy structure is declared without initialization: struct victim_sel_policy p; The field p.one_time_gc is only assigned when the 'one_time' parameter is true (line 789): if (one_time) { p.one_time_gc = one_time; ... } However, this field is unconditionally read in subsequent get_gc_cost() at line 395: if (p->one_time_gc && (valid_thresh_ratio < 100) && ...) When one_time is false, p.one_time_gc contains uninitialized stack memory. Hence p.one_time_gc is an invalid bool value. UBSAN detects this invalid bool value: UBSAN: invalid-load in fs/f2fs/gc.c:395:7 load of value 77 is not a valid value for type '_Bool' CPU: 3 UID: 0 PID: 1297 Comm: f2fs_gc-252:16 Not tainted 6.18.0-rc3 #5 PREEMPT(voluntary) Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x70/0x90 dump_stack+0x14/0x20 __ubsan_handle_load_invalid_value+0xb3/0xf0 ? dl_server_update+0x2e/0x40 ? update_curr+0x147/0x170 f2fs_get_victim.cold+0x66/0x134 [f2fs] ? sched_balance_newidle+0x2ca/0x470 ? finish_task_switch.isra.0+0x8d/0x2a0 f2fs_gc+0x2ba/0x8e0 [f2fs] ? _raw_spin_unlock_irqrestore+0x12/0x40 ? __timer_delete_sync+0x80/0xe0 ? timer_delete_sync+0x14/0x20 ? schedule_timeout+0x82/0x100 gc_thread_func+0x38b/0x860 [f2fs] ? gc_thread_func+0x38b/0x860 [f2fs] ? __pfx_autoremove_wake_function+0x10/0x10 kthread+0x10b/0x220 ? __pfx_gc_thread_func+0x10/0x10 [f2fs] ? _raw_spin_unlock_irq+0x12/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x11a/0x160 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> This issue is reliably reproducible with the following steps on a 100GB SSD /dev/vdb: mkfs.f2fs -f /dev/vdb mount /dev/vdb /mnt/f2fs_test fio --name=gc --directory=/mnt/f2fs_test --rw=randwrite \ --bs=4k --size=8G --numjobs=12 --fsync=4 --runtime=10 \ --time_based echo 1 > /sys/fs/f2fs/vdb/gc_urgent The uninitialized value causes incorrect GC victim selection, leading to unpredictable garbage collection behavior. Fix by zero-initializing the entire victim_sel_policy structure to ensure all fields have defined values. Fixes: e791d00bd06c ("f2fs: add valid block ratio not to do excessive GC for one time GC") Cc: stable@kernel.org Signed-off-by: Xiaole He <hexiaole1994@126.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: ensure node page reads complete before f2fs_put_super() finishes	Jan Prusakowski
	Xfstests generic/335, generic/336 sometimes crash with the following message: F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1 ------------[ cut here ]------------ kernel BUG at fs/f2fs/super.c:1939! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G W 6.17.0-rc5-xfstests-g9dd1835ecda5 #1 PREEMPT(none) Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:f2fs_put_super+0x3b3/0x3c0 Call Trace: <TASK> generic_shutdown_super+0x7e/0x190 kill_block_super+0x1a/0x40 kill_f2fs_super+0x9d/0x190 deactivate_locked_super+0x30/0xb0 cleanup_mnt+0xba/0x150 task_work_run+0x5c/0xa0 exit_to_user_mode_loop+0xb7/0xc0 do_syscall_64+0x1ae/0x1c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- It appears that sometimes it is possible that f2fs_put_super() is called before all node page reads are completed. Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem. Cc: stable@kernel.org Fixes: 20872584b8c0b ("f2fs: fix to drop all dirty meta/node pages during umount()") Signed-off-by: Jan Prusakowski <jprusakowski@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: block cache/dio write during f2fs_enable_checkpoint()	Chao Yu
	If there are too many background IOs during f2fs_enable_checkpoint(), sync_inodes_sb() may be blocked for long time due to it will loop to write dirty datas which are generated by in parallel write() continuously. Let's change as below to resolve this issue: - hold cp_enable_rwsem write lock to block any cache/dio write - decrease DEF_ENABLE_INTERVAL from 16 to 5 In addition, dump more logs during f2fs_enable_checkpoint(). Testcase: 1. fill data into filesystem until 90% usage. 2. mount -o remount,checkpoint=disable:10% /data 3. fio --rw=randwrite --bs=4kb --size=1GB --numjobs=10 \ --iodepth=64 --ioengine=psync --time_based --runtime=600 \ --directory=/data/fio_dir/ & 4. mount -o remount,checkpoint=enable /data Before: F2FS-fs (dm-51): f2fs_enable_checkpoint() finishes, writeback:7232, sync:39793, cp:457 After: F2FS-fs (dm-51): f2fs_enable_checkpoint end, writeback:5032, lock:0, sync_inode:5552, sync_fs:84 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix to propagate error from f2fs_enable_checkpoint()	Chao Yu
	In order to let userspace detect such error rather than suffering silent failure. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: change the unlock parameter of f2fs_put_page to bool	Yongpeng Yang
	Change the type of the unlock parameter of f2fs_put_page to bool. All callers should consistently pass true or false. No logical change. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: invalidate dentry cache on failed whiteout creation	Deepanshu Kartikey
	F2FS can mount filesystems with corrupted directory depth values that get runtime-clamped to MAX_DIR_HASH_DEPTH. When RENAME_WHITEOUT operations are performed on such directories, f2fs_rename performs directory modifications (updating target entry and deleting source entry) before attempting to add the whiteout entry via f2fs_add_link. If f2fs_add_link fails due to the corrupted directory structure, the function returns an error to VFS, but the partial directory modifications have already been committed to disk. VFS assumes the entire rename operation failed and does not update the dentry cache, leaving stale mappings. In the error path, VFS does not call d_move() to update the dentry cache. This results in new_dentry still pointing to the old inode (new_inode) which has already had its i_nlink decremented to zero. The stale cache causes subsequent operations to incorrectly reference the freed inode. This causes subsequent operations to use cached dentry information that no longer matches the on-disk state. When a second rename targets the same entry, VFS attempts to decrement i_nlink on the stale inode, which may already have i_nlink=0, triggering a WARNING in drop_nlink(). Example sequence: 1. First rename (RENAME_WHITEOUT): file2 → file1 - f2fs updates file1 entry on disk (points to inode 8) - f2fs deletes file2 entry on disk - f2fs_add_link(whiteout) fails (corrupted directory) - Returns error to VFS - VFS does not call d_move() due to error - VFS cache still has: file1 → inode 7 (stale!) - inode 7 has i_nlink=0 (already decremented) 2. Second rename: file3 → file1 - VFS uses stale cache: file1 → inode 7 - Tries to drop_nlink on inode 7 (i_nlink already 0) - WARNING in drop_nlink() Fix this by explicitly invalidating old_dentry and new_dentry when f2fs_add_link fails during whiteout creation. This forces VFS to refresh from disk on subsequent operations, ensuring cache consistency even when the rename partially succeeds. Reproducer: 1. Mount F2FS image with corrupted i_current_depth 2. renameat2(file2, file1, RENAME_WHITEOUT) 3. renameat2(file3, file1, 0) 4. System triggers WARNING in drop_nlink() Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT") Reported-by: syzbot+632cf32276a9a564188d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=632cf32276a9a564188d Suggested-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/all/20251022233349.102728-1-kartikey406@gmail.com/ [v1] Cc: stable@vger.kernel.org Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: use global inline_xattr_slab instead of per-sb slab cache	Chao Yu
	As Hong Yun reported in mailing list: loop7: detected capacity change from 0 to 131072 ------------[ cut here ]------------ kmem_cache of name 'f2fs_xattr_entry-7:7' already exists WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 kmem_cache_sanity_check mm/slab_common.c:109 [inline] WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 __kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307 CPU: 0 UID: 0 PID: 24426 Comm: syz.7.1370 Not tainted 6.17.0-rc4 #1 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:kmem_cache_sanity_check mm/slab_common.c:109 [inline] RIP: 0010:__kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307 Call Trace: __kmem_cache_create include/linux/slab.h:353 [inline] f2fs_kmem_cache_create fs/f2fs/f2fs.h:2943 [inline] f2fs_init_xattr_caches+0xa5/0xe0 fs/f2fs/xattr.c:843 f2fs_fill_super+0x1645/0x2620 fs/f2fs/super.c:4918 get_tree_bdev_flags+0x1fb/0x260 fs/super.c:1692 vfs_get_tree+0x43/0x140 fs/super.c:1815 do_new_mount+0x201/0x550 fs/namespace.c:3808 do_mount fs/namespace.c:4136 [inline] __do_sys_mount fs/namespace.c:4347 [inline] __se_sys_mount+0x298/0x2f0 fs/namespace.c:4324 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x8e/0x3a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e The bug can be reproduced w/ below scripts: - mount /dev/vdb /mnt1 - mount /dev/vdc /mnt2 - umount /mnt1 - mounnt /dev/vdb /mnt1 The reason is if we created two slab caches, named f2fs_xattr_entry-7:3 and f2fs_xattr_entry-7:7, and they have the same slab size. Actually, slab system will only create one slab cache core structure which has slab name of "f2fs_xattr_entry-7:3", and two slab caches share the same structure and cache address. So, if we destroy f2fs_xattr_entry-7:3 cache w/ cache address, it will decrease reference count of slab cache, rather than release slab cache entirely, since there is one more user has referenced the cache. Then, if we try to create slab cache w/ name "f2fs_xattr_entry-7:3" again, slab system will find that there is existed cache which has the same name and trigger the warning. Let's changes to use global inline_xattr_slab instead of per-sb slab cache for fixing. Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups") Cc: stable@kernel.org Reported-by: Hong Yun <yhong@link.cuhk.edu.hk> Tested-by: Hong Yun <yhong@link.cuhk.edu.hk> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix to avoid updating compression context during writeback	Chao Yu
	Bai, Shuangpeng <sjb7183@psu.edu> reported a bug as below: Oops: divide error: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 11441 Comm: syz.0.46 Not tainted 6.17.0 #1 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:f2fs_all_cluster_page_ready+0x106/0x550 fs/f2fs/compress.c:857 Call Trace: <TASK> f2fs_write_cache_pages fs/f2fs/data.c:3078 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3290 [inline] f2fs_write_data_pages+0x1c19/0x3600 fs/f2fs/data.c:3317 do_writepages+0x38e/0x640 mm/page-writeback.c:2634 filemap_fdatawrite_wbc mm/filemap.c:386 [inline] __filemap_fdatawrite_range mm/filemap.c:419 [inline] file_write_and_wait_range+0x2ba/0x3e0 mm/filemap.c:794 f2fs_do_sync_file+0x6e6/0x1b00 fs/f2fs/file.c:294 generic_write_sync include/linux/fs.h:3043 [inline] f2fs_file_write_iter+0x76e/0x2700 fs/f2fs/file.c:5259 new_sync_write fs/read_write.c:593 [inline] vfs_write+0x7e9/0xe00 fs/read_write.c:686 ksys_write+0x19d/0x2d0 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xf7/0x470 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The bug was triggered w/ below race condition: fsync setattr ioctl - f2fs_do_sync_file - file_write_and_wait_range - f2fs_write_cache_pages : inode is non-compressed : cc.cluster_size = F2FS_I(inode)->i_cluster_size = 0 - tag_pages_for_writeback - f2fs_setattr - truncate_setsize - f2fs_truncate - f2fs_fileattr_set - f2fs_setflags_common - set_compress_context : F2FS_I(inode)->i_cluster_size = 4 : set_inode_flag(inode, FI_COMPRESSED_FILE) - f2fs_compressed_file : return true - f2fs_all_cluster_page_ready : "pgidx % cc->cluster_size" trigger dividing 0 issue Let's change as below to fix this issue: - introduce a new atomic type variable .writeback in structure f2fs_inode_info to track the number of threads which calling f2fs_write_cache_pages(). - use .i_sem lock to protect .writeback update. - check .writeback before update compression context in f2fs_setflags_common() to avoid race w/ ->writepages. Fixes: 4c8ff7095bef ("f2fs: support data compression") Cc: stable@kernel.org Reported-by: Bai, Shuangpeng <sjb7183@psu.edu> Tested-by: Bai, Shuangpeng <sjb7183@psu.edu> Closes: https://lore.kernel.org/lkml/44D8F7B3-68AD-425F-9915-65D27591F93F@psu.edu Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix to avoid updating zero-sized extent in extent cache	Chao Yu
	As syzbot reported: F2FS-fs (loop0): __update_extent_tree_range: extent len is zero, type: 0, extent [0, 0, 0], age [0, 0] ------------[ cut here ]------------ kernel BUG at fs/f2fs/extent_cache.c:678! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:__update_extent_tree_range+0x13bc/0x1500 fs/f2fs/extent_cache.c:678 Call Trace: <TASK> f2fs_update_read_extent_cache_range+0x192/0x3e0 fs/f2fs/extent_cache.c:1085 f2fs_do_zero_range fs/f2fs/file.c:1657 [inline] f2fs_zero_range+0x10c1/0x1580 fs/f2fs/file.c:1737 f2fs_fallocate+0x583/0x990 fs/f2fs/file.c:2030 vfs_fallocate+0x669/0x7e0 fs/open.c:342 ioctl_preallocate fs/ioctl.c:289 [inline] file_ioctl+0x611/0x780 fs/ioctl.c:-1 do_vfs_ioctl+0xb33/0x1430 fs/ioctl.c:576 __do_sys_ioctl fs/ioctl.c:595 [inline] __se_sys_ioctl+0x82/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f07bc58eec9 In error path of f2fs_zero_range(), it may add a zero-sized extent into extent cache, it should be avoided. Fixes: 6e9619499f53 ("f2fs: support in batch fzero in dnode page") Cc: stable@kernel.org Reported-by: syzbot+24124df3170c3638b35f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68e5d698.050a0220.256323.0032.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: fix to avoid potential deadlock	Chao Yu
	As Jiaming Zhang and syzbot reported, there is potential deadlock in f2fs as below: Chain exists of: &sbi->cp_rwsem --> fs_reclaim --> sb_internal#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(sb_internal#2); lock(fs_reclaim); lock(sb_internal#2); rlock(&sbi->cp_rwsem); * DEADLOCK * 3 locks held by kswapd0/73: #0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:7015 [inline] #0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x951/0x2800 mm/vmscan.c:7389 #1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_trylock_shared fs/super.c:562 [inline] #1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_cache_scan+0x91/0x4b0 fs/super.c:197 #2: ffff888011840610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x8d9/0x1b60 fs/f2fs/inode.c:890 stack backtrace: CPU: 0 UID: 0 PID: 73 Comm: kswapd0 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120 print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043 check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175 check_prev_add kernel/locking/lockdep.c:3165 [inline] check_prevs_add kernel/locking/lockdep.c:3284 [inline] validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908 __lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237 lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868 down_read+0x46/0x2e0 kernel/locking/rwsem.c:1537 f2fs_down_read fs/f2fs/f2fs.h:2278 [inline] f2fs_lock_op fs/f2fs/f2fs.h:2357 [inline] f2fs_do_truncate_blocks+0x21c/0x10c0 fs/f2fs/file.c:791 f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:867 f2fs_truncate+0x489/0x7c0 fs/f2fs/file.c:925 f2fs_evict_inode+0x9f2/0x1b60 fs/f2fs/inode.c:897 evict+0x504/0x9c0 fs/inode.c:810 f2fs_evict_inode+0x1dc/0x1b60 fs/f2fs/inode.c:853 evict+0x504/0x9c0 fs/inode.c:810 dispose_list fs/inode.c:852 [inline] prune_icache_sb+0x21b/0x2c0 fs/inode.c:1000 super_cache_scan+0x39b/0x4b0 fs/super.c:224 do_shrink_slab+0x6ef/0x1110 mm/shrinker.c:437 shrink_slab_memcg mm/shrinker.c:550 [inline] shrink_slab+0x7ef/0x10d0 mm/shrinker.c:628 shrink_one+0x28a/0x7c0 mm/vmscan.c:4955 shrink_many mm/vmscan.c:5016 [inline] lru_gen_shrink_node mm/vmscan.c:5094 [inline] shrink_node+0x315d/0x3780 mm/vmscan.c:6081 kswapd_shrink_node mm/vmscan.c:6941 [inline] balance_pgdat mm/vmscan.c:7124 [inline] kswapd+0x147c/0x2800 mm/vmscan.c:7389 kthread+0x70e/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> The root cause is deadlock among four locks as below: kswapd - fs_reclaim --- Lock A - shrink_one - evict - f2fs_evict_inode - sb_start_intwrite --- Lock B - iput - evict - f2fs_evict_inode - sb_start_intwrite --- Lock B - f2fs_truncate - f2fs_truncate_blocks - f2fs_do_truncate_blocks - f2fs_lock_op --- Lock C ioctl - f2fs_ioc_commit_atomic_write - f2fs_lock_op --- Lock C - __f2fs_commit_atomic_write - __replace_atomic_write_block - f2fs_get_dnode_of_data - __get_node_folio - f2fs_check_nid_range - f2fs_handle_error - f2fs_record_errors - f2fs_down_write --- Lock D open - do_open - do_truncate - security_inode_need_killpriv - f2fs_getxattr - lookup_all_xattrs - f2fs_handle_error - f2fs_record_errors - f2fs_down_write --- Lock D - f2fs_commit_super - read_mapping_folio - filemap_alloc_folio_noprof - prepare_alloc_pages - fs_reclaim_acquire --- Lock A In order to avoid such deadlock, we need to avoid grabbing sb_lock in f2fs_handle_error(), so, let's use asynchronous method instead: - remove f2fs_handle_error() implementation - rename f2fs_handle_error_async() to f2fs_handle_error() - spread f2fs_handle_error() Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock") Cc: stable@kernel.org Reported-by: syzbot+14b90e1156b9f6fc1266@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68eae49b.050a0220.ac43.0001.GAE@google.com Reported-by: Jiaming Zhang <r772577952@gmail.com> Closes: https://lore.kernel.org/lkml/CANypQFa-Gy9sD-N35o3PC+FystOWkNuN8pv6S75HLT0ga-Tzgw@mail.gmail.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: use f2fs_filemap_get_folio() to support fault injection	Chao Yu
	Use f2fs_filemap_get_folio() instead of __filemap_get_folio() in: - f2fs_find_data_folio - f2fs_write_begin - f2fs_read_merkle_tree_page So that, we can trigger fault injection in those places. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: use f2fs_filemap_get_folio() instead of f2fs_pagecache_get_page()	Chao Yu
	Let's use f2fs_filemap_get_folio() instead of f2fs_pagecache_get_page() in ra_data_block() and move_data_block(), then remove f2fs_pagecache_get_page() since it has no user. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: convert add_ipu_page() to use folio	Chao Yu
	No logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04	f2fs: clean up w/ bio_add_folio_nofail()	Chao Yu
	In add_bio_entry(), adding a page to newly allocated bio should never fail, let's use bio_add_folio_nofail() instead of bio_add_page() & unnecessary error handling for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-01	Merge tag 'vfs-6.19-rc1.fs_header' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fs header updates from Christian Brauner: "This contains initial work to start splitting up fs.h. Begin the long-overdue work of splitting up the monolithic fs.h header. The header has grown to over 3000 lines and includes types and functions for many different subsystems, making it difficult to navigate and causing excessive compilation dependencies. This series introduces new focused headers for superblock-related code: - Rename fs_types.h to fs_dirent.h to better reflect its actual content (directory entry types) - Add fs/super_types.h containing superblock type definitions - Add fs/super.h containing superblock function declarations This is the first step in a longer effort to modularize the VFS headers. Cleanups: - Inode Field Layout Optimization (Mateusz Guzik) Move inode fields used during fast path lookup closer together to improve cache locality during path resolution. - current_umask() Optimization (Mateusz Guzik) Inline current_umask() and move it to fs_struct.h. This improves performance by avoiding function call overhead for this frequently-used function, and places it in a more appropriate header since it operates on fs_struct" * tag 'vfs-6.19-rc1.fs_header' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: move inode fields used during fast path lookup closer together fs: inline current_umask() and move it to fs_struct.h fs: add fs/super.h header fs: add fs/super_types.h header fs: rename fs_types.h to fs_dirent.h
2025-12-01	Merge tag 'vfs-6.19-rc1.folio' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull folio updates from Christian Brauner: "Add a new folio_next_pos() helper function that returns the file position of the first byte after the current folio. This is a common operation in filesystems when needing to know the end of the current folio. The helper is lifted from btrfs which already had its own version, and is now used across multiple filesystems and subsystems: - btrfs - buffer - ext4 - f2fs - gfs2 - iomap - netfs - xfs - mm This fixes a long-standing bug in ocfs2 on 32-bit systems with files larger than 2GiB. Presumably this is not a common configuration, but the fix is backported anyway. The other filesystems did not have bugs, they were just mildly inefficient. This also introduce uoff_t as the unsigned version of loff_t. A recent commit inadvertently changed a comparison from being unsigned (on 64-bit systems) to being signed (which it had always been on 32-bit systems), leading to sporadic fstests failures. Generally file sizes are restricted to being a signed integer, but in places where -1 is passed to indicate "up to the end of the file", it is convenient to have an unsigned type to ensure comparisons are always unsigned regardless of architecture" * tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Add uoff_t mm: Use folio_next_pos() xfs: Use folio_next_pos() netfs: Use folio_next_pos() iomap: Use folio_next_pos() gfs2: Use folio_next_pos() f2fs: Use folio_next_pos() ext4: Use folio_next_pos() buffer: Use folio_next_pos() btrfs: Use folio_next_pos() filemap: Add folio_next_pos()
2025-12-01	Merge tag 'vfs-6.19-rc1.writeback' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull writeback updates from Christian Brauner: "Features: - Allow file systems to increase the minimum writeback chunk size. The relatively low minimal writeback size of 4MiB means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. This adds a superblock field that allows the file system to override the default size, and sets it to the zone size for zoned XFS. - Add logging for slow writeback when it exceeds sysctl_hung_task_timeout_secs. This helps identify tasks waiting for a long time and pinpoint potential issues. Recording the starting jiffies is also useful when debugging a crashed vmcore. - Wake up waiting tasks when finishing the writeback of a chunk Cleanups: - filemap_* writeback interface cleanups. Adding filemap_fdatawrite_wbc ended up being a mistake, as all but the original btrfs caller should be using better high level interfaces instead. This series removes all these low-level interfaces, switches btrfs to a more specific interface, and cleans up other too low-level interfaces. With this the writeback_control that is passed to the writeback code is only initialized in three places. - Remove __filemap_fdatawrite, __filemap_fdatawrite_range, and filemap_fdatawrite_wbc - Add filemap_flush_nr helper for btrfs - Push struct writeback_control into start_delalloc_inodes in btrfs - Rename filemap_fdatawrite_range_kick to filemap_flush_range - Stop opencoding filemap_fdatawrite_range in 9p, ocfs2, and mm - Make wbc_to_tag() inline and use it in fs" * tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Make wbc_to_tag() inline and use it in fs. xfs: set s_min_writeback_pages for zoned file systems writeback: allow the file system to override MIN_WRITEBACK_PAGES writeback: cleanup writeback_chunk_size mm: rename filemap_fdatawrite_range_kick to filemap_flush_range mm: remove __filemap_fdatawrite_range mm: remove filemap_fdatawrite_wbc mm: remove __filemap_fdatawrite mm,btrfs: add a filemap_flush_nr helper btrfs: push struct writeback_control into start_delalloc_inodes btrfs: use the local tmp_inode variable in start_delalloc_inodes ocfs2: don't opencode filemap_fdatawrite_range in ocfs2_journal_submit_inode_data_buffers 9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close mm: don't opencode filemap_fdatawrite_range in filemap_invalidate_inode writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs) writeback: Wake up waiting tasks when finishing the writeback of a chunk.