| Age | Commit message (Collapse) | Author |
|
We could have detected the quick inherit bug more directly if we had
an extra warning about squota hierarchy consistency while modifying the
hierarchy. In squotas, the parent usage always simply adds up to the sum of
its children, so we can just check for that when changing membership and
detect more accounting bugs.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Replace open-coded if/else blocks with the boolean directly and introduce
local const bool variables, making the code shorter and easier to read.
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Replace integer literals 0/1 with true/false when calling
btrfs_inc_ref() and btrfs_dec_ref() to make the code self-documenting
and avoid mixing bool/integer types.
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Drop the obsolete @refs parameter from the comment so the argument list
matches the current function signature after commit f8c4d59de23c9
("btrfs: drop unused parameter refs from visit_node_for_delete()").
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We allocate the bitmap but we never free it in free_raid_bio_pointers().
Fix this by adding a bitmap_free() call against the stripe_uptodate_bitmap
of a raid bio.
Fixes: 1810350b04ef ("btrfs: raid56: move sector_ptr::uptodate into a dedicated bitmap")
Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-btrfs/20260126045315.GA31641@lst.de/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
They can either be removed or replaced with IS_ENABLED().
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
I/O requests beyond the end of the filesystem should be zeroed out,
similar to loopback devices and that is what we expect.
Fixes: ce63cb62d794 ("erofs: support unencoded inodes for fileio")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
The EROFS on-disk format uses a tiny, plain metadata design that
prioritizes performance and minimizes complex inconsistencies against
common writable disk filesystems (almost all serious metadata
inconsistency cannot happen in well-designed immutable filesystems like
EROFS). EROFS deliberately avoids artificial design flaws to eliminate
serious security risks from untrusted remote sources by design,
although human-made implementation bugs can still happen sometimes.
Currently, there is no strict check to prevent compressed inodes,
especially LZ4-compressed inodes, from being read in plain filesystems.
Starting with erofs-utils 1.0 and Linux 5.3, LZ4_0PADDING sb feature
is automatically enabled for LZ4-compressed EROFS images to support
in-place decompression. Furthermore, since Linux 5.4 LTS is no longer
supported, we no longer need to handle ancient LZ4-compressed EROFS
images generated by erofs-utils prior to 1.0.
To formally distinguish different filesystem types for improved
security:
- Use the presence of LZ4_0PADDING or a non-zero
`dsb->u1.lz4_max_distance` as a marker for compressed filesystems
containing LZ4-compressed inodes only;
- For other algorithms, use `dsb->u1.available_compr_algs` bitmap.
Note: LZ4_0PADDING has been supported since Linux 5.4 (the first formal
kernel version), so exposing it via sysfs is no longer necessary and is
now deprecated (but remain it for five more years until 2031):
`dsb->u1` has been strictly non-zero for all EROFS images containing
compressed inodes starting with erofs-utils v1.3 and it is actually
a much better marker for compressed filesystems.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Symlink lengths are now cached in in-memory inodes directly so that
readlink can be sped up.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Look up the fsverity_info once in end_buffer_async_read_io, and then
pass it along to the I/O completion workqueue in
struct postprocess_bh_ctx.
This amortizes the lookup better once it becomes less efficient.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20260202060754.270269-8-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Pass a struct fsverity_info to the verification and readahead helpers,
and push the lookup into the callers. Right now this is a very dumb
almost mechanic move that open codes a lot of fsverity_info_addr() calls
in the file systems. The subsequent patches will clean this up.
This prepares for reducing the number of fsverity_info lookups, which
will allow to amortize them better when using a more expensive lookup
method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Link: https://lore.kernel.org/r/20260202060754.270269-7-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
A lot of file system code expects a non-const inode pointer. Dropping
the const qualifier here allows using the inode pointer in
verify_data_block and prepares for further argument reductions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Link: https://lore.kernel.org/r/20260202060754.270269-6-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Currently all reads of the fsverity hashes are kicked off from the data
I/O completion handler, leading to needlessly dependent I/O. This is
worked around a bit by performing readahead on the level 0 nodes, but
still fairly ineffective.
Switch to a model where the ->read_folio and ->readahead methods instead
kick off explicit readahead of the fsverity hashed so they are usually
available at I/O completion time.
For 64k sequential reads on my test VM this improves read performance
from 2.4GB/s - 2.6GB/s to 3.5GB/s - 3.9GB/s. The improvements for
random reads are likely to be even bigger.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Link: https://lore.kernel.org/r/20260202060754.270269-5-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Keep all the read into pagecache code in a single file.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20260202060754.270269-4-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
A recent change for the range check started triggering a clang warning:
fs/jfs/jfs_dtree.c:2906:31: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
2906 | if (stbl[i] < 0 || stbl[i] >= DTPAGEMAXSLOT) {
| ~~~~~~~ ^ ~~~~~~~~~~~~~
fs/jfs/jfs_dtree.c:3111:30: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
3111 | if (stbl[0] < 0 || stbl[0] >= DTPAGEMAXSLOT) {
| ~~~~~~~ ^ ~~~~~~~~~~~~~
Both the old and the new check were useless, but the previous version
apparently did not lead to the warning.
Remove the extraneous range check for simplicity.
Fixes: cafc6679824a ("jfs: replace hardcoded magic number with DTPAGEMAXSLOT constant")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Require the invalidate_lock to be held over calls to
page_cache_ra_unbounded instead of acquiring it in this function.
This prepares for calling page_cache_ra_unbounded from ->readahead for
fsverity read-ahead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20260202060754.270269-3-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Issuing more reads on errors is not a good idea, especially when the
most common error here is -ENOMEM.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20260202060754.270269-2-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
It is possible to have a task get stuck on waiting on the
NFS_LAYOUT_DRAIN in the following scenario
1. cpu a: waiter test NFS_LAYOUT_DRAIN (1) and plh_outstanding (1)
2. cpu b: atomic_dec_and_test() -> clear bit -> wake up
3. cpu c: sets NFS_LAYOUT_DRAIN again
4. cpu a: calls wait_on_bit() sleeps forever.
To expand on this we have say 2 outstanding pnfs write IO that get
ESTALE which causes both to call pnfs_destroy_layout() and set the
NFS_LAYOUT_DRAIN bit but the 1st one doesn't call the
pnfs_put_layout_hdr() yet (as that would prevent the 2nd ESTALE write
from trying to call pnfs_destroy_layout()). If the 1st ESTALE write
is the one that initially sets the NFS_LAYOUT_DRAIN so that new IO
on this file initiates new LAYOUTGET. Another new write would find
NFS_LAYOUT_DRAIN set and phl_outstanding>0 (step 1) and would
wait_on_bit(). LAYOUTGET completes doing step 2. Now, the 2nd of
ESTALE writes is calling pnfs_destory_layout() and set the
NFS_LAYOUT_DRAIN bit (step 3). Finally, the waiting write wakes up
to check the bit and goes back to sleep.
The problem revolves around the fact that if NFS_LAYOUT_INVALID_STID
was already set, it should not do the work of
pnfs_mark_layout_stateid_invalid(), thus NFS_LAYOUT_DRAIN will not
be set more than once for an invalid layout.
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Fixes: 880265c77ac4 ("pNFS: Avoid a live lock condition in pnfs_update_layout()")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Reproducer:
1. server: supports SMB1, directories are exported read-only
2. client: mount -t cifs -o vers=1.0 //${server_ip}/export /mnt
3. client: dd if=/dev/zero of=/mnt/file bs=512 count=1000 oflag=direct
4. client: umount /mnt
5. client: sleep 1
6. client: modprobe -r cifs
The error message is as follows:
=============================================================================
BUG cifs_small_rq (Not tainted): Objects remaining on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Object 0x00000000d34491e6 @offset=896
Object 0x00000000bde9fab3 @offset=4480
Object 0x00000000104a1f70 @offset=6272
Object 0x0000000092a51bb5 @offset=7616
Object 0x000000006714a7db @offset=13440
...
WARNING: mm/slub.c:1251 at __kmem_cache_shutdown+0x379/0x3f0, CPU#7: modprobe/712
...
Call Trace:
<TASK>
kmem_cache_destroy+0x69/0x160
cifs_destroy_request_bufs+0x39/0x40 [cifs]
cleanup_module+0x43/0xfc0 [cifs]
__se_sys_delete_module+0x1d5/0x300
__x64_sys_delete_module+0x1a/0x30
x64_sys_call+0x2299/0x2ff0
do_syscall_64+0x6e/0x270
entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
kmem_cache_destroy cifs_small_rq: Slab cache still has objects when called from cifs_destroy_request_bufs+0x39/0x40 [cifs]
WARNING: mm/slab_common.c:532 at kmem_cache_destroy+0x142/0x160, CPU#7: modprobe/712
Link: https://lore.kernel.org/linux-cifs/9751f02d-d1df-4265-a7d6-b19761b21834@linux.dev/T/#mf14808c144448b715f711ce5f0477a071f08eaf6
Fixes: 6be09580df5c ("cifs: Make smb1's SendReceive() wrap cifs_send_recv()")
Reported-by: Paulo Alcantara <pc@manguebit.org>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Reproducer:
1. server: directories are exported read-only
2. client: mount -t cifs //${server_ip}/export /mnt
3. client: dd if=/dev/zero of=/mnt/file bs=512 count=1000 oflag=direct
4. client: umount /mnt
5. client: sleep 1
6. client: modprobe -r cifs
The error message is as follows:
=============================================================================
BUG cifs_small_rq (Not tainted): Objects remaining on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
Object 0x00000000d47521be @offset=14336
...
WARNING: mm/slub.c:1251 at __kmem_cache_shutdown+0x34e/0x440, CPU#0: modprobe/1577
...
Call Trace:
<TASK>
kmem_cache_destroy+0x94/0x190
cifs_destroy_request_bufs+0x3e/0x50 [cifs]
cleanup_module+0x4e/0x540 [cifs]
__se_sys_delete_module+0x278/0x400
__x64_sys_delete_module+0x5f/0x70
x64_sys_call+0x2299/0x2ff0
do_syscall_64+0x89/0x350
entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
kmem_cache_destroy cifs_small_rq: Slab cache still has objects when called from cifs_destroy_request_bufs+0x3e/0x50 [cifs]
WARNING: mm/slab_common.c:532 at kmem_cache_destroy+0x16b/0x190, CPU#0: modprobe/1577
Link: https://lore.kernel.org/linux-cifs/9751f02d-d1df-4265-a7d6-b19761b21834@linux.dev/T/#mf14808c144448b715f711ce5f0477a071f08eaf6
Fixes: e255612b5ed9 ("cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES")
Reported-by: Paulo Alcantara <pc@manguebit.org>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Reviewed-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Move the conflicting declaration to the end of the corresponding
structure. Notice that `struct dlm_message` is a flexible
structure, this is a structure that contains a flexible-array
member.
Fix the following warning:
fs/dlm/dlm_internal.h:609:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
Currently it is not possible to distinguish between the case where a
process has already exited and the case where a process is in a
different namespace, as both return -ESRCH.
glibc's pidfd_getpid() procfs-based implementation returns -EREMOTE
in the latter, so that distinguishing the two is possible, as the
fdinfo in procfs will list '0' as the PID in that case:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/pidfd_getpid.c;h=860829cf07da2267484299ccb02861822c0d07b4;hb=HEAD#l121
Change the error code so that the kernel also returns -EREMOTE in
that case.
Fixes: 7477d7dce48a ("pidfs: allow to retrieve exit information")
Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
Link: https://patch.msgid.link/20260127225209.2293342-1-luca.boccassi@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
commit c06c303832ec ("ocfs2: fix xattr array entry __counted_by error")
doesn't handle all cases and the cleanup job for preserved xattr entries
still has bug:
- the 'last' pointer should be shifted by one unit after cleanup
an array entry.
- current code logic doesn't cleanup the first entry when xh_count is 1.
Note, commit c06c303832ec is also a bug fix for 0fe9b66c65f3.
Link: https://lkml.kernel.org/r/20251210015725.8409-2-heming.zhao@suse.com
Fixes: 0fe9b66c65f3 ("ocfs2: Add preserve to reflink.")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fix from Ard Biesheuvel:
- Fix regression in efivarfs error propagation
* tag 'efi-fixes-for-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efivarfs: fix error propagation in efivar_entry_get()
|
|
This patch introduces two new tracepoints for debug purpose.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If userspace thread has held f2fs rw semaphore, due to its low priority,
it could be runnable or preempted state for long time, during the time,
it will block high priority thread which is trying to grab the same rw
semaphore, e.g. cp_rwsem, io_rwsem...
To fix such issue, let's detect thread's priority when it tries to grab
f2fs_rwsem lock, if the priority is lower than a priority threshold, let's
uplift the priority before it enters into critical region of lock, and
restore the priority after it leaves from critical region.
Meanwhile, introducing two new sysfs nodes:
- /sys/fs/f2fs/<disk>/adjust_lock_priority, it is used to control whether
the functionality is enable or not.
========== ==================
Flag_Value Flag_Description
========== ==================
0x00000000 Disabled (default)
0x00000001 cp_rwsem
0x00000002 node_change
0x00000004 node_write
0x00000008 gc_lock
0x00000010 cp_global
0x00000020 io_rwsem
========== ==================
- /sys/fs/f2fs/<disk>/lock_duration_priority, it is used to control
priority threshold.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When overwriting already allocated blocks, f2fs_iomap_begin() calls
f2fs_overwrite_io() to check block mappings. However,
f2fs_overwrite_io() iterates through all mapped blocks in the range,
which can be inefficient for fragmented files with large I/O requests.
This patch optimizes f2fs_overwrite_io() by adding a 'check_first'
parameter and introducing __f2fs_overwrite_io() helper. When called from
f2fs_iomap_begin(), we only check the first mapping to determine if the
range is already allocated, which is sufficient for setting
map.m_may_create.
This optimization significantly reduces the number of f2fs_map_blocks()
calls in f2fs_overwrite_io() when called from f2fs_iomap_begin(),
especially for fragmented files with large I/O requests.
Cc: stable@kernel.org
Fixes: 351bc761338d ("f2fs: optimize f2fs DIO overwrites")
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Rework this code that was totally busted at least as of my most
recent changes. Introduce a separate list for delayed delegations
so that they can't get lost and don't clutter up the returns list.
Add a missing spin_unlock in the helper marking it as a regular
pending return.
Fixes: 0ebe655bd033 ("NFS: add a separate delegation return list")
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Drop the pointless delegation->lock held over setting multiple
atomic bits in different structures, and use separate labels
for the delay vs abort cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
This will allow to simplify the error handling flow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
All callers now pass a non-NULL delegation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Replace the integer used as boolean with a bool type, and tidy up
the prototype and top of function comment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
The caller doesn't check the return value, so drop it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
None of the callers checks the return value, so drop it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
In a previous commit, a bug was introduced where compact SSA summaries
failed to utilize the entire block space in non-4KB block size
configurations, leading to inefficient space management.
This patch fixes the calculation logic to ensure that compact SSA
summaries can fully occupy the block regardless of the block size.
Reported-by: Chris Mason <clm@meta.com>
Fixes: e48e16f3e37f ("f2fs: support non-4KB block size without packed_ssa feature")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Compiling the NFSv4 module without any minorversion support doesn't make
much sense, so this patch sets NFS v4.1 as the default, always enabled
NFS version allowing us to replace all the CONFIG_NFS_V4_1s scattered
throughout the code with CONFIG_NFS_V4.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
I introduce NFS4_MIN_MINOR_VERSION as a parallel to
NFS4_MAX_MINOR_VERSION to check if NFS v4.0 has been compiled in and
return an appropriate error if not.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
At the same time, I move the NFS v4.0 functions into nfs40proc.c to keep
v4.0 features together in their own files.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
No functional change in this patch. This just makes the next patch where
I introduce "sequence slot operations" simpler.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
They don't need to be visible outside of nfs40proc.c anymore now that
the minor version ops have been moved over.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
This is the first step in extracting NFS v4.0 into its own set of files
that can be disabled through Kconfig.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
The inline data buffer head (dibh) is being released prematurely in
gfs2_iomap_begin() via release_metapath() while iomap->inline_data
still points to dibh->b_data. This causes a use-after-free when
iomap_write_end_inline() later attempts to write to the inline data
area.
The bug sequence:
1. gfs2_iomap_begin() calls gfs2_meta_inode_buffer() to read inode
metadata into dibh
2. Sets iomap->inline_data = dibh->b_data + sizeof(struct gfs2_dinode)
3. Calls release_metapath() which calls brelse(dibh), dropping refcount
to 0
4. kswapd reclaims the page (~39ms later in the syzbot report)
5. iomap_write_end_inline() tries to memcpy() to iomap->inline_data
6. KASAN detects use-after-free write to freed memory
Fix by storing dibh in iomap->private and incrementing its refcount
with get_bh() in gfs2_iomap_begin(). The buffer is then properly
released in gfs2_iomap_end() after the inline write completes,
ensuring the page stays alive for the entire iomap operation.
Note: A C reproducer is not available for this issue. The fix is based
on analysis of the KASAN report and code review showing the buffer head
is freed before use.
[agruenba: Take buffer head reference in gfs2_iomap_begin() to avoid
leaks in gfs2_iomap_get() and gfs2_iomap_alloc().]
Reported-by: syzbot+ea1cd4aa4d1e98458a55@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ea1cd4aa4d1e98458a55
Fixes: d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|