| Age | Commit message (Collapse) | Author |
|
Remove redundant netif_napi_del() call from disconnect path.
A WARN may be triggered in __netif_napi_del_locked() during USB device
disconnect:
WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350
This happens because netif_napi_del() is called in the disconnect path while
NAPI is still enabled. However, it is not necessary to call netif_napi_del()
explicitly, since unregister_netdev() will handle NAPI teardown automatically
and safely. Removing the redundant call avoids triggering the warning.
Full trace:
lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV
lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV
lan78xx 1-1:1.0 enu1: Link is Down
lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV
------------[ cut here ]------------
WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350
Modules linked in: flexcan can_dev fuse
CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT
Hardware name: SKOV IMX8MP CPU revC - bd500 (DT)
Workqueue: usb_hub_wq hub_event
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __netif_napi_del_locked+0x2b4/0x350
lr : __netif_napi_del_locked+0x7c/0x350
sp : ffffffc085b673c0
x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8
x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb
x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000
x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000
x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028
x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8
x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000
x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001
x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000
Call trace:
__netif_napi_del_locked+0x2b4/0x350 (P)
lan78xx_disconnect+0xf4/0x360
usb_unbind_interface+0x158/0x718
device_remove+0x100/0x150
device_release_driver_internal+0x308/0x478
device_release_driver+0x1c/0x30
bus_remove_device+0x1a8/0x368
device_del+0x2e0/0x7b0
usb_disable_device+0x244/0x540
usb_disconnect+0x220/0x758
hub_event+0x105c/0x35e0
process_one_work+0x760/0x17b0
worker_thread+0x768/0xce8
kthread+0x3bc/0x690
ret_from_fork+0x10/0x20
irq event stamp: 211604
hardirqs last enabled at (211603): [<ffffffc0828cc9ec>] _raw_spin_unlock_irqrestore+0x84/0x98
hardirqs last disabled at (211604): [<ffffffc0828a9a84>] el1_dbg+0x24/0x80
softirqs last enabled at (211296): [<ffffffc080095f10>] handle_softirqs+0x820/0xbc8
softirqs last disabled at (210993): [<ffffffc080010288>] __do_softirq+0x18/0x20
---[ end trace 0000000000000000 ]---
lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0
Fixes: e110bc825897 ("net: usb: lan78xx: Convert to PHYLINK for improved PHY and MAC management")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-5-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not configure Latency Tolerance Messaging (LTM) on USB 2.0 hardware.
The LAN7850 is a High-Speed (USB 2.0) only device and does not support
SuperSpeed features like LTM. Currently, the driver unconditionally
attempts to configure LTM registers during initialization. On the
LAN7850, these registers do not exist, resulting in writes to invalid
or undocumented memory space.
This issue was identified during a port to the regmap API with strict
register validation enabled. While no functional issues or crashes have
been observed from these invalid writes, bypassing LTM initialization
on the LAN7850 ensures the driver strictly adheres to the hardware's
valid register map.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Account for hardware auto-padding in TX byte counters to reflect actual
wire traffic.
The LAN7850 hardware automatically pads undersized frames to the minimum
Ethernet frame length (ETH_ZLEN, 60 bytes). However, the driver tracks
the network statistics based on the unpadded socket buffer length. This
results in the tx_bytes counter under-reporting the actual physical
bytes placed on the Ethernet wire for small packets (like short ARP or
ICMP requests).
Use max_t() to ensure the transmission statistics accurately account for
the hardware-generated padding.
Fixes: d383216a7efe ("lan78xx: Introduce Tx URB processing improvements")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do not drop packets with checksum errors at the USB driver level;
pass them to the network stack.
Previously, the driver dropped all packets where the 'Receive Error
Detected' (RED) bit was set, regardless of the specific error type. This
caused packets with only IP or TCP/UDP checksum errors to be dropped
before reaching the kernel, preventing the network stack from accounting
for them or performing software fallback.
Add a mask for hard hardware errors to safely drop genuinely corrupt
frames, while allowing checksum-errored frames to pass with their
ip_summed field explicitly set to CHECKSUM_NONE.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20260305143429.530909-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
include/net/tc_wrapper.h changes should be reviewed by TC maintainers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260307120607.3504191-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When removing a nexthop from a group, remove_nh_grp_entry() publishes
the new group via rcu_assign_pointer() then immediately frees the
removed entry's percpu stats with free_percpu(). However, the
synchronize_net() grace period in the caller remove_nexthop_from_groups()
runs after the free. RCU readers that entered before the publish still
see the old group and can dereference the freed stats via
nh_grp_entry_stats_inc() -> get_cpu_ptr(nhge->stats), causing a
use-after-free on percpu memory.
Fix by deferring the free_percpu() until after synchronize_net() in the
caller. Removed entries are chained via nh_list onto a local deferred
free list. After the grace period completes and all RCU readers have
finished, the percpu stats are safely freed.
Fixes: f4676ea74b85 ("net: nexthop: Add nexthop group entry stats")
Cc: stable@vger.kernel.org
Signed-off-by: Mehul Rao <mehulrao@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260306233821.196789-1-mehulrao@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A reproducer triggers a KASAN slab-use-after-free in pty_write_room()
when caif_serial's TX path calls tty_write_room(). The faulting access
is on tty->link->port.
Hold an extra kref on tty->link for the lifetime of the caif_serial line
discipline: get it in ldisc_open() and drop it in ser_release(), and
also drop it on the ldisc_open() error path.
With this change applied, the reproducer no longer triggers the UAF in
my testing.
Link: https://gist.github.com/shuangpengbai/c898debad6bdf170a84be7e6b3d8707f
Link: https://lore.kernel.org/netdev/20260301220525.1546355-1-shuangpeng.kernel@gmail.com
Fixes: e31d5a05948e ("caif: tty's are kref objects so take a reference")
Signed-off-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://patch.msgid.link/20260306034006.3395740-1-shuangpeng.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With the current sfp_fixup_ignore_tx_fault() fixup we ignore the TX_FAULT
signal, but we also need to apply sfp_fixup_ignore_los() in order to be
able to communicate with the module even if the fiber isn't connected for
configuration purposes.
This is needed for all the MA5671a firmwares, excluding the FS modded
firmware.
Fixes: 2069624dac19 ("net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT")
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260306125139.213637-1-noltari@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
graph_util_is_ports0() identifies DPCM front-end (ports@0) vs back-end
(ports@1) by calling of_get_child_by_name() to find the first "ports"
child and comparing pointers. This relies on child iteration order
matching DTS source order.
When the DPCM topology comes from a DT overlay, __of_attach_node()
inserts new children at the head of the sibling list, reversing the
order. of_get_child_by_name() then returns ports@1 instead of ports@0,
causing all front-end links to be classified as back-ends. The card
registers with no PCM devices.
Fix this by matching the unit address directly from the node name
instead of relying on sibling order.
Fixes: 92939252458f ("ASoC: simple-card-utils: add asoc_graph_is_ports0()")
Signed-off-by: Sen Wang <sen@ti.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/20260309042109.2576612-1-sen@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Merge series from "Sheetal ." <sheetal@nvidia.com>:
Add Tegra238 sound card support in the Tegra audio graph card driver,
as Tegra238 requires different PLLA and PLLA_OUT0 clock rates compared
to other Tegra platforms.
|
|
AppArmor was putting the reference to i_private data on its end after
removing the original entry from the file system. However the inode
can aand does live beyond that point and it is possible that some of
the fs call back functions will be invoked after the reference has
been put, which results in a race between freeing the data and
accessing it through the fs.
While the rawdata/loaddata is the most likely candidate to fail the
race, as it has the fewest references. If properly crafted it might be
possible to trigger a race for the other types stored in i_private.
Fix this by moving the put of i_private referenced data to the correct
place which is during inode eviction.
Fixes: c961ee5f21b20 ("apparmor: convert from securityfs to apparmorfs for policy ns files")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Maxime Bélair <maxime.belair@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
There is a race condition that leads to a use-after-free situation:
because the rawdata inodes are not refcounted, an attacker can start
open()ing one of the rawdata files, and at the same time remove the
last reference to this rawdata (by removing the corresponding profile,
for example), which frees its struct aa_loaddata; as a result, when
seq_rawdata_open() is reached, i_private is a dangling pointer and
freed memory is accessed.
The rawdata inodes weren't refcounted to avoid a circular refcount and
were supposed to be held by the profile rawdata reference. However
during profile removal there is a window where the vfs and profile
destruction race, resulting in the use after free.
Fix this by moving to a double refcount scheme. Where the profile
refcount on rawdata is used to break the circular dependency. Allowing
for freeing of the rawdata once all inode references to the rawdata
are put.
Fixes: 5d5182cae401 ("apparmor: move to per loaddata files, instead of replicating in profiles")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Maxime Bélair <maxime.belair@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Differential encoding allows loops to be created if it is abused. To
prevent this the unpack should verify that a diff-encode chain
terminates.
Unfortunately the differential encode verification had two bugs.
1. it conflated states that had gone through check and already been
marked, with states that were currently being checked and marked.
This means that loops in the current chain being verified are treated
as a chain that has already been verified.
2. the order bailout on already checked states compared current chain
check iterators j,k instead of using the outer loop iterator i.
Meaning a step backwards in states in the current chain verification
was being mistaken for moving to an already verified state.
Move to a double mark scheme where already verified states get a
different mark, than the current chain being kept. This enables us
to also drop the backwards verification check that was the cause of
the second error as any already verified state is already marked.
Fixes: 031dcc8f4e84 ("apparmor: dfa add support for state differential encoding")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
An unprivileged local user can load, replace, and remove profiles by
opening the apparmorfs interfaces, via a confused deputy attack, by
passing the opened fd to a privileged process, and getting the
privileged process to write to the interface.
This does require a privileged target that can be manipulated to do
the write for the unprivileged process, but once such access is
achieved full policy management is possible and all the possible
implications that implies: removing confinement, DoS of system or
target applications by denying all execution, by-passing the
unprivileged user namespace restriction, to exploiting kernel bugs for
a local privilege escalation.
The policy management interface can not have its permissions simply
changed from 0666 to 0600 because non-root processes need to be able
to load policy to different policy namespaces.
Instead ensure the task writing the interface has privileges that
are a subset of the task that opened the interface. This is already
done via policy for confined processes, but unconfined can delegate
access to the opened fd, by-passing the usual policy check.
Fixes: b7fd2c0340eac ("apparmor: add per policy ns .load, .replace, .remove interface files")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
if ns_name is NULL after
1071 error = aa_unpack(udata, &lh, &ns_name);
and if ent->ns_name contains an ns_name in
1089 } else if (ent->ns_name) {
then ns_name is assigned the ent->ns_name
1095 ns_name = ent->ns_name;
however ent->ns_name is freed at
1262 aa_load_ent_free(ent);
and then again when freeing ns_name at
1270 kfree(ns_name);
Fix this by NULLing out ent->ns_name after it is transferred to ns_name
Fixes: 145a0ef21c8e9 ("apparmor: fix blob compression when ns is forced on a policy load
")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The verify_dfa() function only checks DEFAULT_TABLE bounds when the state
is not differentially encoded.
When the verification loop traverses the differential encoding chain,
it reads k = DEFAULT_TABLE[j] and uses k as an array index without
validation. A malformed DFA with DEFAULT_TABLE[j] >= state_count,
therefore, causes both out-of-bounds reads and writes.
[ 57.179855] ==================================================================
[ 57.180549] BUG: KASAN: slab-out-of-bounds in verify_dfa+0x59a/0x660
[ 57.180904] Read of size 4 at addr ffff888100eadec4 by task su/993
[ 57.181554] CPU: 1 UID: 0 PID: 993 Comm: su Not tainted 6.19.0-rc7-next-20260127 #1 PREEMPT(lazy)
[ 57.181558] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 57.181563] Call Trace:
[ 57.181572] <TASK>
[ 57.181577] dump_stack_lvl+0x5e/0x80
[ 57.181596] print_report+0xc8/0x270
[ 57.181605] ? verify_dfa+0x59a/0x660
[ 57.181608] kasan_report+0x118/0x150
[ 57.181620] ? verify_dfa+0x59a/0x660
[ 57.181623] verify_dfa+0x59a/0x660
[ 57.181627] aa_dfa_unpack+0x1610/0x1740
[ 57.181629] ? __kmalloc_cache_noprof+0x1d0/0x470
[ 57.181640] unpack_pdb+0x86d/0x46b0
[ 57.181647] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181653] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181656] ? aa_unpack_nameX+0x1a8/0x300
[ 57.181659] aa_unpack+0x20b0/0x4c30
[ 57.181662] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181664] ? stack_depot_save_flags+0x33/0x700
[ 57.181681] ? kasan_save_track+0x4f/0x80
[ 57.181683] ? kasan_save_track+0x3e/0x80
[ 57.181686] ? __kasan_kmalloc+0x93/0xb0
[ 57.181688] ? __kvmalloc_node_noprof+0x44a/0x780
[ 57.181693] ? aa_simple_write_to_buffer+0x54/0x130
[ 57.181697] ? policy_update+0x154/0x330
[ 57.181704] aa_replace_profiles+0x15a/0x1dd0
[ 57.181707] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181710] ? __kvmalloc_node_noprof+0x44a/0x780
[ 57.181712] ? aa_loaddata_alloc+0x77/0x140
[ 57.181715] ? srso_alias_return_thunk+0x5/0xfbef5
[ 57.181717] ? _copy_from_user+0x2a/0x70
[ 57.181730] policy_update+0x17a/0x330
[ 57.181733] profile_replace+0x153/0x1a0
[ 57.181735] ? rw_verify_area+0x93/0x2d0
[ 57.181740] vfs_write+0x235/0xab0
[ 57.181745] ksys_write+0xb0/0x170
[ 57.181748] do_syscall_64+0x8e/0x660
[ 57.181762] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 57.181765] RIP: 0033:0x7f6192792eb2
Remove the MATCH_FLAG_DIFF_ENCODE condition to validate all DEFAULT_TABLE
entries unconditionally.
Fixes: 031dcc8f4e84 ("apparmor: dfa add support for state differential encoding")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The match_char() macro evaluates its character parameter multiple
times when traversing differential encoding chains. When invoked
with *str++, the string pointer advances on each iteration of the
inner do-while loop, causing the DFA to check different characters
at each iteration and therefore skip input characters.
This results in out-of-bounds reads when the pointer advances past
the input buffer boundary.
[ 94.984676] ==================================================================
[ 94.985301] BUG: KASAN: slab-out-of-bounds in aa_dfa_match+0x5ae/0x760
[ 94.985655] Read of size 1 at addr ffff888100342000 by task file/976
[ 94.986319] CPU: 7 UID: 1000 PID: 976 Comm: file Not tainted 6.19.0-rc7-next-20260127 #1 PREEMPT(lazy)
[ 94.986322] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 94.986329] Call Trace:
[ 94.986341] <TASK>
[ 94.986347] dump_stack_lvl+0x5e/0x80
[ 94.986374] print_report+0xc8/0x270
[ 94.986384] ? aa_dfa_match+0x5ae/0x760
[ 94.986388] kasan_report+0x118/0x150
[ 94.986401] ? aa_dfa_match+0x5ae/0x760
[ 94.986405] aa_dfa_match+0x5ae/0x760
[ 94.986408] __aa_path_perm+0x131/0x400
[ 94.986418] aa_path_perm+0x219/0x2f0
[ 94.986424] apparmor_file_open+0x345/0x570
[ 94.986431] security_file_open+0x5c/0x140
[ 94.986442] do_dentry_open+0x2f6/0x1120
[ 94.986450] vfs_open+0x38/0x2b0
[ 94.986453] ? may_open+0x1e2/0x2b0
[ 94.986466] path_openat+0x231b/0x2b30
[ 94.986469] ? __x64_sys_openat+0xf8/0x130
[ 94.986477] do_file_open+0x19d/0x360
[ 94.986487] do_sys_openat2+0x98/0x100
[ 94.986491] __x64_sys_openat+0xf8/0x130
[ 94.986499] do_syscall_64+0x8e/0x660
[ 94.986515] ? count_memcg_events+0x15f/0x3c0
[ 94.986526] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986540] ? handle_mm_fault+0x1639/0x1ef0
[ 94.986551] ? vma_start_read+0xf0/0x320
[ 94.986558] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986561] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986563] ? fpregs_assert_state_consistent+0x50/0xe0
[ 94.986572] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986574] ? arch_exit_to_user_mode_prepare+0x9/0xb0
[ 94.986587] ? srso_alias_return_thunk+0x5/0xfbef5
[ 94.986588] ? irqentry_exit+0x3c/0x590
[ 94.986595] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 94.986597] RIP: 0033:0x7fda4a79c3ea
Fix by extracting the character value before invoking match_char,
ensuring single evaluation per outer loop.
Fixes: 074c1cd798cb ("apparmor: dfa move character match into a macro")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Currently the number of policy namespaces is not bounded relying on
the user namespace limit. However policy namespaces aren't strictly
tied to user namespaces and it is possible to create them and nest
them arbitrarily deep which can be used to exhaust system resource.
Hard cap policy namespaces to the same depth as user namespaces.
Fixes: c88d4c7b049e8 ("AppArmor: core policy routines")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Reviewed-by: Ryan Lee <ryan.lee@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The profile removal code uses recursion when removing nested profiles,
which can lead to kernel stack exhaustion and system crashes.
Reproducer:
$ pf='a'; for ((i=0; i<1024; i++)); do
echo -e "profile $pf { \n }" | apparmor_parser -K -a;
pf="$pf//x";
done
$ echo -n a > /sys/kernel/security/apparmor/.remove
Replace the recursive __aa_profile_list_release() approach with an
iterative approach in __remove_profile(). The function repeatedly
finds and removes leaf profiles until the entire subtree is removed,
maintaining the same removal semantic without recursion.
Fixes: c88d4c7b049e ("AppArmor: core policy routines")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The function sets `*ns = NULL` on every call, leaking the namespace
string allocated in previous iterations when multiple profiles are
unpacked. This also breaks namespace consistency checking since *ns
is always NULL when the comparison is made.
Remove the incorrect assignment.
The caller (aa_unpack) initializes *ns to NULL once before the loop,
which is sufficient.
Fixes: dd51c8485763 ("apparmor: provide base for multiple profiles to be replaced at once")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Start states are read from untrusted data and used as indexes into the
DFA state tables. The aa_dfa_next() function call in unpack_pdb() will
access dfa->tables[YYTD_ID_BASE][start], and if the start state exceeds
the number of states in the DFA, this results in an out-of-bound read.
==================================================================
BUG: KASAN: slab-out-of-bounds in aa_dfa_next+0x2a1/0x360
Read of size 4 at addr ffff88811956fb90 by task su/1097
...
Reject policies with out-of-bounds start states during unpacking
to prevent the issue.
Fixes: ad5ff3db53c6 ("AppArmor: Add ability to load extended policy")
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Reviewed-by: Cengiz Can <cengiz.can@canonical.com>
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
When a sound card is unbound while a PCM stream is open, a
use-after-free can occur in snd_soc_dapm_stream_event(), called from
the close_delayed_work workqueue handler.
During unbind, snd_soc_unbind_card() flushes delayed work and then
calls soc_cleanup_card_resources(). Inside cleanup,
snd_card_disconnect_sync() releases all PCM file descriptors, and
the resulting PCM close path can call snd_soc_dapm_stream_stop()
which schedules new delayed work with a pmdown_time timer delay.
Since this happens after the flush in snd_soc_unbind_card(), the
new work is not caught. soc_remove_link_components() then frees
DAPM widgets before this work fires, leading to the use-after-free.
The existing flush in soc_free_pcm_runtime() also cannot help as it
runs after soc_remove_link_components() has already freed the widgets.
Add a flush in soc_cleanup_card_resources() after
snd_card_disconnect_sync() (after which no new PCM closes can
schedule further delayed work) and before soc_remove_link_dais()
and soc_remove_link_components() (which tear down the structures the
delayed work accesses).
Fixes: e894efef9ac7 ("ASoC: core: add support to card rebind")
Signed-off-by: Matteo Cotifava <cotifavamatteo@gmail.com>
Link: https://patch.msgid.link/20260309215412.545628-3-cotifavamatteo@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The delayed_work_pending() check before flush_delayed_work() in
soc_free_pcm_runtime() is unnecessary and racy. flush_delayed_work()
is safe to call unconditionally - it is a no-op when no work is
pending. Remove the check.
The original check was added by commit 9c9b65203492 ("ASoC: core:
only flush inited work during free") but delayed_work_pending()
followed by flush_delayed_work() has a time-of-check/time-of-use
window where work can become pending between the two calls.
Fixes: 9c9b65203492 ("ASoC: core: only flush inited work during free")
Signed-off-by: Matteo Cotifava <cotifavamatteo@gmail.com>
Link: https://patch.msgid.link/20260309215412.545628-2-cotifavamatteo@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Dual LVDS output (available on the SN65DSI84) requires HSYNC_PULSE_WIDTH
and HORIZONTAL_BACK_PORCH to be divided by two with respect to the values
used for single LVDS output.
While not clearly stated in the datasheet, this is needed according to the
DSI Tuner [0] output. It also makes sense intuitively because in dual LVDS
output two pixels at a time are output and so the output clock is half of
the pixel clock.
Some dual-LVDS panels refuse to show any picture without this fix.
Divide by two HORIZONTAL_FRONT_PORCH too, even though this register is used
only for test pattern generation which is not currently implemented by this
driver.
[0] https://www.ti.com/tool/DSI-TUNER
Fixes: ceb515ba29ba ("drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver")
Cc: stable@vger.kernel.org
Reviewed-by: Marek Vasut <marek.vasut@mailbox.org>
Link: https://patch.msgid.link/20260226-ti-sn65dsi83-dual-lvds-fixes-and-test-pattern-v1-2-2e15f5a9a6a0@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
|
|
The DSI frequency must be in the range:
(CHA_DSI_CLK_RANGE * 5 MHz) <= DSI freq < ((CHA_DSI_CLK_RANGE + 1) * 5 MHz)
So the register value should point to the lower range value, but
DIV_ROUND_UP() rounds the division to the higher range value, resulting in
an excess of 1 (unless the frequency is an exact multiple of 5 MHz).
For example for a 437100000 MHz clock CHA_DSI_CLK_RANGE should be 87 (0x57):
(87 * 5 = 435) <= 437.1 < (88 * 5 = 440)
but current code returns 88 (0x58).
Fix the computation by removing the DIV_ROUND_UP().
Fixes: ceb515ba29ba ("drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver")
Cc: stable@vger.kernel.org
Reviewed-by: Marek Vasut <marek.vasut@mailbox.org>
Link: https://patch.msgid.link/20260226-ti-sn65dsi83-dual-lvds-fixes-and-test-pattern-v1-1-2e15f5a9a6a0@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
|
|
aesbs_setkey() and aesbs_cbc_ctr_setkey() allocate struct crypto_aes_ctx
on the stack. On arm64, the kernel-mode NEON context is also stored on
the stack, causing the combined frame size to exceed 1024 bytes and
triggering -Wframe-larger-than= warnings.
Allocate struct crypto_aes_ctx on the heap instead and use
kfree_sensitive() to ensure the key material is zeroed on free.
Use a goto-based cleanup path to ensure kfree_sensitive() is always
called.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Fixes: 4fa617cc6851 ("arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack")
Link: https://lore.kernel.org/r/20260306064254.2079274-1-yphbchou0911@gmail.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Fix a warning for:
$ ./scripts/kconfig/merge_config.sh .config extra.config
Using .config as base
Merging extra.config
./scripts/kconfig/merge_config.sh: 384: [: false: unexpected operator
The shellcheck report is also attached:
if [ "$STRICT" == "true" ] && [ "$STRICT_MODE_VIOLATED" == "true" ]; then
^-- SC3014 (warning): In POSIX sh, == in place of = is undefined.
^-- SC3014 (warning): In POSIX sh, == in place of = is undefined.
Fixes: dfc97e1c5da5 ("scripts: kconfig: merge_config.sh: use awk in checks too")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: Mikko Rapeli <mikko.rapeli@linaro.org>
Link: https://patch.msgid.link/20260309121505.40454-1-o451686892@gmail.com
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
|
|
The makefile tries to delete a file named ".builtin-dtb.S" but the file
created by scripts/Makefile.vmlinux is actually called ".builtin-dtbs.S".
Fixes: 654102df2ac2a ("kbuild: add generic support for built-in boot DTBs")
Cc: stable@vger.kernel.org
Signed-off-by: Charles Mirabile <cmirabil@redhat.com>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Link: https://patch.msgid.link/20260308044338.181403-1-cmirabil@redhat.com
[nathan: Small commit message adjustments]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
|
|
Since the caller, __io_uring_run_bpf_filters(), doesn't prevent
migration, it should use the migration disabling variant for running
the BPF program.
Fixes: d42eb05e60fe ("io_uring: add support for BPF filtering for opcode restrictions")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The driver uses devm_spi_register_controller() for registration, which
automatically unregisters the controller via devm cleanup when the
device is removed. The manual call to spi_unregister_controller() in
the remove() callback can lead to a double-free.
And to make sure controller is unregistered before DMA buffer is
unmapped, switch to use spi_register_controller() in probe().
Fixes: 8011709906d0 ("spi: rockchip-sfc: Support pm ops")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Link: https://patch.msgid.link/20260310-sfc-v2-1-67fab04b097f@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The wacom_intuos_bt_irq() function processes Bluetooth HID reports
without sufficient bounds checking. A maliciously crafted short report
can trigger an out-of-bounds read when copying data into the wacom
structure.
Specifically, report 0x03 requires at least 22 bytes to safely read
the processed data and battery status, while report 0x04 (which
falls through to 0x03) requires 32 bytes.
Add explicit length checks for these report IDs and log a warning if
a short report is received.
Signed-off-by: Benoît Sevens <bsevens@google.com>
Reviewed-by: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower utility updates for 7.0-rc4 from Shuah Khan:
"linux-cpupower-7.0-rc4
- Adds support for setting EPP via systemd service
- Fixes swapped power/energy unit labels
- Adds intel_pstate turbo boost support for Intel platforms"
* tag 'linux-cpupower-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower: Add intel_pstate turbo boost support for Intel platforms
cpupower: Add support for setting EPP via systemd service
cpupower: fix swapped power/energy unit labels
|
|
Commit 60f3ada4174f ("kunit: Add --list_suites to show suites") introduced
the --list_suites option to kunit.py, but the update to the corresponding
run_wrapper documentation was omitted.
Add the missing description for --list_suites to keep the documentation in
sync with the tool's supported arguments.
Fixes: 60f3ada4174f ("kunit: Add --list_suites to show suites")
Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
scx_enable() uses double-checked locking to lazily initialize a static
kthread_worker pointer. The fast path reads helper locklessly:
if (!READ_ONCE(helper)) { // lockless read -- no helper_mutex
The write side initializes helper under helper_mutex, but previously
used a plain assignment:
helper = kthread_run_worker(0, "scx_enable_helper");
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
plain write -- KCSAN data race with READ_ONCE() above
Since READ_ONCE() on the fast path and the plain write on the
initialization path access the same variable without a common lock,
they constitute a data race. KCSAN requires that all sides of a
lock-free access use READ_ONCE()/WRITE_ONCE() consistently.
Use a temporary variable to stage the result of kthread_run_worker(),
and only WRITE_ONCE() into helper after confirming the pointer is
valid. This avoids a window where a concurrent caller on the fast path
could observe an ERR pointer via READ_ONCE(helper) before the error
check completes.
Fixes: b06ccbabe250 ("sched_ext: Fix starvation of scx_enable() under fair-class saturation")
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The insn state is getting saved on the stack twice for each recursive
iteration. No need for that, once is enough.
Fixes the following reported stack overflow:
drivers/scsi/qla2xxx/qla_dbg.o: error: SIGSEGV: objtool stack overflow!
Segmentation fault
Fixes: 70589843b36f ("objtool: Add option to trace function validation")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/90956545-2066-46e3-b547-10c884582eb0@app.fastmail.com
Link: https://patch.msgid.link/8b97f62d083457f3b0a29a424275f7957dd3372f.1772821683.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
For no apparent reason (possibly related to CONFIG_KMSAN), Clang can
randomly pass the value of RSP to other registers and then back again to
RSP. Handle that accordingly.
Fixes the following warnings:
drivers/input/misc/uinput.o: warning: objtool: uinput_str_to_user+0x165: undefined stack state
drivers/input/misc/uinput.o: warning: objtool: uinput_str_to_user+0x165: unknown CFA base reg -1
Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/90956545-2066-46e3-b547-10c884582eb0@app.fastmail.com
Link: https://patch.msgid.link/240e6a172cc73292499334a3724d02ccb3247fc7.1772818491.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
The actual code is right, but the comment is the wrong way around.
Fixes: ed82f35b926b ("io_uring: allow registration of per-task restrictions")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Dingisoul with KASAN reports a use after free if device_add() fails in
nd_async_device_register().
Commit b6eae0f61db2 ("libnvdimm: Hold reference on parent while
scheduling async init") correctly added a reference on the parent device
to be held until asynchronous initialization was complete. However, if
device_add() results in an allocation failure the ref count of the
device drops to 0 prior to the parent pointer being accessed. Thus
resulting in use after free.
The bug bot AI correctly identified the fix. Save a reference to the
parent pointer to be used to drop the parent reference regardless of the
outcome of device_add().
Reported-by: Dingisoul <dingiso.kernel@gmail.com>
Closes: http://lore.kernel.org/8855544b-be9e-4153-aa55-0bc328b13733@gmail.com
Fixes: b6eae0f61db2 ("libnvdimm: Hold reference on parent while scheduling async init")
Cc: stable@vger.kernel.org
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260306-fix-uaf-async-init-v1-1-a28fd7526723@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
Tegra238 platforms use different clock rates for plla and
plla_out0 clocks. Add Tegra238 support in the Tegra
sound card driver to apply specific clock configurations.
Signed-off-by: Aditya Bavanari <abavanari@nvidia.com>
Signed-off-by: Sheetal <sheetal@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260303100249.3214529-3-sheetal@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
In iptfs_reassem_cont(), IP-TFS attempts to append data to the new inner
packet 'newskb' that is being reassembled. First a zero-copy approach is
tried if it succeeds then newskb becomes non-linear.
When a subsequent fragment in the same datagram does not meet the
fast-path conditions, a memory copy is performed. It calls skb_put() to
append the data and as newskb is non-linear it triggers
SKB_LINEAR_ASSERT check.
Oops: invalid opcode: 0000 [#1] SMP NOPTI
[...]
RIP: 0010:skb_put+0x3c/0x40
[...]
Call Trace:
<IRQ>
iptfs_reassem_cont+0x1ab/0x5e0 [xfrm_iptfs]
iptfs_input_ordered+0x2af/0x380 [xfrm_iptfs]
iptfs_input+0x122/0x3e0 [xfrm_iptfs]
xfrm_input+0x91e/0x1a50
xfrm4_esp_rcv+0x3a/0x110
ip_protocol_deliver_rcu+0x1d7/0x1f0
ip_local_deliver_finish+0xbe/0x1e0
__netif_receive_skb_core.constprop.0+0xb56/0x1120
__netif_receive_skb_list_core+0x133/0x2b0
netif_receive_skb_list_internal+0x1ff/0x3f0
napi_complete_done+0x81/0x220
virtnet_poll+0x9d6/0x116e [virtio_net]
__napi_poll.constprop.0+0x2b/0x270
net_rx_action+0x162/0x360
handle_softirqs+0xdc/0x510
__irq_exit_rcu+0xe7/0x110
irq_exit_rcu+0xe/0x20
common_interrupt+0x85/0xa0
</IRQ>
<TASK>
Fix this by checking if the skb is non-linear. If it is, linearize it by
calling skb_linearize(). As the initial allocation of newskb originally
reserved enough tailroom for the entire reassembled packet we do not
need to check if we have enough tailroom or extend it.
Fixes: 5f2b6a909574 ("xfrm: iptfs: add skb-fragment sharing code")
Reported-by: Hao Long <me@imlonghao.com>
Closes: https://lore.kernel.org/netdev/DGRCO9SL0T5U.JTINSHJQ9KPK@imlonghao.com/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
When UBLK_F_NO_AUTO_PART_SCAN is set, GD_SUPPRESS_PART_SCAN is cleared
unconditionally, including for unprivileged daemons. Keep it consistent
with the code block for setting GD_SUPPRESS_PART_SCAN by not clearing
it for unprivileged daemons.
In reality this isn't a problem because ioctl(BLKRRPART) requires
CAP_SYS_ADMIN, but it is more reliable to not clear the bit.
Cc: Alexander Atanasov <alex@zazolabs.com>
Fixes: 8443e2087e70 ("ublk: add UBLK_F_NO_AUTO_PART_SCAN feature flag")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The driver uses devm_dma_request_chan() which registers automatic cleanup
via devm_add_action_or_reset(). Calling dma_release_channel() manually on
the RX channel when TX channel request fails causes a double-free when
the devm cleanup runs.
Remove the unnecessary manual cleanup and simplify the error handling
since devm will properly release channels on probe failure or driver
detach.
Fixes: 34e3815ea459 ("spi: atcspi200: Add ATCSPI200 SPI controller driver")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Link: https://patch.msgid.link/20260305-atcspi2000-v1-1-eafe08dcca60@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Fix three bugs in aml_sfc_dma_buffer_setup() error paths:
1. Unnecessary goto: When the first DMA mapping (sfc->daddr) fails,
nothing needs cleanup. Use direct return instead of goto.
2. Double-unmap bug: When info DMA mapping failed, the code would
unmap sfc->daddr inline, then fall through to out_map_data which
would unmap it again, causing a double-unmap.
3. Wrong unmap size: The out_map_info label used datalen instead of
infolen when unmapping sfc->iaddr, which could lead to incorrect
DMA sync behavior.
Fixes: 4670db6f32e9 ("spi: amlogic: add driver for Amlogic SPI Flash Controller")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Link: https://patch.msgid.link/20260306-spifc-a4-v1-1-f22c9965f64a@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Jarkko does now work for Intel anymore and since I'm currently
maintaining this driver, update my contact information here to make sure
patches get Cc'd to me as well.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> (internally)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
device_property_read_foo() returns 0 on success and only then modifies
'val'. Currently, val is left uninitialized if the aforementioned
function returns non-zero, making nhi_wake_supported() return true
almost always (random != 0) if the property is not present in device
firmware.
Invert the check to make it make sense.
Fixes: 3cdb9446a117 ("thunderbolt: Add support for Intel Ice Lake")
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
|
|
When `ceph_process_folio_batch` encounters a folio past the end of the
current object, it should leave it in the batch so that it is picked up
in the next iteration.
Removing the folio from the batch means that it does not get written
back and remains dirty instead. This makes `fsync()` silently skip some
of the data, delays capability release, and breaks coherence with
`O_DIRECT`.
The link below contains instructions for reproducing the bug.
Cc: stable@vger.kernel.org
Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
Link: https://tracker.ceph.com/issues/75156
Signed-off-by: Hristo Venev <hristo@venev.name>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Add __putname() calls to error code paths that did not free the "path"
pointer obtained by __getname(). If ownership of this pointer is not
passed to the caller via path_info.path, the function must free it
before returning.
Cc: stable@vger.kernel.org
Fixes: 3fd945a79e14 ("ceph: encode encrypted name in ceph_mdsc_build_path and dentry release")
Fixes: 550f7ca98ee0 ("ceph: give up on paths longer than PATH_MAX")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_mdsc_build_path() must be called with a zero-initialized
ceph_path_info parameter, or else the following
ceph_mdsc_free_path_info() may crash.
Example crash (on Linux 6.18.12):
virt_to_cache: Object is not a Slab page!
WARNING: CPU: 184 PID: 2871736 at mm/slub.c:6732 kmem_cache_free+0x316/0x400
[...]
Call Trace:
[...]
ceph_open+0x13d/0x3e0
do_dentry_open+0x134/0x480
vfs_open+0x2a/0xe0
path_openat+0x9a3/0x1160
[...]
cache_from_obj: Wrong slab cache. names_cache but object is from ceph_inode_info
WARNING: CPU: 184 PID: 2871736 at mm/slub.c:6746 kmem_cache_free+0x2dd/0x400
[...]
kernel BUG at mm/slub.c:634!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:__slab_free+0x1a4/0x350
Some of the ceph_mdsc_build_path() callers had initializers, but
others had not, even though they were all added by commit 15f519e9f883
("ceph: fix race condition validating r_parent before applying state").
The ones without initializer are suspectible to random crashes. (I can
imagine it could even be possible to exploit this bug to elevate
privileges.)
Unfortunately, these Ceph functions are undocumented and its semantics
can only be derived from the code. I see that ceph_mdsc_build_path()
initializes the structure only on success, but not on error.
Calling ceph_mdsc_free_path_info() after a failed
ceph_mdsc_build_path() call does not even make sense, but that's what
all callers do, and for it to be safe, the structure must be
zero-initialized. The least intrusive approach to fix this is
therefore to add initializers everywhere.
Cc: stable@vger.kernel.org
Fixes: 15f519e9f883 ("ceph: fix race condition validating r_parent before applying state")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
During async unlink, we drop the `i_nlink` counter before we receive
the completion (that will eventually update the `i_nlink`) because "we
assume that the unlink will succeed". That is not a bad idea, but it
races against deletions by other clients (or against the completion of
our own unlink) and can lead to an underrun which emits a WARNING like
this one:
WARNING: CPU: 85 PID: 25093 at fs/inode.c:407 drop_nlink+0x50/0x68
Modules linked in:
CPU: 85 UID: 3221252029 PID: 25093 Comm: php-cgi8.1 Not tainted 6.14.11-cm4all1-ampere #655
Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : drop_nlink+0x50/0x68
lr : ceph_unlink+0x6c4/0x720
sp : ffff80012173bc90
x29: ffff80012173bc90 x28: ffff086d0a45aaf8 x27: ffff0871d0eb5680
x26: ffff087f2a64a718 x25: 0000020000000180 x24: 0000000061c88647
x23: 0000000000000002 x22: ffff07ff9236d800 x21: 0000000000001203
x20: ffff07ff9237b000 x19: ffff088b8296afc0 x18: 00000000f3c93365
x17: 0000000000070000 x16: ffff08faffcbdfe8 x15: ffff08faffcbdfec
x14: 0000000000000000 x13: 45445f65645f3037 x12: 34385f6369706f74
x11: 0000a2653104bb20 x10: ffffd85f26d73290 x9 : ffffd85f25664f94
x8 : 00000000000000c0 x7 : 0000000000000000 x6 : 0000000000000002
x5 : 0000000000000081 x4 : 0000000000000481 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff08727d3f91e8
Call trace:
drop_nlink+0x50/0x68 (P)
vfs_unlink+0xb0/0x2e8
do_unlinkat+0x204/0x288
__arm64_sys_unlinkat+0x3c/0x80
invoke_syscall.constprop.0+0x54/0xe8
do_el0_svc+0xa4/0xc8
el0_svc+0x18/0x58
el0t_64_sync_handler+0x104/0x130
el0t_64_sync+0x154/0x158
In ceph_unlink(), a call to ceph_mdsc_submit_request() submits the
CEPH_MDS_OP_UNLINK to the MDS, but does not wait for completion.
Meanwhile, between this call and the following drop_nlink() call, a
worker thread may process a CEPH_CAP_OP_IMPORT, CEPH_CAP_OP_GRANT or
just a CEPH_MSG_CLIENT_REPLY (the latter of which could be our own
completion). These will lead to a set_nlink() call, updating the
`i_nlink` counter to the value received from the MDS. If that new
`i_nlink` value happens to be zero, it is illegal to decrement it
further. But that is exactly what ceph_unlink() will do then.
The WARNING can be reproduced this way:
1. Force async unlink; only the async code path is affected. Having
no real clue about Ceph internals, I was unable to find out why the
MDS wouldn't give me the "Fxr" capabilities, so I patched
get_caps_for_async_unlink() to always succeed.
(Note that the WARNING dump above was found on an unpatched kernel,
without this kludge - this is not a theoretical bug.)
2. Add a sleep call after ceph_mdsc_submit_request() so the unlink
completion gets handled by a worker thread before drop_nlink() is
called. This guarantees that the `i_nlink` is already zero before
drop_nlink() runs.
The solution is to skip the counter decrement when it is already zero,
but doing so without a lock is still racy (TOCTOU). Since
ceph_fill_inode() and handle_cap_grant() both hold the
`ceph_inode_info.i_ceph_lock` spinlock while set_nlink() runs, this
seems like the proper lock to protect the `i_nlink` updates.
I found prior art in NFS and SMB (using `inode.i_lock`) and AFS (using
`afs_vnode.cb_lock`). All three have the zero check as well.
Cc: stable@vger.kernel.org
Fixes: 2ccb45462aea ("ceph: perform asynchronous unlink if we have sufficient caps")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
At the end of this function, d is the traversal cursor of flist, but the
code completes found instead. This can lead to issues such as NULL pointer
dereferences, double completion, or descriptor leaks.
Fix this by completing d instead of found in the final
list_for_each_entry_safe() loop.
Fixes: aa8d18becc0c ("dmaengine: idxd: add callback support for iaa crypto")
Signed-off-by: Tuo Li <islituo@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260106032428.162445-1-islituo@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|