| Age | Commit message (Collapse) | Author |
|
A GCS mapping should not be accessible after mprotect(PROT_NONE). Add a
kselftest for this scenario.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
vm_get_page_prot() short-circuits the protection_map[] lookup for a
VM_SHADOW_STACK mapping since it uses a different PIE index from the
typical read/write/exec permissions. However, the side effect is that it
also ignores mprotect(PROT_NONE) by creating an accessible PTE.
Special-case the !(vm_flags & VM_ACCESS_FLAGS) flags to use the
protection_map[VM_NONE] permissions instead. No GCS attributes are
required for an inaccessible PTE.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack")
Cc: stable@vger.kernel.org
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When FEAT_LPA2 is enabled, bits 8-9 of the PTE replace the
shareability attribute with bits 50-51 of the output address. The
_PAGE_GCS{,_RO} definitions include the PTE_SHARED bits as 0b11 (this
matches the other _PAGE_* definitions) but using this macro directly
leads to the following panic when enabling GCS on a system/model with
LPA2:
Unable to handle kernel paging request at virtual address fffff1ffc32d8008
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000060f4d000
[fffff1ffc32d8008] pgd=100000006184b003, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
CPU: 0 UID: 0 PID: 513 Comm: gcs_write_fault Tainted: G M 7.0.0-rc1 #1 PREEMPT
Tainted: [M]=MACHINE_CHECK
Hardware name: QEMU QEMU Virtual Machine, BIOS 2025.02-8+deb13u1 11/08/2025
pstate: 03402005 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : zap_huge_pmd+0x168/0x468
lr : zap_huge_pmd+0x2c/0x468
sp : ffff800080beb660
x29: ffff800080beb660 x28: fff00000c2058180 x27: ffff800080beb898
x26: fff00000c2058180 x25: ffff800080beb820 x24: 00c800010b600f41
x23: ffffc1ffc30af1a8 x22: fff00000c2058180 x21: 0000ffff8dc00000
x20: fff00000c2bc6370 x19: ffff800080beb898 x18: ffff800080bebb60
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000007
x14: 000000000000000a x13: 0000aaaacbbbffff x12: 0000000000000000
x11: 0000ffff8ddfffff x10: 00000000000001fe x9 : 0000ffff8ddfffff
x8 : 0000ffff8de00000 x7 : 0000ffff8da00000 x6 : fff00000c2bc6370
x5 : 0000ffff8da00000 x4 : 000000010b600000 x3 : ffffc1ffc0000000
x2 : fff00000c2058180 x1 : fffff1ffc32d8000 x0 : 000000c00010b600
Call trace:
zap_huge_pmd+0x168/0x468 (P)
unmap_page_range+0xd70/0x1560
unmap_single_vma+0x48/0x80
unmap_vmas+0x90/0x180
unmap_region+0x88/0xe4
vms_complete_munmap_vmas+0xf8/0x1e0
do_vmi_align_munmap+0x158/0x180
do_vmi_munmap+0xac/0x160
__vm_munmap+0xb0/0x138
vm_munmap+0x14/0x20
gcs_free+0x70/0x80
mm_release+0x1c/0xc8
exit_mm_release+0x28/0x38
do_exit+0x190/0x8ec
do_group_exit+0x34/0x90
get_signal+0x794/0x858
arch_do_signal_or_restart+0x11c/0x3e0
exit_to_user_mode_loop+0x10c/0x17c
el0_da+0x8c/0x9c
el0t_64_sync_handler+0xd0/0xf0
el0t_64_sync+0x198/0x19c
Code: aa1603e2 d34cfc00 cb813001 8b011861 (f9400420)
Similarly to how the kernel handles protection_map[], use a
gcs_page_prot variable to store the protection bits and clear PTE_SHARED
if LPA2 is enabled.
Also remove the unused PAGE_GCS{,_RO} macros.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack")
Reported-by: Emanuele Rocca <emanuele.rocca@arm.com>
Cc: stable@vger.kernel.org
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The only caller of ioremap_prot() outside of the generic ioremap()
implementation is generic_access_phys(), which passes a 'pgprot_t' value
determined from the user mapping of the target 'pfn' being accessed by
the kernel. On arm64, the 'pgprot_t' contains all of the non-address
bits from the pte, including the permission controls, and so we end up
returning a new user mapping from ioremap_prot() which faults when
accessed from the kernel on systems with PAN:
| Unable to handle kernel read from unreadable memory at virtual address ffff80008ea89000
| ...
| Call trace:
| __memcpy_fromio+0x80/0xf8
| generic_access_phys+0x20c/0x2b8
| __access_remote_vm+0x46c/0x5b8
| access_remote_vm+0x18/0x30
| environ_read+0x238/0x3e8
| vfs_read+0xe4/0x2b0
| ksys_read+0xcc/0x178
| __arm64_sys_read+0x4c/0x68
Extract only the memory type from the user 'pgprot_t' in ioremap_prot()
and assert that we're being passed a user mapping, to protect us against
any changes in future that may require additional handling. To avoid
falsely flagging users of ioremap(), provide our own ioremap() macro
which simply wraps __ioremap_prot().
Cc: Zeng Heng <zengheng4@huawei.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 893dea9ccd08 ("arm64: Add HAVE_IOREMAP_PROT support")
Reported-by: Jinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Rename our ioremap_prot() implementation to __ioremap_prot() and convert
all arch-internal callers over to the new function.
ioremap_prot() remains as a #define to __ioremap_prot() for
generic_access_phys() and will be subsequently extended to handle user
permissions in 'prot'.
Cc: Zeng Heng <zengheng4@huawei.com>
Cc: Jinjiang Tu <tujinjiang@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Panther Lake systems introduced an autonomous power gating feature for
the integrated Gigabit Ethernet in shutdown state (S5) state. As part of
it, the reset value of DPG_EN bit was changed to 1. Clear this bit after
performing hardware reset to avoid errors such as Tx/Rx hangs, or packet
loss/corruption.
Fixes: 0c9183ce61bc ("e1000e: Add support for the next LOM generation")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add new board type for Panther Lake devices for separating device-specific
features and flows.
Additionally, remove the deprecated device IDs 0x57B5 and 0x57B6, which
are not used by any existing devices.
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It may happen that VF spawned for E610 adapter has problem with setting
link up. This happens when ixgbevf supporting mailbox API 1.6 cooperates
with PF driver which doesn't support this version of API, and hence
doesn't support new approach for getting PF link data.
In that case VF asks PF to provide link data but as PF doesn't support
it, returns -EOPNOTSUPP what leads to early bail from link configuration
sequence.
Avoid such situation by using legacy VFLINKS approach whenever negotiated
API version is less than 1.6.
To reproduce the issue just create VF and set its link up - adapter must
be any from the E610 family, ixgbevf must support API 1.6 or higher while
ixgbevf must not.
Fixes: 53f0eb62b4d2 ("ixgbevf: fix getting link speed data for E610 devices")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Using get_cpu() in the tracepoint assignment causes an obvious preempt
count leak because nothing invokes put_cpu() to undo it:
softirq: huh, entered softirq 3 NET_RX with preempt_count 00000100, exited with 00000101?
This clearly has seen a lot of testing in the last 3+ years...
Use smp_processor_id() instead.
Fixes: 6d4d584a7ea8 ("i40e: Add i40e_napi_poll tracepoint")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Since the conversion of ice to page pool, the ethtool loopback test
crashes:
BUG: kernel NULL pointer dereference, address: 000000000000000c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1100f1067 P4D 0
Oops: Oops: 0002 [#1] SMP NOPTI
CPU: 23 UID: 0 PID: 5904 Comm: ethtool Kdump: loaded Not tainted 6.19.0-0.rc7.260128g1f97d9dcf5364.49.eln154.x86_64 #1 PREEMPT(lazy)
Hardware name: [...]
RIP: 0010:ice_alloc_rx_bufs+0x1cd/0x310 [ice]
Code: 83 6c 24 30 01 66 41 89 47 08 0f 84 c0 00 00 00 41 0f b7 dc 48 8b 44 24 18 48 c1 e3 04 41 bb 00 10 00 00 48 8d 2c 18 8b 04 24 <89> 45 0c 41 8b 4d 00 49 d3 e3 44 3b 5c 24 24 0f 83 ac fe ff ff 44
RSP: 0018:ff7894738aa1f768 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000700 RDI: 0000000000000000
RBP: 0000000000000000 R08: ff16dcae79880200 R09: 0000000000000019
R10: 0000000000000001 R11: 0000000000001000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ff16dcae6c670000
FS: 00007fcf428850c0(0000) GS:ff16dcb149710000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000000c CR3: 0000000121227005 CR4: 0000000000773ef0
PKRU: 55555554
Call Trace:
<TASK>
ice_vsi_cfg_rxq+0xca/0x460 [ice]
ice_vsi_cfg_rxqs+0x54/0x70 [ice]
ice_loopback_test+0xa9/0x520 [ice]
ice_self_test+0x1b9/0x280 [ice]
ethtool_self_test+0xe5/0x200
__dev_ethtool+0x1106/0x1a90
dev_ethtool+0xbe/0x1a0
dev_ioctl+0x258/0x4c0
sock_do_ioctl+0xe3/0x130
__x64_sys_ioctl+0xb9/0x100
do_syscall_64+0x7c/0x700
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[...]
It crashes because we have not initialized libeth for the rx ring.
Fix it by treating ICE_VSI_LB VSIs slightly more like normal PF VSIs and
letting them have a q_vector. It's just a dummy, because the loopback
test does not use interrupts, but it contains a napi struct that can be
passed to libeth_rx_fq_create() called from ice_vsi_cfg_rxq() ->
ice_rxq_pp_create().
Fixes: 93f53db9f9dc ("ice: switch to Page Pool")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix IRDMA hardware initialization timeout (-110) after resume by
separating VSI-dependent configuration from RDMA resource allocation,
ensuring VSI is rebuilt before IRDMA accesses it.
After resume from suspend, IRDMA hardware initialization fails:
ice: IRDMA hardware initialization FAILED init_state=4 status=-110
Separate RDMA initialization into two phases:
1. ice_init_rdma() - Allocate resources only (no VSI/QoS access, no plug)
2. ice_rdma_finalize_setup() - Assign VSI/QoS info and plug device
This allows:
- ice_init_rdma() to stay in ice_resume() (mirrors ice_deinit_rdma()
in ice_suspend())
- VSI assignment deferred until after ice_vsi_rebuild() completes
- QoS info updated after ice_dcb_rebuild() completes
- Device plugged only when control queues, VSI, and DCB are all ready
Fixes: bc69ad74867db ("ice: avoid IRQ collision to fix init failure on ACPI S3 resume")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When deleting a flow rule using "ethtool -N <dev> delete <location>",
idpf_sideband_action_ena() incorrectly validates fsp->ring_cookie even
though ethtool doesn't populate this field for delete operations. The
uninitialized ring_cookie may randomly match RX_CLS_FLOW_DISC or
RX_CLS_FLOW_WAKE, causing validation to fail and preventing legitimate
rule deletions. Remove the unnecessary sideband action enable check and
ring_cookie validation during delete operations since action validation
is not required when removing existing rules.
Fixes: ada3e24b84a0 ("idpf: add flow steering support")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The code uses the vidx for the IRQ name but that doesn't match ethtool
reporting nor netdev naming, this makes it hard to tune the device and
associate queues with IRQs. Sequentially requesting irqs starting from
'0' makes the output consistent.
This commit changes the interrupt numbering but preserves the name
format, maintaining ABI compatibility. Existing tools relying on the old
numbering are already non-functional, as they lack a useful correlation
to the interrupts.
Before:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-1/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-3/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-4/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-5/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 1002
tx_q-1_pkts: 2679
tx_q-2_pkts: 1113
tx_q-3_pkts: 1192 <----- tx_q-3 vs idpf-eth1-Tx-5
rx_q-0_pkts: 1143
rx_q-1_pkts: 3172
rx_q-2_pkts: 1074
After:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-0/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-1/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-2/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-3/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 118
tx_q-1_pkts: 134
tx_q-2_pkts: 228
tx_q-3_pkts: 138 <--- tx_q-3 matches idpf-eth1-Tx-3
rx_q-0_pkts: 111
rx_q-1_pkts: 366
rx_q-2_pkts: 120
Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
rss_data->rss_key needs to be nullified after it is freed.
Checks like "if (!rss_data->rss_key)" in the code could fail
if it is not nullified.
Tested: built and booted the kernel.
Fixes: 83f38f210b85 ("idpf: Fix RSS LUT NULL pointer crash on early ethtool operations")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In idpf_txq_group_alloc(), if any txq group's txqs failed to
allocate memory:
for (j = 0; j < tx_qgrp->num_txq; j++) {
tx_qgrp->txqs[j] = kzalloc(sizeof(*tx_qgrp->txqs[j]),
GFP_KERNEL);
if (!tx_qgrp->txqs[j])
goto err_alloc;
}
It would cause a NULL ptr kernel panic in idpf_txq_group_rel():
for (j = 0; j < txq_grp->num_txq; j++) {
if (flow_sch_en) {
kfree(txq_grp->txqs[j]->refillq);
txq_grp->txqs[j]->refillq = NULL;
}
kfree(txq_grp->txqs[j]);
txq_grp->txqs[j] = NULL;
}
[ 6.532461] BUG: kernel NULL pointer dereference, address: 0000000000000058
...
[ 6.534433] RIP: 0010:idpf_txq_group_rel+0xc9/0x110
...
[ 6.538513] Call Trace:
[ 6.538639] <TASK>
[ 6.538760] idpf_vport_queues_alloc+0x75/0x550
[ 6.538978] idpf_vport_open+0x4d/0x3f0
[ 6.539164] idpf_open+0x71/0xb0
[ 6.539324] __dev_open+0x142/0x260
[ 6.539506] netif_open+0x2f/0xe0
[ 6.539670] dev_open+0x3d/0x70
[ 6.539827] bond_enslave+0x5ed/0xf50
[ 6.540005] ? rcutree_enqueue+0x1f/0xb0
[ 6.540193] ? call_rcu+0xde/0x2a0
[ 6.540375] ? barn_get_empty_sheaf+0x5c/0x80
[ 6.540594] ? __kfree_rcu_sheaf+0xb6/0x1a0
[ 6.540793] ? nla_put_ifalias+0x3d/0x90
[ 6.540981] ? kvfree_call_rcu+0xb5/0x3b0
[ 6.541173] ? kvfree_call_rcu+0xb5/0x3b0
[ 6.541365] do_set_master+0x114/0x160
[ 6.541547] do_setlink+0x412/0xfb0
[ 6.541717] ? security_sock_rcv_skb+0x2a/0x50
[ 6.541931] ? sk_filter_trim_cap+0x7c/0x320
[ 6.542136] ? skb_queue_tail+0x20/0x50
[ 6.542322] ? __nla_validate_parse+0x92/0xe50
ro[o t t o6 .d5e4f2a5u4l0t]- ? security_capable+0x35/0x60
[ 6.542792] rtnl_newlink+0x95c/0xa00
[ 6.542972] ? __rtnl_unlock+0x37/0x70
[ 6.543152] ? netdev_run_todo+0x63/0x530
[ 6.543343] ? allocate_slab+0x280/0x870
[ 6.543531] ? security_capable+0x35/0x60
[ 6.543722] rtnetlink_rcv_msg+0x2e6/0x340
[ 6.543918] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 6.544138] netlink_rcv_skb+0x16a/0x1a0
[ 6.544328] netlink_unicast+0x20a/0x320
[ 6.544516] netlink_sendmsg+0x304/0x3b0
[ 6.544748] __sock_sendmsg+0x89/0xb0
[ 6.544928] ____sys_sendmsg+0x167/0x1c0
[ 6.545116] ? ____sys_recvmsg+0xed/0x150
[ 6.545308] ___sys_sendmsg+0xdd/0x120
[ 6.545489] ? ___sys_recvmsg+0x124/0x1e0
[ 6.545680] ? rcutree_enqueue+0x1f/0xb0
[ 6.545867] ? rcutree_enqueue+0x1f/0xb0
[ 6.546055] ? call_rcu+0xde/0x2a0
[ 6.546222] ? evict+0x286/0x2d0
[ 6.546389] ? rcutree_enqueue+0x1f/0xb0
[ 6.546577] ? kmem_cache_free+0x2c/0x350
[ 6.546784] __x64_sys_sendmsg+0x72/0xc0
[ 6.546972] do_syscall_64+0x6f/0x890
[ 6.547150] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 6.547393] RIP: 0033:0x7fc1a3347bd0
...
[ 6.551375] RIP: 0010:idpf_txq_group_rel+0xc9/0x110
...
[ 6.578856] Rebooting in 10 seconds..
We should skip deallocating txqs[j] if it is NULL in the first place.
Tested: with this patch, the kernel panic no longer appears.
Fixes: 1c325aac10a8 ("idpf: configure resources for TX queues")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In idpf_rxq_group_alloc(), if rx_qgrp->splitq.bufq_sets failed to get
allocated:
rx_qgrp->splitq.bufq_sets = kcalloc(vport->num_bufqs_per_qgrp,
sizeof(struct idpf_bufq_set),
GFP_KERNEL);
if (!rx_qgrp->splitq.bufq_sets) {
err = -ENOMEM;
goto err_alloc;
}
idpf_rxq_group_rel() would attempt to deallocate it in
idpf_rxq_sw_queue_rel(), causing a kernel panic:
```
[ 7.967242] early-network-sshd-n-rexd[3148]: knetbase: Info: [ 8.127804] BUG: kernel NULL pointer dereference, address: 00000000000000c0
...
[ 8.129779] RIP: 0010:idpf_rxq_group_rel+0x101/0x170
...
[ 8.133854] Call Trace:
[ 8.133980] <TASK>
[ 8.134092] idpf_vport_queues_alloc+0x286/0x500
[ 8.134313] idpf_vport_open+0x4d/0x3f0
[ 8.134498] idpf_open+0x71/0xb0
[ 8.134668] __dev_open+0x142/0x260
[ 8.134840] netif_open+0x2f/0xe0
[ 8.135004] dev_open+0x3d/0x70
[ 8.135166] bond_enslave+0x5ed/0xf50
[ 8.135345] ? nla_put_ifalias+0x3d/0x90
[ 8.135533] ? kvfree_call_rcu+0xb5/0x3b0
[ 8.135725] ? kvfree_call_rcu+0xb5/0x3b0
[ 8.135916] do_set_master+0x114/0x160
[ 8.136098] do_setlink+0x412/0xfb0
[ 8.136269] ? security_sock_rcv_skb+0x2a/0x50
[ 8.136509] ? sk_filter_trim_cap+0x7c/0x320
[ 8.136714] ? skb_queue_tail+0x20/0x50
[ 8.136899] ? __nla_validate_parse+0x92/0xe50
[ 8.137112] ? security_capable+0x35/0x60
[ 8.137304] rtnl_newlink+0x95c/0xa00
[ 8.137483] ? __rtnl_unlock+0x37/0x70
[ 8.137664] ? netdev_run_todo+0x63/0x530
[ 8.137855] ? allocate_slab+0x280/0x870
[ 8.138044] ? security_capable+0x35/0x60
[ 8.138235] rtnetlink_rcv_msg+0x2e6/0x340
[ 8.138431] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 8.138650] netlink_rcv_skb+0x16a/0x1a0
[ 8.138840] netlink_unicast+0x20a/0x320
[ 8.139028] netlink_sendmsg+0x304/0x3b0
[ 8.139217] __sock_sendmsg+0x89/0xb0
[ 8.139399] ____sys_sendmsg+0x167/0x1c0
[ 8.139588] ? ____sys_recvmsg+0xed/0x150
[ 8.139780] ___sys_sendmsg+0xdd/0x120
[ 8.139960] ? ___sys_recvmsg+0x124/0x1e0
[ 8.140152] ? rcutree_enqueue+0x1f/0xb0
[ 8.140341] ? rcutree_enqueue+0x1f/0xb0
[ 8.140528] ? call_rcu+0xde/0x2a0
[ 8.140695] ? evict+0x286/0x2d0
[ 8.140856] ? rcutree_enqueue+0x1f/0xb0
[ 8.141043] ? kmem_cache_free+0x2c/0x350
[ 8.141236] __x64_sys_sendmsg+0x72/0xc0
[ 8.141424] do_syscall_64+0x6f/0x890
[ 8.141603] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 8.141841] RIP: 0033:0x7f2799d21bd0
...
[ 8.149905] Kernel panic - not syncing: Fatal exception
[ 8.175940] Kernel Offset: 0xf800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 8.176425] Rebooting in 10 seconds..
```
Tested: With this patch, the kernel panic no longer appears.
Fixes: 95af467d9a4e ("idpf: configure resources for RX queues")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently, in idpf_wait_for_sw_marker_completion(), when an
IDPF_TXD_COMPLT_SW_MARKER packet is found, the routine breaks out of
the for loop and does not increment the next_to_clean counter. This
causes the subsequent NAPI polls to run into the same
IDPF_TXD_COMPLT_SW_MARKER packet again and print out the following:
[ 23.261341] idpf 0000:05:00.0 eth1: Unknown TX completion type: 5
Instead, we should increment next_to_clean regardless when an
IDPF_TXD_COMPLT_SW_MARKER packet is found.
Tested: with the patch applied, we do not see the errors above from NAPI
polls anymore.
Fixes: 9d39447051a0 ("idpf: remove SW marker handling from NAPI")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
On x86, the HYPERVISOR_CALLBACK_VECTOR is used to receive synthetic
interrupts (SINTs) from the hypervisor for doorbells and intercepts.
There is no such vector reserved for arm64.
On arm64, the hypervisor exposes a synthetic register that can be read
to find the INTID that should be used for SINTs. This INTID is in the
PPI range.
To better unify the code paths, introduce mshv_sint_vector_init() that
either reads the synthetic register and obtains the INTID (arm64) or
just uses HYPERVISOR_CALLBACK_VECTOR as the interrupt vector (x86).
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Rename mshv_synic_init() to mshv_synic_cpu_init() and
mshv_synic_cleanup() to mshv_synic_cpu_exit() to better reflect that
these functions handle per-cpu synic setup and teardown.
Use mshv_synic_init/cleanup() to perform init/cleanup that is not per-cpu.
Move all the synic related setup from mshv_parent_partition_init.
Move the reboot notifier to mshv_synic.c because it currently only
operates on the synic cpuhp state.
Move out synic_pages from the global mshv_root since its use is now
completely local to mshv_synic.c.
This is in preparation for adding more stuff to mshv_synic_init().
No functional change.
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- The newly introduced feature that issues a deferred (non-NCQ) command
from a workqueue, forgot to consider the case where the deferred QC
times out. Fix the code to take timeouts into consideration, which
avoids a use after free (Damien)
- The newly introduced feature that issues a deferred (non-NCQ) command
from a workqueue, when unloading the module, calls cancel_work_sync(),
a function that can sleep, while holding a spin lock. Move the function
call outside the lock (Damien)
* tag 'ata-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: fix cancellation of a port deferred qc work
ata: libata-eh: correctly handle deferred qc timeouts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix an uninitialized variable in file_getattr().
The flags_valid field wasn't initialized before calling
vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse
- Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is
not enabled.
sysctl_hung_task_timeout_secs is 0 in that case causing spurious
"waiting for writeback completion for more than 1 seconds" warnings
- Fix a null-ptr-deref in do_statmount() when the mount is internal
- Add missing kernel-doc description for the @private parameter in
iomap_readahead()
- Fix mount namespace creation to hold namespace_sem across the mount
copy in create_new_namespace().
The previous drop-and-reacquire pattern was fragile and failed to
clean up mount propagation links if the real rootfs was a shared or
dependent mount
- Fix /proc mount iteration where m->index wasn't updated when
m->show() overflows, causing a restart to repeatedly show the same
mount entry in a rapidly expanding mount table
- Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the
inode number is out of range
- Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared.
copy_mnt_ns() received the live fs_struct so if a subsequent
namespace creation failed the rollback would leave pwd and root
pointing to detached mounts. Always allocate a new fs_struct when
CLONE_NEWNS is requested
- fserror bug fixes:
- Remove the unused fsnotify_sb_error() helper now that all callers
have been converted to fserror_report_metadata
- Fix a lockdep splat in fserror_report() where igrab() takes
inode::i_lock which can be held in IRQ context.
Replace igrab() with a direct i_count bump since filesystems
should not report inodes that are about to be freed or not yet
exposed
- Handle error pointer in procfs for try_lookup_noperm()
- Fix an integer overflow in ep_loop_check_proc() where recursive calls
returning INT_MAX would overflow when +1 is added, breaking the
recursion depth check
- Fix a misleading break in pidfs
* tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pidfs: avoid misleading break
eventpoll: Fix integer overflow in ep_loop_check_proc()
proc: Fix pointer error dereference
fserror: fix lockdep complaint when igrabbing inode
fsnotify: drop unused helper
unshare: fix unshare_fs() handling
minix: Correct errno in minix_new_inode
namespace: fix proc mount iteration
mount: hold namespace_sem across copy in create_new_namespace()
iomap: Describe @private in iomap_readahead()
statmount: Fix the null-ptr-deref in do_statmount()
writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK
fs: init flags_valid before calling vfs_fileattr_get
|
|
mdc800_device_read() submits download_urb and waits for completion.
If the timeout fires and the device has not responded, the function
returns without killing the URB, leaving it active.
A subsequent read() resubmits the same URB while it is still
in-flight, triggering the WARN in usb_submit_urb():
"URB submitted while active"
Check the return value of wait_event_timeout() and kill the URB if
it indicates timeout, ensuring the URB is complete before its status
is inspected or the URB is resubmitted.
Similar to
- commit 372c93131998 ("USB: yurex: fix control-URB timeout handling")
- commit b98d5000c505 ("media: rc: iguanair: handle timeouts")
Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu>
Cc: stable <stable@kernel.org>
Link: https://patch.msgid.link/20260209151937.2247202-1-n7l8m4@u.northwestern.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If a signal arrives after a read has partially completed,
we need to return the number of bytes read. -EINTR is correct
only if that number is zero.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@kernel.org>
Link: https://patch.msgid.link/20260209142048.1503791-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A null-pointer-dereference bug was reported by syzbot:
Oops: general protection fault, probably for address 0xdffffc0000000000:
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:bitmap_subset include/linux/bitmap.h:433 [inline]
RIP: 0010:cpumask_subset include/linux/cpumask.h:836 [inline]
RIP: 0010:rebuild_sched_domains_locked kernel/cgroup/cpuset.c:967
RSP: 0018:ffffc90003ecfbc0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000020
RDX: ffff888028de0000 RSI: ffffffff8200f003 RDI: ffffffff8df14f28
RBP: 0000000000000000 R08: 0000000000000cc0 R09: 00000000ffffffff
R10: ffffffff8e7d95b3 R11: 0000000000000001 R12: 0000000000000000
R13: 00000000000f4240 R14: dffffc0000000000 R15: 0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f463fff CR3: 000000003704c000 CR4: 00000000003526f0
Call Trace:
<TASK>
rebuild_sched_domains_cpuslocked kernel/cgroup/cpuset.c:983 [inline]
rebuild_sched_domains+0x21/0x40 kernel/cgroup/cpuset.c:990
sched_rt_handler+0xb5/0xe0 kernel/sched/rt.c:2911
proc_sys_call_handler+0x47f/0x5a0 fs/proc/proc_sysctl.c:600
new_sync_write fs/read_write.c:595 [inline]
vfs_write+0x6ac/0x1070 fs/read_write.c:688
ksys_write+0x12a/0x250 fs/read_write.c:740
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The issue occurs when generate_sched_domains() returns ndoms = 1 and
doms = NULL due to a kmalloc failure. This leads to a null-pointer
dereference when accessing doms in rebuild_sched_domains_locked().
Fix this by adding a NULL check for doms before accessing it.
Fixes: 6ee43047e8ad ("cpuset: Remove unnecessary checks in rebuild_sched_domains_locked")
Reported-by: syzbot+460792609a79c085f79f@syzkaller.appspotmail.com
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Use the kernel.org address as a unified single entry to send patches
to. At the same time, update mailmap to group all past contributions.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
cadence_nand_init()
Fix wrong variable used for error checking after dma_alloc_coherent()
call. The function checks cdns_ctrl->dma_cdma_desc instead of
cdns_ctrl->cdma_desc, which could lead to incorrect error handling.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Cc: stable@vger.kernel.org
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
Given CONFIG_FORTIFY_SOURCE=y and a recent compiler,
commit 439a1bcac648 ("fortify: Use __builtin_dynamic_object_size() when
available") produces the warning below and an oops.
Searching for RedBoot partition table in 50000000.flash at offset 0x7e0000
------------[ cut here ]------------
WARNING: lib/string_helpers.c:1035 at 0xc029e04c, CPU#0: swapper/0/1
memcmp: detected buffer overflow: 15 byte read of buffer size 14
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0 #1 NONE
As Kees said, "'names' is pointing to the final 'namelen' many bytes
of the allocation ... 'namelen' could be basically any length at all.
This fortify warning looks legit to me -- this code used to be reading
beyond the end of the allocation."
Since the size of the dynamic allocation is calculated with strlen()
we can use strcmp() instead of memcmp() and remain within bounds.
Cc: Kees Cook <kees@kernel.org>
Cc: stable@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/all/202602151911.AD092DFFCD@keescook/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Kees Cook <kees@kernel.org>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
Takahiro has been an active contributor to the SPI NOR subsystem,
providing valuable patches and reviews. Add him as a designated
reviewer to help facilitate patch processing and maintenance.
Cc: Takahiro Kuwano <takahiro.kuwano@infineon.com>
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Takahiro Kuwano <takahiro.kuwano@infineon.com>
Acked-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
I have not been actively involved in SPI NOR development recently and
would like to step down to focus on my current day-to-day work.
The subsystem remains in good hands with Pratyush and Michael.
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Acked-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
When Linux is running as guest, runs a user space process and the
user space process accesses a page that the host has paged out,
the guest gets a pfault interrupt and schedules a different process.
Without this mechanism the host would have to suspend the whole
virtual CPU until the page has been paged in.
To setup the pfault interrupt the real address of parameter list
should be passed to DIAGNOSE 0x258, but a virtual address is passed
instead.
That has a performance impact, since the pfault setup never succeeds,
the interrupt is never delivered to a guest and the whole virtual CPU
is suspended as result.
Cc: stable@vger.kernel.org
Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
s390_reset_system() calls set_prefix(0), which switches back to the
absolute lowcore. At that point the stack protector canary no longer
matches the canary from the lowcore the function was entered with, so
the stack check fails.
Mark s390_reset_system() __no_stack_protector. This is safe here since
its callers (__do_machine_kdump() and __do_machine_kexec()) are
effectively no-return and fall back to disabled_wait() on failure.
Fixes: f5730d44e05e ("s390: Add stackprotector support")
Reported-by: Nikita Dubrovskii <nikita@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Pull a small idle/vtime series with two cputime accounting fixes plus
related cleanups and minor improvements.
* idle-vtime-fixes-cleanups:
s390/idle: Remove psw_idle() prototype
s390/vtime: Use lockdep_assert_irqs_disabled() instead of BUG_ON()
s390/vtime: Use __this_cpu_read() / get rid of READ_ONCE()
s390/irq/idle: Remove psw bits early
s390/idle: Inline update_timer_idle()
s390/idle: Slightly optimize idle time accounting
s390/idle: Add comment for non obvious code
s390/vtime: Fix virtual timer forwarding
s390/idle: Fix cpu idle exit cpu time accounting
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
psw_idle() does not exist anymore. Remove its prototype.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Use lockdep_assert_irqs_disabled() instead of BUG_ON(). This avoids
crashing the kernel, and generates better code if CONFIG_PROVE_LOCKING
is disabled.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
do_account_vtime() runs always with interrupts disabled, therefore use
__this_cpu_read() instead of this_cpu_read() to get rid of a pointless
preempt_disable() / preempt_enable() pair.
Also there are no concurrent writers to the cpu time accounting fields
in lowcore. Therefore get rid of READ_ONCE() usages.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Remove wait, io, external interrupt bits early in do_io_irq()/do_ext_irq()
when previous context was idle. This saves one conditional branch and is
closer to the original old assembly code.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Inline update_timer_idle() again to avoid an extra function call. This
way the generated code is close to old assembler version again.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Slightly optimize account_idle_time_irq() and update_timer_idle():
- Use fast single instruction __atomic64() primitives to update per
cpu idle_time and idle_count, instead of READ_ONCE() / WRITE_ONCE()
pairs
- stcctm() is an inline assembly with a full memory barrier. This
leads to a not necessary extra dereference of smp_cpu_mtid in
update_timer_idle(). Avoid this and read smp_cpu_mtid into a
variable
- Use __this_cpu_add() instead of this_cpu_add() to avoid disabling /
enabling of preemption several times in a loop in update_timer_idle().
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add a comment to update_timer_idle() which describes why wall time (not
steal time) is added to steal_timer. This is not obvious and was reported
by Frederic Weisbecker.
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/all/aXEVM-04lj0lntMr@localhost.localdomain/
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Since delayed accounting of system time [1] the virtual timer is
forwarded by do_account_vtime() but also vtime_account_kernel(),
vtime_account_softirq(), and vtime_account_hardirq(). This leads
to double accounting of system, guest, softirq, and hardirq time.
Remove accounting from the vtime_account*() family to restore old behavior.
There is only one user of the vtimer interface, which might explain
why nobody noticed this so far.
Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
With the conversion to generic entry [1] cpu idle exit cpu time accounting
was converted from assembly to C. This introduced an reversed order of cpu
time accounting.
On cpu idle exit the current accounting happens with the following call
chain:
-> do_io_irq()/do_ext_irq()
-> irq_enter_rcu()
-> account_hardirq_enter()
-> vtime_account_irq()
-> vtime_account_kernel()
vtime_account_kernel() accounts the passed cpu time since last_update_timer
as system time, and updates last_update_timer to the current cpu timer
value.
However the subsequent call of
-> account_idle_time_irq()
will incorrectly subtract passed cpu time from timer_idle_enter to the
updated last_update_timer value from system_timer. Then last_update_timer
is updated to a sys_enter_timer, which means that last_update_timer goes
back in time.
Subsequently account_hardirq_exit() will account too much cpu time as
hardirq time. The sum of all accounted cpu times is still correct, however
some cpu time which was previously accounted as system time is now
accounted as hardirq time, plus there is the oddity that last_update_timer
goes back in time.
Restore previous behavior by extracting cpu time accounting code from
account_idle_time_irq() into a new update_timer_idle() function and call it
before irq_enter_rcu().
Fixes: 56e62a737028 ("s390: convert to generic entry") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
We should use READ_ONCE when reading from a SQE, make sure timeout gets
a stable timespec address.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The default bdl_pos_adj of 32 for Nvidia HDA controllers is
insufficient on GA102 (and likely other recent Nvidia GPUs) after S3
suspend/resume. The controller's DMA timing degrades after resume,
causing premature IRQ detection in azx_position_ok() which results in
silent HDMI/DP audio output despite userspace reporting a valid
playback state and correct ELD data.
Increase bdl_pos_adj to 64 for AZX_DRIVER_NVIDIA, matching the value
already used by Intel Apollo Lake for the same class of timing issue.
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221069
Suggested-by: Charalampos Mitrodimas <charmitro@posteo.net>
Signed-off-by: Panagiotis Foliadis <pfoliadis@posteo.net>
Link: https://patch.msgid.link/20260225-nvidia-audio-fix-v1-1-b1383c37ec49@posteo.net
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Using a prerelease version as a minimum supported version for
CONFIG_WARN_CONTEXT_ANALYSIS was reasonable to do while LLVM 22 was the
development version so that people could immediately build from main and
start testing and validating this in their own code. However, it can be
problematic when using prerelease versions of LLVM 22, such as Android
clang 22.0.1 (the current android mainline compiler) or when bisecting
LLVM between llvmorg-22-init and llvmorg-23-init, to build the kernel,
as all compiler fixes for the context analysis may not be present,
potentially resulting in warnings that can easily turn into errors.
Now that LLVM 22 is released as 22.1.0, upgrade the check to require at
least this version to ensure that a user's toolchain actually has all
the changes needed for a smooth experience with context analysis.
Fixes: 3269701cb256 ("compiler-context-analysis: Add infrastructure for Context Analysis with Clang")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Link: https://patch.msgid.link/20260224-bump-clang-ver-context-analysis-v1-1-16cc7a90a040@kernel.org
|
|
Make sure that __perf_event_overflow() runs with IRQs disabled for all
possible callchains. Specifically the software events can end up running
it with only preemption disabled.
This opens up a race vs perf_event_exit_event() and friends that will go
and free various things the overflow path expects to be present, like
the BPF program.
Fixes: 592903cdcbf6 ("perf_counter: add an event_list")
Reported-by: Simond Hu <cmdhh1767@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Simond Hu <cmdhh1767@gmail.com>
Link: https://patch.msgid.link/20260224122909.GV1395416@noisy.programming.kicks-ass.net
|
|
When the system is booted with kernel command line argument "nosmt" or
"maxcpus" to limit the number of CPUs, disabling turbo via:
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
results in a crash:
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
...
RIP: 0010:store_no_turbo+0x100/0x1f0
...
This occurs because for_each_possible_cpu() returns CPUs even if they
are not online. For those CPUs, all_cpu_data[] will be NULL. Since
commit 973207ae3d7c ("cpufreq: intel_pstate: Rearrange max frequency
updates handling code"), all_cpu_data[] is dereferenced even for CPUs
which are not online, causing the NULL pointer dereference.
To fix that, pass CPU number to intel_pstate_update_max_freq() and use
all_cpu_data[] for those CPUs for which there is a valid cpufreq policy.
Fixes: 973207ae3d7c ("cpufreq: intel_pstate: Rearrange max frequency updates handling code")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221068
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Link: https://patch.msgid.link/20260225001752.890164-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The ioctl structures in libxfs/xfs_fs.h are missing static size checks.
It is useful to have static size checks for these structures as adding
new fields to them could cause issues (e.g. extra padding that may be
inserted by the compiler). So add these checks to xfs/xfs_ondisk.h.
Due to different padding/alignment requirements across different
architectures, to avoid build failures, some structures are ommited from
the size checks. For example, structures with "compat_" definitions in
xfs/xfs_ioctl32.h are ommited.
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
In libxfs/xfs_ondisk.h, remove some duplicate entries of
XFS_CHECK_STRUCT_SIZE().
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Add comments explaining when to use XFS_IS_CORRUPT() and ASSERT()
Suggested-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Update lazy counters in xfs_growfs_rt_bmblock() similar to the way it
is done xfs_growfs_data_private(). This is because the lazy counters are
not always updated and synching the counters will avoid inconsistencies
between frexents and rtextents(total realtime extent count). This will
be more useful once realtime shrink is implemented as this will prevent
some transient state to occur where frexents might be greater than
total rtextents.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|