summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2025-12-19drm/i915/colorop: do not include headers from headersJani Nikula
drm_colorop.h doesn't need the intel_display_types.h include for anything. Don't include headers from headers if it can be avoided. Fixes: 3e9b06559aa1 ("drm/i915: Add intel_color_op") Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Uma Shankar <uma.shankar@intel.com> Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Link: https://patch.msgid.link/20251218141807.409751-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-12-19drm/i915/dp: Restrict max source rate for WCL to HBR3Ankit Nautiyal
WCL supports a maximum of HBR3 8.1 Gbps for both eDP/DP. Limit the max source rate to HBR3 for WCL. v2: Move the check inside mtl_max_source_rate(). (Suraj) Bspec:74286 Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20251122053651.759389-1-ankit.k.nautiyal@intel.com
2025-12-19Merge tag 'drm-xe-fixes-2025-12-19' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: - Limit num_syncs to prevent oversized kernel allocations (Shuicheng) - Disallow 0 OA property values (Ashutosh) - Disallow 0 EU stall property values (Ashutosh) Driver Changes: - Fix kobject leak (Shuicheng) - Workaround (Vinay) - Loop variable reference fix (Matt Brost) - Fix a CONFIG corner-case incorrect number of arguments (Arnd Bergmann) - Skip reason prefix while emitting array (Raag) - VF migration fix (Tomasz) - Fix context in mei interrupt top half (Junxiao) - Don't include the CCS metadata in the dma-buf sg-table (Thomas) - VF queueing recovery work fix (Satyanarayana) - Increase TDF timeout (Jagmeet) - GT reset registers vs scheduler ordering fix (Jan) - Adjust long-running workload timeslices (Matt Brost) - Always set OA_OAGLBCTXCTRL_COUNTER_RESUME (Ashutosh) - Fix a return value (Dan Carpenter) - Drop preempt-fences when destroying imported dma-bufs (Thomas) - Use usleep_range for accurate long-running workload timeslicing (Matthew) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/aUSMlQ4iruzm0NQR@fedora
2025-12-18drm/xe: Fix documentation heading levels in xe_guc_pc.cSwaraj Gaikwad
Sphinx reports htmldocs warnings: Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:76: ERROR: A level 2 section cannot be used here. Documentation/gpu/xe/xe_firmware:31: ./drivers/gpu/drm/xe/xe_guc_pc.c:87: ERROR: A level 2 section cannot be used here. The xe_guc_pc.c documentation is included inside xe_firmware.rst. The headers in the C file currently use '=' underlines, which conflict with the parent document's section levels. Fix this by demoting "Frequency management" and "Render-C States" headers from '=' to '-' to correctly nest them as subsections. Build environment: Python 3.13.7 Sphinx 8.2.3 docutils 0.22.3 Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com> Link: https://patch.msgid.link/20251209094836.18589-1-swarajgaikwad1925@gmail.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-18drm/xe/xe_survivability: Remove unused indexRiana Tauro
Remove unused index variable and fix for loop. Fixes: f4e9fc967afd ("drm/xe/xe_survivability: Redesign survivability mode") Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/intel-xe/20251210075757.GA1206705@ax162/ Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251218105151.586575-5-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-19Merge tag 'drm-misc-fixes-2025-12-18' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.19-rc2: - Add -EDEADLK handling in drm unit tests. - Plug DRM_IOCTL_GEM_CHANGE_HANDLE leak. - Fix regression in sony-td4353-jdi. - Kconfig fix for visionox-rm69299. - Do not load amdxdna when running virtualized. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/21861d1b-54bf-4853-9c35-97abe3c5deba@linux.intel.com
2025-12-18drm/xe/nvm: enable cri platformAlexander Usyskin
Mark CRI as one that have the CSC NVM device. Update the writable override flow to take the information from the scratch register for CRI. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251216111034.3093507-1-alexander.usyskin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-12-18drm/xe: Use usleep_range for accurate long-running workload timeslicingMatthew Brost
msleep is not very accurate in terms of how long it actually sleeps, whereas usleep_range is precise. Replace the timeslice sleep for long-running workloads with the more accurate usleep_range to avoid jitter if the sleep period is less than 20ms. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-3-matthew.brost@intel.com (cherry picked from commit ca415c4d4c17ad676a2c8981e1fcc432221dce79) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe: Drop preempt-fences when destroying imported dma-bufs.Thomas Hellström
When imported dma-bufs are destroyed, TTM is not fully individualizing the dma-resv, but it *is* copying the fences that need to be waited for before declaring idle. So in the case where the bo->resv != bo->_resv we can still drop the preempt-fences, but make sure we do that on bo->_resv which contains the fence-pointer copy. In the case where the copying fails, bo->_resv will typically not contain any fences pointers at all, so there will be nothing to drop. In that case, TTM would have ensured all fences that would have been copied are signaled, including any remaining preempt fences. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: fa0af721bd1f ("drm/ttm: test private resv obj on release/destroy") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Tested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217093441.5073-1-thomas.hellstrom@linux.intel.com (cherry picked from commit 425fe550fb513b567bd6d01f397d274092a9c274) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe/eustall: Disallow 0 EU stall property valuesAshutosh Dixit
An EU stall property value of 0 is invalid and will cause a NPD. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6453 Fixes: 1537ec85ebd7 ("drm/xe/uapi: Introduce API for EU stall sampling") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-4-ashutosh.dixit@intel.com (cherry picked from commit 5bf763e908bf795da4ad538d21c1ec41f8021f76) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe/oa: Disallow 0 OA property valuesAshutosh Dixit
An OA property value of 0 is invalid and will cause a NPD. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6452 Fixes: cc4e6994d5a2 ("drm/xe/oa: Move functions up so they can be reused for config ioctl") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-3-ashutosh.dixit@intel.com (cherry picked from commit 7a100e6ddcc47c1f6ba7a19402de86ce24790621) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()Dan Carpenter
The xe_sriov_vfio_migration_supported() function is type bool so returning -EPERM means returning true. Return false instead. Fixes: bd45d46ffc8f ("drm/xe/pf: Export helpers for VFIO") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/aTLEZ4g-FD-iMQ2V@stanley.mountain Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit 0a2404c8f6a3a120f79c57ef8a3302c8e8bc34d9) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUMEAshutosh Dixit
Reports can be written out to the OA buffer using ways other than periodic sampling. These include mmio trigger and context switches. To support these use cases, when periodic sampling is not enabled, OAG_OAGLBCTXCTRL_COUNTER_RESUME must be set. Fixes: 1db9a9dc90ae ("drm/xe/oa: OA stream initialization (OAG)") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251205212613.826224-4-ashutosh.dixit@intel.com (cherry picked from commit 88d98e74adf3e20f678bb89581a5c3149fdbdeaa) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe: Adjust long-running workload timeslices to reasonable valuesMatthew Brost
A 10ms timeslice for long-running workloads is far too long and causes significant jitter in benchmarks when the system is shared. Adjust the value to 5ms for preempt-fencing VMs, as the resume step there is quite costly as memory is moved around, and set it to zero for pagefault VMs, since switching back to pagefault mode after dma-fence mode is relatively fast. Also change min_run_period_ms to 'unsiged int' type rather than 's64' as only positive values make sense. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251212182847.1683222-2-matthew.brost@intel.com (cherry picked from commit 33a5abd9a68394aa67f9618b20eee65ee8702ff4) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe/oa: Limit num_syncs to prevent oversized allocationsShuicheng Lin
The OA open parameters did not validate num_syncs, allowing userspace to pass arbitrarily large values, potentially leading to excessive allocations. Add check to ensure that num_syncs does not exceed DRM_XE_MAX_SYNCS, returning -EINVAL when the limit is violated. v2: use XE_IOCTL_DBG() and drop duplicated check. (Ashutosh) Fixes: c8507a25cebd ("drm/xe/oa/uapi: Define and parse OA sync properties") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251205234715.2476561-6-shuicheng.lin@intel.com (cherry picked from commit e057b2d2b8d815df3858a87dffafa2af37e5945b) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe: Limit num_syncs to prevent oversized allocationsShuicheng Lin
The exec and vm_bind ioctl allow userspace to specify an arbitrary num_syncs value. Without bounds checking, a very large num_syncs can force an excessively large allocation, leading to kernel warnings from the page allocator as below. Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request exceeding this limit. " ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124 ... Call Trace: <TASK> alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416 ___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317 __kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348 __do_kmalloc_node mm/slub.c:4364 [inline] __kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388 kmalloc_noprof include/linux/slab.h:909 [inline] kmalloc_array_noprof include/linux/slab.h:948 [inline] xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158 drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797 drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894 xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:598 [inline] __se_sys_ioctl fs/ioctl.c:584 [inline] __x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... " v2: Add "Reported-by" and Cc stable kernels. v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh) v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt) v5: Do the check at the top of the exec func. (Matt) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Koen Koning <koen.koning@intel.com> Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450 Cc: <stable@vger.kernel.org> # v6.12+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Ivan Briano <ivan.briano@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com (cherry picked from commit b07bac9bd708ec468cd1b8a5fe70ae2ac9b0a11c) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-18drm/xe: Drop preempt-fences when destroying imported dma-bufs.Thomas Hellström
When imported dma-bufs are destroyed, TTM is not fully individualizing the dma-resv, but it *is* copying the fences that need to be waited for before declaring idle. So in the case where the bo->resv != bo->_resv we can still drop the preempt-fences, but make sure we do that on bo->_resv which contains the fence-pointer copy. In the case where the copying fails, bo->_resv will typically not contain any fences pointers at all, so there will be nothing to drop. In that case, TTM would have ensured all fences that would have been copied are signaled, including any remaining preempt fences. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Fixes: fa0af721bd1f ("drm/ttm: test private resv obj on release/destroy") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Tested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217093441.5073-1-thomas.hellstrom@linux.intel.com
2025-12-18drm/imagination: Disallow exporting of PM/FW protected objectsAlessio Belle
These objects are meant to be used by the GPU firmware or by the PM unit within the GPU, in which case they may contain physical addresses. This adds a layer of protection against exposing potentially exploitable information outside of the driver. Fixes: ff5f643de0bf ("drm/imagination: Add GEM and VM related code") Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251208-no-export-pm-fw-obj-v1-1-83ab12c61693@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com>
2025-12-18drm/gem: Fix kerneldoc warningsLoïc Molinari
Fix incorrect parameters in drm_gem_shmem_init() and missing " *" on empty lines in drm_gem_get_huge_mnt(). Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Fixes: 6e0b1b82017b ("drm/gem: Add huge tmpfs mountpoint helpers") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/dri-devel/20251216115605.4babbce0@canb.auug.org.au/ Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251217172404.31216-1-loic.molinari@collabora.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-12-18drm/panthor: unlock on error in panthor_ioctl_bo_create()Dan Carpenter
Call drm_dev_exit() before returning -EINVAL. Fixes: cd2c9c3015e6 ("drm/panthor: Add flag to map GEM object Write-Back Cacheable") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patch.msgid.link/aUOxxvXXtHHfFCcg@stanley.mountain Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-12-18drm/i915: Add intel_gvt_driver_remove() onto error cleanup pathJuha-Pekka Heikkila
Add intel_gvt_driver_remove() onto error cleanup path. Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Zhenyu Wang <zhenyuw.linux@gmail.com> Signed-off-by: Mika Kahola <mika.kahola@intel.com> Link: https://patch.msgid.link/20251216080754.221974-3-juhapekka.heikkila@gmail.com
2025-12-18drm/i915: switch to use kernel standard error injectionJuha-Pekka Heikkila
Switch error injection testing from i915_inject_probe_failure to ALLOW_ERROR_INJECTION. Here taken out calls to i915_inject_probe_failure and changed to use ALLOW_ERROR_INJECTION for the same functions. Below functions are dropped from testing since I couldn't hit those at module bind time, testing these would just fail the tests. To include these in test would need to find way to cause resetting in i915 which would trigger these: intel_gvt_init intel_wopcm_init intel_uc_fw_upload intel_gt_init with expected -EIO (-EINVAL is tested) lrc_init_wa_ctx intel_huc_auth guc_check_version_range intel_uc_fw_fetch uc_fw_xfer __intel_uc_reset_hw guc_enable_communication uc_init_wopcm ..and all stages of __force_fw_fetch_failures Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Mika Kahola <mika.kahola@intel.com> Link: https://patch.msgid.link/20251216080754.221974-2-juhapekka.heikkila@gmail.com
2025-12-18drm/syncobj: Convert syncobj idr to xarrayTvrtko Ursulin
IDR is deprecated and syncobj looks pretty trivial to convert so lets just do it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: intel-xe@lists.freedesktop.org Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20251205150910.92913-1-tvrtko.ursulin@igalia.com
2025-12-18drivers: gpu: Update ARef imports from sync::arefShankari Anand
Update call sites to import `ARef` from `sync::aref` instead of `types`. This aligns with the ongoing effort to move `ARef` and `AlwaysRefCounted` to sync. Suggested-by: Benno Lossin <lossin@kernel.org> Link: https://github.com/Rust-for-Linux/linux/issues/1173 Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://patch.msgid.link/20251123092438.182251-3-shankari.ak0208@gmail.com [aliceryhl: keep trailing // at last import] Signed-off-by: Alice Ryhl <aliceryhl@google.com>
2025-12-17drm/xe/eustall: Disallow 0 EU stall property valuesAshutosh Dixit
An EU stall property value of 0 is invalid and will cause a NPD. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6453 Fixes: 1537ec85ebd7 ("drm/xe/uapi: Introduce API for EU stall sampling") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-4-ashutosh.dixit@intel.com
2025-12-17drm/xe/oa: Disallow 0 OA property valuesAshutosh Dixit
An OA property value of 0 is invalid and will cause a NPD. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6452 Fixes: cc4e6994d5a2 ("drm/xe/oa: Move functions up so they can be reused for config ioctl") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-3-ashutosh.dixit@intel.com
2025-12-17drm/xe/oa: Move default oa unit assignment earlier during stream openAshutosh Dixit
De-referencing param.oa_unit, when an OA unit id is not provided during stream open, results in NPD below. Oops: general protection fault, probably for non-canonical address... KASAN: null-ptr-deref in range... RIP: 0010:xe_oa_stream_open_ioctl+0x169/0x38a0 xe_observation_ioctl+0x19f/0x270 drm_ioctl_kernel+0x1f4/0x410 Fix this by moving default oa unit assignment before the dereference. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6840 Fixes: c7e269aa565f ("drm/xe/oa: Allow exec_queue's to be specified only for OAG OA unit") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-2-ashutosh.dixit@intel.com
2025-12-17drm/xe/pf: Add handling for MLRC adverse event thresholdDaniele Ceraolo Spurio
Since it is illegal to register a MLRC context when scheduler groups are enabled, the GuC consider the VF doing so as an adverse event. Like for other adverse event, there is a threshold for how many times the event can happen before the GuC throws an error, which we need to add support for. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-5-michal.wajdeczko@intel.com
2025-12-17drm/xe/pf: Prepare for new threshold KLVsMichal Wajdeczko
We want to extend our macro-based KLV list definitions with new information about the version from which given KLV is supported. Prepare our code generators to emit dedicated version check if a KLV was defined with the version information. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-4-michal.wajdeczko@intel.com
2025-12-17drm/xe/guc: Introduce GUC_FIRMWARE_VER_AT_LEAST helperMichal Wajdeczko
There are already few places in the code where we need to check GuC firmware version. Wrap existing raw conditions into a named helper macro to make it clear and avoid explicit call of the MAKE_GUC_VER. Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-3-michal.wajdeczko@intel.com
2025-12-17drm/xe: Introduce IF_ARGS macro utilityMichal Wajdeczko
We want to extend our macro-based KLV list definitions with new information about the version from which given KLV is supported. Add utility IF_ARGS macro that can be used in code generators to emit different code based on the presence of additional arguments. Introduce macro itself and extend our kunit tests to cover it. We will use this macro in next patch. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217224018.3490-1-michal.wajdeczko@intel.com
2025-12-17drm/xe: Fix NULL pointer dereference in xe_exec_ioctlTapani Pälli
Helper function xe_sync_needs_wait expects sync->fence when accessing flags, patch makes sure we call only when sync->fence exists. v2: move null checking to xe_sync_needs_wait and make xe_sync_entry_wait utilize this helper (Matthew Auld) v3: further simplify code (Matthew Auld) Fixes NULL pointer dereference seen with Vulkan workloads: [ 118.410401] RIP: 0010:xe_sync_needs_wait+0x27/0x50 [xe] Fixes: 4ac9048d0501 ("drm/xe: Wait on in-syncs when swicthing to dma-fence mode") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217132412.435755-1-tapani.palli@intel.com
2025-12-17drm/panthor: fix for dma-fence safe access rulesChia-I Wu
Commit 506aa8b02a8d6 ("dma-fence: Add safe access helpers and document the rules") details the dma-fence safe access rules. The most common culprit is that drm_sched_fence_get_timeline_name may race with group_free_queue. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patch.msgid.link/20251204174545.399059-1-olvaffe@gmail.com
2025-12-17drm/panthor: Fix NULL pointer dereference on panthor_fw_unplugKarunika Choo
This patch removes the MCU halt and wait for halt procedures during panthor_fw_unplug() as the MCU can be in a variety of states or the FW may not even be loaded/initialized at all, the latter of which can lead to a NULL pointer dereference. It should be safe on unplug to just disable the MCU without waiting for it to halt as it may not be able to. Fixes: 514072549865 ("drm/panthor: Support GLB_REQ.STATE field for Mali-G1 GPUs") Suggested-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Reviewed-by: Liviu Dudau <liviu@dudau.co.uk> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patch.msgid.link/20251215203312.1084182-1-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-12-17drm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()Dan Carpenter
The xe_sriov_vfio_migration_supported() function is type bool so returning -EPERM means returning true. Return false instead. Fixes: 17f22465c5a5 ("drm/xe/pf: Export helpers for VFIO") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/aTLEZ4g-FD-iMQ2V@stanley.mountain Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-12-17drm/xe/vf: fix return type in vf_migration_init_late()Dan Carpenter
The vf_migration_init_late() function is supposed to return zero on success and negative error codes on failure. The error code eventually gets propagated back to the probe() function and returned. The problem is it's declared as type bool so it returns true on error. Change it to type int instead. Fixes: 2e2dab20dd66 ("drm/xe/vf: Enable VF migration only on supported GuC versions") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/aTK9pwJ_roc8vpDi@stanley.mountain Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-12-16drm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUMEAshutosh Dixit
Reports can be written out to the OA buffer using ways other than periodic sampling. These include mmio trigger and context switches. To support these use cases, when periodic sampling is not enabled, OAG_OAGLBCTXCTRL_COUNTER_RESUME must be set. Fixes: 1db9a9dc90ae ("drm/xe/oa: OA stream initialization (OAG)") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251205212613.826224-4-ashutosh.dixit@intel.com
2025-12-16drm/xe/rtp: Whitelist OAMERT MMIO trigger registersAshutosh Dixit
Whitelist OAMERT registers to enable userspace to execute MMIO triggers on OAMERT units. Registers are whitelisted for compute and copy class engines. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251205212613.826224-3-ashutosh.dixit@intel.com
2025-12-16drm/xe/oa/uapi: Expose MERT OA unitAshutosh Dixit
A MERT OA unit is available in the SoC on some platforms. Add support for this OA unit and expose it to userspace. The MERT OA unit does not have any HW engines attached, but is otherwise similar to an OAM unit. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20251205212613.826224-2-ashutosh.dixit@intel.com
2025-12-16drm/amdkfd: Fix improper NULL termination of queue restore SMI event stringBrian Kocoloski
Pass character "0" rather than NULL terminator to properly format queue restoration SMI events. Currently, the NULL terminator precedes the newline character that is intended to delineate separate events in the SMI event buffer, which can break userspace parsers. Signed-off-by: Brian Kocoloski <brian.kocoloski@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6e7143e5e6e21f9d5572e0390f7089e6d53edf3c)
2025-12-16drm/amd/pm: restore SCLK settings after S0ix resumemythilam
User-configured SCLK(GPU core clock)frequencies were not persisting across S0ix suspend/resume cycles on smu v14 hardware. The issue occurred because of the code resetting clock frequency to zero during resume. This patch addresses the problem by: - Preserving user-configured values in driver and sets the clock frequency across resume - Preserved settings are sent to the hardware during resume Signed-off-by: mythilam <mythilam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 20ba98326f4c69e6bf8d1f42942ece485a675b27)
2025-12-16drm/amdgpu: fix a job->pasid access race in gpu recoveryAlex Deucher
Avoid a possible UAF in GPU recovery due to a race between the sched timeout callback and the tdr work queue. The gpu recovery function calls drm_sched_stop() and later drm_sched_start(). drm_sched_start() restarts the tdr queue which will eventually free the job. If the tdr queue frees the job before time out callback completes, the job will be freed and we'll get a UAF when accessing the pasid. Cache it early to avoid the UAF. Example KASAN trace: [ 493.058141] BUG: KASAN: slab-use-after-free in amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.067530] Read of size 4 at addr ffff88b0ce3f794c by task kworker/u128:1/323 [ 493.074892] [ 493.076485] CPU: 9 UID: 0 PID: 323 Comm: kworker/u128:1 Tainted: G E 6.16.0-1289896.2.zuul.bf4f11df81c1410bbe901c4373305a31 #1 PREEMPT(voluntary) [ 493.076493] Tainted: [E]=UNSIGNED_MODULE [ 493.076495] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 493.076500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] [ 493.076512] Call Trace: [ 493.076515] <TASK> [ 493.076518] dump_stack_lvl+0x64/0x80 [ 493.076529] print_report+0xce/0x630 [ 493.076536] ? _raw_spin_lock_irqsave+0x86/0xd0 [ 493.076541] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 493.076545] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077253] kasan_report+0xb8/0xf0 [ 493.077258] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077965] amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.078672] ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu] [ 493.079378] ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu] [ 493.080111] amdgpu_job_timedout+0x642/0x1400 [amdgpu] [ 493.080903] ? pick_task_fair+0x24e/0x330 [ 493.080910] ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu] [ 493.081702] ? _raw_spin_lock+0x75/0xc0 [ 493.081708] ? __pfx__raw_spin_lock+0x10/0x10 [ 493.081712] drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched] [ 493.081721] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081725] process_one_work+0x679/0xff0 [ 493.081732] worker_thread+0x6ce/0xfd0 [ 493.081736] ? __pfx_worker_thread+0x10/0x10 [ 493.081739] kthread+0x376/0x730 [ 493.081744] ? __pfx_kthread+0x10/0x10 [ 493.081748] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081751] ? __pfx_kthread+0x10/0x10 [ 493.081755] ret_from_fork+0x247/0x330 [ 493.081761] ? __pfx_kthread+0x10/0x10 [ 493.081764] ret_from_fork_asm+0x1a/0x30 [ 493.081771] </TASK> Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info") Link: https://github.com/HansKristian-Work/vkd3d-proton/pull/2670 Cc: SRINIVASAN.SHANMUGAM@amd.com Cc: vitaly.prosyak@amd.com Cc: christian.koenig@amd.com Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 20880a3fd5dd7bca1a079534cf6596bda92e107d)
2025-12-16drm/amd/display: Fix DP no audio issueCharlene Liu
[why] need to enable APG_CLOCK_ENABLE enable first also need to wake up az from D3 before access az block Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bf5e396957acafd46003318965500914d5f4edfa)
2025-12-16drm/amd/display: Fix scratch registers offsets for DCN351Ray Wu
[Why] Different platforms use different NBIO header files, causing display code to use differnt offset and read wrong accelerated status. [How] - Unified NBIO offset header file across platform. - Correct scratch registers offsets to proper locations. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667 Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 576e032e909c8a6bb3d907b4ef5f6abe0f644199) Cc: stable@vger.kernel.org
2025-12-16drm/amd/display: Fix scratch registers offsets for DCN35Ray Wu
[Why] Different platforms use differnet NBIO header files, causing display code to use differnt offset and read wrong accelerated status. [How] - Unified NBIO offset header file across platform. - Correct scratch registers offsets to proper locations. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667 Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 49a63bc8eda0304ba307f5ba68305f936174f72d) Cc: stable@vger.kernel.org
2025-12-16drm/amd: Resume the device in thaw() callback when console suspend is disabledMario Limonciello (AMD)
If console suspend has been disabled using `no_console_suspend` also wake up during thaw() so that some messages can be seen for debugging. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4191 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 63387cbbb714d9f0d179d9d4560de1408d0906de)
2025-12-16drm/radeon: Convert legacy DRM logging in evergreen.c to drm_* helpersAbhishek Rajput
Replace DRM_DEBUG(), DRM_ERROR(), and DRM_INFO() calls with the corresponding drm_dbg(), drm_err(), and drm_info() helpers in the radeon driver. The drm_*() logging helpers take a struct drm_device * argument, allowing the DRM core to prefix log messages with the correct device name and instance. This is required to correctly distinguish log messages on systems with multiple GPUs. This change aligns radeon with the DRM TODO item: "Convert logging to drm_* functions with drm_device parameter". Signed-off-by: Abhishek Rajput <abhiraj21put@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16drm/amdgpu: Add gfx v12_1 interrupt source headerHawking Zhang
To acommandate specific interrupt source for gfx v12_1 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16drm/amdkfd: Override KFD SVM mappings for GFX 12.1Mukul Joshi
Override the local MTYPE mappings in KFD SVM code with mtype_local modprobe param for GFX 12.1. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16drm/amdgpu: correct rlc autoload for xcc harvestLikun Gao
If the number instances of firmware is RLC_NUM_INS_CODE0(Only 1 inst), need to copy it directly for rlcautolad. For the firmware which instances number bigger than 1, only copy for enabled XCC to save copy time. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>