summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
2025-08-06drm/amd/pm: Add unique ids for SMUv13.0.6 SOCsLijo Lazar
Fetch and store the unique ids for AIDs/XCDs in SMUv13.0.6 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu: Add helpers to set/get unique idsLijo Lazar
Add a struct to store unique id information for each type. Add helper to fetch the unique id. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu: Prevent hardware access in dpc stateLijo Lazar
Don't allow hardware access while in dpc state. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu/vcn: Fix double-free of vcn dump bufferLijo Lazar
The buffer is already freed as part of amdgpu_vcn_reg_dump_fini(). The issue is introduced by below patch series. Fixes: de55cbff5ce9 ("drm/amdgpu/vcn: Add regdump helper functions") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu: Log reset source during recoveryLijo Lazar
To get more context, add reset source to identify the source of gpu recovery - job timeout, RAS, HWS hang etc. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu: Generate BP threshold exceed CPER once threshold exceededXiang Liu
The bad pages threshold exceed CPER should be generated once threshold exceeded, no matter the bad_page_threshold setted or not. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/pm: Enable temperature metrics capsAsad Kamal
Enable temperature metrics caps for smu_v13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/pm: Add temperature metrics sysfs entryAsad Kamal
Add temperature metrics sysfs entry to expose gpuboard/baseboard temperature metrics v2: Removed unused function, rename functions(Lijo) v3: Remove unnecessary initialization Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/pm: Fetch and fill temperature metricsAsad Kamal
Fetch system metrics table to fill gpuboard/baseboard temperature metrics data for smu_v13_0_12 v2: Remove unnecessary checks, used separate metrics time for temperature metrics table(Lijo) v3: Use cached values for back to back system metrics query(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/pm: Update pmfw header for smu_v13_0_12Asad Kamal
Update pmfw header for smu_v13_0_12 with system temperature metrics table Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/pm: Add smu interface for temp metricsAsad Kamal
Add smu interface to get baseboard/gpuboard temperature metrics v2: Rename is_support to is_supported(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/pm: Add dpm interface for temp metricsAsad Kamal
Add dpm interface to get gpuboard/baseboard temperature metrics v2: Add temperature metrics support check(Lijo) v3: Return error code in case of operation not supported(Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/display: Fix vupdate_offload_work docAurabindo Pillai
Fix the following warning in struct documentation: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:168: warning: expecting prototype for struct dm_vupdate_work. Prototype was for struct vupdate_offload_work instead Fixes: c210b757b400 ("drm/amd/display: fix dmub access race condition") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdkfd: return migration pages from copy functionJames Zhu
dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit should always be set when migration success. cpage includes src MIGRATE_PFN_MIGRATE bit set and MIGRATE_PFN_VALID bit unset pages for both ram and vram when memory is only allocated without being populated before migration, those ram pages should be counted as migrate pages and those vram pages should not be counted as migrate pages. Here migration pages refer to how many vram pages involved. -v2 use dst to check MIGRATE_PFN_VALID bit (suggested-by Philip) -v3 add warning when vram pages is less than migration pages return migration pages directly from copy function -v4 correct comments and copy function return mpage (suggested-by Felix) Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdkfd: remove unused codeJames Zhu
upages is assigned under cpages = 0, so it isn't really used in this function. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Philip.Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amd/pm: Add priority messages for SMU v13.0.6Lijo Lazar
Certain messages will processed with high priority by PMFW even if it hasn't responded to a previous message. Send the priority message regardless of the success/fail status of the previous message. Add support on SMUv13.0.6 and SMUv13.0.12 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu: Set dpc status appropriatelyLijo Lazar
Set the dpc status based on hardware state. Also, clear the status before reinitialization after a successful reset. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdkfd: Destroy KFD debugfs after destroy KFD wqAmber Lin
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line debugfs_remove_recursive(entry->proc_dentry); tries to remove /sys/kernel/debug/kfd/proc/<pid> while /sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel NULL pointer. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu: Wait for bootloader after PSPv11 resetLijo Lazar
Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead of checking for C2PMSG_33 status, add the callback wait_for_bootloader. Wait for bootloader to be back to steady state is already part of the generic mode-1 reset flow. Increase the retry count for bootloader wait and also fix the mask to prevent fake pass. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu/gfx9.4.3: remove redundant repeated nested 0 checkEthan Carter Edwards
The repeated checks on grbm_soft_reset are unnecessary. Remove them. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu/gfx9: remove redundant repeated nested 0 checkEthan Carter Edwards
The repeated checks on grbm_soft_reset are unnecessary. Remove them. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06drm/amdgpu/gfx10: remove redundant repeated nested 0 checkEthan Carter Edwards
The repeated checks on grbm_soft_reset are unnecessary. Remove them. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-06amdgpu/amdgpu_discovery: increase timeout limit for IFWI initXaver Hugl
With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amdgpu: Update SDMA firmware version check for user queue supportJesse.Zhang
This commit fixes a firmware version check for enabling user queue support in SDMA v7.0. The previous version check (7836028) was incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL commands causing register conflicts between MCU_DBG0 and MCU_DBG1. Fixes: 8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 92e2449241516c95aab95eea91faecd0fa2b7ed5) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Add NULL check for asic_funcsLijo Lazar
If driver load fails too early, asic_funcs pointer remains unassigned. Add NULL check to sanitize unwind path. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 582bf7c5158dce16f7dc5b8345b7876bd8031224) Cc: stable@vger.kernel.org
2025-08-04drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"Mario Limonciello
This reverts commit 66abb996999de0d440a02583a6e70c2c24deab45. This broke custom brightness curves but it wasn't obvious because of other related changes. Custom brightness curves are always from a 0-255 input signal. The correct fix was to fix the default value which was done by [1]. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4412 Link: https://lore.kernel.org/amd-gfx/0f094c4b-d2a3-42cd-824c-dc2858a5618d@kernel.org/T/#m69f875a7e69aa22df3370b3e3a9e69f4a61fdaf2 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6ec8a5cbec751625133461600d0d4950ffd3a214) Cc: stable@vger.kernel.org
2025-08-04drm/amd/display: fix a Null pointer dereference vulnerabilitySiyang Liu
[Why] A null pointer dereference vulnerability exists in the AMD display driver's (DC module) cleanup function dc_destruct(). When display control context (dc->ctx) construction fails (due to memory allocation failure), this pointer remains NULL. During subsequent error handling when dc_destruct() is called, there's no NULL check before dereferencing the perf_trace member (dc->ctx->perf_trace), causing a kernel null pointer dereference crash. [How] Check if dc->ctx is non-NULL before dereferencing. Link: https://lore.kernel.org/r/tencent_54FF4252EDFB6533090A491A25EEF3EDBF06@qq.com Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> (Updated commit text and removed unnecessary error message) Signed-off-by: Siyang Liu <Security@tencent.com> Signed-off-by: Roman Li <roman.li@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9dd8e2ba268c636c240a918e0a31e6feaee19404) Cc: stable@vger.kernel.org
2025-08-04drm/amd/display: Add primary plane to commits for correct VRR handlingMichel Dänzer
amdgpu_dm_commit_planes calls update_freesync_state_on_stream only for the primary plane. If a commit affects a CRTC but not its primary plane, it would previously not trigger a refresh cycle or affect LFC, violating current UAPI semantics. Fixes e.g. atomic commits affecting only the cursor plane being limited to the minimum refresh rate. Don't do this for the legacy cursor ioctls though, it would break the UAPI semantics for those. Suggested-by: Xaver Hugl <xaver.hugl@kde.org> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cc7bfba95966251b254cb970c21627124da3b7f4) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: update mmhub 3.3 client id mappingsAlex Deucher
Update the client id mapping so the correct clients get printed when there is a mmhub page fault. v2: fix typos spotted by David Wu. v3: fix additional typo spotted by David. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e932f4779a2d329841bb9ca70bb80a4bb2d707b6) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: update mmhub 3.0.1 client id mappingsAlex Deucher
Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a2681eda73b99a2c1ee8cdb006099ea5d0c2505) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Retain job->vm in amdgpu_job_prepare_jobYuanShang
The field job->vm is used in function amdgpu_job_run to get the page table re-generation counter and decide whether the job should be skipped. Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use. For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed, then the gfx job should be skipped. Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ed76936c6b10b547c6df4ca75412331e9ef6d339) Cc: stable@vger.kernel.org
2025-08-04drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.Timur Kristóf
Apparently, both DCE 6.0 and 6.4 have 3 PLLs, but PLL0 can only be used for DP. Make sure to initialize the correct amount of PLLs in DC for these DCE versions and use PLL0 only for DP. Also, on DCE 6.0 and 6.4, the PLL0 needs to be powered on at initialization as opposed to DCE 6.1 and 7.x which use a different clock source for DFS. The following functions were used as reference from the old radeon driver implementation of DCE 6.x: - radeon_atom_pick_pll - atombios_crtc_set_disp_eng_pll Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 35222b5934ec8d762473592ece98659baf6bc48e) Cc: stable@vger.kernel.org
2025-08-04drm/amd/display: Don't overwrite dce60_clk_mgrTimur Kristóf
dc_clk_mgr_create accidentally overwrites the dce60_clk_mgr with the dce_clk_mgr, causing incorrect behaviour on DCE6. Fix it by removing the extra dce_clk_mgr_construct. Fixes: 62eab49faae7 ("drm/amd/display: hide VGH asic specific structs") Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bbddcbe36a686af03e91341b9bbfcca94bd45fb6) Cc: stable@vger.kernel.org
2025-08-04drm/amdkfd: Fix checkpoint-restore on multi-xccDavid Yat Sin
GPUs with multi-xcc have multiple MQDs per queue. This patch saves and restores all the MQDs within the partition. Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a578f2a58c3ab38f0643b1b6e7534af860233cb1) Cc: stable@vger.kernel.org
2025-08-04drm/amd: Restore cached manual clock settings during resumeMario Limonciello
If the SCLK limits have been set before S3 they will not be restored. The limits are however cached in the driver and so they can be restored by running a commit sequence during resume. Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250725031222.3015095-3-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e9526924d09057a9ba854305e17eded900ced82) Cc: stable@vger.kernel.org
2025-08-04drm/amd: Restore cached power limit during resumeMario Limonciello
The power limit will be cached in smu->current_power_limit but if the ASIC goes into S3 this value won't be restored. Restore the value during SMU resume. Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250725031222.3015095-2-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 26a609e053a6fc494403e95403bc6a2470383bec) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Update external revid for GC v9.5.0Lijo Lazar
Use different external revid for GC v9.5.0 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 21c6764ed4bfaecad034bc4fd15dd64c5a436325) Cc: stable@vger.kernel.org
2025-08-04drm/amdgpu: Update supported modes for GC v9.5.0Lijo Lazar
For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in NPS2 mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9d1ac25c7f830e0132aa816393b1e9f140e71148) Cc: stable@vger.kernel.org
2025-08-04drm/amd/pm: Make static table support conditionalLijo Lazar
Add PMFW version check for static table support on SMU v13.0.6 VFs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amdgpu: Fix vcn v4.0.3 poison irq call trace on sriov guestXiang Liu
Sriov guest side doesn't init ras feature hence the poison irq shouldn't be put during hw fini. [25209.468816] Call Trace: [25209.468817] <TASK> [25209.468818] ? srso_alias_return_thunk+0x5/0x7f [25209.468820] ? show_trace_log_lvl+0x28e/0x2ea [25209.468822] ? show_trace_log_lvl+0x28e/0x2ea [25209.468825] ? vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu] [25209.468936] ? show_regs.part.0+0x23/0x29 [25209.468939] ? show_regs.cold+0x8/0xd [25209.468940] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.469038] ? __warn+0x8c/0x100 [25209.469040] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.469135] ? report_bug+0xa4/0xd0 [25209.469138] ? handle_bug+0x39/0x90 [25209.469140] ? exc_invalid_op+0x19/0x70 [25209.469142] ? asm_exc_invalid_op+0x1b/0x20 [25209.469146] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.469241] vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu] [25209.469343] amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu] [25209.469511] amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu] Fixes: 4c4a89149608 ("drm/amdgpu: Register aqua vanjaram vcn poison irq") Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amdgpu: Fix jpeg v4.0.3 poison irq call trace on sriov guestXiang Liu
Sriov guest side doesn't init ras feature hence the poison irq shouldn't be put during hw fini. [25209.467154] Call Trace: [25209.467156] <TASK> [25209.467158] ? srso_alias_return_thunk+0x5/0x7f [25209.467162] ? show_trace_log_lvl+0x28e/0x2ea [25209.467166] ? show_trace_log_lvl+0x28e/0x2ea [25209.467171] ? jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu] [25209.467300] ? show_regs.part.0+0x23/0x29 [25209.467303] ? show_regs.cold+0x8/0xd [25209.467304] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.467403] ? __warn+0x8c/0x100 [25209.467407] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.467503] ? report_bug+0xa4/0xd0 [25209.467508] ? handle_bug+0x39/0x90 [25209.467511] ? exc_invalid_op+0x19/0x70 [25209.467513] ? asm_exc_invalid_op+0x1b/0x20 [25209.467518] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.467613] ? amdgpu_irq_put+0x5f/0xc0 [amdgpu] [25209.467709] jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu] [25209.467805] amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu] [25209.467971] amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu] Fixes: 1b2231de4163 ("drm/amdgpu: Register aqua vanjaram jpeg poison irq") Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amdgpu: Add wrapper function for dpc stateLijo Lazar
Use wrapper functions to set/indicate dpc status. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amd/pm: Allow static metrics table query in VFLijo Lazar
Allow statics metrics table to be queried on SMUv13.0.6 SOCs in VF mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amdgpu: Update SDMA firmware version check for user queue supportJesse.Zhang
This commit fixes a firmware version check for enabling user queue support in SDMA v7.0. The previous version check (7836028) was incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL commands causing register conflicts between MCU_DBG0 and MCU_DBG1. Fixes: 8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amd/pm: Use cached metrics data on arcturusLijo Lazar
Cached metrics data validity is 1ms on arcturus. It's not reasonable for any client to query gpu_metrics at a faster rate and constantly interrupt PMFW. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amd/pm: Use cached metrics data on aldebaranLijo Lazar
Cached metrics data validity is 1ms on aldebaran. It's not reasonable for any client to query gpu_metrics at a faster rate and constantly interrupt PMFW. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amdgpu: Add NULL check for asic_funcsLijo Lazar
If driver load fails too early, asic_funcs pointer remains unassigned. Add NULL check to sanitize unwind path. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amd/display: Promote DC to 3.2.344Taimur Hassan
Summary: * Add interface to log hw state when underflow happens * Fix hubp programming of 3dlut fast load * Avoid Read Remote DPCD Many Times * More liberal vmin/vmax update for freesync * Fix dmub access race condition Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amd/display: Adding interface to log hw state when underflow happensMuhammad Ahmed
[why] Will help us better debug underflow issues. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Muhammad Ahmed <Muhammad.Ahmed@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-04drm/amd/display: Toggle for Disable Force Pstate Allow on DisableRyan Seto
[Why & How] In theory, driver should be able to support disabling force pstate allow after hardware release however this behavior is not tested yet. Introducing a new toggle to disable the force on the fly. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>