| Age | Commit message (Collapse) | Author |
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.20-2026-02-13:
amdgpu:
- SMU 13.x fixes
- DC resume lag fix
- MPO fixes
- DCN 3.6 fix
- VSDB fixes
- HWSS clean up
- Replay fixes
- DCE cursor fixes
- DCN 3.5 SR DDR5 latency fixes
- HPD fixes
- Error path unwind fixes
- SMU13/14 mode1 reset fixes
- PSP 15 updates
- SMU 15 updates
- RAS fixes
- Sync fix in amdgpu_dma_buf_move_notify()
- HAINAN fix
- PSP 13.x fix
- GPUVM locking fix
amdkfd:
- APU GTT as VRAM fix
radeon:
- HAINAN fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260213220825.1454189-1-alexander.deucher@amd.com
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
- Regresion fix for HDR 4k displays (#15503)
- Fixup for Dell XPS 13 7390 eDP rate limit
- Memory leak fix on ACPI _DSM handling
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patch.msgid.link/aY8CtbhijtetQ6P3@jlahtine-mobl
|
|
Skip creating CCS sysfs files in VF mode to ensure VFs do not
try to change CCS mode, as it is predefined and immutable in
the SR-IOV mode.
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Nareshkumar Gollakoti <naresh.kumar.g@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260202170810.1393147-5-naresh.kumar.g@intel.com
(cherry picked from commit 4e8f602ac3574cf1ebc7acfb6624d06e04b30c91)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Remove the unnecessary VRAM channel entry introduced in xe_hwmon_channel.
Without this, adding any new hwmon channel causes extra VRAM channel
to appear. This remained unnoticed earlier because VRAM was the
final xe hwmon channel.
v2: Use MAX_VRAM_CHANNELS with in_range() instead of
CHANNEL_VRAM_N_MAX. (Raag)
Fixes: 49a498338417 ("drm/xe/hwmon: Expose individual VRAM channel temperature")
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patch.msgid.link/20260206081655.2115439-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 48eb073c7d95883eca2789447f94e1e8cafbabe5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Passing a structure by value into a function is sometimes problematic,
for a number of reasons. Of of these is a warning from the 32-bit arm
compiler:
drivers/gpu/drm/drm_gpusvm.c: In function '__drm_gpusvm_unmap_pages':
drivers/gpu/drm/drm_gpusvm.c:1152:33: note: parameter passing for argument of type 'struct drm_pagemap_addr' changed in GCC 9.1
1152 | dpagemap->ops->device_unmap(dpagemap,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1153 | dev, *addr);
| ~~~~~~~~~~~
This particular problem is harmless since we are not mixing compiler versions
inside of the compiler. However, passing this by reference avoids the warning
along with providing slightly better calling conventions as it avoids an
extra copy on the stack.
Fixes: 75af93b3f5d0 ("drm/pagemap, drm/xe: Support destination migration over interconnect")
Fixes: 2df55d9e66a2 ("drm/xe: Support pcie p2p dma as a fast interconnect")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260216134644.1025365-1-arnd@kernel.org
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
(cherry picked from commit 95162db0208aee122d10ac1342fe97a1721cd258)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
As per uapi documentation[1], the prerequisite for wedged device is to
redirected page faults to a dummy page. Follow it.
[1] Documentation/gpu/drm-uapi.rst
v2: Add uapi reference and fixes tag (Matthew Brost)
Fixes: 7bc00751f877 ("drm/xe: Use device wedged event")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260212055622.2054991-1-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit c020fff70d757612933711dd3cc3751d7d782d3c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
vram_bar_size is registered as an int module parameter and is documented
to accept negative values to disable BAR resizing.
Store it as an int in xe_modparam as well, so negative values work as
intended and the module_param type matches.
Fixes: 80742a1aa26e ("drm/xe: Allow to drop vram resizing")
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260202181853.1095736-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 25c9aa4dcb5ef2ad9f354d19f8f1eeb690d1c161)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When the media GT is not allowed, a VF must not attempt to read
the media version from the GuC. The GuC may not be loaded, and
any attempt to communicate with it would result in a timeout
and a VF probe failure:
(...)
[ 1912.406046] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: GuC mmio request 0x5507: no reply 0x5507
[ 1912.407277] xe 0000:01:00.1: [drm] *ERROR* Tile0: GT1: [GUC COMMUNICATION] MMIO send failed (-ETIMEDOUT)
[ 1912.408689] xe 0000:01:00.1: [drm] *ERROR* VF: Tile0: GT1: Failed to reset GuC state (-ETIMEDOUT)
[ 1912.413986] xe 0000:01:00.1: probe with driver xe failed with error -110
Let's skip reading the media version for VFs when the media GT is not
allowed.
v2: move the condition directly to the VF path
Fixes: 7abd69278bb5 ("drm/xe/configfs: Add attribute to disable GT types")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260202115041.2863357-1-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry picked from commit 0bcacf56dc0b265f9c47056c6a4f0c1394a8a3f0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The PSS_CHICKEN register has been part of the RCS engine's LRC since it
was first introduced in Xe_LP. That means that any workarounds that
adjust its value (such as Wa_14019988906 and Wa_14019877138) need to be
implemented in the lrc_was[] table so that they become part of the
default LRC from which all subsequent LRCs are copied. Although these
workarounds were implemented correctly on most platforms, they were
incorrectly placed on the engine_was[] table for Xe2_HPG.
Move the workarounds to the proper lrc_was[] table and switch the
'xe_rtp_match_first_render_or_compute' rule to specifically match the
RCS since that's the engine whose LRC manages the register.
Bspec: 65182
Fixes: 7f3ee7d88058 ("drm/xe/xe2hpg: Add initial GT workarounds")
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Link: https://patch.msgid.link/20260205220508.51905-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit e04c609eedf4d6748ac0bcada4de1275b034fed6)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
xe_mmio_read64_2x32() was adjusting register addresses and then
calling xe_mmio_read32(), which applies the adjustment again.
This may shift accesses twice if adj_offset < adj_limit. There is
no issue currently, as for media gt, adj_offset > adj_limit, so
the 2nd adjust will be a no-op. But it may not work in future.
To fix it, replace the adjusted-address comparison with a direct
sanity check that ensures the MMIO address adjustment cutoff never
falls within the 8-byte range of a 64-bit register. And let
xe_mmio_read32() handle address translation.
v2: rewrite the sanity check in a more natural way. (Matt)
v3: Add Fixes tag. (Jani)
Fixes: 07431945d8ae ("drm/xe: Avoid 64-bit register reads")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260130165621.471408-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit a30f999681126b128a43137793ac84b6a5b7443f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When user provides a bogus pat_index value through the madvise IOCTL, the
xe_pat_index_get_coh_mode() function performs an array access without
validating bounds. This allows a malicious user to trigger an out-of-bounds
kernel read from the xe->pat.table array.
The vulnerability exists because the validation in madvise_args_are_sane()
directly calls xe_pat_index_get_coh_mode(xe, args->pat_index.val) without
first checking if pat_index is within [0, xe->pat.n_entries).
Although xe_pat_index_get_coh_mode() has a WARN_ON to catch this in debug
builds, it still performs the unsafe array access in production kernels.
v2(Matthew Auld)
- Using array_index_nospec() to mitigate spectre attacks when the value
is used
v3(Matthew Auld)
- Put the declarations at the start of the block
Fixes: ada7486c5668 ("drm/xe: Implement madvise ioctl for xe")
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.18+
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260205161529.1819276-1-jia.yao@intel.com
(cherry picked from commit 944a3329b05510d55c69c2ef455136e2fc02de29)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
On some configs and old compilers we can get following build errors:
../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_mid_bb':
../drivers/gpu/drm/xe/xe_configfs.h:40:76: error: parameter name omitted
static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev, enum xe_engine_class,
^~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_post_bb':
../drivers/gpu/drm/xe/xe_configfs.h:42:77: error: parameter name omitted
static inline u32 xe_configfs_get_ctx_restore_post_bb(struct pci_dev *pdev, enum xe_engine_class,
^~~~~~~~~~~~~~~~~~~~
when trying to define our configfs stub functions. Fix that.
Fixes: 7a4756b2fd04 ("drm/xe/lrc: Allow to add user commands mid context switch")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260203193745.576-1-michal.wajdeczko@intel.com
(cherry picked from commit f59cde8a2452b392115d2af8f1143a94725f4827)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
In case of devm_add_action_or_reset() failure the provided cleanup
action will be run immediately on the not yet initialized kobject.
This may lead to errors like:
[ ] kobject: '(null)' (ff110001393608e0): is not initialized, yet kobject_put() is being called.
[ ] WARNING: lib/kobject.c:734 at kobject_put+0xd9/0x250, CPU#0: kworker/0:0/9
[ ] RIP: 0010:kobject_put+0xdf/0x250
[ ] Call Trace:
[ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe]
[ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe]
[ ] xe_sriov_init_late+0x5f/0x2c0 [xe]
[ ] xe_device_probe+0x5f2/0xc20 [xe]
[ ] xe_pci_probe+0x396/0x610 [xe]
[ ] local_pci_probe+0x47/0xb0
[ ] refcount_t: underflow; use-after-free.
[ ] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x68/0xb0, CPU#0: kworker/0:0/9
[ ] RIP: 0010:refcount_warn_saturate+0x68/0xb0
[ ] Call Trace:
[ ] kobject_put+0x174/0x250
[ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe]
[ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe]
[ ] xe_sriov_init_late+0x5f/0x2c0 [xe]
[ ] xe_device_probe+0x5f2/0xc20 [xe]
[ ] xe_pci_probe+0x396/0x610 [xe]
[ ] local_pci_probe+0x47/0xb0
Fix that by calling kobject_init() and kobject_add() separately
and register cleanup action after the kobject is initialized.
Also make this cleanup registration a part of the create helper to
fix another mistake, as in the loop we were wrongly passing parent
kobject while registering cleanup action, and this resulted in some
undetected leaks.
Fixes: 5c170a4d9c53 ("drm/xe/pf: Prepare sysfs for SR-IOV admin attributes")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://patch.msgid.link/20260203235332.1350-1-michal.wajdeczko@intel.com
(cherry picked from commit 98b16727f07e26a5d4de84d88805ce7ffcfdd324)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add the missing check for a valid slice count during
mode validation when DSC is enabled.
Cc: Vinod Govindapillai <vinod.govindapillai@intel.com>
Fixes: 745395b51c26 ("drm/i915/dp: Add intel_dp_mode_valid_with_dsc()")
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patch.msgid.link/20260216070421.714884-2-imre.deak@intel.com
(cherry picked from commit ec4db429fd38e5c5cbea3521049739fd2718845c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
The function devm_drm_dev_alloc() returns a pointer error upon failure
not NULL. Change null check to pointer error check.
Detected by Smatch:
drivers/gpu/drm/tiny/sharp-memory.c:549 sharp_memory_probe() error:
'smd' dereferencing possible ERR_PTR()
Fixes: b8f9f21716fec ("drm/tiny: Add driver for Sharp Memory LCD")
Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260216040438.43702-1-ethantidmore06@gmail.com
|
|
These WARN_ONs seem to trigger a lot, and we don't seem to have a
plan to fix them, so just drop them, as they are most likely
harmless.
Cc: stable@vger.kernel.org
Fixes: 176fdcbddfd2 ("drm/nouveau/gsp/r535: add support for booting GSP-RM")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20241121014601.229391-1-airlied@gmail.com
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
This partly reverts commit 8f582bcd132c ("drm/hyperv: Remove reference
to hyperv_fb driver") which was messed up by me while trying to fix a
merge conflict.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove hyperv_fb reference as the driver is removed.
Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Fallback to polling to detect hotplug events on systems without
interrupts.
On systems where the interrupt line of the bridge is not connected,
the bridge cannot notify hotplug events. Only add the
DRM_BRIDGE_OP_HPD flag if an interrupt has been registered
otherwise remain in polling mode.
Fixes: 55e8ff842051 ("drm/bridge: ti-sn65dsi86: Add HPD for DisplayPort connector type")
Cc: stable@vger.kernel.org # 6.16: 9133bc3f0564: drm/bridge: ti-sn65dsi86: Add
Signed-off-by: Franz Schnyder <franz.schnyder@toradex.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
[dianders: Adjusted Fixes/stable line based on discussion]
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patch.msgid.link/20260206123758.374555-1-fra.schnyder@gmail.com
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.20-2026-02-06:
amdgpu:
- DML 2.1 fixes
- Panel replay fixes
- Display writeback fixes
- MES 11 old firmware compat fix
- DC CRC improvements
- DPIA fixes
- XGMI fixes
- ASPM fix
- SMU feature bit handling fixes
- DC LUT fixes
- RAS fixes
- Misc memory leak in error path fixes
- SDMA queue reset fixes
- PG handling fixes
- 5 level GPUVM page table fix
- SR-IOV fix
- Queue reset fix
amdkfd:
- Fix possible double deletion of validate list
- Event setup fix
- Device disconnect regression fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260206192706.59396-1-alexander.deucher@amd.com
|
|
In order to be able to use only vma_flags_t in vm_area_desc we must adjust
shmem file setup functions to operate in terms of vma_flags_t rather than
vm_flags_t.
This patch makes this change and updates all callers to use the new
functions.
No functional changes intended.
[akpm@linux-foundation.org: comment fixes, per Baolin]
Link: https://lkml.kernel.org/r/736febd280eb484d79cef5cf55b8a6f79ad832d2.1769097829.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Chris Mason <clm@fb.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The VM was not locked in the past since we initially only cleared the
linked list element and not added it to any VM state.
But this has changed quite some time ago, we just never realized this
problem because the VM state lock was masking it.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
eng_id can be negative and that stream_enc_regs[]
can be indexed out of bounds.
eng_id is used directly as an index into stream_enc_regs[], which has
only 5 entries. When eng_id is 5 (ENGINE_ID_DIGF) or negative, this can
access memory past the end of the array.
Add a bounds check using ARRAY_SIZE() before using eng_id as an index.
The unsigned cast also rejects negative values.
This avoids out-of-bounds access.
Fixes the below smatch error:
dcn*_resource.c: stream_encoder_create() may index
stream_enc_regs[eng_id] out of bounds (size 5).
drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn351/dcn351_resource.c
1246 static struct stream_encoder *dcn35_stream_encoder_create(
1247 enum engine_id eng_id,
1248 struct dc_context *ctx)
1249 {
...
1255
1256 /* Mapping of VPG, AFMT, DME register blocks to DIO block instance */
1257 if (eng_id <= ENGINE_ID_DIGF) {
ENGINE_ID_DIGF is 5. should <= be <?
Unrelated but, ugh, why is Smatch saying that "eng_id" can be negative?
end_id is type signed long, but there are checks in the caller which prevent it from being negative.
1258 vpg_inst = eng_id;
1259 afmt_inst = eng_id;
1260 } else
1261 return NULL;
1262
...
1281
1282 dcn35_dio_stream_encoder_construct(enc1, ctx, ctx->dc_bios,
1283 eng_id, vpg, afmt,
--> 1284 &stream_enc_regs[eng_id],
^^^^^^^^^^^^^^^^^^^^^^^ This stream_enc_regs[] array has 5 elements so we are one element beyond the end of the array.
...
1287 return &enc1->base;
1288 }
v2: use explicit bounds check as suggested by Roman/Dan; avoid unsigned int cast
v3: The compiler already knows how to compare the two values, so the
cast (int) is not needed. (Roman)
Fixes: 2728e9c7c842 ("drm/amd/display: add DC changes for DCN351")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Mario Limonciello <superm1@kernel.org>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Cc: Roman Li <roman.li@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a check in mqd_on_vram. If the device prefers GTT, it returns false
Fixes: d4a814f400d4 ("drm/amdkfd: Move gfx9.4.3 and gfx 9.5 MQD to HBM")
Signed-off-by: Siwei He <siwei.he@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support for psp v13_0_15 ip block
Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Set the family for GC 11.5.4
Fixes: 47ae1f938d12 ("drm/amdgpu: add support for GC IP version 11.5.4")
Cc: Tim Huang <tim.huang@amd.com>
Cc: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Cc: Roman Li <Roman.Li@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch limits the clock speeds of the AMD Radeon R5 M420 GPU from
850/1000MHz (core/memory) to 800/950 MHz, making it work stably. This
patch is for amdgpu.
Signed-off-by: decce6 <decce6@proton.me>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch limits the clock speeds of the AMD Radeon R5 M420 GPU from
850/1000MHz (core/memory) to 800/950 MHz, making it work stably. This
patch is for radeon.
Signed-off-by: decce6 <decce6@proton.me>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move SMU v15_0_0 specific functions to IP file
- smu_v15_0_0_set_default_dpm_tables and
- smu_v15_0_0_update_table
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Invalidating a dmabuf will impact other users of the shared BO.
In the scenario where process A moves the BO, it needs to inform
process B about the move and process B will need to update its
page table.
The commit fixes a synchronisation bug caused by the use of the
ticket: it made amdgpu_vm_handle_moved behave as if updating
the page table immediately was correct but in this case it's not.
An example is the following scenario, with 2 GPUs and glxgears
running on GPU0 and Xorg running on GPU1, on a system where P2P
PCI isn't supported:
glxgears:
export linear buffer from GPU0 and import using GPU1
submit frame rendering to GPU0
submit tiled->linear blit
Xorg:
copy of linear buffer
The sequence of jobs would be:
drm_sched_job_run # GPU0, frame rendering
drm_sched_job_queue # GPU0, blit
drm_sched_job_done # GPU0, frame rendering
drm_sched_job_run # GPU0, blit
move linear buffer for GPU1 access #
amdgpu_dma_buf_move_notify -> update pt # GPU0
It this point the blit job on GPU0 is still running and would
likely produce a page fault.
Cc: stable@vger.kernel.org
Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
These definitions are used by user APIs.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
end the function flow when ras table checksum is error
Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Tune the sleep interval in the PSP fence wait loop from 10-100us to
60-100us.This adjustment results in an overall wait window of 1.2s
(60us * 20000 iterations) to 2 seconds (100us * 20000 iterations),
which guarantees that we can retrieve the correct fence value
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SMU 15_0_0 exports only soft limits for CLKs
Use correct messages
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable GFXOff for GC 11.5.4
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Set the default reset method to mode2 for SMU 15.0.0.
Signed-off-by: Kanala Ramalingeswara Reddy <Kanala.RamalingeswaraReddy@amd.com>
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Not supported on SMU 15_0_0
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
drop set_driver_table_location
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use multi args for get_enabled_mask to fix is_dpm_running
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use smu_v15_0_0_update_table instead of common api
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use multi param based get op for metrics_table
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add update_table for SMU 15_0_0
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Some SMU messages have changed to multi reg read/write
Initialize during smu_early_init
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
TOC and TA both are required
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Send unload command to smu during modprobe -r amdgpu for smu 13/14.
1. This can fix the high voltage/temperatue issue after driver is unloaded.
2. Reloading driver could fail but with the debug port based mode1 reset
during driver is reloaded, it is good and safe.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
use debug port for mode1 reset request so fw can handle mode1 reset
even when it is stuck.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Currently DCE doesn't support the overlay cursor, so the
dm_crtc_get_cursor_mode() function returns DM_CURSOR_NATIVE_MODE
unconditionally. The outcome is that it doesn't check for the
conditions that would necessitate the overlay cursor, meaning
that it doesn't reject cases where the native cursor mode isn't
supported on DCE.
Remove the early return from dm_crtc_get_cursor_mode() for
DCE and instead let it perform the necessary checks and
return DM_CURSOR_OVERLAY_MODE. Add a later check that rejects
when DM_CURSOR_OVERLAY_MODE would be used with DCE.
Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4600
Suggested-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The driver uses strncmp() to compare sysfs attribute strings,
which does not handle trailing newlines and lacks NULL safety.
sysfs_streq() is the recommended function for sysfs string equality
checks in the kernel, providing safer and more correct behavior.
replace strncmp() with sysfs_streq() in drivers/gpu/drm/amd/pm/amdgpu_pm.c
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The address watch clear code receives watch_id as an unsigned value
(u32), but some helper functions were using a signed int and checked
bits by shifting with watch_id.
If a very large watch_id is passed from userspace, it can be converted
to a negative value. This can cause invalid shifts and may access
memory outside the watch_points array.
drm/amdkfd: Fix watch_id bounds checking in debug address watch v2
Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before
using it. Also use BIT(watch_id) to test and clear bits safely.
This keeps the behavior unchanged for valid watch IDs and avoids
undefined behavior for invalid ones.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448
kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow
'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c
433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd,
434 uint32_t watch_id)
435 {
436 int r;
437
438 if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id))
kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if
watch_id is larger than INT_MAX it leads to a buffer overflow.
(Negative shifts are undefined).
439 return -EINVAL;
440
441 if (!pdd->dev->kfd->shared_resources.enable_mes) {
442 r = debug_lock_and_unmap(pdd->dev->dqm);
443 if (r)
444 return r;
445 }
446
447 amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
--> 448 pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch(
449 pdd->dev->adev,
450 watch_id);
v2: (as per, Jonathan Kim)
- Add early watch_id >= MAX_WATCH_ADDRESSES validation in the set path to
match the clear path.
- Drop the redundant bounds check in kfd_dbg_owns_dev_watch_id().
Fixes: e0f85f4690d0 ("drm/amdkfd: add debug set and clear address watch points operation")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Jonathan Kim <jonathan.kim@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_ib_schedule() returns early after calling amdgpu_ring_undo().
This skips the common free_fence cleanup path. Other error paths were
already changed to use goto free_fence, but this one was missed.
Change the early return to goto free_fence so all error paths clean up
the same way.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c:232 amdgpu_ib_schedule()
warn: missing unwind goto?
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
124 int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
125 struct amdgpu_ib *ibs, struct amdgpu_job *job,
126 struct dma_fence **f)
127 {
...
224
225 if (ring->funcs->insert_start)
226 ring->funcs->insert_start(ring);
227
228 if (job) {
229 r = amdgpu_vm_flush(ring, job, need_pipe_sync);
230 if (r) {
231 amdgpu_ring_undo(ring);
--> 232 return r;
The patch changed the other error paths to goto free_fence but
this one was accidentally skipped.
233 }
234 }
235
236 amdgpu_ring_ib_begin(ring);
...
338
339 free_fence:
340 if (!job)
341 kfree(af);
342 return r;
343 }
Fixes: f903b85ed0f1 ("drm/amdgpu: fix possible fence leaks from job structure")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|