summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
AgeCommit message (Collapse)Author
2026-01-14drm/amdgpu: make sure userqs are enabled in userq IOCTLsAlex Deucher
These IOCTLs shouldn't be called when userqs are not enabled. Make sure they are enabled before executing the IOCTLs. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d967509651601cddce7ff2a9f09479f3636f684d) Cc: stable@vger.kernel.org
2025-11-04drm/amdgpu: Implement user queue reset functionalityJesse.Zhang
This patch adds robust reset handling for user queues (userq) to improve recovery from queue failures. The key components include: 1. Queue detection and reset logic: - amdgpu_userq_detect_and_reset_queues() identifies failed queues - Per-IP detect_and_reset callbacks for targeted recovery - Falls back to full GPU reset when needed 2. Reset infrastructure: - Adds userq_reset_work workqueue for async reset handling - Implements pre/post reset handlers for queue state management - Integrates with existing GPU reset framework 3. Error handling improvements: - Enhanced state tracking with HUNG state - Automatic reset triggering on critical failures - VRAM loss handling during recovery 4. Integration points: - Added to device init/reset paths - Called during queue destroy, suspend, and isolation events - Handles both individual queue and full GPU resets The reset functionality works with both gfx/compute and sdma queues, providing better resilience against queue failures while minimizing disruption to unaffected queues. v2: add detection and reset calls when preemption/unmaped fails. add a per device userq counter for each user queue type.(Alex) v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex) warn if the adev->userq_mutex is not held. v4: make sure we have all of the uqm->userq_mutex held. warn if the uqm->userq_mutex is not held. v5: Use array for user queue type counters.(Alex) all of the uqm->userq_mutex need to be held when calling detect and reset. (Alex) v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo) v8: remove atomic_set(&userq_mgr->userq_count[i], 0). it should already be 0 since we kzalloc the structure (Alex) v9: For consistency with kernel queues, We may want something like: amdgpu_userq_is_reset_type_supported (Alex) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28drm/amdgpu: Convert amdgpu userqueue management from IDR to XArrayJesse.Zhang
This commit refactors the AMDGPU userqueue management subsystem to replace IDR (ID Allocation) with XArray for improved performance, scalability, and maintainability. The changes address several issues with the previous IDR implementation and provide better locking semantics. Key changes: 1. **Global XArray Introduction**: - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking - Uses doorbell_index as key for efficient global lookup - Replaces the previous `userq_mgr_list` linked list approach 2. **Per-process XArray Conversion**: - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr` - Maintains per-process queue tracking with queue_id as key - Uses XA_FLAGS_ALLOC for automatic ID allocation 3. **Locking Improvements**: - Removed global `userq_mutex` from `struct amdgpu_device` - Replaced with fine-grained XArray locking using XArray's internal spinlocks 4. **Runtime Idle Check Optimization**: - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty 5. **Queue Management Functions**: - Converted all IDR operations to equivalent XArray functions: - `idr_alloc()` → `xa_alloc()` - `idr_find()` → `xa_load()` - `idr_remove()` → `xa_erase()` - `idr_for_each()` → `xa_for_each()` Benefits: - **Performance**: XArray provides better scalability for large numbers of queues - **Memory Efficiency**: Reduced memory overhead compared to IDR - **Thread Safety**: Improved locking semantics with XArray's internal spinlocks v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa Remove xa_lock and use its own lock. v3: Set queue->userq_mgr = uq_mgr in amdgpu_userq_create() v4: use xa_store_irq (Christian) hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian) Acked-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: validate userq va for GEM unmapPrike Liang
When a user unmaps a userq VA, the driver must ensure the queue has no in-flight jobs. If there is pending work, the kernel should wait for the attached eviction (bookkeeping) fence to signal before deleting the mapping. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu: add userq object va track helpersPrike Liang
Add the userq object virtual address list_add() helpers for tracking the userq obj va address usage. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13drm/amdgpu/userq: extend userq statePrike Liang
Extend the userq state for identifying the userq invalid cases. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-15drm/amdgpu: validate userq buffer virtual address and sizePrike Liang
It needs to validate the userq object virtual address to determine whether it is residented in a valid vm mapping. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu: clean up the amdgpu_userq_active()Prike Liang
This is no invocation for amdgpu_userq_active(). Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05drm/amdgpu/userq: add a detect and reset callbackJesse.Zhang
Add a detect and reset callback and add the implementation for mes. The callback will detect all hung queues of a particular ip type (e.g., GFX or compute or SDMA) and reset them. v2: increase reset counter and set fence force completion v3: Removed userq_mutex in mes_userq_detect_and_reset since the driver holds it when calling Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05drm/amdgpu: Add preempt and restore callbacks to userq funcsJesse.Zhang
Add two new function pointers to struct amdgpu_userq_funcs: - preempt: To handle preemption of user mode queues - restore: To restore preempted user mode queues These callbacks will allow the driver to properly manage queue preemption and restoration when needed, such as during context switching or priority changes. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-04drm/amdgpu: add support of debugfs for mqd informationSunil Khatri
Add debugfs support for mqd for each queue of the client. The address exposed to debugfs could be used to dump the mqd. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250704075548.1549849-5-sunil.khatri@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
2025-05-05drm/amdgpu: add drm_file reference in userq_mgrSunil Khatri
drm_file will be used in usermode queues code to enable better process information in logging and hence add drm_file part of the userq_mgr struct. update the drm_file pointer in userq_mgr for each amdgpu_driver_open_kms. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-22drm/amdgpu/userq: use consistent function namingAlex Deucher
s/userqueue/userq/ 1. remove the mix of amdgpu_userqueue and amdgpu_userq 2. to be consistent with other amdgpu_userq_fence.c 3. it's shorter Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>