kernel/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h, branch linux-rolling-stable

drm/amdgpu: make sure userqs are enabled in userq IOCTLs

2026-01-14T19:57:55Z

These IOCTLs shouldn't be called when userqs are not enabled. Make sure they are enabled before executing the IOCTLs. Reviewed-by: Christian König Signed-off-by: Alex Deucher (cherry picked from commit d967509651601cddce7ff2a9f09479f3636f684d) Cc: stable@vger.kernel.org

drm/amdgpu: Implement user queue reset functionality

2025-11-04T16:53:05Z

This patch adds robust reset handling for user queues (userq) to improve recovery from queue failures. The key components include: 1. Queue detection and reset logic: - amdgpu_userq_detect_and_reset_queues() identifies failed queues - Per-IP detect_and_reset callbacks for targeted recovery - Falls back to full GPU reset when needed 2. Reset infrastructure: - Adds userq_reset_work workqueue for async reset handling - Implements pre/post reset handlers for queue state management - Integrates with existing GPU reset framework 3. Error handling improvements: - Enhanced state tracking with HUNG state - Automatic reset triggering on critical failures - VRAM loss handling during recovery 4. Integration points: - Added to device init/reset paths - Called during queue destroy, suspend, and isolation events - Handles both individual queue and full GPU resets The reset functionality works with both gfx/compute and sdma queues, providing better resilience against queue failures while minimizing disruption to unaffected queues. v2: add detection and reset calls when preemption/unmaped fails. add a per device userq counter for each user queue type.(Alex) v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex) warn if the adev->userq_mutex is not held. v4: make sure we have all of the uqm->userq_mutex held. warn if the uqm->userq_mutex is not held. v5: Use array for user queue type counters.(Alex) all of the uqm->userq_mutex need to be held when calling detect and reset. (Alex) v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo) v8: remove atomic_set(&userq_mgr->userq_count[i], 0). it should already be 0 since we kzalloc the structure (Alex) v9: For consistency with kernel queues, We may want something like: amdgpu_userq_is_reset_type_supported (Alex) Signed-off-by: Jesse Zhang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: Convert amdgpu userqueue management from IDR to XArray

2025-10-28T13:59:22Z

This commit refactors the AMDGPU userqueue management subsystem to replace IDR (ID Allocation) with XArray for improved performance, scalability, and maintainability. The changes address several issues with the previous IDR implementation and provide better locking semantics. Key changes: 1. **Global XArray Introduction**: - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking - Uses doorbell_index as key for efficient global lookup - Replaces the previous `userq_mgr_list` linked list approach 2. **Per-process XArray Conversion**: - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr` - Maintains per-process queue tracking with queue_id as key - Uses XA_FLAGS_ALLOC for automatic ID allocation 3. **Locking Improvements**: - Removed global `userq_mutex` from `struct amdgpu_device` - Replaced with fine-grained XArray locking using XArray's internal spinlocks 4. **Runtime Idle Check Optimization**: - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty 5. **Queue Management Functions**: - Converted all IDR operations to equivalent XArray functions: - `idr_alloc()` → `xa_alloc()` - `idr_find()` → `xa_load()` - `idr_remove()` → `xa_erase()` - `idr_for_each()` → `xa_for_each()` Benefits: - **Performance**: XArray provides better scalability for large numbers of queues - **Memory Efficiency**: Reduced memory overhead compared to IDR - **Thread Safety**: Improved locking semantics with XArray's internal spinlocks v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa Remove xa_lock and use its own lock. v3: Set queue->userq_mgr = uq_mgr in amdgpu_userq_create() v4: use xa_store_irq (Christian) hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian) Acked-by: Alex Deucher Suggested-by: Christian König Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu: validate userq va for GEM unmap

2025-10-13T18:14:34Z

When a user unmaps a userq VA, the driver must ensure the queue has no in-flight jobs. If there is pending work, the kernel should wait for the attached eviction (bookkeeping) fence to signal before deleting the mapping. Suggested-by: Christian König Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: add userq object va track helpers

2025-10-13T18:14:33Z

Add the userq object virtual address list_add() helpers for tracking the userq obj va address usage. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu/userq: extend userq state

2025-10-13T18:14:32Z

Extend the userq state for identifying the userq invalid cases. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: validate userq buffer virtual address and size

2025-09-15T20:52:15Z

It needs to validate the userq object virtual address to determine whether it is residented in a valid vm mapping. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu: clean up the amdgpu_userq_active()

2025-09-09T20:18:18Z

This is no invocation for amdgpu_userq_active(). Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher

drm/amdgpu/userq: add a detect and reset callback

2025-09-05T21:38:39Z

Add a detect and reset callback and add the implementation for mes. The callback will detect all hung queues of a particular ip type (e.g., GFX or compute or SDMA) and reset them. v2: increase reset counter and set fence force completion v3: Removed userq_mutex in mes_userq_detect_and_reset since the driver holds it when calling Reviewed-by: Alex Deucher Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher

drm/amdgpu: Add preempt and restore callbacks to userq funcs

2025-09-05T21:37:50Z

Add two new function pointers to struct amdgpu_userq_funcs: - preempt: To handle preemption of user mode queues - restore: To restore preempted user mode queues These callbacks will allow the driver to properly manage queue preemption and restoration when needed, such as during context switching or priority changes. Reviewed-by: Alex Deucher Signed-off-by: Jesse Zhang Signed-off-by: Alex Deucher