kernel/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c, branch linux-rolling-stable

drm/amdgpu: Show warning message if IH ring overflow

2024-12-18T17:39:07Z

If IH primary ring and KFD ih fifo overflows, we may miss CP, SDMA interrupts and cause application soft hang. Show warning message with ring name if overflow happens. Add function to get ih ring name to avoid duplicating it. To keep warning message consistent between GPU generations, change all *_ih.c except ASICs older than Vega which has only one ih ring. Signed-off-by: Philip Yang Reviewed-by: Christian König Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Queue interrupt work to different CPU

2024-12-18T17:39:07Z

For CPX mode, each KFD node has interrupt worker to process ih_fifo to send events to user space. Currently all interrupt workers of same adev queue to same CPU, all workers execution are actually serialized and this cause KFD ih_fifo overflow when CPU usage is high. Use per-GPU unbounded highpri queue with number of workers equals to number of partitions, let queue_work select the next CPU round robin among the local CPUs of same NUMA. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdgpu: Optimize gfx v9 GPU page fault handling

2024-12-18T17:39:07Z

After GPU page fault, there are lots of page fault interrupts generated at short period even with CAM filter enabled because the fault address is different. Each page fault copy to KFD ih fifo to send event to user space by KFD interrupt worker, this could cause KFD ih fifo overflow while other processes generate events at same time. KFD process is aborted after GPU page fault, we only need one GPU page fault interrupt sent to KFD ih fifo to send memory exception event to user space. Incease KFD ih fifo size to 2 times of IH primary ring size, to handle the burst events case. This patch handle the gfx v9 path, cover retry on/off and CAM filter on/off cases. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: KFD interrupt access ih_fifo data in-place

2024-12-18T17:39:07Z

To handle 40000 to 80000 interrupts per second running CPX mode with 4 streams/queues per KFD node, KFD interrupt handler becomes the performance bottleneck. Remove the kfifo_out memcpy overhead by accessing ih_fifo data in-place and updating rptr with kfifo_skip_count. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Cleanup workqueue during module unload

2024-03-22T19:54:48Z

Destroy the high priority workqueue that handles interrupts during KFD node cleanup. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Introduce kfd_node struct (v5)

2023-06-09T13:42:27Z

Introduce a new structure, kfd_node, which will now represent a compute node. kfd_node is carved out of kfd_dev structure. kfd_dev struct now will become the parent of kfd_node, and will store common resources such as doorbells, GTT sub-alloctor etc. kfd_node struct will store all resources specific to a compute node, such as device queue manager, interrupt handling etc. This is the first step in adding compute partition support in KFD. v2: introduce kfd_node struct to gc v11 (Hawking) v3: make reference to kfd_dev struct through kfd_node (Morris) v4: use kfd_node instead for kfd isr/mqd functions (Morris) v5: rebase (Alex) Signed-off-by: Mukul Joshi Tested-by: Amber Lin Reviewed-by: Felix Kuehling Signed-off-by: Hawking Zhang Signed-off-by: Morris Zhang Signed-off-by: Alex Deucher

drm/amdkfd: use time_is_before_jiffies(a + b) to replace "jiffies - a > b"

2022-07-29T19:17:31Z

time_is_before_jiffies deals with timer wrapping correctly. Signed-off-by: Yu Zhe Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher

drm/amdkfd: Improve concurrency of event handling

2022-04-07T20:34:24Z

Use rcu_read_lock to read p->event_idr concurrently with other readers and writers. Use p->event_mutex only for creating and destroying events and in kfd_wait_on_events. Protect the contents of the kfd_event structure with a per-event spinlock that can be taken inside the rcu_read_lock critical section. This eliminates contention of p->event_mutex in set_event, which tends to be on the critical path for dispatch latency even when busy waiting is used. It also eliminates lock contention in event interrupt handlers. Since the p->event_mutex is now used much less, the impact of requiring it in kfd_wait_on_events should also be much smaller. This should improve event handling latency for processes using multiple GPUs concurrently. v2: Reschedule the worker periodically to avoid soft lockup warnings Signed-off-by: Felix Kuehling Reviewed-by: Sean Keely # v1 Tested-by: Sanjay Tripathi Signed-off-by: Alex Deucher

drm/amdkfd: Use real device for messages

2022-02-23T19:02:50Z

kfd_chardev() doesn't provide much useful information in dev_... messages on multi-GPU systems because there is only one KFD device, which doesn't correspond to any particular GPU. Use the actual GPU device to indicate the GPU that caused a message. Signed-off-by: Felix Kuehling Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdkfd: Drop IH ring overflow message to dbg

2022-02-22T19:40:26Z

When this was first implemented, overflows weren't expected in regular operations, and tests weren't in place to cause said overflow. Now there are cases where overflows occur with real workloads, but we know that the kernel can handle this robustly, so move the message to a debug message. Signed-off-by: Kent Russell Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher