| Age | Commit message (Collapse) | Author |
|
size to GPU page size
The control stack size is calculated based on the number of CUs and
waves, and is then aligned to PAGE_SIZE. When the resulting control
stack size is aligned to 64 KB, GPU hangs and queue preemption
failures are observed while running RCCL unit tests on systems with
more than two GPUs.
amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4
amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues
This issue is observed on both 4 KB and 64 KB system page-size
configurations.
This patch fixes the issue by aligning the control stack size to
AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size
will not be 64 KB on systems with a 64 KB page size and queue
preemption works correctly.
Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE,
which can waste memory if the system page size is large. In this patch,
wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated
from wg_data_size and the control stack size, is aligned to PAGE_SIZE.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a3e14436304392fbada359edd0f1d1659850c9b7)
|
|
The AQL queue size can be 4K, but the minimum buffer object (BO)
allocation size is PAGE_SIZE. On systems with a page size larger
than 4K, the expected queue size does not match the allocated BO
size, causing queue creation to fail.
Align the expected queue size to PAGE_SIZE so that it matches the
allocated BO size and allows queue creation to succeed.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b01cd158a2f5230b137396c5f8cda3fc780abbc2)
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
HW-supported EOP buffer sizes are 4K and 32K. On systems that do not
use 4K pages, the minimum buffer object (BO) allocation size is
PAGE_SIZE (for example, 64K). During queue buffer acquisition, the driver
currently checks the allocated BO size against the supported EOP buffer
size. Since the allocated BO is larger than the expected size, this check
fails, preventing queue creation.
Relax the strict size validation and allow PAGE_SIZE-sized BOs to be used.
Only the required 4K region of the buffer will be used as the EOP buffer
and avoids queue creation failures on non-4K page systems.
Acked-by: Christian König <christian.koenig@amd.com>
Suggested-by: Philip Yang <yangp@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support for separate ring-buffer for metadata packets when using
compute queues. Userspace application allocate the metadata ring-buffer
and the queue ring-buffer with a single allocation. The metadata
ring-buffer starts after the queue ring-buffer.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update the SGPR, VGPR, HWREG size and number of waves supported
for GFX 12.1 CWSR memory limits. The CU calculation changed in
topology, as a result, the values need to be updated.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
GFX1151 has 1.5x the number of available physical VGPRs per SIMD.
Bump total memory availability for acquire checks on queue creation.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Over allocation of save area is not fatal, only under allocation is.
ROCm has various components that independently claim authority over save
area size.
Unless KFD decides to claim single authority, relax size checks.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove workaround code for the early engineering
samples GC v9.4.3 SOCs with revID 0
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Through KFD IOCTL Fuzzing we encountered a NULL pointer derefrence
when calling kfd_queue_acquire_buffers.
Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size")
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Andrew Martin <Andrew.Martin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To workaround queue full h/w issue on Gfx7/8, when application create
AQL queue, the ring buffer bo allocate size is queue_size/2 and
map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each
attachment map size is queue_size/2, with same ring_bo backing memory.
For Gfx7/8, user queue buffer validation should use queue_size/2 to
verify ring_bo allocation and mapping size.
Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers")
Suggested-by: Tomáš Trnka <trnka@scm.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update cwsr area size for gfx950 to fit the new user queue buffer validation.
The size of LDS calculation is referred from gfx950 thunk implementation.
Signed-off-by: Le Ma <le.ma@amd.com>
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add helper function kfd_queue_unreference_buffers to reduce queue buffer
refcount, separate it from release queue buffers.
Because it is circular locking to hold dqm_lock to take vm lock,
kfd_ioctl_destroy_queue should take vm lock, unreference queue buffers
first, but not release queue buffers, to handle error in case failed to
hold vm lock. Then hold dqm_lock to remove queue from queue list and
then release queue buffers.
Restore process worker restore queue hold dqm_lock, will always find
the queue with valid queue buffers.
v2 (Felix):
- renamed kfd_queue_unreference_buffer(s) to kfd_queue_unref_bo_va(s)
- added two FIXME comments for follow up
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes the below if kernel config not enable HMM support
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:26: error:
implicit declaration of function 'svm_range_from_addr'
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:107:24: error:
assignment to 'struct svm_range *' from 'int' makes pointer from integer
without a cast [-Wint-conversion]
>> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:111:28: error:
invalid use of undefined type 'struct svm_range'
Fixes: b049504e211e ("drm/amdkfd: Validate user queue svm memory residency")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407252127.zvnxaKRA-lkp@intel.com/
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The fix involves setting 'err' to '-EINVAL' before each 'goto
out_err_unreserve'.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c:265 kfd_queue_acquire_buffers()
warn: missing error code 'err'
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_queue.c
226 int kfd_queue_acquire_buffers(struct kfd_process_device *pdd, struct queue_properties *properties)
227 {
228 struct kfd_topology_device *topo_dev;
229 struct amdgpu_vm *vm;
230 u32 total_cwsr_size;
231 int err;
232
233 topo_dev = kfd_topology_device_by_id(pdd->dev->id);
234 if (!topo_dev)
235 return -EINVAL;
236
237 vm = drm_priv_to_vm(pdd->drm_priv);
238 err = amdgpu_bo_reserve(vm->root.bo, false);
239 if (err)
240 return err;
241
242 err = kfd_queue_buffer_get(vm, properties->write_ptr, &properties->wptr_bo, PAGE_SIZE);
243 if (err)
244 goto out_err_unreserve;
245
246 err = kfd_queue_buffer_get(vm, properties->read_ptr, &properties->rptr_bo, PAGE_SIZE);
247 if (err)
248 goto out_err_unreserve;
249
250 err = kfd_queue_buffer_get(vm, (void *)properties->queue_address,
251 &properties->ring_bo, properties->queue_size);
252 if (err)
253 goto out_err_unreserve;
254
255 /* only compute queue requires EOP buffer and CWSR area */
256 if (properties->type != KFD_QUEUE_TYPE_COMPUTE)
257 goto out_unreserve;
This is clearly a success path.
258
259 /* EOP buffer is not required for all ASICs */
260 if (properties->eop_ring_buffer_address) {
261 if (properties->eop_ring_buffer_size != topo_dev->node_props.eop_buffer_size) {
262 pr_debug("queue eop bo size 0x%lx not equal to node eop buf size 0x%x\n",
263 properties->eop_buf_bo->tbo.base.size,
264 topo_dev->node_props.eop_buffer_size);
--> 265 goto out_err_unreserve;
This has err in the label name. err = -EINVAL?
266 }
267 err = kfd_queue_buffer_get(vm, (void *)properties->eop_ring_buffer_address,
268 &properties->eop_buf_bo,
269 properties->eop_ring_buffer_size);
270 if (err)
271 goto out_err_unreserve;
272 }
273
274 if (properties->ctl_stack_size != topo_dev->node_props.ctl_stack_size) {
275 pr_debug("queue ctl stack size 0x%x not equal to node ctl stack size 0x%x\n",
276 properties->ctl_stack_size,
277 topo_dev->node_props.ctl_stack_size);
278 goto out_err_unreserve;
err?
279 }
280
281 if (properties->ctx_save_restore_area_size != topo_dev->node_props.cwsr_size) {
282 pr_debug("queue cwsr size 0x%x not equal to node cwsr size 0x%x\n",
283 properties->ctx_save_restore_area_size,
284 topo_dev->node_props.cwsr_size);
285 goto out_err_unreserve;
err? Not sure.
286 }
287
288 total_cwsr_size = (topo_dev->node_props.cwsr_size + topo_dev->node_props.debug_memory_size)
289 * NUM_XCC(pdd->dev->xcc_mask);
290 total_cwsr_size = ALIGN(total_cwsr_size, PAGE_SIZE);
291
292 err = kfd_queue_buffer_get(vm, (void *)properties->ctx_save_restore_area_address,
293 &properties->cwsr_bo, total_cwsr_size);
294 if (!err)
295 goto out_unreserve;
296
297 amdgpu_bo_unreserve(vm->root.bo);
298
299 err = kfd_queue_buffer_svm_get(pdd, properties->ctx_save_restore_area_address,
300 total_cwsr_size);
301 if (err)
302 goto out_err_release;
303
304 return 0;
305
306 out_unreserve:
307 amdgpu_bo_unreserve(vm->root.bo);
308 return 0;
309
310 out_err_unreserve:
311 amdgpu_bo_unreserve(vm->root.bo);
312 out_err_release:
313 kfd_queue_release_buffers(pdd, properties);
314 return err;
315 }
Fixes: 629568d25fea ("drm/amdkfd: Validate queue cwsr area and eop buffer size")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When creating KFD user compute queue, check if queue eop buffer size,
cwsr area size, ctl stack size equal to the size of KFD node
properities.
Check the entire cwsr area which may split into multiple svm ranges
aligned to granularity boundary.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the queue eop buffer size, cwsr area size, ctl stack size
calculation from Thunk, store the value to KFD node properties.
Those will be used to validate queue eop buffer size, cwsr area size,
ctl stack size when creating KFD user compute queue.
Those will be exposed to user space via sysfs KFD node properties, to
remove the duplicate calculation code from Thunk.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Queue CWSR area maybe registered to GPU as svm memory, create queue to
ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag.
Add queue_refcount to struct svm_range, to track queue CWSR area usage.
Because unmap mmu notifier callback return value is ignored, if
application unmap the CWSR area while queue is active, pr_warn message
in dmesg log. To be safe, evict user queue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap
BO from the GPU if the bo_va queue_refcount is not zero.
Create queue to increase the bo_va queue_refcount, destroy queue to
decrease the bo_va queue_refcount, to ensure the queue buffers mapped on
the GPU when queue is active.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Find user queue rptr, ring buf, eop buffer and cwsr area BOs, and
check BOs are mapped on the GPU with correct size and take the BO
reference.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo
reference from queue write_ptr if it is mapped to the KFD node with
expected size.
Add wptr_bo to structure queue_properties because structure queue is
allocated after queue buffers are validated, then we can remove wptr_bo
parameter from pqm_create_queue.
Rename structure queue wptr_bo_gart to hold wptr_bo reference for GART
mapping and umapping. Move MES wptr_bo_gart mapping to init_user_queue,
the same location with queue ctx_bo GART mapping.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Remove BUG_ONs that check for NULL pointer arguments that are
dereferenced in the same function. Dereferencing the NULL pointer
will generate a BUG anyway, so the explicit check is redundant and
unnecessary overhead.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
See https://kernel.org/doc/html/latest/process/coding-style.html
under "14) Allocating Memory" for rationale behind removing the
x=alloc(sizeof(struct) style and using x=alloc(sizeof(*x) instead
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Allow init_queue() to take 'struct queue_properties' by reference.
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
The queue module enables allocating and initializing queues uniformly.
v3: Removed typedef and redundant memset call. Broke long pr_debug print to one
liners and Added documentation.
v5: Move amdkfd from drm/radeon/ to drm/amd/
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
|