kernel/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c, branch linux-4.17.y

drm/amdgpu: drop compute ring timeout setting for non-sriov only (v2)

2018-04-03T17:52:56Z

Sriov still wants these error messags on timeout. So, for sriov use case, the timeout setting on compute rings is kept. -v2: clean the code Signed-off-by: Evan Quan Reviewed-by: Christian König Reviewed-by: Monk Liu Signed-off-by: Alex Deucher

drm/amdgpu: no job timeout setting on compute queues

2018-03-21T19:36:57Z

Under some heavy computing environment(e.g. dgemm test), it takes the asic over 10+ seconds to finish the dispatched job which will trigger the timeout. It's quite confusing although it does not seem to bring any real problems. As a quick workround, we choose to not enfoce the timeout setting on compute queues. Signed-off-by: Evan Quan Acked-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: rename amdgpu_gpu_recover

2017-12-18T15:59:58Z

add device to the name for consistency. Acked-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Simplify amdgpu_lockup_timeout usage.

2017-12-15T22:15:00Z

With introduction of amdgpu_gpu_recovery we don't need any more to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: Add gpu_recovery parameter

2017-12-15T22:14:50Z

Add new parameter to control GPU recovery procedure. v2: Add auto logic where reset is disabled for bare metal and enabled for SR-IOV. Allow forced reset from debugfs. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: no need with INT for fence polling

2017-12-12T19:50:00Z

We are polling so no need for INT. Signed-off-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm: move amd_gpu_scheduler into common location

2017-12-07T16:51:56Z

This moves and renames the AMDGPU scheduler to a common location in DRM in order to facilitate re-use by other drivers. This is mostly a straight forward rename with no code changes. One notable exception is the function to_drm_sched_fence(), which is no longer a inline header function to avoid the need to export the drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures. Reviewed-by: Chunming Zhou Tested-by: Dieter Nützel Acked-by: Alex Deucher Signed-off-by: Lucas Stach Signed-off-by: Alex Deucher

drm/amdgpu:implement new GPU recover(v3)

2017-12-04T21:41:30Z

1,new imple names amdgpu_gpu_recover which gives more hint on what it does compared with gpu_reset 2,gpu_recover unify bare-metal and SR-IOV, only the asic reset part is implemented differently 3,gpu_recover will increase hang job karma and mark its entity/context as guilty if exceeds limit V2: 4,in scheduler main routine the job from guilty context will be immedialy fake signaled after it poped from queue and its fence be set with "-ECANCELED" error 5,in scheduler recovery routine all jobs from the guilty entity would be dropped 6,in run_job() routine the real IB submission would be skipped if @skip parameter equales true or there was VRAM lost occured. V3: 7,replace deprecated gpu reset, use new gpu recover Signed-off-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher

drm/amdgpu: change redundant init logs to debug level

2017-12-04T21:33:12Z

When this VF stays in exclusive mode for long, other VFs will be impacted. The redundant messages causes exclusive mode timeout when they're redirected. That is a normal use case for cloud service to redirect guest log to virtual serial port. Reviewed-by: Alex Deucher Signed-off-by: pding Signed-off-by: Alex Deucher

drm/amdgpu:add hang_limit for sched(v2)

2017-12-04T21:33:08Z

since gpu_scheduler source domain cannot access amdgpu variable so need create the hang_limit membewr for sched, and it can refer it for the upcoming GPU RESET patches v2: make hang_limit a parameter of sched_init() Signed-off-by: Monk Liu Reviewed-by: Chunming Zhou Signed-off-by: Alex Deucher