<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/accel/amdxdna/aie2_ctx.c, branch linux-rolling-stable</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2026-03-19T15:14:56Z</updated>
<entry>
<title>accel/amdxdna: Fix runtime suspend deadlock when there is pending job</title>
<updated>2026-03-19T15:14:56Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-03-10T18:00:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ac72e7385a2c7533dd766de4197134d96230be85'/>
<id>urn:sha1:ac72e7385a2c7533dd766de4197134d96230be85</id>
<content type='text'>
[ Upstream commit 6b13cb8f48a42ddf6dd98865b673a82e37ff238b ]

The runtime suspend callback drains the running job workqueue before
suspending the device. If a job is still executing and calls
pm_runtime_resume_and_get(), it can deadlock with the runtime suspend
path.

Fix this by moving pm_runtime_resume_and_get() from the job execution
routine to the job submission routine, ensuring the device is resumed
before the job is queued and avoiding the deadlock during runtime
suspend.

Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260310180058.336348-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Fill invalid payload for failed command</title>
<updated>2026-03-12T11:09:45Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-02-27T00:48:41Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=84a4b37eb03699a068205414636d510d3dc1e5e7'/>
<id>urn:sha1:84a4b37eb03699a068205414636d510d3dc1e5e7</id>
<content type='text'>
[ Upstream commit 89ff45359abbf9d8d3c4aa3f5a57ed0be82b5a12 ]

Newer userspace applications may read the payload of a failed command
to obtain detailed error information. However, the driver and old firmware
versions may not support returning advanced error information.
In this case, initialize the command payload with an invalid value so
userspace can detect that no detailed error information is available.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260227004841.3080241-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Fix command hang on suspended hardware context</title>
<updated>2026-03-12T11:09:14Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-02-11T20:53:41Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=105caae0ee7b0395bb88017872a351f1b8c59c00'/>
<id>urn:sha1:105caae0ee7b0395bb88017872a351f1b8c59c00</id>
<content type='text'>
[ Upstream commit 07efce5a6611af6714ea3ef65694e0c8dd7e44f5 ]

When a hardware context is suspended, the job scheduler is stopped. If a
command is submitted while the context is suspended, the job is queued in
the scheduler but aie2_sched_job_run() is never invoked to restart the
hardware context. As a result, the command hangs.

Fix this by modifying the hardware context suspend routine to keep the job
scheduler running so that queued jobs can trigger context restart properly.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260211205341.722982-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Fix dead lock for suspend and resume</title>
<updated>2026-03-12T11:09:14Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-02-11T20:46:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ac24537478dd8eb2fd3984b4652bb19461e5e74c'/>
<id>urn:sha1:ac24537478dd8eb2fd3984b4652bb19461e5e74c</id>
<content type='text'>
[ Upstream commit 1aa82181a3c285c7351523d587f7981ae4c015c8 ]

When an application issues a query IOCTL while auto suspend is running,
a deadlock can occur. The query path holds dev_lock and then calls
pm_runtime_resume_and_get(), which waits for the ongoing suspend to
complete. Meanwhile, the suspend callback attempts to acquire dev_lock
and blocks, resulting in a deadlock.

Fix this by releasing dev_lock before calling pm_runtime_resume_and_get()
and reacquiring it after the call completes. Also acquire dev_lock in the
resume callback to keep the locking consistent.

Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260211204644.722758-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Reduce log noise during process termination</title>
<updated>2026-03-12T11:09:14Z</updated>
<author>
<name>Mario Limonciello</name>
<email>mario.limonciello@amd.com</email>
</author>
<published>2026-02-10T16:42:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c503a8b3de4892caf63b5baea5e38c1af4cf3b9f'/>
<id>urn:sha1:c503a8b3de4892caf63b5baea5e38c1af4cf3b9f</id>
<content type='text'>
[ Upstream commit 57aa3917a3b3bd805a3679371f97a1ceda3c5510 ]

During process termination, several error messages are logged that are
not actual errors but expected conditions when a process is killed or
interrupted. This creates unnecessary noise in the kernel log.

The specific scenarios are:

1. HMM invalidation returns -ERESTARTSYS when the wait is interrupted by
   a signal during process cleanup. This is expected when a process is
   being terminated and should not be logged as an error.

2. Context destruction returns -ENODEV when the firmware or device has
   already stopped, which commonly occurs during cleanup if the device
   was already torn down. This is also an expected condition during
   orderly shutdown.

Downgrade these expected error conditions from error level to debug level
to reduce log noise while still keeping genuine errors visible.

Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup")
Reviewed-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Signed-off-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260210164521.1094274-3-mario.limonciello@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Switch to always use chained command</title>
<updated>2026-03-12T11:09:13Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-02-06T06:02:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f2360a4678cc2fe873517faa49ec0c15e656ba4d'/>
<id>urn:sha1:f2360a4678cc2fe873517faa49ec0c15e656ba4d</id>
<content type='text'>
[ Upstream commit c68a6af400ca80596e8c37de0a1cb564aa9da8a4 ]

Preempt commands are only supported when submitted as chained commands.
To ensure preempt support works consistently, always submit commands in
chained command format.

Set force_cmdlist to true so that single commands are filled using the
chained command layout, enabling correct handling of preempt commands.

Fixes: 3a0ff7b98af4 ("accel/amdxdna: Support preemption requests")
Reviewed-by: Karol Wachowski &lt;karol.wachowski@linux.intel.com&gt;
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260206060251.4050512-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Move RPM resume into job run function</title>
<updated>2026-02-26T23:01:03Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-02-04T17:11:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=044a2da3411aea7fd82d7a5c3b08f98ef569609b'/>
<id>urn:sha1:044a2da3411aea7fd82d7a5c3b08f98ef569609b</id>
<content type='text'>
[ Upstream commit 69674c1c704c0199ca7a3947f3cdcd575973175d ]

Currently, amdxdna_pm_resume_get() is called during job creation, and
amdxdna_pm_suspend_put() is called when the hardware notifies job
completion. If a job is canceled before it is run, no hardware
completion notification is generated, resulting in an unbalanced
runtime PM resume/suspend pair.

Fix this by moving amdxdna_pm_resume_get() to the job run path, ensuring
runtime PM is only resumed for jobs that are actually executed.

Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management")
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260204171118.3165607-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Fix incorrect error code returned for failed chain command</title>
<updated>2026-02-26T23:01:03Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-02-03T18:40:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=90c123a69a8295e2ed361184bf60d7f29ae6318c'/>
<id>urn:sha1:90c123a69a8295e2ed361184bf60d7f29ae6318c</id>
<content type='text'>
[ Upstream commit 750817a7c41de083ca5d73052e97bb7b67d7c394 ]

The driver currently returns an incorrect error code when a chain command
fails. In this case, ERT_CMD_STATE_ERROR is expected to be reported for
failed chain commands.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Reviewed-by: Maciej Falkowski &lt;maciej.falkowski@linux.intel.com&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260203184037.2751889-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Remove hardware context status</title>
<updated>2026-02-26T23:01:03Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2026-02-02T21:24:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a9140306b0734127c2b8dcb721c98276f72f29a9'/>
<id>urn:sha1:a9140306b0734127c2b8dcb721c98276f72f29a9</id>
<content type='text'>
[ Upstream commit b853007fdcdd64b49601a993c2b30c28279ae15d ]

One newly supported command does not require hardware context configuration
to be performed upfront. As a result, checking hardware context status
causes this command to fail incorrectly.

Remove hardware context status handling entirely. For other commands,
if userspace submits a request without configuring the hardware context
first, the firmware will report an error or time out as appropriate.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20260202212450.2681273-1-lizhi.hou@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>accel/amdxdna: Enable temporal sharing only mode</title>
<updated>2026-02-26T23:01:03Z</updated>
<author>
<name>Lizhi Hou</name>
<email>lizhi.hou@amd.com</email>
</author>
<published>2025-12-17T19:11:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=09650dfc3ca47855ee6ab7a1bcdcddb43a6dd38b'/>
<id>urn:sha1:09650dfc3ca47855ee6ab7a1bcdcddb43a6dd38b</id>
<content type='text'>
[ Upstream commit 7818618a09a06320f409571bf28801ccfe7e0a30 ]

Newer firmware versions prefer temporal sharing only mode. In this mode,
the driver no longer needs to manage AIE array column allocation. Instead,
a new field, num_unused_col, is added to the hardware context creation
request to specify how many columns will not be used by this hardware
context.

Reviewed-by: Mario Limonciello (AMD) &lt;superm1@kernel.org&gt;
Signed-off-by: Lizhi Hou &lt;lizhi.hou@amd.com&gt;
Link: https://patch.msgid.link/20251217191150.2145937-1-lizhi.hou@amd.com
Stable-dep-of: b853007fdcdd ("accel/amdxdna: Remove hardware context status")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
