<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/gpu/drm/xe/xe_survivability_mode.c, branch linux-rolling-stable</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2025-09-04T20:33:51Z</updated>
<entry>
<title>drm/xe/configfs: Don't expose survivability_mode if not applicable</title>
<updated>2025-09-04T20:33:51Z</updated>
<author>
<name>Michal Wajdeczko</name>
<email>michal.wajdeczko@intel.com</email>
</author>
<published>2025-09-02T13:17:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3088f485dea2e98c8acb07495759363c5ae546bd'/>
<id>urn:sha1:3088f485dea2e98c8acb07495759363c5ae546bd</id>
<content type='text'>
The survivability_mode attribute is applicable only for DGFX and
platforms newer than BATTLEMAGE. Use .is_visible() hook to hide
this attribute when above conditions are not met. Remove code that
was trying to fix such configuration during the runtime.

Signed-off-by: Michal Wajdeczko &lt;michal.wajdeczko@intel.com&gt;
Cc: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Cc: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Reviewed-by: Stuart Summers &lt;stuart.summers@intel.com&gt;
Link: https://lore.kernel.org/r/20250902131744.5076-4-michal.wajdeczko@intel.com
</content>
</entry>
<entry>
<title>drm/xe/configfs: Don't touch survivability_mode on fini</title>
<updated>2025-09-04T20:31:26Z</updated>
<author>
<name>Michal Wajdeczko</name>
<email>michal.wajdeczko@intel.com</email>
</author>
<published>2025-09-04T10:35:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=079a5c83dbd23db7a6eed8f558cf75e264d8a17b'/>
<id>urn:sha1:079a5c83dbd23db7a6eed8f558cf75e264d8a17b</id>
<content type='text'>
This is a user controlled configfs attribute, we should not
modify that outside the configfs attr.store() implementation.

Fixes: bc417e54e24b ("drm/xe: Enable configfs support for survivability mode")
Signed-off-by: Michal Wajdeczko &lt;michal.wajdeczko@intel.com&gt;
Cc: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Cc: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Stuart Summers &lt;stuart.summers@intel.com&gt;
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com
</content>
</entry>
<entry>
<title>drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors</title>
<updated>2025-08-26T14:11:34Z</updated>
<author>
<name>Riana Tauro</name>
<email>riana.tauro@intel.com</email>
</author>
<published>2025-08-26T06:34:16Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a7df563b45b03ac8aa480051cce8d49d7b489ec6'/>
<id>urn:sha1:a7df563b45b03ac8aa480051cce8d49d7b489ec6</id>
<content type='text'>
Add support to handle CSC firmware reported errors. When CSC firmware
errors are encoutered, a error interrupt is received by the GFX device as
a MSI interrupt.

Device Source control registers indicates the source of the error as CSC
The HEC error status register indicates that the error is firmware reported
Depending on the type of error, the error cause is written to the HEC
Firmware error register.

On encountering such CSC firmware errors, the graphics device is
non-recoverable from driver context. The only way to recover from these
errors is firmware flash.

System admin/userspace is notified of the necessity of firmware flash
with a combination of vendor-specific drm device edged uevent, dmesg logs
and runtime survivability sysfs. It is the responsiblity of the consumer
to verify all the actions and then trigger a firmware flash using tools
like fwupd.

$ udevadm monitor --property --kernel
monitor will print the received events for:
KERNEL - the kernel uevent

KERNEL[754.709341] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm)
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0
SUBSYSTEM=drm
WEDGED=vendor-specific
DEVNAME=/dev/dri/card0
DEVTYPE=drm_minor
SEQNUM=5973
MAJOR=226
MINOR=0

Logs

xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: Tile0 reported NONFATAL error 0x20000
xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: NONFATAL: HEC Uncorrected FW FD Corruption error reported, bit[2] is set
xe 0000:03:00.0: Runtime Survivability mode enabled
xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
               IOCTLs and executions are blocked. Only a rebind may clear the failure
               Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
xe 0000:03:00.0: [drm] device wedged, needs recovery
xe 0000:03:00.0: Firmware flash required, Please refer to the userspace documentation for more details!

Runtime survivability Sysfs:

/sys/bus/pci/devices/&lt;device&gt;/survivability_mode

v2: use vendor recovery method with
    runtime survivability (Christian, Rodrigo, Raag)
v3: move declare wedged to runtime survivability mode (Rodrigo)
v4: update commit message

Signed-off-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Umesh Nerlige Ramappa &lt;umesh.nerlige.ramappa@intel.com&gt;
Link: https://lore.kernel.org/r/20250826063419.3022216-10-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/doc: Document device wedged and runtime survivability</title>
<updated>2025-08-26T14:11:34Z</updated>
<author>
<name>Riana Tauro</name>
<email>riana.tauro@intel.com</email>
</author>
<published>2025-08-26T06:34:14Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f646c9f9371b28b8f93e619fe003415f6aaeb416'/>
<id>urn:sha1:f646c9f9371b28b8f93e619fe003415f6aaeb416</id>
<content type='text'>
Add documentation for vendor specific device wedged recovery method
and runtime survivability.

v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
v4: use consistent documentation (Raag)
v5: add more documentation

Signed-off-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Reviewed-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Link: https://lore.kernel.org/r/20250826063419.3022216-8-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/xe_survivability: Add support for Runtime survivability mode</title>
<updated>2025-08-26T14:11:34Z</updated>
<author>
<name>Riana Tauro</name>
<email>riana.tauro@intel.com</email>
</author>
<published>2025-08-26T06:34:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a2ca0633a0fef925a0d8125d8f3e4495a5ecb310'/>
<id>urn:sha1:a2ca0633a0fef925a0d8125d8f3e4495a5ecb310</id>
<content type='text'>
Certain runtime firmware errors can cause the device to be in a unusable
state requiring a firmware flash to restore normal operation.
Runtime Survivability Mode indicates firmware flash is necessary by
wedging the device and exposing survivability mode sysfs.

The below sysfs is an indication that device is in survivability mode

/sys/bus/pci/devices/&lt;device&gt;/survivability_mode

v2: Fix kernel-doc (Umesh)
v3: Add user friendly dmesg (Frank)

Signed-off-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Link: https://lore.kernel.org/r/20250826063419.3022216-7-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/xe_survivability: Refactor survivability mode</title>
<updated>2025-08-26T14:11:34Z</updated>
<author>
<name>Riana Tauro</name>
<email>riana.tauro@intel.com</email>
</author>
<published>2025-08-26T06:34:12Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=41ff795aff53613d4184a35bbfae11a15caf07c5'/>
<id>urn:sha1:41ff795aff53613d4184a35bbfae11a15caf07c5</id>
<content type='text'>
Refactor survivability mode code to support both boot
and runtime survivability.

Signed-off-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Link: https://lore.kernel.org/r/20250826063419.3022216-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe/xe_i2c: Add support for i2c in survivability mode</title>
<updated>2025-07-10T14:19:41Z</updated>
<author>
<name>Riana Tauro</name>
<email>riana.tauro@intel.com</email>
</author>
<published>2025-07-01T12:22:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f5c5d29522ecb3b6797130995e74e4774caf4548'/>
<id>urn:sha1:f5c5d29522ecb3b6797130995e74e4774caf4548</id>
<content type='text'>
Initialize i2c in survivability mode to allow firmware
update of Add-In Management Controller (AMC) in
survivability mode.

Signed-off-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Signed-off-by: Heikki Krogerus &lt;heikki.krogerus@linux.intel.com&gt;
Reviewed-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Andi Shyti &lt;andi.shyti@linux.intel.com&gt;
Link: https://lore.kernel.org/r/20250701122252.2590230-6-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Enable configfs support for survivability mode</title>
<updated>2025-04-09T05:24:00Z</updated>
<author>
<name>Riana Tauro</name>
<email>riana.tauro@intel.com</email>
</author>
<published>2025-04-07T05:14:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bc417e54e24bc9c96d3c6eba2c8c60f7919e5afe'/>
<id>urn:sha1:bc417e54e24bc9c96d3c6eba2c8c60f7919e5afe</id>
<content type='text'>
Enable survivability mode if supported and configfs attribute is set.
Enabling survivability mode manually is useful in cases where pcode does
not detect failure, validation and for IFR (in-field-repair).

To set configfs survivability mode attribute for a device

echo 1 &gt; /sys/kernel/config/xe/0000:03:00.0/survivability_mode

The card enters survivability mode if supported

v2: add a log if survivability mode is enabled for unsupported
    platforms (Rodrigo)

Signed-off-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Link: https://lore.kernel.org/r/20250407051414.1651616-4-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Add documentation for survivability mode</title>
<updated>2025-04-09T05:23:59Z</updated>
<author>
<name>Riana Tauro</name>
<email>riana.tauro@intel.com</email>
</author>
<published>2025-04-07T05:14:12Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=77052ab24590cb72598e31de4a7c29f99d51d201'/>
<id>urn:sha1:77052ab24590cb72598e31de4a7c29f99d51d201</id>
<content type='text'>
Add survivability mode document to pcode document as it is enabled
when pcode detects a failure.

v2: fix kernel-doc (Lucas)

Signed-off-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Link: https://lore.kernel.org/r/20250407051414.1651616-3-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
<entry>
<title>drm/xe: Set survivability mode before heci init</title>
<updated>2025-03-21T18:48:22Z</updated>
<author>
<name>Lucas De Marchi</name>
<email>lucas.demarchi@intel.com</email>
</author>
<published>2025-03-14T13:54:26Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=14efa739ca70514e8b923a02b5bcb42511dd1ee8'/>
<id>urn:sha1:14efa739ca70514e8b923a02b5bcb42511dd1ee8</id>
<content type='text'>
Commit d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci")
tried to follow the logic: initialize everything needed and if
everything succeeds, set the flag that it's enabled. While it fixed some
corner cases of those calls failing, it was wrong for setting the flag
after the call to xe_heci_gsc_init(): that function does a different
initialization for survivability mode.

Fix that and add comments about this being done on purpose.

Suggested-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Fixes: d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci")
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-2-fdb3559ea965@intel.com
Signed-off-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
</content>
</entry>
</feed>
