| Age | Commit message (Collapse) | Author |
|
Pull in rc1 to include all changes since the merge window closed,
and grab all fixes and changes from drm/drm-next.
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
Commit 9ac09bb9feac ("mm: consistently use current->mm in
mm_get_unmapped_area()") upstream dropped a parameter from
mm_get_unmapped_area() while commit 99bda20d6d4c ("drm/gem: Introduce
drm_gem_get_unmapped_area() fop") in drm-misc-next added a new user.
Drop the extra parameter from the call.
Fixes: 7f790dd21a93 ("Merge drm/drm-next into drm-misc-next")
Cc: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251215092706.3218018-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Aligning with the parent interface struct definitions, also group and
sort the parent interface wrappers to improve clarity on where to add
new stuff.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/b61af1d33d0448cd904cccccb2714f0d07d85b07.1765548786.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Sort the member initializers to improve clarity.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/0af6654afb2174c472f75710cea328eb443f4b73.1765548786.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Sort the member initializers to improve clarity. Separate individual
function initializers with a blank line in between.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/7f5deefc30703006bc2daa1ce1093a4947f6e049.1765548786.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Let's kickstart the v6.20 (7.0?) release cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Stop open coding pending job list in drivers. Add pending job list
iterator which safely walks DRM scheduler list asserting DRM scheduler
is stopped.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20251209200039.1366764-3-matthew.brost@intel.com
|
|
In the past, drivers used to reach into scheduler internals—this must
end because it makes it difficult to change scheduler internals, as
driver-side code must also be updated.
Add helpers to check if the scheduler is stopped and to query a job’s
signaled state to avoid reaching into scheduler internals. These are
expected to be used driver-side in recovery and debug flows.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patch.msgid.link/20251209200039.1366764-2-matthew.brost@intel.com
|
|
Pull more drm fixes from Dave Airlie:
"These are the enqueued fixes that ended up in our fixes branch,
nouveau mostly, along with some small fixes in other places.
plane:
- Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties()
ttm:
- fix devcoredump for evicted bos
panel:
- Fix stack usage warning in novatek-nt35560
nouveau:
- alloc fwsec sb at boot to avoid s/r problems
- fix strcpy usage
- fix i2c encoder crash
bridge:
- Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83
mgag200:
- Fix bigendian handling in mgag200
tilcdc:
- Fix probe failure in tilcdc"
* tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
drm/mgag200: Fix big-endian support
drm/tilcdc: Fix removal actions in case of failed probe
drm/ttm: Avoid NULL pointer deref for evicted BOs
drm: nouveau: Replace sprintf() with sysfs_emit()
drm/nouveau: fix circular dep oops from vendored i2c encoder
drm/nouveau: refactor deprecated strcpy
drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties()
drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors
drm/nouveau/gsp: Allocate fwsec-sb at boot
drm/panel: novatek-nt35560: avoid on-stack device structure
|
|
Pull drm fixes from Dave Airlie:
"This is the weekly fixes for what is in next tree, mostly amdgpu and
some i915, panthor and a core revert.
core:
- revert dumb bo 8 byte alignment
amdgpu:
- SI fix
- DC reduce stack usage
- HDMI fixes
- VCN 4.0.5 fix
- DP MST fix
- DC memory allocation fix
amdkfd:
- SVM fix
- Trap handler fix
- VGPR fixes for GC 11.5
i915:
- Fix format string truncation warning
- FIx runtime PM reference during fbdev BO creation
panthor:
- fix UAF
renesas:
- fix sync flag handling"
* tag 'drm-next-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel:
Revert "drm/amd/display: Fix pbn to kbps Conversion"
drm/amd: Fix unbind/rebind for VCN 4.0.5
drm/i915: Fix format string truncation warning
drm/i915/fbdev: Hold runtime PM ref during fbdev BO creation
drm/amd/display: Improve HDMI info retrieval
drm/amdkfd: bump minimum vgpr size for gfx1151
drm/amd/display: shrink struct members
drm/amdkfd: Export the cwsr_size and ctl_stack_size to userspace
drm/amd/display: Refactor dml_core_mode_support to reduce stack frame
drm/amdgpu: don't attach the tlb fence for SI
drm/amd/display: Use GFP_ATOMIC in dc_create_plane_state()
drm/amdkfd: Trap handler support for expert scheduling mode
drm/amdkfd: Use huge page size to check split svm range alignment
drm/rcar-du: dsi: Handle both DRM_MODE_FLAG_N.SYNC and !DRM_MODE_FLAG_P.SYNC
drm/gem-shmem: revert the 8-byte alignment constraint
drm/gem-dma: revert the 8-byte alignment constraint
drm/panthor: Prevent potential UAF in group creation
|
|
LNL has been out long enough that all of our internal usage of
pre-production hardware has been phased out and we no longer need to
maintain workarounds that were exclusive to pre-production parts.
Production LNL hardware always has B0 or later steppings for both
graphics and media IP. Eliminate all workarounds that were exclusive to
A-step hardware and set the 'has_prod_wa_only' device flag for LNL to
make sure we warn and taint if someone tries to load the driver on an
old pre-production part.
Bspec: 70821
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20251212181411.294854-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
When we're initially enabling driver support for a new platform/IP, we
usually implement all workarounds documented in the WA database in the
driver. Many of those workarounds are restricted to early steppings
that only showed up in pre-production hardware (i.e., internal test
chips that are not available to the general public). Since the
workarounds for early, pre-production steppings tend to be some of the
ugliest and most complicated workarounds, we generally want to eliminate
them and simplify the code once the platform has launched and our
internal usage of those pre-production parts have been phased out.
Let's add a flag to the device info that tracks which platforms still
have support for pre-production workarounds for so that we can print a
warning and taint if someone tries to load the driver on a
pre-production part for a platform without pre-production workarounds.
This will help our internal users understand the likely problems they'll
encounter if they try to load the driver on an old pre-production
device.
The Xe behavior here is similar to what we've done for many years on
i915 (see intel_detect_preproduction_hw()), except that instead of
manually coding up ranges of device steppings that we believe to be
pre-production hardware, Xe will use the hardware's own production vs
pre-production fusing status, which we can read from the FUSE2 register.
This fuse didn't exist on older Intel hardware, but should be present on
all platforms supported by the Xe driver.
Going forward, let's set the expectation that we'll start looking into
removing pre-production workarounds for a platform around the time that
platforms of the next major IP stepping are having their force_probe
requirement lifted. This timing is just a rough guideline; there may be
cases where some instances of pre-production parts are still being
actively used in CI farms, internal device pools, etc. and we'll need to
wait a bit longer for those to be swapped out.
v2:
- Fix inverted forcewake check
v3:
- Invert flag and add it to the platforms on which we still have
pre-prod workarounds. (Jani, Lucas)
v4:
- Avoid checking pre-production on VF since they don't have access to
the FUSE2 register.
Bspec: 78271, 52544
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251212181411.294854-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Allow for runtime modification to page reclamation feature through
debugfs configuration. This parameter will only take effect if the
platform supports the page reclamation feature by default.
v2:
- Minor comment tweaks. (Shuicheng)
- Convert to kstrtobool_from_user. (Michal)
- Only expose page reclaim file if page reclaim flag
initially supported and with that, remove
xe_match_desc usage. (Michal)
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-22-brian3.nguyen@intel.com
|
|
There are additional hardware managed L2$ flushing such as the
transient display. In those scenarios, page reclamation is
unnecessary resulting in redundant cacheline flushes, so skip
over those corresponding ranges.
v2:
- Elaborated on reasoning for page reclamation skip based on
Tejas's discussion. (Matthew A, Tejas)
v3:
- Removed MEDIA_IS_ON due to racy condition resulting in removal of
relevant registers and values. (Matthew A)
- Moved l3 policy access to xe_pat. (Matthew A)
v4:
- Updated comments based on previous change. (Tejas)
- Move back PAT index macros to xe_pat.c.
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-21-brian3.nguyen@intel.com
|
|
Add page reclamation action to tlb inval backend. The page reclamation
action is paired with range tlb invalidations so both are issued at the
same time.
Page reclamation will issue the TLB invalidation with an invalid seqno
and a H2G page reclamation action with the fence's corresponding seqno
and handle the fence accordingly on page reclaim action done handler.
If page reclamation fails, tlb timeout handler will be responsible for
signalling fence and cleaning up.
v2:
- add send_page_reclaim to patch.
- Remove flush_cache and use prl_sa pointer to determine PPC flush
instead of explicit bool. Add NULL as fallback for others. (Matthew B)
v3:
- Add comments for flush_cache with media.
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-20-brian3.nguyen@intel.com
|
|
Use page reclaim list as indicator if page reclaim action is desired and
pass it to tlb inval fence to handle.
Job will need to maintain its own embedded copy to ensure lifetime of
PRL exist until job has run.
v2:
- Use xe variant of WARN_ON (Michal)
v3:
- Add comments for PRL tile handling and flush behavior with media.
(Matthew Brost)
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-19-brian3.nguyen@intel.com
|
|
Page reclamation feature needs the PRL to be suballocated into a
GGTT-mapped BO. On allocation failure, fallback to default tlb
invalidation with full PPC flush.
PRL's BO allocation is managed in separate pool to ensure 4K alignment
for proper GGTT address.
With BO, pass into TLB invalidation backend and modify fence to
accomadate accordingly.
v2:
- Removed page reclaim related variables from TLB fence. (Matthew B)
- Allocate PRL bo size to num_entries. (Matthew B)
- Move PRL bo allocation to tlb_inval run_job. (Matthew B)
v5:
- Use xe_page_reclaim_list_valid. (Matthew B)
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-18-brian3.nguyen@intel.com
|
|
Page reclaim list (PRL) is preparation work for the page reclaim feature.
The PRL is firstly owned by pt_update_ops and all other page reclaim
operations will point back to this PRL. PRL generates its entries during
the unbind page walker, updating the PRL.
This PRL is restricted to a 4K page, so 512 page entries at most.
v2:
- Removed unused function. (Shuicheng)
- Compacted warning checking, update commit message,
spelling, etc. (Shuicheng, Matthew B)
- Fix kernel docs
- Moved PRL max entries overflow handling out from
generate_reclaim_entry to caller (Shuicheng)
- Add xe_page_reclaim_list_init for clarity. (Matthew B)
- Modify xe_guc_page_reclaim_entry to use macros
for greater flexbility. (Matthew B)
- Add fallback for PTE outside of page reclaim supported
4K, 64K, 2M pages (Matthew B)
- Invalidate PRL for early abort page walk.
- Removed page reclaim related variables from tlb fence
(Matthew Brost)
- Remove error handling in *alloc_entries failure. (Matthew B)
v3:
- Fix NULL pointer dereference check.
- Modify reclaim_entry to QW and bitfields accordingly. (Matthew B)
- Add vm_dbg prints for PRL generation and invalidation. (Matthew B)
v4:
- s/GENMASK/GENMASK_ULL && s/BIT/BIT_ULL (CI)
v5:
- Addition of xe_page_reclaim_list_is_new() to avoid continuous
allocation of PRL if consecutive VMAs cause a PRL invalidation.
- Add xe_page_reclaim_list_valid() helpers for clarity. (Matthew B)
- Move xe_page_reclaim_list_entries_put in
xe_page_reclaim_list_invalidate.
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-17-brian3.nguyen@intel.com
|
|
Add page reclamation related changes to GuC interface, handlers, and
senders to support page reclamation.
Currently TLB invalidations will perform an entire PPC flush in order to
prevent stale memory access for noncoherent system memory. Page
reclamation is an extension of the typical TLB invalidation
workflow, allowing disabling of full PPC flush and enable selective PPC
flushing. Selective flushing will be decided by a list of pages whom's
address is passed to GuC at time of action.
Page reclamation interfaces require at least GuC FW ver 70.31.0.
v2:
- Moved send_page_reclaim to first patch usage.
- Add comments explaining shared done handler. (Matthew B)
- Add FW version fallback to disable page reclaim
on older versions. (Matthew B, Shuicheng)
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-16-brian3.nguyen@intel.com
|
|
Starting from Xe3p, HW adds a feature assisting range based page
reclamation. Introduce a bit in device info to indicate whether
device has such capability.
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-15-brian3.nguyen@intel.com
|
|
Allow tlb_invalidation to control when driver wants to flush the
Private Physical Cache (PPC) as a process of the tlb invalidation
process.
Default behavior is still to always flush the PPC but driver now has the
option to disable it.
v2:
- Revise commit/kernel doc descriptions. (Shuicheng)
- Remove unused function. (Shuicheng)
- Remove bool flush_cache parameter from fence,
and various function inputs. (Matthew B)
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-14-brian3.nguyen@intel.com
|
|
Certain TLB invalidation operations send multiple H2G messages per seqno
with only the final H2G containing the valid seqno - the others carry an
invalid seqno. The G2H handler drops these invalid seqno to aovid
prematurely signaling a TLB invalidation fence.
With TLB_INVALIDATION_SEQNO_INVALID used to indicate in progress
multi-step TLB invalidations, reset tdr to ensure that timeout
won't prematurely trigger when G2H actions are still ongoing.
v2: Remove lock from xe_tlb_inval_reset_timeout. (Matthew B)
v3: Squash with dependent patch from Matthew Brost' series.
Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251212213225.3564537-13-brian3.nguyen@intel.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.19-rc1:
- Fix stack usage warning in novatek-nt35560.
- Fix s/r, i2c issues in nouveau and update string handling.
- Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83.
- Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties().
- Fix devcoredump crash on reading evicted bo's.
- Fix bigendian handling in mgag200.
- Fix probe failure in tilcdc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/6c371dc1-08bf-4a34-895c-9ef348b6061b@linux.intel.com
|
|
During GT reset recovery in do_gt_restart(), xe_uc_start() was called
before xe_reg_sr_apply_mmio() restored engine-specific registers. This
created a race window where the scheduler could run jobs before hardware
state was fully restored.
This caused failures in eudebug tests (xe_exec_sip_eudebug@breakpoint-
waitsip-*) where TD_CTL register (containing TD_CTL_GLOBAL_DEBUG_ENABLE)
wasn't restored before jobs started executing. Breakpoints would fail to
trigger SIP entry because the debug enable bit wasn't set yet.
Fix by moving xe_uc_start() after all MMIO register restoration,
including engine registers and CCS mode configuration, ensuring all
hardware state is fully restored before any jobs can be scheduled.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Jan Maslak <jan.maslak@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251210145618.169625-2-jan.maslak@intel.com
|
|
There are some corner cases where flushing transient
data may take slightly longer than the 150us timeout
we currently allow. Update the driver to use a 300us
timeout instead based on the latest guidance from
the hardware team. An update to the bspec to formally
document this is expected to arrive soon.
Fixes: c01c6066e6fa ("drm/xe/device: implement transient flush")
Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/0201b1d6ec64d3651fcbff1ea21026efa915126a.1765487866.git.jagmeet.randhawa@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Enable I2C controller for Crescent Island and while at it, rely on
has_i2c flag instead of manual platform checks.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20251128084414.306265-1-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Add missing drm_gem_object_put() call when drm_gem_object_lookup()
successfully returns an object. This fixes a GEM object reference
leak that can prevent driver modules from unloading when using
prime buffers.
Fixes: 53096728b891 ("drm: Add DRM prime interface to reassign GEM handle")
Cc: <stable@vger.kernel.org> # v6.18+
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20251212134133.475218-1-karol.wachowski@linux.intel.com
|
|
Move the panic handling to the display parent interface, making display
more independent of i915 and xe driver implementations.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/e27eca5424479e8936b786018d0af19a34f839f6.1765474612.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The intel_panic.c implementation is i915 specific, and xe has its
own. Move it to i915 core as i915_panic.c.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/8dc7af0ae1f859d17b0be269a545146c5536d8fc.1765474612.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Fedora/CentOS/RHEL CI is reporting intermittent failures while running
the drm_validate_modeset test [1]:
# drm_test_check_connector_changed_modeset: EXPECTATION FAILED at
# drivers/gpu/drm/tests/drm_atomic_state_test.c:162
Expected ret == 0, but
ret == -35 (0xffffffffffffffdd)
Change the set_up_atomic_state() helper function to return on error and
restart the atomic sequence when the returned error is EDEADLK.
[1] https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/2106744096/test_x86_64/11762450343/artifacts/jobwatch/logs/recipes/19797909/tasks/204139142/results/945095586/logs/dmesg.log
Fixes: 73d934d7b6e3 ("drm/tests: Add test for drm_atomic_helper_commit_modeset_disables()")
Closes: https://datawarehouse.cki-project.org/issue/4004
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patch.msgid.link/20251104102535.12212-2-jose.exposito89@gmail.com
|
|
Fedora/CentOS/RHEL CI is reporting intermittent failures while running
the drm_test_check_valid_clones() KUnit test.
The error log can be either [1]:
# drm_test_check_valid_clones: ASSERTION FAILED at
# drivers/gpu/drm/tests/drm_atomic_state_test.c:295
Expected ret == param->expected_result, but
ret == -35 (0xffffffffffffffdd)
param->expected_result == 0 (0x0)
Or [2] depending on the test case:
# drm_test_check_valid_clones: ASSERTION FAILED at
# drivers/gpu/drm/tests/drm_atomic_state_test.c:295
Expected ret == param->expected_result, but
ret == -35 (0xffffffffffffffdd)
param->expected_result == -22 (0xffffffffffffffea)
Restart the atomic sequence when EDEADLK is returned.
[1] https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/2113057246/test_x86_64/11802139999/artifacts/jobwatch/logs/recipes/19824965/tasks/204347800/results/946112713/logs/dmesg.log
[2] https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/trusted-artifacts/2106744297/test_aarch64/11762450907/artifacts/jobwatch/logs/recipes/19797942/tasks/204139727/results/945094561/logs/dmesg.log
Fixes: 88849f24e2ab ("drm/tests: Add test for drm_atomic_helper_check_modeset()")
Closes: https://datawarehouse.cki-project.org/issue/4004
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patch.msgid.link/20251104102535.12212-1-jose.exposito89@gmail.com
|
|
EDEADLK
Fedora/CentOS/RHEL CI is reporting intermittent failures while running
the KUnit tests present in drm_hdmi_state_helper_test.c [1].
While the specific test causing the failure change between runs, all of
them are caused by drm_kunit_helper_enable_crtc_connector() returning
-EDEADLK. The error trace always follow this structure:
# <test name>: ASSERTION FAILED at
# drivers/gpu/drm/tests/drm_hdmi_state_helper_test.c:<line>
Expected ret == 0, but
ret == -35 (0xffffffffffffffdd)
As documented, if the drm_kunit_helper_enable_crtc_connector() function
returns -EDEADLK (-35), the entire atomic sequence must be restarted.
Handle this error code for all function calls.
Closes: https://datawarehouse.cki-project.org/issue/4039 [1]
Fixes: 6a5c0ad7e08e ("drm/tests: hdmi_state_helpers: Switch to new helper")
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Link: https://patch.msgid.link/20251104102258.10026-1-jose.exposito89@gmail.com
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 fixes for v6.19-rc1:
- Fix format string truncation warning
- FIx runtime PM reference during fbdev BO creation
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/281309f78560bcceebac8d5c0511efe66baf641c@intel.com
|
|
Add kernel documentation for Multi Queue group GuC interface.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-36-niranjana.vishwanathapura@intel.com
|
|
Add kernel documentation for Multi Queue group and update
the corresponding rst.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-35-niranjana.vishwanathapura@intel.com
|
|
Add support to keep the group active after the primary queue is
destroyed. Instead of killing the primary queue during exec_queue
destroy ioctl, kill it when all the secondary queues of the group
are killed.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-34-niranjana.vishwanathapura@intel.com
|
|
Add xe_exec_queue_create_multi_queue event with
multi-queue information.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-33-niranjana.vishwanathapura@intel.com
|
|
Upon a job timeout, teardown the multi-queue group by
triggering TDR on all queues of the multi-queue group
and by skipping timeout checks in them.
v5: Ban the group while triggering TDR for the guc
reported errors
Add FIXME in TDR to take multi-queue group off HW
(Matt Brost)
v6: Trigger cleanup of group only for multi-queue case
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-32-niranjana.vishwanathapura@intel.com
|
|
If GuC doesn't response to CGP_SYNC message, trigger
GT reset and cleanup of all the queues of the multi
queue group.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-31-niranjana.vishwanathapura@intel.com
|
|
Trigger multi-queue context cleanup upon CGP context error
notification from GuC.
v4: Fix error message
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-30-niranjana.vishwanathapura@intel.com
|
|
To properly support soft light restore between batches
being arbitrated at the CFEG, PIPE_CONTROL instructions
have a new bit in the first DW, QUEUE_DRAIN_MODE. When
set, this indicates to the CFEG that it should only
drain the current queue.
Additionally we no longer want to set the CS_STALL bit
for these multi queue queues as this causes the entire
pipeline to stall waiting for completion of the prior
batch, preventing this soft light restore from occurring
between queues in a queue group.
v4: Assert !multi_queue where applicable (Matt Roper)
Bspec: 56551
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-29-niranjana.vishwanathapura@intel.com
|
|
As all queues of a multi queue group use the primary queue of the group
to interface with GuC. Hence there is a dependency between the queues of
the group. So, when primary queue of a multi queue group is cleaned up,
also trigger a cleanup of the secondary queues also. During cleanup, stop
and re-start submission for all queues of a multi queue group to avoid
any submission happening in parallel when a queue is being cleaned up.
v2: Initialize group->list_lock, add fs_reclaim dependency, remove
unwanted secondary queues cleanup (Matt Brost)
v3: Properly handle cleanup of multi-queue group (Matt Brost)
v4: Fix IS_ENABLED(CONFIG_LOCKDEP) check (Matt Brost)
Revert stopping/restarting of submissions on queues of the
group in TDR as it is not needed.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-28-niranjana.vishwanathapura@intel.com
|
|
Dump multi queue specific information in the guc exec queue
dump.
v2: Move multi queue related fields inside the multi_queue
sub-structure (Matt Brost)
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-27-niranjana.vishwanathapura@intel.com
|
|
Support dynamic priority change for multi queue group queues via
exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
the drm scheduler message interface for priority to take effect.
v2: Move is_multi_queue check to exec_queue layer and assert
is_multi_queue being set in guc submission layer (Matt Brost)
v3: Assert CGP_SYNC message length is valid (Matt Brost)
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-26-niranjana.vishwanathapura@intel.com
|
|
This patch adds support for exec_queue set_property ioctl.
It is derived from the original work which is part of
https://patchwork.freedesktop.org/series/112188/
Currently only DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY
property can be dynamically set.
v2: Check for and update kernel-doc which property this ioctl
supports (Matt Brost)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-25-niranjana.vishwanathapura@intel.com
|
|
Only MULTI_QUEUE_PRIORITY property is valid for secondary queues of a
multi queue group. MULTI_QUEUE_PRIORITY only applies to multi queue group
queues. Detect invalid user queue property setting and return error.
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-24-niranjana.vishwanathapura@intel.com
|
|
Add support for queues of a multi queue group to set
their priority within the queue group by adding property
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY.
This is the only other property supported by secondary
queues of a multi queue group, other than
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE.
v2: Add kernel doc for enum xe_multi_queue_priority,
Add assert for priority values, fix includes and
declarations (Matt Brost)
v3: update uapi kernel-doc (Matt Brost)
v4: uapi change due to rebase
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-23-niranjana.vishwanathapura@intel.com
|
|
Implement GuC commands and response along with the Context
Group Page (CGP) interface for multi queue support.
Ensure that only primary queue (q0) of a multi queue group
communicate with GuC. The secondary queues of the group only
need to maintain LRCA and interface with drm scheduler.
Use primary queue's submit_wq for all secondary queues of a multi
queue group. This serialization avoids any locking around CGP
synchronization with GuC.
v2: Fix G2H_LEN_DW_MULTI_QUEUE_CONTEXT value, add more comments
(Matt Brost)
v3: Minor code refactro, use xe_gt_assert
v4: Use xe_guc_ct_wake_waiters(), remove vf recovery support
(Matt Brost)
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-22-niranjana.vishwanathapura@intel.com
|
|
Multi Queue is a new mode of execution supported by the compute and
blitter copy command streamers (CCS and BCS, respectively). It is an
enhancement of the existing hardware architecture and leverages the
same submission model. It enables support for efficient, parallel
execution of multiple queues within a single context. All the queues
of a group must use the same address space (VM).
The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
property supports creating a multi queue group and adding queues to
a queue group. All queues of a multi queue group share the same
context.
A exec queue create ioctl call with above property specified with value
DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
queue being created as the primary queue (aka q0) of the group. To add
secondary queues to the group, they need to be created with the above
property with id of the primary queue as the value. The properties of
the primary queue (like priority, timeslice) applies to the whole group.
So, these properties can't be set for secondary queues of a group.
Once destroyed, the secondary queues of a multi queue group can't be
replaced. However, they can be dynamically added to the group up to a
total of 64 queues per group. Once the primary queue is destroyed,
secondary queues can't be added to the queue group.
v2: Remove group->lock, fix xe_exec_queue_group_add()/delete()
function semantics, add additional comments, remove unused
group->list_lock, add XE_BO_FLAG_GGTT_INVALIDATE for cgp bo,
Assert LRC is valid, update uapi kernel doc.
(Matt Brost)
v3: Use XE_BO_FLAG_PINNED_LATE_RESTORE/USER_VRAM/GGTT_INVALIDATE
flags for cgp bo (Matt)
v4: Ensure queue is not a vm_bind queue
uapi change due to rebase
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-21-niranjana.vishwanathapura@intel.com
|
|
Add multi_queue_enable_mask field to the gt information structure
which is bitmask of all engine classes with multi queue support
enabled.
v2: Rename multi_queue_enable_mask to multi_queue_engine_class_mask
(Matt Brost)
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-20-niranjana.vishwanathapura@intel.com
|