<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/gpu/drm/i915/intel_ringbuffer.h, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-02-07T16:13:21Z</updated>
<entry>
<title>drm/i915: Hack and slash, throttle execbuffer hogs</title>
<updated>2019-02-07T16:13:21Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-02-07T07:18:22Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d6f328bfeb0b0a9304166c52ec15c08076ca5246'/>
<id>urn:sha1:d6f328bfeb0b0a9304166c52ec15c08076ca5246</id>
<content type='text'>
Apply backpressure to hogs that emit requests faster than the GPU can
process them by waiting for their ring to be less than half-full before
proceeding with taking the struct_mutex.

This is a gross hack to apply throttling backpressure, the long term
goal is to remove the struct_mutex contention so that each client
naturally waits, preferably in an asynchronous, nonblocking fashion
(pipelined operations for the win), for their own resources and never
blocks another client within the driver at least. (Realtime priority
goals would extend to ensuring that resource contention favours high
priority clients as well.)

This patch only limits excessive request production and does not attempt
to throttle clients that block waiting for eviction (either global GTT or
system memory) or any other global resources, see above for the long term
goal.

No microbenchmarks are harmed (to the best of my knowledge).

Testcase: igt/gem_exec_schedule/pi-ringfull-*
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Cc: Joonas Lahtinen &lt;joonas.lahtinen@linux.intel.com&gt;
Cc: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Reviewed-by: Joonas Lahtinen &lt;joonas.lahtinen@linux.intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915/pmu: Fix enable count array size and bounds checking</title>
<updated>2019-02-06T10:22:47Z</updated>
<author>
<name>Tvrtko Ursulin</name>
<email>tvrtko.ursulin@intel.com</email>
</author>
<published>2019-02-05T13:03:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=26a11deea685b41a43edb513194718aa1f461c9a'/>
<id>urn:sha1:26a11deea685b41a43edb513194718aa1f461c9a</id>
<content type='text'>
Enable count array is supposed to have one counter for each possible
engine sampler. As such, array sizing and bounds checking is not correct
and would blow up the asserts if more samplers were added.

No ill-effect in the current code base but lets fix it for correctness.

At the same time tidy the assert for readability and robustness.

v2:
 * One check per assert. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries")
Reviewed-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130353.21105-1-tvrtko.ursulin@linux.intel.com
</content>
</entry>
<entry>
<title>drm/i915: Allow normal clients to always preempt idle priority clients</title>
<updated>2019-02-04T09:47:28Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-02-04T08:41:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3d7a64b992eab27be75c6cf734be4e87fc88e4b5'/>
<id>urn:sha1:3d7a64b992eab27be75c6cf734be4e87fc88e4b5</id>
<content type='text'>
When first enabling preemption, we hesitated from making it a free-for-all
where every higher priority client would force a preempt-to-idle cycle
and take over from all lower priority clients. We hesitated because we
were uncertain just how well preemption would work in practice, whether
the preemption latency itself would detract from the latency gains for
higher priority tasks and whether it would work at all. Since
introducing preemption, we have been enabling it for more common tasks,
even giving normal clients a small preemptive boost when they first
start (to aide fairness and improve interactivity). Now lets take one
step further and give permission for all normal (priority:0) clients to
preempt any idle (priority:&lt;0) task so that users running long compute
jobs do not overly impact other jobs (i.e. their desktop) and the system
remains responsive under such idle loads.

References: f6322eddaff7 ("drm/i915/preemption: Allow preemption between submission ports")
References: b16c765122f9 ("drm/i915: Priority boost for new clients")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Cc: Michal Wajdeczko &lt;michal.wajdeczko@intel.com&gt;
Cc: Michał Winiarski &lt;michal.winiarski@intel.com&gt;
Cc: "Bloomfield, Jon" &lt;jon.bloomfield@intel.com&gt;
Cc: "Stead, Alan" &lt;alan.stead@intel.com&gt;
Reviewed-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190204084116.3013-1-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915: Drop fake breadcrumb irq</title>
<updated>2019-01-29T21:45:30Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-01-29T20:52:30Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=789659f4307abc1c2356db09eeefefd3f445c12d'/>
<id>urn:sha1:789659f4307abc1c2356db09eeefefd3f445c12d</id>
<content type='text'>
Missed breadcrumb detection is defunct due to the tight coupling with
dma_fence signaling and the myriad ways we may signal fences from
everywhere but from an interrupt, i.e. we frequently signal a fence
before we even see its interrupt. This means that even if we miss an
interrupt for a fence, it still is signaled before our breadcrumb
hangcheck fires, so simplify the breadcrumb hangchecking by moving it
into the GPU hangcheck and forgo fake interrupts.

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-3-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915: Replace global breadcrumbs with per-context interrupt tracking</title>
<updated>2019-01-29T21:45:22Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-01-29T20:52:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=52c0fdb25c7c919334b97976d05096b441a3eada'/>
<id>urn:sha1:52c0fdb25c7c919334b97976d05096b441a3eada</id>
<content type='text'>
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.

To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.

Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)

The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).

Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.

v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.

References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915/execlists: Suppress preempting self</title>
<updated>2019-01-29T20:00:05Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-01-29T18:54:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c9a646228816efeeacb05cc9400464ad8aa90017'/>
<id>urn:sha1:c9a646228816efeeacb05cc9400464ad8aa90017</id>
<content type='text'>
In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.

However, when we avoid preempting ELSP[0], we still retain the preemption
value as it may match a second preemption request within the same time period
that we need to resolve after the next CS event. However, since we only
store the maximum preemption priority seen, it may not match the
subsequent event and so we should double check whether or not we
actually do need to trigger a preempt-to-idle by comparing the top
priorities from each queue. Later, this gives us a hook for finer
control over deciding whether the preempt-to-idle is justified.

The sequence of events where we end up preempting for no avail is:

1. Queue requests/contexts A, B
2. Priority boost A; no preemption as it is executing, but keep hint
3. After CS switch, B is less than hint, force preempt-to-idle
4. Resubmit B after idling

v2: We can simplify a bunch of tests based on the knowledge that PI will
ensure that earlier requests along the same context will have the highest
priority.
v3: Demonstrate the stale preemption hint with a selftest

References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915: Rename execlists-&gt;queue_priority to queue_priority_hint</title>
<updated>2019-01-29T20:00:03Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-01-29T18:54:51Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4d97cbe01980c0d008d125903ef9ff05b6640c2d'/>
<id>urn:sha1:4d97cbe01980c0d008d125903ef9ff05b6640c2d</id>
<content type='text'>
After noticing that we trigger preemption events for currently executing
requests, as well as requests that complete before the preemption and
attempting to suppress those preemption events, it is wise to not
consider the queue_priority to be authoritative. As we only track the
maximum priority seen between dequeue passes, if the maximum priority
request is no longer available for dequeuing (it completed or is even
executing on another engine), we have no knowledge of the previous
queue_priority as it would require us to keep a full history of enqueued
requests -- but we already have that history in the priolists!

Rename the queue_priority to queue_priority_hint so that we do not
confuse it as being exactly the maximum priority in the queue, but merely
an indication that we have seen a new maximum priority value and as such
we should check whether it should preempt the currently running request.

v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it
being only used for the singular task of preemption and not the wider
question of waking up due to a change in the queue.

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915: Identify active requests</title>
<updated>2019-01-29T19:59:59Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-01-29T18:54:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8547444137ec6138ce52fc1938980b737a0d4d9e'/>
<id>urn:sha1:8547444137ec6138ce52fc1938980b737a0d4d9e</id>
<content type='text'>
To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)

There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915: Allocate a status page for each timeline</title>
<updated>2019-01-28T19:07:02Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-01-28T18:18:09Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=52954edd1f7030f753a63093c16826ef50805098'/>
<id>urn:sha1:52954edd1f7030f753a63093c16826ef50805098</id>
<content type='text'>
Allocate a page for use as a status page by a group of timelines, as we
only need a dword of storage for each (rounded up to the cacheline for
safety) we can pack multiple timelines into the same page. Each timeline
will then be able to track its own HW seqno.

v2: Reuse the common per-engine HWSP for the solitary ringbuffer
timeline, so that we do not have to emit (using per-gen specialised
vfuncs) the breadcrumb into the distinct timeline HWSP and instead can
keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the
sleight-of-hand for the global/per-context seqno switchover, we will
store both temporarily (and so use a custom offset for the shared timeline
HWSP until the switch over).

v3: Keep things simple and allocate a page for each timeline, page
sharing comes next.

v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over
again in selftests.

v5: And caught red handed copying create timeline + check.

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Reviewed-by: Tvrtko Ursulin &lt;tvrtko.ursulin@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-3-chris@chris-wilson.co.uk
</content>
</entry>
<entry>
<title>drm/i915: Always allocate an object/vma for the HWSP</title>
<updated>2019-01-28T16:24:19Z</updated>
<author>
<name>Chris Wilson</name>
<email>chris@chris-wilson.co.uk</email>
</author>
<published>2019-01-28T10:23:55Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0ca88ba0d6347cf8c4ea9f264c384594f8fefa11'/>
<id>urn:sha1:0ca88ba0d6347cf8c4ea9f264c384594f8fefa11</id>
<content type='text'>
Currently we only allocate an object and vma if we are using a GGTT
virtual HWSP, and a plain struct page for a physical HWSP. For
convenience later on with global timelines, it will be useful to always
have the status page being tracked by a struct i915_vma. Make it so.

Signed-off-by: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Reviewed-by: Matthew Auld &lt;matthew.auld@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-4-chris@chris-wilson.co.uk
</content>
</entry>
</feed>
