kernel/drivers/gpu/drm/i915/i915_gem_execbuffer.c, branch linux-4.1.y

drm/i915: Always mark the object as dirty when used by the GPU

2015-09-21T17:05:27Z

commit 51bc140431e233284660b1d22c47dec9ecdb521e upstream. There have been many hard to track down bugs whereby userspace forgot to flag a write buffer and then cause graphics corruption or a hung GPU when that buffer was later purged under memory pressure (as the buffer appeared clean, its pages would have been evicted rather than preserved and any changes more recent than in the backing storage would be lost). In retrospect this is a rare optimisation against memory pressure, already the slow path. If we always mark the buffer as dirty when accessed by the GPU, anything not used can still be evicted cheaply (ideal behaviour for mark-and-sweep eviction) but we do not run the risk of corruption. For correct read serialisation, userspace still has to notify when the GPU writes to an object. However, there are certain situations under which userspace may wish to tell white lies to the kernel... Signed-off-by: Chris Wilson Cc: Daniel Vetter Cc: Kristian Høgsberg Cc: Jesse Barnes Cc: "Goel, Akash" Cc: Michał Winiarski Reviewed-by: Daniel Vetter Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman

drm/i915: Skip allocating shadow batch for 0-length batches

2015-03-27T14:28:41Z

Since commit 17cabf571e50677d980e9ab2a43c5f11213003ae Author: Chris Wilson Date: Wed Jan 14 11:20:57 2015 +0000 drm/i915: Trim the command parser allocations we may then try to allocate a zero-sized object and attempt to extract its pages. Understandably this fails. Testcase: igt/gem_exec_nop #ivb,byt,hsw Signed-off-by: Chris Wilson Signed-off-by: Daniel Vetter

drm/i915: Track page table reload need

2015-03-20T10:48:18Z

This patch was formerly known as, "Force pd restore when PDEs change, gen6-7." I had to change the name because it is needed for GEN8 too. The real issue this is trying to solve is when a new object is mapped into the current address space. The GPU does not snoop the new mapping so we must do the gen specific action to reload the page tables. GEN8 and GEN7 do differ in the way they load page tables for the RCS. GEN8 does so with the context restore, while GEN7 requires the proper load commands in the command streamer. Non-render is similar for both. Caveat for GEN7 The docs say you cannot change the PDEs of a currently running context. We never map new PDEs of a running context, and expect them to be present - so I think this is okay. (We can unmap, but this should also be okay since we only unmap unreferenced objects that the GPU shouldn't be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag to signal that even if the context is the same, force a reload. It's unclear exactly what this does, but I have a hunch it's the right thing to do. The logic assumes that we always emit a context switch after mapping new PDEs, and before we submit a batch. This is the case today, and has been the case since the inception of hardware contexts. A note in the comment let's the user know. It's not just for gen8. If the current context has mappings change, we need a context reload to switch v2: Rebased after ppgtt clean up patches. Split the warning for aliasing and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt is always null. v3: Invalidate PPGTT TLBs inside alloc_va_range. v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when neither ctx->ppgtt and aliasing_ppgtt exist. v5: Removed references to teardown_va_range. v6: Updated needs_pd_load_pre/post. v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move comment about updated PDEs to object_pin/bind (Mika). Cc: Mika Kuoppala Signed-off-by: Ben Widawsky Signed-off-by: Michel Thierry (v2+) Reviewed-by: Mika Kuoppala Signed-off-by: Daniel Vetter

drm/i915: Fallback to using CPU relocations for large batch buffers

2015-03-20T10:48:13Z

If the batch buffer is too large to fit into the aperture and we need a GTT mapping for relocations, we currently fail. This only applies to a subset of machines for a subset of environments, quite undesirable. We can simply check after failing to insert the batch into the GTT as to whether we only need a mappable binding for relocation and, if so, we can revert to using a non-mappable binding and an alternate relocation method. However, using relocate_entry_cpu() is excruciatingly slow for large buffers on non-LLC as the entire buffer requires clflushing before and after the relocation handling. Alternatively, we can implement a third relocation method that only clflushes around the relocation entry. This is still slower than updating through the GTT, so we prefer using the GTT where possible, but is orders of magnitude faster as we typically do not have to then clflush the entire buffer. An alternative idea of using a temporary WC mapping of the backing store is promising (it should be faster than using the GTT itself), but requires fairly extensive arch/x86 support - along the lines of kmap_atomic_prof_pfn() (which is not universally implemented even for x86). Testcase: igt/gem_exec_big #pnv,byt Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392 Signed-off-by: Chris Wilson [danvet: Add a WARN_ONCE for the impossible reloc case and explain in a short comment why we want to avoid ping-pong.] Signed-off-by: Daniel Vetter

drm/i915: pass which operation triggered the frontbuffer tracking

2015-03-17T21:29:51Z

We want to port FBC to the frontbuffer tracking infrastructure, but for that we need to know what caused the object invalidation so we can react accordingly: CPU mmaps need manual, GTT mmaps and flips don't need handling and ring rendering needs nukes. v2: - s/ORIGIN_RENDER/ORIGIN_CS/ (Daniel, Rodrigo) - Fix copy/pasted wrong documentation - Rebase v3: - Rebase v4: - Don't pass the operation to flushes (Daniel). Signed-off-by: Paulo Zanoni Reviewed-by: Rodrigo Vivi Signed-off-by: Daniel Vetter

drm/i915: Fix trivial typos in comments and warning message

2015-03-17T21:29:48Z

Change 'mutliple' to 'multiple' Change 'mutlipler' to 'multiplier' Change 'Haswel' to 'Haswell' Signed-off-by: Yannick Guerrini Signed-off-by: Daniel Vetter

drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading

2015-02-25T21:43:29Z

There is a flags word that is passed through the execbuffer code path all the way from initial decoding of the user parameters down to the very final dispatch buffer call. It is simply called 'flags'. Unfortuantely, there are many other flags words floating around in the same blocks of code. Even more once the GPU scheduler arrives. This patch makes it more obvious exactly which flags word is which by renaming 'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word already have an 'I915_DISPATCH_' prefix on them and so are not quite so ambiguous. OTC-Jira: VIZ-1587 Signed-off-by: John Harrison [danvet: Resolve conflict with Chris' rework of the bb parsing.] Signed-off-by: Daniel Vetter

drm/i915: Trim the command parser allocations

2015-02-23T16:07:40Z

Currently, the command parser tries to create a secondary batch exactly as large as the original, and vmap both. This is open to abuse by userspace using extremely large batch objects, but only executing very short batches. For example, this would be if userspace were to implement a command submission ringbuffer. However, we only need to allocate pages for just the contents of the command sequence in the batch - all relocations copied to the secondary batch will reference the original batch and so there can be no access to the secondary batch outside of the explicit execution region. Testcase: igt/gem_exec_big #ivb,byt,hsw Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88308 Signed-off-by: Chris Wilson Reviewed-by: John Harrison Signed-off-by: Daniel Vetter

drm/i915: Specify bsd rings through exec flag

2015-01-27T08:51:05Z

On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical. For example, HEVC GPU commands can be only dispatched to VCS1 ring. But userspace has no control when using VCS1 or VCS2. This patch introduces a mechanism to avoid the default ping-pong mode and use one specific ring through execution flag. This mechanism is usable for all the platforms with 2 VCS rings. The open source usage is from these two commits in vaapi/intel: commit 702050f04131a44ef8ac16651708ce8a8d98e4b8 Author: Zhao, Yakui Date: Mon Nov 17 12:44:19 2014 +0800 Allow the batchbuffer to be submitted with override flag commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8 Author: Zhao Yakui Date: Mon Nov 17 12:44:22 2014 +0800 Add the override flag to assure that HEVC video command always uses BSD ring0 for SKL GT3 machine v2: fix whitespace (Rodrigo) v3: remove incorrect chunk that came on -collector rebase. (Rodrigo) v4: change the comment (Zhipeng) v5: address Daniel's comment (Zhipeng) Signed-off-by: Zhipeng Gong Reviewed-by: Rodrigo Vivi Signed-off-by: Daniel Vetter

Merge tag 'topic/i915-hda-componentized-2015-01-12' into drm-intel-next-queued

2015-01-12T22:07:46Z

Conflicts: drivers/gpu/drm/i915/intel_runtime_pm.c Separate branch so that Takashi can also pull just this refactoring into sound-next. Signed-off-by: Daniel Vetter