kernel/drivers/dma-buf/heaps/system_heap.c, branch linux-rolling-stable

dma-buf: system_heap: use larger contiguous mappings instead of per-page mmap

2025-11-21T05:18:00Z

We can allocate high-order pages, but mapping them one by one is inefficient. This patch changes the code to map as large a chunk as possible. The code looks somewhat complicated mainly because supporting mmap with a non-zero offset is a bit tricky. Using the micro-benchmark below, we see that mmap becomes 35X faster: #include #include #include #include #include #include #include #include #define SIZE (512UL * 1024 * 1024) #define PAGE 4096 #define STRIDE (PAGE/sizeof(int)) #define PAGES (SIZE/PAGE) int main(void) { int heap = open("/dev/dma_heap/system", O_RDONLY); struct dma_heap_allocation_data d = { .len = SIZE, .fd_flags = O_RDWR|O_CLOEXEC }; ioctl(heap, DMA_HEAP_IOCTL_ALLOC, &d); struct timespec t0, t1; clock_gettime(CLOCK_MONOTONIC, &t0); int *p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, d.fd, 0); clock_gettime(CLOCK_MONOTONIC, &t1); for (int i = 0; i < PAGES; i++) p[i*STRIDE] = i; for (int i = 0; i < PAGES; i++) if (p[i*STRIDE] != i) { fprintf(stderr, "mismatch at page %d\n", i); exit(1); } long ns = (t1.tv_sec-t0.tv_sec)*1000000000L + (t1.tv_nsec-t0.tv_nsec); printf("mmap 512MB took %.3f us, verify OK\n", ns/1000.0); return 0; } W/ patch: ~ # ./a.out mmap 512MB took 200266.000 us, verify OK ~ # ./a.out mmap 512MB took 198151.000 us, verify OK ~ # ./a.out mmap 512MB took 197069.000 us, verify OK ~ # ./a.out mmap 512MB took 196781.000 us, verify OK ~ # ./a.out mmap 512MB took 198102.000 us, verify OK ~ # ./a.out mmap 512MB took 195552.000 us, verify OK W/o patch: ~ # ./a.out mmap 512MB took 6987470.000 us, verify OK ~ # ./a.out mmap 512MB took 6970739.000 us, verify OK ~ # ./a.out mmap 512MB took 6984383.000 us, verify OK ~ # ./a.out mmap 512MB took 6971311.000 us, verify OK ~ # ./a.out mmap 512MB took 6991680.000 us, verify OK Signed-off-by: Barry Song Acked-by: John Stultz Reviewed-by: Maxime Ripard Signed-off-by: Sumit Semwal [sumits: correct from 3.5x to 35x] Link: https://patch.msgid.link/20251021042022.47919-1-21cnbao@gmail.com

dma-buf: system_heap: No separate allocation for attachment sg_tables

2025-07-09T10:21:39Z

struct dma_heap_attachment is a separate allocation from the struct sg_table it contains, but there is no reason for this. Let's use the slab allocator just once instead of twice for dma_heap_attachment. Signed-off-by: T.J. Mercier Reviewed-by: Christian König Signed-off-by: Sumit Semwal Link: https://lore.kernel.org/r/20250417180943.1559755-1-tjmercier@google.com

dma-buf: heaps: system: Remove global variable

2025-04-08T09:48:39Z

The system heap is storing its struct dma_heap pointer in a global variable but isn't using it anywhere. Let's move the global variable into system_heap_create() to make it local. Signed-off-by: Maxime Ripard Reviewed-by: Christian König Reviewed-by: Mattijs Korpershoek Link: https://patchwork.freedesktop.org/patch/msgid/20250407-dma-buf-ecc-heap-v3-1-97cdd36a5f29@kernel.org Signed-off-by: Christian König

dma-buf: heaps: Add __init to CMA and system heap module_init functions

2024-09-09T10:21:54Z

Shrink the kernel .text a bit after successful initialization of the heaps. Signed-off-by: T.J. Mercier Acked-by: John Stultz Signed-off-by: Sumit Semwal Link: https://patchwork.freedesktop.org/patch/msgid/20240906000314.2368749-1-tjmercier@google.com

dma-buf/heaps: Correct the types of fd_flags and heap_flags

2024-06-19T14:35:34Z

dma_heap_allocation_data defines the UAPI as follows: struct dma_heap_allocation_data { __u64 len; __u32 fd; __u32 fd_flags; __u64 heap_flags; }; But dma heaps are casting both fd_flags and heap_flags into unsigned long. This patch makes dma heaps - cma heap and system heap have consistent types with UAPI. Signed-off-by: Barry Song Acked-by: John Stultz Reviewed-by: Carlos Llamas Signed-off-by: Sumit Semwal Link: https://patchwork.freedesktop.org/patch/msgid/20240606020213.49854-1-21cnbao@gmail.com

dma-buf/heaps: Don't assert held reservation lock for dma-buf mmapping

2023-06-21T17:22:20Z

Don't assert held dma-buf reservation lock on memory mapping of exported buffer. We're going to change dma-buf mmap() locking policy such that exporters will have to handle the lock. The previous locking policy caused deadlock problem for DRM drivers in a case of self-imported dma-bufs once these drivers are moved to use reservation lock universally. The problem solved by moving the lock down to exporters. This patch prepares dma-buf heaps for the locking policy update. Reviewed-by: Emil Velikov Signed-off-by: Dmitry Osipenko Link: https://patchwork.freedesktop.org/patch/msgid/20230529223935.2672495-3-dmitry.osipenko@collabora.com

Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2023-04-28T02:42:02Z

Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...

dma-buf: heaps: remove MODULE_LICENSE in non-modules

2023-04-13T20:13:52Z

Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock Suggested-by: Luis Chamberlain Cc: Luis Chamberlain Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa Cc: Sumit Semwal Cc: "Christian König" Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Luis Chamberlain

dma-buf: system_heap: avoid reclaim for order 4

2023-03-28T23:20:12Z

Using order 4 pages would be helpful for IOMMUs mapping, but trying to get order 4 pages could spend quite much time in the page allocation. From the perspective of responsiveness, the deterministic memory allocation speed, I think, is quite important. The order 4 allocation with __GFP_RECLAIM may spend much time in reclaim and compation logic. __GFP_NORETRY also may affect. These cause unpredictable delay. To get reasonable allocation speed from dma-buf system heap, use HIGH_ORDER_GFP for order 4 to avoid reclaim. And let me remove meaningless __GFP_COMP for order 0. According to my tests, order 4 with MID_ORDER_GFP could get more number of order 4 pages but the elapsed times could be very slow. time order 8 order 4 order 0 584 usec 0 160 0 28,428 usec 0 160 0 100,701 usec 0 160 0 76,645 usec 0 160 0 25,522 usec 0 160 0 38,798 usec 0 160 0 89,012 usec 0 160 0 23,015 usec 0 160 0 73,360 usec 0 160 0 76,953 usec 0 160 0 31,492 usec 0 160 0 75,889 usec 0 160 0 84,551 usec 0 160 0 84,352 usec 0 160 0 57,103 usec 0 160 0 93,452 usec 0 160 0 If HIGH_ORDER_GFP is used for order 4, the number of order 4 could be decreased but the elapsed time results were quite stable and fast enough. time order 8 order 4 order 0 1,356 usec 0 155 80 1,901 usec 0 11 2384 1,912 usec 0 0 2560 1,911 usec 0 0 2560 1,884 usec 0 0 2560 1,577 usec 0 0 2560 1,366 usec 0 0 2560 1,711 usec 0 0 2560 1,635 usec 0 28 2112 544 usec 10 0 0 633 usec 2 128 0 848 usec 0 160 0 729 usec 0 160 0 1,000 usec 0 160 0 1,358 usec 0 160 0 2,638 usec 0 31 2064 Link: https://lkml.kernel.org/r/20230303050332.10138-1-jaewon31.kim@samsung.com Signed-off-by: Jaewon Kim Reviewed-by: John Stultz Cc: Daniel Vetter Cc: Johannes Weiner Cc: Michal Hocko Cc: Sumit Semwal Cc: T.J. Mercier Signed-off-by: Andrew Morton

dma-buf/heaps: Assert held reservation lock for dma-buf mmapping

2022-11-11T20:49:51Z

When userspace mmaps dma-buf's fd, the dma-buf reservation lock must be held. Add locking sanity checks to the dma-buf mmaping callbacks to ensure that the locking assumptions won't regress in the future. Suggested-by: Daniel Vetter Signed-off-by: Dmitry Osipenko Acked-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20221110201349.351294-5-dmitry.osipenko@collabora.com