<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/include/rdma/ib_umem.h, branch linux-3.17.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-3.17.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-3.17.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2014-09-19T16:55:42Z</updated>
<entry>
<title>IB: ib_umem_release() should decrement mm-&gt;pinned_vm from ib_umem_get</title>
<updated>2014-09-19T16:55:42Z</updated>
<author>
<name>Shawn Bohrer</name>
<email>sbohrer@rgmadvisors.com</email>
</author>
<published>2014-09-03T17:13:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=87773dd56d5405ac28119fcfadacefd35877c18f'/>
<id>urn:sha1:87773dd56d5405ac28119fcfadacefd35877c18f</id>
<content type='text'>
In debugging an application that receives -ENOMEM from ib_reg_mr(), I
found that ib_umem_get() can fail because the pinned_vm count has
wrapped causing it to always be larger than the lock limit even with
RLIMIT_MEMLOCK set to RLIM_INFINITY.

The wrapping of pinned_vm occurs because the process that calls
ib_reg_mr() will have its mm-&gt;pinned_vm count incremented.  Later a
different process with a different mm_struct than the one that
allocated the ib_umem struct ends up releasing it which results in
decrementing the new processes mm-&gt;pinned_vm count past zero and
wrapping.

I'm not entirely sure what circumstances cause a different process to
release the ib_umem than the one that allocated it but the kernel
stack trace of the freeing process from my situation looks like the
following:

    Call Trace:
     [&lt;ffffffff814d64b1&gt;] dump_stack+0x19/0x1b
     [&lt;ffffffffa0b522a5&gt;] ib_umem_release+0x1f5/0x200 [ib_core]
     [&lt;ffffffffa0b90681&gt;] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib]
     [&lt;ffffffffa0b4d93c&gt;] ib_destroy_qp+0x12c/0x170 [ib_core]
     [&lt;ffffffffa0cc7129&gt;] ib_uverbs_close+0x259/0x4e0 [ib_uverbs]
     [&lt;ffffffff81141cba&gt;] __fput+0xba/0x240
     [&lt;ffffffff81141e4e&gt;] ____fput+0xe/0x10
     [&lt;ffffffff81060894&gt;] task_work_run+0xc4/0xe0
     [&lt;ffffffff810029e5&gt;] do_notify_resume+0x95/0xa0
     [&lt;ffffffff814e3dd0&gt;] int_signal+0x12/0x17

The following patch fixes the issue by storing the pid struct of the
process that calls ib_umem_get() so that ib_umem_release and/or
ib_umem_account() can properly decrement the pinned_vm count of the
correct mm_struct.

Signed-off-by: Shawn Bohrer &lt;sbohrer@rgmadvisors.com&gt;
Reviewed-by: Shachar Raindel &lt;raindel@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
<entry>
<title>IB: Refactor umem to use linear SG table</title>
<updated>2014-03-04T18:34:28Z</updated>
<author>
<name>Yishai Hadas</name>
<email>yishaih@mellanox.com</email>
</author>
<published>2014-01-28T11:40:15Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=eeb8461e36c99fdf2d058751be924a2aab215005'/>
<id>urn:sha1:eeb8461e36c99fdf2d058751be924a2aab215005</id>
<content type='text'>
This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list.  With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.

Signed-off-by: Shachar Raindel &lt;raindel@mellanox.com&gt;
Signed-off-by: Yishai Hadas &lt;yishaih@mellanox.com&gt;
Signed-off-by: Roland Dreier &lt;roland@purestorage.com&gt;
</content>
</entry>
<entry>
<title>IB: expand ib_umem_get() prototype</title>
<updated>2008-04-29T15:06:12Z</updated>
<author>
<name>Arthur Kepner</name>
<email>akepner@sgi.com</email>
</author>
<published>2008-04-29T08:00:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cb9fbc5c37b69ac584e61d449cfd590f5ae1f90d'/>
<id>urn:sha1:cb9fbc5c37b69ac584e61d449cfd590f5ae1f90d</id>
<content type='text'>
Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().

Signed-off-by: Arthur Kepner &lt;akepner@sgi.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Jesse Barnes &lt;jbarnes@virtuousgeek.org&gt;
Cc: Jes Sorensen &lt;jes@sgi.com&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Roland Dreier &lt;rdreier@cisco.com&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Grant Grundler &lt;grundler@parisc-linux.org&gt;
Cc: Michael Ellerman &lt;michael@ellerman.id.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>IB/umem: Add hugetlb flag to struct ib_umem</title>
<updated>2007-10-10T02:59:13Z</updated>
<author>
<name>Joachim Fenkes</name>
<email>fenkes@de.ibm.com</email>
</author>
<published>2007-09-13T16:15:28Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c8d8beea0383e47c9d65d45f0ca95626ec435fcd'/>
<id>urn:sha1:c8d8beea0383e47c9d65d45f0ca95626ec435fcd</id>
<content type='text'>
During ib_umem_get(), determine whether all pages from the memory
region are hugetlb pages and report this in the "hugetlb" member.
Low-level drivers can use this information if they need it.

Signed-off-by: Joachim Fenkes &lt;fenkes@de.ibm.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
</entry>
<entry>
<title>Detach sched.h from mm.h</title>
<updated>2007-05-21T16:18:19Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2007-05-20T21:22:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e8edc6e03a5c8562dc70a6d969f732bdb355a7e7'/>
<id>urn:sha1:e8edc6e03a5c8562dc70a6d969f732bdb355a7e7</id>
<content type='text'>
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>IB: Put rlimit accounting struct in struct ib_umem</title>
<updated>2007-05-09T01:00:37Z</updated>
<author>
<name>Roland Dreier</name>
<email>rolandd@cisco.com</email>
</author>
<published>2007-04-19T03:20:28Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1bf66a30421ca772820f489d88c16d0c430d6a67'/>
<id>urn:sha1:1bf66a30421ca772820f489d88c16d0c430d6a67</id>
<content type='text'>
When memory pinned with ib_umem_get() is released, ib_umem_release()
needs to subtract the amount of memory being unpinned from
mm-&gt;locked_vm.  However, ib_umem_release() may be called with
mm-&gt;mmap_sem already held for writing if the memory is being released
as part of an munmap() call, so it is sometimes necessary to defer
this accounting into a workqueue.

However, the work struct used to defer this accounting is dynamically
allocated before it is queued, so there is the possibility of failing
that allocation.  If the allocation fails, then ib_umem_release has no
choice except to bail out and leave the process with a permanently
elevated locked_vm.

Fix this by allocating the structure to defer accounting as part of
the original struct ib_umem, so there's no possibility of failing a
later allocation if creating the struct ib_umem and pinning memory
succeeds.

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
</entry>
<entry>
<title>IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules</title>
<updated>2007-05-09T01:00:37Z</updated>
<author>
<name>Roland Dreier</name>
<email>rolandd@cisco.com</email>
</author>
<published>2007-03-05T00:15:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f7c6a7b5d59980b076abbf2ceeb8735591290285'/>
<id>urn:sha1:f7c6a7b5d59980b076abbf2ceeb8735591290285</id>
<content type='text'>
Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
</entry>
</feed>
