<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/mm/truncate.c, branch linux-4.10.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.10.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.10.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2017-05-20T12:31:02Z</updated>
<entry>
<title>mm: fix data corruption due to stale mmap reads</title>
<updated>2017-05-20T12:31:02Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2017-05-12T22:46:50Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=717946b469cfafc9f31ebfcefa34a18ff369fc2e'/>
<id>urn:sha1:717946b469cfafc9f31ebfcefa34a18ff369fc2e</id>
<content type='text'>
commit cd656375f94632d7b5af57bf67b7b5c0270c591c upstream.

Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX.  That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:

 - open an mmap over a 2MiB hole

 - read from a 2MiB hole, faulting in a 2MiB zero page

 - write to the hole with write(3p). The write succeeds but we
   incorrectly leave the 2MiB zero page mapping intact.

 - via the mmap, read the data that was just written. Since the zero
   page mapping is still intact we read back zeroes instead of the new
   data.

Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.

Fixes: c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55
Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dax: prevent invalidation of mapped DAX entries</title>
<updated>2017-05-20T12:31:02Z</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2017-05-12T22:46:47Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=35223d76e2cf6366be1d281548f099c345ace3f1'/>
<id>urn:sha1:35223d76e2cf6366be1d281548f099c345ace3f1</id>
<content type='text'>
commit 4636e70bb0a8b871998b6841a2e4b205cf2bc863 upstream.

Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.

This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).

The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.

This patch (of 4):

dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked.  This is done via:

  invalidate_mapping_pages()
    invalidate_exceptional_entry()
      dax_invalidate_mapping_entry()

However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped.  This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().

For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry.  This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.

We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping-&gt;i_mmap_rwsem.

Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().

Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: Invalidate DAX radix tree entries only if appropriate</title>
<updated>2016-12-27T04:29:24Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-08-10T15:22:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55'/>
<id>urn:sha1:c6dcf52c23d2d3fb5235cec42d7dd3f786b87d55</id>
<content type='text'>
Currently invalidate_inode_pages2_range() and invalidate_mapping_pages()
just delete all exceptional radix tree entries they find. For DAX this
is not desirable as we track cache dirtiness in these entries and when
they are evicted, we may not flush caches although it is necessary. This
can for example manifest when we write to the same block both via mmap
and via write(2) (to different offsets) and fsync(2) then does not
properly flush CPU caches when modification via write(2) was the last
one.

Create appropriate DAX functions to handle invalidation of DAX entries
for invalidate_inode_pages2_range() and invalidate_mapping_pages() and
wire them up into the corresponding mm functions.

Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>mm: workingset: move shadow entry tracking to radix tree exceptional tracking</title>
<updated>2016-12-13T02:55:08Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2016-12-13T00:43:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=14b468791fa955d442f962fdf5207dfd39a131c8'/>
<id>urn:sha1:14b468791fa955d442f962fdf5207dfd39a131c8</id>
<content type='text'>
Currently, we track the shadow entries in the page cache in the upper
bits of the radix_tree_node-&gt;count, behind the back of the radix tree
implementation.  Because the radix tree code has no awareness of them,
we rely on random subtleties throughout the implementation (such as the
node-&gt;count != 1 check in the shrinking code, which is meant to exclude
multi-entry nodes but also happens to skip nodes with only one shadow
entry, as that's accounted in the upper bits).  This is error prone and
has, in fact, caused the bug fixed in d3798ae8c6f3 ("mm: filemap: don't
plant shadow entries without radix tree node").

To remove these subtleties, this patch moves shadow entry tracking from
the upper bits of node-&gt;count to the existing counter for exceptional
entries.  node-&gt;count goes back to being a simple counter of valid
entries in the tree node and can be shrunk to a single byte.

This vastly simplifies the page cache code.  All accounting happens
natively inside the radix tree implementation, and maintaining the LRU
linkage of shadow nodes is consolidated into a single function in the
workingset code that is called for leaf nodes affected by a change in
the page cache tree.

This also removes the last user of the __radix_delete_node() return
value.  Eliminate it.

Link: http://lkml.kernel.org/r/20161117193211.GE23430@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib: radix-tree: check accounting of existing slot replacement users</title>
<updated>2016-12-13T02:55:08Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2016-12-13T00:43:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6d75f366b9242f9b17ed7d0b0604d7460f818f21'/>
<id>urn:sha1:6d75f366b9242f9b17ed7d0b0604d7460f818f21</id>
<content type='text'>
The bug in khugepaged fixed earlier in this series shows that radix tree
slot replacement is fragile; and it will become more so when not only
NULL&lt;-&gt;!NULL transitions need to be caught but transitions from and to
exceptional entries as well.  We need checks.

Re-implement radix_tree_replace_slot() on top of the sanity-checked
__radix_tree_replace().  This requires existing callers to also pass the
radix tree root, but it'll warn us when somebody replaces slots with
contents that need proper accounting (transitions between NULL entries,
real entries, exceptional entries) and where a replacement through the
slot pointer would corrupt the radix tree node counts.

Link: http://lkml.kernel.org/r/20161117193021.GB23430@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Matthew Wilcox &lt;mawilcox@linuxonhyperv.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb</title>
<updated>2016-12-01T00:32:52Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-11-30T23:54:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5cbc198ae08d84bd416b672ad8bd1222acd0855c'/>
<id>urn:sha1:5cbc198ae08d84bd416b672ad8bd1222acd0855c</id>
<content type='text'>
Hugetlb pages have -&gt;index in size of the huge pages (PMD_SIZE or
PUD_SIZE), not in PAGE_SIZE as other types of pages.  This means we
cannot user page_to_pgoff() to check whether we've got the right page
for the radix-tree index.

Let's introduce page_to_index() which would return radix-tree index for
given page.

We will be able to get rid of this once hugetlb will be switched to
multi-order entries.

Fixes: fc127da085c2 ("truncate: handle file thp")
Link: http://lkml.kernel.org/r/20161123093053.mjbnvn5zwxw5e6lk@black.fi.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reported-by: Doug Nelson &lt;doug.nelson@intel.com&gt;
Tested-by: Doug Nelson &lt;doug.nelson@intel.com&gt;
Reviewed-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.8+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>truncate: handle file thp</title>
<updated>2016-07-26T23:19:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-07-26T22:26:07Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fc127da085c26beb89f83ad804cf73422c3b6855'/>
<id>urn:sha1:fc127da085c26beb89f83ad804cf73422c3b6855</id>
<content type='text'>
For shmem/tmpfs we only need to tweak truncate_inode_page() and
invalidate_mapping_pages().

truncate_inode_pages_range() and invalidate_inode_pages2_range() are
adjusted to use page_to_pgoff().

Link: http://lkml.kernel.org/r/1466021202-61880-26-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: New fault locking</title>
<updated>2016-05-19T21:20:54Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2016-05-12T16:29:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ac401cc782429cc8560ce4840b1405d603740917'/>
<id>urn:sha1:ac401cc782429cc8560ce4840b1405d603740917</id>
<content type='text'>
Currently DAX page fault locking is racy.

CPU0 (write fault)		CPU1 (read fault)

__dax_fault()			__dax_fault()
  get_block(inode, block, &amp;bh, 0) -&gt; not mapped
				  get_block(inode, block, &amp;bh, 0)
				    -&gt; not mapped
  if (!buffer_mapped(&amp;bh))
    if (vmf-&gt;flags &amp; FAULT_FLAG_WRITE)
      get_block(inode, block, &amp;bh, 1) -&gt; allocates blocks
  if (page) -&gt; no
				  if (!buffer_mapped(&amp;bh))
				    if (vmf-&gt;flags &amp; FAULT_FLAG_WRITE) {
				    } else {
				      dax_load_hole();
				    }
  dax_insert_mapping()

And we are in a situation where we fail in dax_radix_entry() with -EIO.

Another problem with the current DAX page fault locking is that there is
no race-free way to clear dirty tag in the radix tree. We can always
end up with clean radix tree and dirty data in CPU cache.

We fix the first problem by introducing locking of exceptional radix
tree entries in DAX mappings acting very similarly to page lock and thus
synchronizing properly faults against the same mapping index. The same
lock can later be used to avoid races when clearing radix tree dirty
tag.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Reviewed-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
</content>
</entry>
<entry>
<title>mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros</title>
<updated>2016-04-04T17:41:08Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2016-04-01T12:29:47Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=09cbfeaf1a5a67bfb3201e0c83c810cecb2efa5a'/>
<id>urn:sha1:09cbfeaf1a5a67bfb3201e0c83c810cecb2efa5a</id>
<content type='text'>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - &lt;foo&gt; &lt;&lt; (PAGE_CACHE_SHIFT - PAGE_SHIFT) -&gt; &lt;foo&gt;;

 - &lt;foo&gt; &gt;&gt; (PAGE_CACHE_SHIFT - PAGE_SHIFT) -&gt; &lt;foo&gt;;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -&gt; PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -&gt; get_page();

 - page_cache_release() -&gt; put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E &lt;&lt; (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E &gt;&gt; (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove unnecessary uses of lock_page_memcg()</title>
<updated>2016-03-15T23:55:16Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2016-03-15T21:57:25Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fdf1cdb91b6ab7a8a91df68c384f36b8a0909cab'/>
<id>urn:sha1:fdf1cdb91b6ab7a8a91df68c384f36b8a0909cab</id>
<content type='text'>
There are several users that nest lock_page_memcg() inside lock_page()
to prevent page-&gt;mem_cgroup from changing.  But the page lock prevents
pages from moving between cgroups, so that is unnecessary overhead.

Remove lock_page_memcg() in contexts with locked contexts and fix the
debug code in the page stat functions to be okay with the page lock.

Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Vladimir Davydov &lt;vdavydov@virtuozzo.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
