<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/arch/s390/mm/pgalloc.c, branch linux-6.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2022-02-09T21:56:04Z</updated>
<entry>
<title>s390/mm: use CRST_ALLOC_ORDER instead of number</title>
<updated>2022-02-09T21:56:04Z</updated>
<author>
<name>Heiko Carstens</name>
<email>hca@linux.ibm.com</email>
</author>
<published>2022-02-07T13:02:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f413f685c6c094a7b968f66ca2e8512720807203'/>
<id>urn:sha1:f413f685c6c094a7b968f66ca2e8512720807203</id>
<content type='text'>
Use CRST_ALLOC_ORDER to make it more obvious what the order means,
and also to be consistent with other code, e.g. the vmemmap code.

Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/mm: check 2KB-fragment page on release</title>
<updated>2021-12-16T18:58:08Z</updated>
<author>
<name>Alexander Gordeev</name>
<email>agordeev@linux.ibm.com</email>
</author>
<published>2021-11-04T06:14:46Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4c88bb96e40b757f4796f70a4a7507df554467c4'/>
<id>urn:sha1:4c88bb96e40b757f4796f70a4a7507df554467c4</id>
<content type='text'>
When CONFIG_DEBUG_VM is defined check that pending remove
and tracking nibbles (bits 31-24 of the page refcount) are
cleared. Should the earlier stages of the page lifespan
have a race or logical error, such check could help in
exposing the issue.

Signed-off-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Reviewed-by: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/mm: better annotate 2KB pagetable fragments handling</title>
<updated>2021-12-16T18:58:08Z</updated>
<author>
<name>Alexander Gordeev</name>
<email>agordeev@linux.ibm.com</email>
</author>
<published>2021-11-04T06:14:45Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1194372db6f3c917c9c6f6907e8378cf1076c557'/>
<id>urn:sha1:1194372db6f3c917c9c6f6907e8378cf1076c557</id>
<content type='text'>
Explicitly encode immediate value of pending remove nibble
(bits 31-28) and tracking nibble (bits 27-24) of the page
refcount whenever these nibbles are tested or changed, for
better readability. Also, add some comments describing how
the fragments are handled.

Reviewed-by: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Signed-off-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/mm: fix 2KB pgtable release race</title>
<updated>2021-12-16T18:58:08Z</updated>
<author>
<name>Alexander Gordeev</name>
<email>agordeev@linux.ibm.com</email>
</author>
<published>2021-11-04T06:14:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c2c224932fd0ee6854d6ebfc8d059c2bcad86606'/>
<id>urn:sha1:c2c224932fd0ee6854d6ebfc8d059c2bcad86606</id>
<content type='text'>
There is a race on concurrent 2KB-pgtables release paths when
both upper and lower halves of the containing parent page are
freed, one via page_table_free_rcu() + __tlb_remove_table(),
and the other via page_table_free(). The race might lead to a
corruption as result of remove of list item in page_table_free()
concurrently with __free_page() in __tlb_remove_table().

Let's assume first the lower and next the upper 2KB-pgtables are
freed from a page. Since both halves of the page are allocated
the tracking byte (bits 24-31 of the page _refcount) has value
of 0x03 initially:

CPU0				CPU1
----				----

page_table_free_rcu() // lower half
{
	// _refcount[31..24] == 0x03
	...
	atomic_xor_bits(&amp;page-&gt;_refcount,
			0x11U &lt;&lt; (0 + 24));
	// _refcount[31..24] &lt;= 0x12
	...
	table = table | (1U &lt;&lt; 0);
	tlb_remove_table(tlb, table);
}
...
__tlb_remove_table()
{
	// _refcount[31..24] == 0x12
	mask = _table &amp; 3;
	// mask &lt;= 0x01
	...

				page_table_free() // upper half
				{
					// _refcount[31..24] == 0x12
					...
					atomic_xor_bits(
						&amp;page-&gt;_refcount,
						1U &lt;&lt; (1 + 24));
					// _refcount[31..24] &lt;= 0x10
					// mask &lt;= 0x10
					...
	atomic_xor_bits(&amp;page-&gt;_refcount,
			mask &lt;&lt; (4 + 24));
	// _refcount[31..24] &lt;= 0x00
	// mask &lt;= 0x00
	...
	if (mask != 0) // == false
		break;
	fallthrough;
	...
					if (mask &amp; 3) // == false
						...
					else
	__free_page(page);			list_del(&amp;page-&gt;lru);
	^^^^^^^^^^^^^^^^^^	RACE!		^^^^^^^^^^^^^^^^^^^^^
}					...
				}

The problem is page_table_free() releases the page as result of
lower nibble unset and __tlb_remove_table() observing zero too
early. With this update page_table_free() will use the similar
logic as page_table_free_rcu() + __tlb_remove_table(), and mark
the fragment as pending for removal in the upper nibble until
after the list_del().

In other words, the parent page is considered as unreferenced and
safe to release only when the lower nibble is cleared already and
unsetting a bit in upper nibble results in that nibble turned zero.

Cc: stable@vger.kernel.org
Suggested-by: Vlastimil Babka &lt;vbabka@suse.com&gt;
Reviewed-by: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Signed-off-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/pgalloc: use pointers instead of unsigned long values</title>
<updated>2021-12-10T15:14:26Z</updated>
<author>
<name>Heiko Carstens</name>
<email>hca@linux.ibm.com</email>
</author>
<published>2021-12-09T11:01:25Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=da001fce26bec9b5933162b32b3cc65923ad6c17'/>
<id>urn:sha1:da001fce26bec9b5933162b32b3cc65923ad6c17</id>
<content type='text'>
After adding the missing __va()/__pa() calls to the base asce
functions there are even more casts in the code than before. Make the
code more readable by passing and using pointers to page tables,
instead of using unsigned values for the same purpose.

This allows to get rid of nearly all casts within the code.

Suggested-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Reviewed-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/pgalloc: add virt/phys address handling to base asce functions</title>
<updated>2021-12-10T15:14:26Z</updated>
<author>
<name>Heiko Carstens</name>
<email>hca@linux.ibm.com</email>
</author>
<published>2021-12-07T19:06:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2f882800f6ab57b50b7c23a376a452a808025f37'/>
<id>urn:sha1:2f882800f6ab57b50b7c23a376a452a808025f37</id>
<content type='text'>
The base asce functions create/free page tables open-coded to make
sure that the returned asce and page tables do not make use of any
enhanced DAT features like e.g. large pages. This is required for some
I/O functions that use an asce, like e.g. some service call requests.

Handling of virtual vs physical addresses is missing; therefore add
that now.

Note: this currently doesn't fix a real bug, since virtual addresses
are indentical to physical ones.

Reviewed-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/mm: fix phys vs virt confusion in pgtable allocation routines</title>
<updated>2021-02-23T23:31:22Z</updated>
<author>
<name>Alexander Gordeev</name>
<email>agordeev@linux.ibm.com</email>
</author>
<published>2021-02-12T06:43:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2a444fdc24a860ed0ca016045913ebc2fa09a66e'/>
<id>urn:sha1:2a444fdc24a860ed0ca016045913ebc2fa09a66e</id>
<content type='text'>
The physical address of page tables is passed around and
used as virtual address in various locations.

Signed-off-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Reviewed-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Signed-off-by: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>s390/mm: remove set_fs / rework address space handling</title>
<updated>2020-11-23T11:01:12Z</updated>
<author>
<name>Heiko Carstens</name>
<email>hca@linux.ibm.com</email>
</author>
<published>2020-11-16T07:06:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=87d5986345219a7e4f204726d9085ea87f3e22d0'/>
<id>urn:sha1:87d5986345219a7e4f204726d9085ea87f3e22d0</id>
<content type='text'>
Remove set_fs support from s390. With doing this rework address space
handling and simplify it. As a result address spaces are now setup
like this:

CPU running in              | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
----------------------------|-----------|-----------|-----------
user space                  |  user     |  user     |  kernel
kernel, normal execution    |  kernel   |  user     |  kernel
kernel, kvm guest execution |  gmap     |  user     |  kernel

To achieve this the getcpu vdso syscall is removed in order to avoid
secondary address mode and a separate vdso address space in for user
space. The getcpu vdso syscall will be implemented differently with a
subsequent patch.

The kernel accesses user space always via secondary address space.
This happens in different ways:
- with mvcos in home space mode and directly read/write to secondary
  address space
- with mvcs/mvcp in primary space mode and copy from primary space to
  secondary space or vice versa
- with e.g. cs in secondary space mode and access secondary space

Switching translation modes happens with sacf before and after
instructions which access user space, like before.

Lazy handling of control register reloading is removed in the hope to
make everything simpler, but at the cost of making kernel entry and
exit a bit slower. That is: on kernel entry the primary asce is always
changed to contain the kernel asce, and on kernel exit the primary
asce is changed again so it contains the user asce.

In kernel mode there is only one exception to the primary asce: when
kvm guests are executed the primary asce contains the gmap asce (which
describes the guest address space). The primary asce is reset to
kernel asce whenever kvm guest execution is interrupted, so that this
doesn't has to be taken into account for any user space accesses.

Reviewed-by: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>mmap locking API: convert mmap_sem comments</title>
<updated>2020-06-09T16:39:14Z</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2020-06-09T04:33:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c1e8d7c6a7a682e1405e3e242d32fc377fd196ff'/>
<id>urn:sha1:c1e8d7c6a7a682e1405e3e242d32fc377fd196ff</id>
<content type='text'>
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Liam Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'cve-2020-11884' from emailed bundle</title>
<updated>2020-04-28T16:13:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-04-28T16:13:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3f777e19d171670ab558a6d5e6b1ac7f9b6c574f'/>
<id>urn:sha1:3f777e19d171670ab558a6d5e6b1ac7f9b6c574f</id>
<content type='text'>
Pull s390 fix from Christian Borntraeger:
 "Fix a race between page table upgrade and uaccess on s390.

  This fixes CVE-2020-11884 which allows for a local kernel crash or
  code execution"

* tag 'cve-2020-11884' from emailed bundle:
  s390/mm: fix page table upgrade vs 2ndary address mode accesses
</content>
</entry>
</feed>
