<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/mm/hugetlb_vmemmap.c, branch 0x221E-v0.0.1-v6.19</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=0x221E-v0.0.1-v6.19</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=0x221E-v0.0.1-v6.19'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2025-11-17T01:28:25Z</updated>
<entry>
<title>treewide: include linux/pgalloc.h instead of asm/pgalloc.h</title>
<updated>2025-11-17T01:28:25Z</updated>
<author>
<name>Harry Yoo</name>
<email>harry.yoo@oracle.com</email>
</author>
<published>2025-10-24T11:30:47Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ad8b2e096181bd23a32d8672de107136d0c478e9'/>
<id>urn:sha1:ad8b2e096181bd23a32d8672de107136d0c478e9</id>
<content type='text'>
For now, including &lt;asm/pgalloc.h&gt; instead of &lt;linux/pgalloc.h&gt; is
technically fine unless the .c file calls p*d_populate_kernel() helper
functions.

But it is a better practice to always include &lt;linux/pgalloc.h&gt;.  Include
&lt;linux/pgalloc.h&gt; instead of &lt;asm/pgalloc.h&gt; outside arch/.

Link: https://lkml.kernel.org/r/20251024113047.119058-3-harry.yoo@oracle.com
Signed-off-by: Harry Yoo &lt;harry.yoo@oracle.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reviewed-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix some typos in mm module</title>
<updated>2025-11-17T01:27:52Z</updated>
<author>
<name>jianyun.gao</name>
<email>jianyungao89@gmail.com</email>
</author>
<published>2025-09-29T00:26:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b6c46600bfb28b4be4e9cff7bad4f2cf357e0fb7'/>
<id>urn:sha1:b6c46600bfb28b4be4e9cff7bad4f2cf357e0fb7</id>
<content type='text'>
Below are some typos in the code comments:

  intevals ==&gt; intervals
  addesses ==&gt; addresses
  unavaliable ==&gt; unavailable
  facor ==&gt; factor
  droping ==&gt; dropping
  exlusive ==&gt; exclusive
  decription ==&gt; description
  confict ==&gt; conflict
  desriptions ==&gt; descriptions
  otherwize ==&gt; otherwise
  vlaue ==&gt; value
  cheching ==&gt; checking
  exisitng ==&gt; existing
  modifed ==&gt; modified
  differenciate ==&gt; differentiate
  refernece ==&gt; reference
  permissons ==&gt; permissions
  indepdenent ==&gt; independent
  spliting ==&gt; splitting

Just fix it.

Link: https://lkml.kernel.org/r/20250929002608.1633825-1-jianyungao89@gmail.com
Signed-off-by: jianyun.gao &lt;jianyungao89@gmail.com&gt;
Reviewed-by: SeongJae Park &lt;sj@kernel.org&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Reviewed-by: Dev Jain &lt;dev.jain@arm.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/pagewalk: split walk_page_range_novma() into kernel/user parts</title>
<updated>2025-07-10T05:42:05Z</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2025-06-05T13:51:04Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=96d81e4766f9e88b66a0502b5a7f34a4c20ac754'/>
<id>urn:sha1:96d81e4766f9e88b66a0502b5a7f34a4c20ac754</id>
<content type='text'>
walk_page_range_novma() is rather confusing - it supports two modes, one
used often, the other used only for debugging.

The first mode is the common case of traversal of kernel page tables,
which is what nearly all callers use this for.

Secondly it provides an unusual debugging interface that allows for the
traversal of page tables in a userland range of memory even for that
memory which is not described by a VMA.

It is far from certain that such page tables should even exist, but
perhaps this is precisely why it is useful as a debugging mechanism.

As a result, this is utilised by ptdump only.  Historically, things were
reversed - ptdump was the only user, and other parts of the kernel evolved
to use the kernel page table walking here.

Since we have some complicated and confusing locking rules for the novma
case, it makes sense to separate the two usages into their own functions.

Doing this also provide self-documentation as to the intent of the caller
- are they doing something rather unusual or are they simply doing a
standard kernel page table walk?

We therefore establish two separate functions - walk_page_range_debug()
for this single usage, and walk_kernel_page_table_range() for general
kernel page table walking.

The walk_page_range_debug() function is currently used to traverse both
userland and kernel mappings, so we maintain this and in the case of
kernel mappings being traversed, we have walk_page_range_debug() invoke
walk_kernel_page_table_range() internally.

We additionally make walk_page_range_debug() internal to mm.

Link: https://lkml.kernel.org/r/20250605135104.90720-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Acked-by: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jonas Bonn &lt;jonas@southpole.se&gt;
Cc: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Stafford Horne &lt;shorne@gmail.com&gt;
Cc: Stefan Kristiansson &lt;stefan.kristiansson@saunalahti.fi&gt;
Cc: WANG Xuerui &lt;kernel@xen0n.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, hugetlb: increment the number of pages to be reset on HVO</title>
<updated>2025-04-18T03:10:08Z</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2025-04-15T11:18:59Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=274fe92de2c4e50dbfd1b30070b4f6d8a27b388a'/>
<id>urn:sha1:274fe92de2c4e50dbfd1b30070b4f6d8a27b388a</id>
<content type='text'>
commit 4eeec8c89a0c ("mm: move hugetlb specific things in folio to
page[3]") shifted hugetlb specific stuff, and now mapping overlaps
_hugetlb_cgroup field.

Upon restoring the vmemmap for HVO, only the first two tail pages are
reset, and this causes the check in free_tail_page_prepare() to fail as it
finds an unexpected mapping value in some tails.

Increment the number of pages to be reset to 4 (head + 3 tail pages)

Link: https://lkml.kernel.org/r/20250415111859.376302-1-osalvador@suse.de
Fixes: 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]")
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Suggested-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Muchun Song &lt;muchun.song@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: do pre-HVO for bootmem allocated pages</title>
<updated>2025-03-17T05:06:29Z</updated>
<author>
<name>Frank van der Linden</name>
<email>fvdl@google.com</email>
</author>
<published>2025-02-28T18:29:20Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b1222550fbf73840e31a103305d03ad53b8f5a59'/>
<id>urn:sha1:b1222550fbf73840e31a103305d03ad53b8f5a59</id>
<content type='text'>
For large systems, the overhead of vmemmap pages for hugetlb is
substantial.  It's about 1.5% of memory, which is about 45G for a 3T
system.  If you want to configure most of that system for hugetlb (e.g. 
to use as backing memory for VMs), there is a chance of running out of
memory on boot, even though you know that the 45G will become available
later.

To avoid this scenario, and since it's a waste to first allocate and then
free that 45G during boot, do pre-HVO for hugetlb bootmem allocated pages
('gigantic' pages).

pre-HVO is done by adding functions that are called from
sparse_init_nid_early and sparse_init_nid_late.  The first is called
before memmap allocation, so it takes care of allocating memmap HVO-style.
The second verifies that all bootmem pages look good, specifically it
checks that they do not intersect with multiple zones.  This can only be
done from sparse_init_nid_late path, when zones have been initialized.

The hugetlb page size must be aligned to the section size, and aligned to
the size of memory described by the number of page structures contained in
one PMD (since pre-HVO is not prepared to split PMDs).  This should be
true for most 'gigantic' pages, it is for 1G pages on x86, where both of
these alignment requirements are 128M.

This will only have an effect if hugetlb_bootmem_alloc was called early in
boot.  If not, it won't do anything, and HVO for bootmem hugetlb pages
works as before.

Link: https://lkml.kernel.org/r/20250228182928.2645936-20-fvdl@google.com
Signed-off-by: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin (Cruise) &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: add pre-HVO framework</title>
<updated>2025-03-17T05:06:29Z</updated>
<author>
<name>Frank van der Linden</name>
<email>fvdl@google.com</email>
</author>
<published>2025-02-28T18:29:18Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=752fe17af693323dba5622b19c858a46fed219a1'/>
<id>urn:sha1:752fe17af693323dba5622b19c858a46fed219a1</id>
<content type='text'>
Define flags for pre-HVOed bootmem hugetlb pages, and act on them.

The most important flag is the HVO flag, signalling that a bootmem
allocated gigantic page has already been HVO-ed.  If this flag is seen by
the hugetlb bootmem gather code, the page is marked as HVO optimized.  The
HVO code will then not try to optimize it again.  Instead, it will just
map the tail page mirror pages read-only, completing the HVO steps.

No functional change, as nothing sets the flags yet.

Link: https://lkml.kernel.org/r/20250228182928.2645936-18-fvdl@google.com
Signed-off-by: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin (Cruise) &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: convert cmdline parameters from setup to early</title>
<updated>2025-03-17T05:06:26Z</updated>
<author>
<name>Frank van der Linden</name>
<email>fvdl@google.com</email>
</author>
<published>2025-02-28T18:29:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5b47c02967ab770aa7661c8863a21b2fd59e35ff'/>
<id>urn:sha1:5b47c02967ab770aa7661c8863a21b2fd59e35ff</id>
<content type='text'>
Convert the cmdline parameters (hugepagesz, hugepages, default_hugepagesz
and hugetlb_free_vmemmap) to early parameters.

Since parse_early_param might run before MMU setups on some platforms
(powerpc), validation of huge page sizes as specified in command line
parameters would fail.  So instead, for the hstate-related values, just
record the them and parse them on demand, from hugetlb_bootmem_alloc.

The allocation of hugetlb bootmem pages is now done in
hugetlb_bootmem_alloc, which is called explicitly at the start of
mm_core_init().  core_initcall would be too late, as that happens with
memblock already torn down.

This change will allow earlier allocation and initialization of bootmem
hugetlb pages later on.

No functional change intended.

Link: https://lkml.kernel.org/r/20250228182928.2645936-8-fvdl@google.com
Signed-off-by: Frank van der Linden &lt;fvdl@google.com&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin (Cruise) &lt;roman.gushchin@linux.dev&gt;
Cc: Usama Arif &lt;usamaarif642@gmail.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>treewide: const qualify ctl_tables where applicable</title>
<updated>2025-01-28T12:48:37Z</updated>
<author>
<name>Joel Granados</name>
<email>joel.granados@kernel.org</email>
</author>
<published>2025-01-28T12:48:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1751f872cc97f992ed5c4c72c55588db1f0021e1'/>
<id>urn:sha1:1751f872cc97f992ed5c4c72c55588db1f0021e1</id>
<content type='text'>
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &amp;uts_kern/const struct ctl_table *table = \&amp;uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt; # for kernel/trace/
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt; # SCSI
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # xfs
Acked-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
Acked-by: Corey Minyard &lt;cminyard@mvista.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Bill O'Donnell &lt;bodonnel@redhat.com&gt;
Acked-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Acked-by: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Signed-off-by: Joel Granados &lt;joel.granados@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO</title>
<updated>2024-09-02T03:25:45Z</updated>
<author>
<name>Yu Zhao</name>
<email>yuzhao@google.com</email>
</author>
<published>2024-07-19T04:25:03Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c2a967f6ab0ec896648c0497d3dc15d8f136b148'/>
<id>urn:sha1:c2a967f6ab0ec896648c0497d3dc15d8f136b148</id>
<content type='text'>
hugetlb_vmemmap_optimize_folio() and hugetlb_vmemmap_restore_folio() are
wrappers meant to be called regardless of whether HVO is enabled. 
Therefore, they should not call synchronize_rcu().  Otherwise, it
regresses use cases not enabling HVO.

So move synchronize_rcu() to __hugetlb_vmemmap_optimize_folio() and
__hugetlb_vmemmap_restore_folio(), and call it once for each batch of
folios when HVO is enabled.

Link: https://lkml.kernel.org/r/20240719042503.2752316-1-yuzhao@google.com
Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
Signed-off-by: Yu Zhao &lt;yuzhao@google.com&gt;
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202407091001.1250ad4a-oliver.sang@intel.com
Reported-by: Janosch Frank &lt;frankja@linux.ibm.com&gt;
Tested-by: Marc Hartmayer &lt;mhartmay@linux.ibm.com&gt;
Acked-by: Muchun Song &lt;muchun.song@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: don't account memmap per-node</title>
<updated>2024-08-16T05:16:14Z</updated>
<author>
<name>Pasha Tatashin</name>
<email>pasha.tatashin@soleen.com</email>
</author>
<published>2024-08-08T21:34:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9d85731110241fb8ca9445ea4177d816041a8825'/>
<id>urn:sha1:9d85731110241fb8ca9445ea4177d816041a8825</id>
<content type='text'>
Fix invalid access to pgdat during hot-remove operation:
ndctl users reported a GPF when trying to destroy a namespace:
$ ndctl destroy-namespace all -r all -f
 Segmentation fault
 dmesg:
 Oops: general protection fault, probably for
 non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN
 PTI
 KASAN: probably user-memory-access in range
 [0x000000000002b280-0x000000000002b287]
 CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1
 Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
 2.20.1 09/13/2023
 RIP: 0010:mod_node_page_state+0x2a/0x110

cxl-test users report a GPF when trying to unload the test module:
$ modrpobe -r cxl-test
 dmesg
 BUG: unable to handle page fault for address: 0000000000004200
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 0 UID: 0 PID: 1076 Comm: modprobe Tainted: G O N 6.11.0-rc1 #197
 Tainted: [O]=OOT_MODULE, [N]=TEST
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/15
 RIP: 0010:mod_node_page_state+0x6/0x90

Currently, when memory is hot-plugged or hot-removed the accounting is
done based on the assumption that memmap is allocated from the same node
as the hot-plugged/hot-removed memory, which is not always the case.

In addition, there are challenges with keeping the node id of the memory
that is being remove to the time when memmap accounting is actually
performed: since this is done after remove_pfn_range_from_zone(), and
also after remove_memory_block_devices(). Meaning that we cannot use
pgdat nor walking though memblocks to get the nid.

Given all of that, account the memmap overhead system wide instead.

For this we are going to be using global atomic counters, but given that
memmap size is rarely modified, and normally is only modified either
during early boot when there is only one CPU, or under a hotplug global
mutex lock, therefore there is no need for per-cpu optimizations.

Also, while we are here rename nr_memmap to nr_memmap_pages, and
nr_memmap_boot to nr_memmap_boot_pages to be self explanatory that the
units are in page count.

[pasha.tatashin@soleen.com: address a few nits from David Hildenbrand]
  Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240809191020.1142142-4-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240808213437.682006-4-pasha.tatashin@soleen.com
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Pasha Tatashin &lt;pasha.tatashin@soleen.com&gt;
Reported-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com
Reported-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t
Tested-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Alison Schofield &lt;alison.schofield@intel.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Tested-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Cc: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Cc: Fan Ni &lt;fan.ni@samsung.com&gt;
Cc: Joel Granados &lt;j.granados@samsung.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Li Zhijian &lt;lizhijian@fujitsu.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Sourav Panda &lt;souravpanda@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
