<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/mm/vmscan.c, branch linux-2.6.18.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-2.6.18.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-2.6.18.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2006-11-04T01:33:50Z</updated>
<entry>
<title>[PATCH] Use min of two prio settings in calculating distress for reclaim</title>
<updated>2006-11-04T01:33:50Z</updated>
<author>
<name>Martin Bligh</name>
<email>mbligh@google.com</email>
</author>
<published>2006-10-28T17:38:25Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1406fd4e987bd05a22827e8f0aa01f3335550e01'/>
<id>urn:sha1:1406fd4e987bd05a22827e8f0aa01f3335550e01</id>
<content type='text'>
If try_to_free_pages / balance_pgdat are called with a gfp_mask specifying
GFP_IO and/or GFP_FS, they will reclaim the requisite number of pages, and the
reset prev_priority to DEF_PRIORITY (or to some other high (ie: unurgent)
value).

However, another reclaimer without those gfp_mask flags set (say, GFP_NOIO)
may still be struggling to reclaim pages.  The concurrent overwrite of
zone-&gt;prev_priority will cause this GFP_NOIO thread to unexpectedly cease
deactivating mapped pages, thus causing reclaim difficulties.

Fix this is to key the distress calculation not off zone-&gt;prev_priority, but
also take into account the local caller's priority by using
min(zone-&gt;prev_priority, sc-&gt;priority)

Signed-off-by: Martin J. Bligh &lt;mbligh@google.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] vmscan: Fix temp_priority race</title>
<updated>2006-11-04T01:33:50Z</updated>
<author>
<name>Martin Bligh</name>
<email>mbligh@mbligh.org</email>
</author>
<published>2006-10-28T17:38:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=252287f4e8e825fbced96f1a8bc7dc1dfead325c'/>
<id>urn:sha1:252287f4e8e825fbced96f1a8bc7dc1dfead325c</id>
<content type='text'>
The temp_priority field in zone is racy, as we can walk through a reclaim
path, and just before we copy it into prev_priority, it can be overwritten
(say with DEF_PRIORITY) by another reclaimer.

The same bug is contained in both try_to_free_pages and balance_pgdat, but
it is fixed slightly differently.  In balance_pgdat, we keep a separate
priority record per zone in a local array.  In try_to_free_pages there is
no need to do this, as the priority level is the same for all zones that we
reclaim from.

Impact of this bug is that temp_priority is copied into prev_priority, and
setting this artificially high causes reclaimers to set distress
artificially low.  They then fail to reclaim mapped pages, when they are,
in fact, under severe memory pressure (their priority may be as low as 0).
This causes the OOM killer to fire incorrectly.

From: Andrew Morton &lt;akpm@osdl.org&gt;

__zone_reclaim() isn't modifying zone-&gt;prev_priority.  But zone-&gt;prev_priority
is used in the decision whether or not to bring mapped pages onto the inactive
list.  Hence there's a risk here that __zone_reclaim() will fail because
zone-&gt;prev_priority ir large (ie: low urgency) and lots of mapped pages end up
stuck on the active list.

Fix that up by decreasing (ie making more urgent) zone-&gt;prev_priority as
__zone_reclaim() scans the zone's pages.

This bug perhaps explains why ZONE_RECLAIM_PRIORITY was created.  It should be
possible to remove that now, and to just start out at DEF_PRIORITY?

Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Christoph Lameter &lt;clameter@engr.sgi.com&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
[chrisw: minor wiggle to fit -stable]
</content>
</entry>
<entry>
<title>zone_reclaim: dynamic slab reclaim</title>
<updated>2006-10-13T20:23:22Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-10-02T17:45:24Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=be64642c614ee7b193a75da3731c7ee397c21b4b'/>
<id>urn:sha1:be64642c614ee7b193a75da3731c7ee397c21b4b</id>
<content type='text'>
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0ff38490c836dc379ff7ec45b10a15a662f4e5f6


Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode.  Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.

However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs.  We have had a case where a machine with
46GB of memory was using 40-42GB for slab.  Zone reclaim was effective in
dealing with pagecache pages.  However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).

This patch implements slab reclaim during zone reclaim.  Zone reclaim
occurs if there is a danger of an off node allocation.  At that point we

1. Shrink the per node page cache if the number of pagecache
   pages is more than min_unmapped_ratio percent of pages in a zone.

2. Shrink the slab cache if the number of the nodes reclaimable slab pages
   (patch depends on earlier one that implements that counter)
   are more than min_slab_ratio (a new /proc/sys/vm tunable).

The shrinking of the slab cache is a bit problematic since it is not node
specific.  So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit.  I hope we will have zone based
slab reclaim at some point which will make that easier.

The default for the min_slab_ratio is 5%

Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.

[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>[PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O</title>
<updated>2006-07-03T22:26:59Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-07-03T07:24:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9614634fe6a138fd8ae044950700d2af8d203f97'/>
<id>urn:sha1:9614634fe6a138fd8ae044950700d2af8d203f97</id>
<content type='text'>
It turns out that it is advantageous to leave a small portion of unmapped file
backed pages if all of a zone's pages (or almost all pages) are allocated and
so the page allocator has to go off-node.

This allows recently used file I/O buffers to stay on the node and
reduces the times that zone reclaim is invoked if file I/O occurs
when we run out of memory in a zone.

The problem is that zone reclaim runs too frequently when the page cache is
used for file I/O (read write and therefore unmapped pages!) alone and we have
almost all pages of the zone allocated.  Zone reclaim may remove 32 unmapped
pages.  File I/O will use these pages for the next read/write requests and the
unmapped pages increase.  After the zone has filled up again zone reclaim will
remove it again after only 32 pages.  This cycle is too inefficient and there
are potentially too many zone reclaim cycles.

With the 1% boundary we may still remove all unmapped pages for file I/O in
zone reclaim pass.  However.  it will take a large number of read and writes
to get back to 1% again where we trigger zone reclaim again.

The zone reclaim 2.6.16/17 does not show this behavior because we have a 30
second timeout.

[akpm@osdl.org: rename the /proc file and the variable]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] Light weight event counters</title>
<updated>2006-06-30T18:25:36Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:45Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f8891e5e1f93a128c3900f82035e8541357896a7'/>
<id>urn:sha1:f8891e5e1f93a128c3900f82035e8541357896a7</id>
<content type='text'>
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=113512330605497&amp;w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=114988082814934&amp;w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=114991748606690&amp;w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&amp;r=1&amp;w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=115024767022346&amp;w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=115047968808926&amp;w=2

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] zoned vm counters: conversion of nr_slab to per zone counter</title>
<updated>2006-06-30T18:25:35Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9a865ffa34b6117a5e0b67640a084d8c2e198c93'/>
<id>urn:sha1:9a865ffa34b6117a5e0b67640a084d8c2e198c93</id>
<content type='text'>
- Allows reclaim to access counter without looping over processor counts.

- Allows accurate statistics on how many pages are used in a zone by
  the slab. This may become useful to balance slab allocations over
  various zones.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval</title>
<updated>2006-06-30T18:25:35Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=34aa1330f9b3c5783d269851d467326525207422'/>
<id>urn:sha1:34aa1330f9b3c5783d269851d467326525207422</id>
<content type='text'>
The zone_reclaim_interval was necessary because we were not able to determine
how many unmapped pages exist in a zone.  Therefore we had to scan in
intervals to figure out if any pages were unmapped.

With the zoned counters and NR_ANON_PAGES we now know the number of pagecache
pages and the number of mapped pages in a zone.  So we can simply skip the
reclaim if there is an insufficient number of unmapped pages.  We use
SWAP_CLUSTER_MAX as the boundary.

Drop all support for /proc/sys/vm/zone_reclaim_interval.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED</title>
<updated>2006-06-30T18:25:35Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f3dbd34460ff54962d3e3244b6bcb7f5295356e6'/>
<id>urn:sha1:f3dbd34460ff54962d3e3244b6bcb7f5295356e6</id>
<content type='text'>
The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
calculation as the number of mapped pagecache pages.  However, that is not
true.  NR_FILE_MAPPED includes the mapped anonymous pages.  This patch
separates those and therefore allows an accurate tracking of the anonymous
pages per zone.

It then becomes possible to determine the number of unmapped pages per zone
and we can avoid scanning for unmapped pages if there are none.

Also it may now be possible to determine the mapped/unmapped ratio in
get_dirty_limit.  Isnt the number of anonymous pages irrelevant in that
calculation?

Note that this will change the meaning of the number of mapped pages reported
in /proc/vmstat /proc/meminfo and in the per node statistics.  This may affect
user space tools that monitor these counters!  NR_FILE_MAPPED works like
NR_FILE_DIRTY.  It is only valid for pagecache pages.

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] zoned vm counters: remove NR_FILE_MAPPED from scan control structure</title>
<updated>2006-06-30T18:25:35Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:36Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bf02cf4b6cf931d060ad5c6ce9b960af6faefd2d'/>
<id>urn:sha1:bf02cf4b6cf931d060ad5c6ce9b960af6faefd2d</id>
<content type='text'>
We can now access the number of pages in a mapped state in an inexpensive way
in shrink_active_list.  So drop the nr_mapped field from scan_control.

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] zoned vm counters: convert nr_mapped to per zone counter</title>
<updated>2006-06-30T18:25:34Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=65ba55f500a37272985d071c9bbb35256a2f7c14'/>
<id>urn:sha1:65ba55f500a37272985d071c9bbb35256a2f7c14</id>
<content type='text'>
nr_mapped is important because it allows a determination of how many pages of
a zone are not mapped, which would allow a more efficient means of determining
when we need to reclaim memory in a zone.

We take the nr_mapped field out of the page state structure and define a new
per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
from NR_MAPPED in the next patch).

We replace the use of nr_mapped in various kernel locations.  This avoids the
looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
zone reclaim).

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
</feed>
