<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/mm/memory.c, branch linux-2.6.18.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-2.6.18.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-2.6.18.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2007-02-23T23:49:52Z</updated>
<entry>
<title>read_zero_pagealigned() locking fix</title>
<updated>2007-02-23T23:49:52Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hugh@veritas.com</email>
</author>
<published>2006-12-10T10:18:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=dbee2bf2f312a9d18fa3f305adc14e2ee58f65df'/>
<id>urn:sha1:dbee2bf2f312a9d18fa3f305adc14e2ee58f65df</id>
<content type='text'>
Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
bugzilla 7645.  Right: read_zero_pagealigned uses down_read of mmap_sem,
but another thread's racing read of /dev/zero, or a normal fault, can
easily set that pte again, in between zap_page_range and zeromap_page_range
getting there.  It's been wrong ever since 2.4.3.

The simple fix is to use down_write instead, but that would serialize reads
of /dev/zero more than at present: perhaps some app would be badly
affected.  So instead let zeromap_page_range return the error instead of
BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
that case - there's no need to optimize for it.

Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
zeromap_page_range), though it really isn't interesting there.  And since
mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
than -ENOMEM.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Ramiro Voicu: &lt;Ramiro.Voicu@cern.ch&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
</entry>
<entry>
<title>[PATCH] mm: fix a race condition under SMC + COW</title>
<updated>2006-11-04T01:33:44Z</updated>
<author>
<name>Suresh Siddha</name>
<email>suresh.b.siddha@intel.com</email>
</author>
<published>2006-10-12T16:14:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f7bc3d8e19acb53487f969c4bf2c54109a972bac'/>
<id>urn:sha1:f7bc3d8e19acb53487f969c4bf2c54109a972bac</id>
<content type='text'>
Failing context is a multi threaded process context and the failing
sequence is as follows.

One thread T0 doing self modifying code on page X on processor P0 and
another thread T1 doing COW (breaking the COW setup as part of just
happened fork() in another thread T2) on the same page X on processor P1.
T0 doing SMC can endup modifying the new page Y (allocated by the T1 doing
COW on P1) but because of different I/D TLB's, P0 ITLB will not see the new
mapping till the flush TLB IPI from P1 is received.  During this interval,
if T0 executes the code created by SMC it can result in an app error (as
ITLB still points to old page X and endup executing the content in page X
rather than using the content in page Y).

Fix this issue by first clearing the PTE and flushing it, before updating
it with new entry.

Hugh sayeth:

  I was a bit sceptical, in the habit of thinking that Self Modifying Code
  must look such issues itself: but I guess there's nothing it can do to avoid
  this one.

  Fair enough, what you're changing it to is pretty much what powerpc and
  s390 were already doing, and is a more robust way of proceeding, consistent
  with how ptes are set everywhere else.

  The ptep_clear_flush is a bit heavy-handed (it's anxious to return the pte
  that was atomically cleared), but we'd have to wander through lots of arches
  to get the right minimal behaviour.  It'd also be nice to eliminate
  ptep_establish completely, now only used to define other macros/inlines: it
  always seemed obfuscation to me, what you've got there now is clearer.
  Let's put those cleanups on a TODO list.

Signed-off-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Acked-by: "David S. Miller" &lt;davem@davemloft.net&gt;
Acked-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Signed-off-by: Chris Wright &lt;chrisw@sous-sol.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] per-task-delay-accounting: sync block I/O and swapin delay collection</title>
<updated>2006-07-15T04:53:56Z</updated>
<author>
<name>Shailabh Nagar</name>
<email>nagar@watson.ibm.com</email>
</author>
<published>2006-07-14T07:24:37Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0ff922452df86f3e9a2c6f705c4588ec62d096a7'/>
<id>urn:sha1:0ff922452df86f3e9a2c6f705c4588ec62d096a7</id>
<content type='text'>
Unlike earlier iterations of the delay accounting patches, now delays are only
collected for the actual I/O waits rather than try and cover the delays seen
in I/O submission paths.

Account separately for block I/O delays incurred as a result of swapin page
faults whose frequency can be affected by the task/process' rss limit.  Hence
swapin delays can act as feedback for rss limit changes independent of I/O
priority changes.

Signed-off-by: Shailabh Nagar &lt;nagar@watson.ibm.com&gt;
Signed-off-by: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Jes Sorensen &lt;jes@sgi.com&gt;
Cc: Peter Chubb &lt;peterc@gelato.unsw.edu.au&gt;
Cc: Erich Focht &lt;efocht@ess.nec.de&gt;
Cc: Levent Serinol &lt;lserinol@gmail.com&gt;
Cc: Jay Lan &lt;jlan@engr.sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] ia64: race flushing icache in COW path</title>
<updated>2006-07-15T04:53:51Z</updated>
<author>
<name>Anil Keshavamurthy</name>
<email>anil.s.keshavamurthy@intel.com</email>
</author>
<published>2006-07-14T07:23:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c38c8db7225465c8d124f38b24d3024decc26bbd'/>
<id>urn:sha1:c38c8db7225465c8d124f38b24d3024decc26bbd</id>
<content type='text'>
There is a race condition that showed up in a threaded JIT environment.
The situation is that a process with a JIT code page forks, so the page is
marked read-only, then some threads are created in the child.  One of the
threads attempts to add a new code block to the JIT page, so a
copy-on-write fault is taken, and the kernel allocates a new page, copies
the data, installs the new pte, and then calls lazy_mmu_prot_update() to
flush caches to make sure that the icache and dcache are in sync.
Unfortunately, the other thread runs right after the new pte is installed,
but before the caches have been flushed.  It tries to execute some old JIT
code that was already in this page, but it sees some garbage in the i-cache
from the previous users of the new physical page.

Fix: we must make the caches consistent before installing the pte.  This is
an ia64 only fix because lazy_mmu_prot_update() is a no-op on all other
architectures.

Signed-off-by: Anil Keshavamurthy &lt;anil.s.keshavamurthy@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] mm/memory.c: EXPORT_UNUSED_SYMBOL</title>
<updated>2006-07-10T20:24:17Z</updated>
<author>
<name>Adrian Bunk</name>
<email>bunk@stusta.de</email>
</author>
<published>2006-07-10T11:44:22Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=26fc52367af3774b123334bca409159ce37d2857'/>
<id>urn:sha1:26fc52367af3774b123334bca409159ce37d2857</id>
<content type='text'>
This patch marks an unused export as EXPORT_UNUSED_SYMBOL.

Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] lockdep: annotate mm</title>
<updated>2006-07-03T22:27:07Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2006-07-03T07:25:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f20dc5f7c1adf1c4b68b7672d6f2002cb824e636'/>
<id>urn:sha1:f20dc5f7c1adf1c4b68b7672d6f2002cb824e636</id>
<content type='text'>
Teach special (recursive) locking code to the lock validator.  Has no effect
on non-lockdep kernels.

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] Light weight event counters</title>
<updated>2006-06-30T18:25:36Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:45Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f8891e5e1f93a128c3900f82035e8541357896a7'/>
<id>urn:sha1:f8891e5e1f93a128c3900f82035e8541357896a7</id>
<content type='text'>
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat.  They have no
essential function for the VM.

We use a simple increment of per cpu variables.  In order to avoid the most
severe races we disable preempt.  Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter.  However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.

In the non preempt case this results in a simple increment for each
counter.  For many architectures this will be reduced by the compiler to a
single instruction.  This single instruction is atomic for i386 and x86_64.
 And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.

The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.

The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).

Benefits:
- VM event counter operations usually reduce to a single inline instruction
  on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
  Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.

References:

RFC http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=113512330605497&amp;w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=114988082814934&amp;w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=114991748606690&amp;w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&amp;r=1&amp;w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=115024767022346&amp;w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&amp;m=115047968808926&amp;w=2

Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] zoned vm counters: conversion of nr_pagetables to per zone counter</title>
<updated>2006-06-30T18:25:35Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-30T08:55:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=df849a1529c106f7460e51479ca78fe07b07dc8c'/>
<id>urn:sha1:df849a1529c106f7460e51479ca78fe07b07dc8c</id>
<content type='text'>
Conversion of nr_page_table_pages to a per zone counter

[akpm@osdl.org: bugfix]
Signed-off-by: Christoph Lameter &lt;clameter@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] add page_mkwrite() vm_operations method</title>
<updated>2006-06-23T14:42:51Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2006-06-23T09:03:43Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9637a5efd4fbe36164c5ce7f6a0ee68b2bf22b7f'/>
<id>urn:sha1:9637a5efd4fbe36164c5ce7f6a0ee68b2bf22b7f</id>
<content type='text'>
Add a new VMA operation to notify a filesystem or other driver about the
MMU generating a fault because userspace attempted to write to a page
mapped through a read-only PTE.

This facility permits the filesystem or driver to:

 (*) Implement storage allocation/reservation on attempted write, and so to
     deal with problems such as ENOSPC more gracefully (perhaps by generating
     SIGBUS).

 (*) Delay making the page writable until the contents have been written to a
     backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
     It permits the filesystem to have some guarantee about the state of the
     cache.

 (*) Account and limit number of dirty pages. This is one piece of the puzzle
     needed to make shared writable mapping work safely in FUSE.

Needed by cachefs (Or is it cachefiles?  Or fscache? &lt;head spins&gt;).

At least four other groups have stated an interest in it or a desire to use
the functionality it provides: FUSE, OCFS2, NTFS and JFFS2.  Also, things like
EXT3 really ought to use it to deal with the case of shared-writable mmap
encountering ENOSPC before we permit the page to be dirtied.

From: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;

  get_user_pages(.write=1, .force=1) can generate COW hits on read-only
  shared mappings, this patch traps those as mkpage_write candidates and fails
  to handle them the old way.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Joel Becker &lt;Joel.Becker@oracle.com&gt;
Cc: Mark Fasheh &lt;mark.fasheh@oracle.com&gt;
Cc: Anton Altaparmakov &lt;aia21@cantab.net&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] Swapless page migration: add R/W migration entries</title>
<updated>2006-06-23T14:42:50Z</updated>
<author>
<name>Christoph Lameter</name>
<email>clameter@sgi.com</email>
</author>
<published>2006-06-23T09:03:35Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0697212a411c1dae03c27845f2de2f3adb32c331'/>
<id>urn:sha1:0697212a411c1dae03c27845f2de2f3adb32c331</id>
<content type='text'>
Implement read/write migration ptes

We take the upper two swapfiles for the two types of migration ptes and define
a series of macros in swapops.h.

The VM is modified to handle the migration entries.  migration entries can
only be encountered when the page they are pointing to is locked.  This limits
the number of places one has to fix.  We also check in copy_pte_range and in
mprotect_pte_range() for migration ptes.

We check for migration ptes in do_swap_cache and call a function that will
then wait on the page lock.  This allows us to effectively stop all accesses
to apge.

Migration entries are created by try_to_unmap if called for migration and
removed by local functions in migrate.c

From: Hugh Dickins &lt;hugh@veritas.com&gt;

  Several times while testing swapless page migration (I've no NUMA, just
  hacking it up to migrate recklessly while running load), I've hit the
  BUG_ON(!PageLocked(p)) in migration_entry_to_page.

  This comes from an orphaned migration entry, unrelated to the current
  correctly locked migration, but hit by remove_anon_migration_ptes as it
  checks an address in each vma of the anon_vma list.

  Such an orphan may be left behind if an earlier migration raced with fork:
  copy_one_pte can duplicate a migration entry from parent to child, after
  remove_anon_migration_ptes has checked the child vma, but before it has
  removed it from the parent vma.  (If the process were later to fault on this
  orphaned entry, it would hit the same BUG from migration_entry_wait.)

  This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
  not.  There's no such problem with file pages, because vma_prio_tree_add
  adds child vma after parent vma, and the page table locking at each end is
  enough to serialize.  Follow that example with anon_vma: add new vmas to the
  tail instead of the head.

  (There's no corresponding problem when inserting migration entries,
  because a missed pte will leave the page count and mapcount high, which is
  allowed for.  And there's no corresponding problem when migrating via swap,
  because a leftover swap entry will be correctly faulted.  But the swapless
  method has no refcounting of its entries.)

From: Ingo Molnar &lt;mingo@elte.hu&gt;

  pte_unmap_unlock() takes the pte pointer as an argument.

From: Hugh Dickins &lt;hugh@veritas.com&gt;

  Several times while testing swapless page migration, gcc has tried to exec
  a pointer instead of a string: smells like COW mappings are not being
  properly write-protected on fork.

  The protection in copy_one_pte looks very convincing, until at last you
  realize that the second arg to make_migration_entry is a boolean "write",
  and SWP_MIGRATION_READ is 30.

  Anyway, it's better done like in change_pte_range, using
  is_write_migration_entry and make_migration_entry_read.

From: Hugh Dickins &lt;hugh@veritas.com&gt;

  Remove unnecessary obfuscation from sys_swapon's range check on swap type,
  which blew up causing memory corruption once swapless migration made
  MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.

Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Acked-by: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Christoph Lameter &lt;clameter@engr.sgi.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
From: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
</feed>
