summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2012-10-13mempolicy: fix a race in shared_policy_replace()Mel Gorman
commit b22d127a39ddd10d93deee3d96e643657ad53a49 upstream. shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot be dereferenced if sp->lock is not held and 2) another thread can modify sp_node between spin_unlock for allocating a new sp node and next spin_lock. The bug was introduced before 2.6.12-rc2. Kosaki's original patch for this problem was to allocate an sp node and policy within shared_policy_replace and initialise it when the lock is reacquired. I was not keen on this approach because it partially duplicates sp_alloc(). As the paths were sp->lock is taken are not that performance critical this patch converts sp->lock to sp->mutex so it can sleep when calling sp_alloc(). [kosaki.motohiro@jp.fujitsu.com: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-13xfrm_user: ensure user supplied esn replay window is validMathias Krause
commit ecd7918745234e423dd87fcc0c077da557909720 upstream. The current code fails to ensure that the netlink message actually contains as many bytes as the header indicates. If a user creates a new state or updates an existing one but does not supply the bytes for the whole ESN replay window, the kernel copies random heap bytes into the replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL netlink attribute. This leads to following issues: 1. The replay window has random bits set confusing the replay handling code later on. 2. A malicious user could use this flaw to leak up to ~3.5kB of heap memory when she has access to the XFRM netlink interface (requires CAP_NET_ADMIN). Known users of the ESN replay window are strongSwan and Steffen's iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter uses the interface with a bitmap supplied while the former does not. strongSwan is therefore prone to run into issue 1. To fix both issues without breaking existing userland allow using the XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a fully specified one. For the former case we initialize the in-kernel bitmap with zero, for the latter we copy the user supplied bitmap. For state updates the full bitmap must be supplied. To prevent overflows in the bitmap length calculation the maximum size of bmp_len is limited to 128 by this patch -- resulting in a maximum replay window of 4096 packets. This should be sufficient for all real life scenarios (RFC 4303 recommends a default replay window size of 64). Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Martin Willi <martin@revosec.ch> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-07serial: set correct baud_base for EXSYS EX-41092 Dual 16950Flavio Leitner
commit 26e8220adb0aec43b7acafa0f1431760eee28522 upstream. Apparently the same card model has two IDs, so this patch complements the commit 39aced68d664291db3324d0fcf0985ab5626aac2 adding the missing one. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02kthread_worker: reimplement flush_kthread_work() to allow freeing the work ↵Tejun Heo
item being executed commit 46f3d976213452350f9d10b0c2780c2681f7075b upstream. kthread_worker provides minimalistic workqueue-like interface for users which need a dedicated worker thread (e.g. for realtime priority). It has basic queue, flush_work, flush_worker operations which mostly match the workqueue counterparts; however, due to the way flush_work() is implemented, it has a noticeable difference of not allowing work items to be freed while being executed. While the current users of kthread_worker are okay with the current behavior, the restriction does impede some valid use cases. Also, removing this difference isn't difficult and actually makes the code easier to understand. This patch reimplements flush_kthread_work() such that it uses a flush_work item instead of queue/done sequence numbers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Colin Cross <ccross@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02time: Move ktime_t overflow checking into timespec_valid_strictJohn Stultz
commit cee58483cf56e0ba355fdd97ff5e8925329aa936 upstream. Andreas Bombe reported that the added ktime_t overflow checking added to timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of timekeeping inputs") was causing problems with X.org because it caused timeouts larger then KTIME_T to be invalid. Previously, these large timeouts would be clamped to KTIME_MAX and would never expire, which is valid. This patch splits the ktime_t overflow checking into a new timespec_valid_strict function, and converts the timekeeping codes internal checking to use this more strict function. Reported-and-tested-by: Andreas Bombe <aeb@debian.org> Cc: Zhouping Liu <zliu@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02time: Improve sanity checking of timekeeping inputsJohn Stultz
commit 4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b upstream. Unexpected behavior could occur if the time is set to a value large enough to overflow a 64bit ktime_t (which is something larger then the year 2262). Also unexpected behavior could occur if large negative offsets are injected via adjtimex. So this patch improves the sanity check timekeeping inputs by improving the timespec_valid() check, and then makes better use of timespec_valid() to make sure we don't set the time to an invalid negative value or one that overflows ktime_t. Note: This does not protect from setting the time close to overflowing ktime_t and then letting natural accumulation cause the overflow. Reported-by: CAI Qian <caiqian@redhat.com> Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Zhouping Liu <zliu@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02sched: Fix race in task_group()Peter Zijlstra
commit 8323f26ce3425460769605a6aece7a174edaa7d1 upstream. Stefan reported a crash on a kernel before a3e5d1091c1 ("sched: Don't call task_group() too many times in set_task_rq()"), he found the reason to be that the multiple task_group() invocations in set_task_rq() returned different values. Looking at all that I found a lack of serialization and plain wrong comments. The below tries to fix it using an extra pointer which is updated under the appropriate scheduler locks. Its not pretty, but I can't really see another way given how all the cgroup stuff works. Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02NFS: Fix the initialisation of the readdir 'cookieverf' arrayTrond Myklebust
commit c3f52af3e03013db5237e339c817beaae5ec9e3a upstream. When the NFS_COOKIEVERF helper macro was converted into a static inline function in commit 99fadcd764 (nfs: convert NFS_*(inode) helpers to static inline), we broke the initialisation of the readdir cookies, since that depended on doing a memset with an argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *). At this point, NFS_COOKIEVERF seems to be more of an obfuscation than a helper, so the best thing would be to just get rid of it. Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881 Reported-by: Andi Kleen <andi@firstfloor.org> Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02perf/x86/ibs: Check syscall attribute flagsRobert Richter
commit bad9ac2d7f878a31cf1ae8c1ee3768077d222bcb upstream. Current implementation simply ignores attribute flags. Thus, there is no notification to userland of unsupported features. Check syscall's attribute flags to let userland know if a feature is supported by the kernel. This is also needed to distinguish between future kernels what might support a feature. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120910093018.GO8285@erda.amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02c/r: prctl: fix build error for no-MMU caseMark Salter
commit be8cfc4af15cf611dfeb66a1fb5df43d5f1e280a upstream. Commit 1ad75b9e1628 ("c/r: prctl: add minimal address test to PR_SET_MM") added some address checking to prctl_set_mm() used by checkpoint-restore. This causes a build error for no-MMU systems: kernel/sys.c: In function 'prctl_set_mm': kernel/sys.c:1868:34: error: 'mmap_min_addr' undeclared (first use in this function) The test for mmap_min_addr doesn't make a lot of sense for no-MMU code as noted in commit 6e1415467614 ("NOMMU: Optimise away the {dac_,}mmap_min_addr tests"). This patch defines mmap_min_addr as 0UL in the no-MMU case so that the compiler will optimize away tests for "addr < mmap_min_addr". Signed-off-by: Mark Salter <msalter@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02mm/ia64: fix a memory block size bugJianguo Wu
commit 05cf96398e1b6502f9e191291b715c7463c9d5dd upstream. I found following definition in include/linux/memory.h, in my IA64 platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE will be 0. #define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS) Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits, so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0. Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong. This will cause wrong system memory infomation in sysfs. I think it should be: #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) And "echo offline > memory0/state" will cause following call trace: kernel BUG at mm/memory_hotplug.c:885! sh[6455]: bugcheck! 0 [1] Pid: 6455, CPU 0, comm: sh psr : 0000101008526030 ifs : 8000000000000fa4 ip : [<a0000001008c40f0>] Not tainted (3.6.0-rc1) ip is at offline_pages+0x210/0xee0 Call Trace: show_stack+0x80/0xa0 show_regs+0x640/0x920 die+0x190/0x2c0 die_if_kernel+0x50/0x80 ia64_bad_break+0x3d0/0x6e0 ia64_native_leave_kernel+0x0/0x270 offline_pages+0x210/0xee0 alloc_pages_current+0x180/0x2a0 Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "Luck, Tony" <tony.luck@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02mISDN: Fix wrong usage of flush_work_sync while holding locksKarsten Keil
commit 4b921eda53366b319602351ff4d7256fafa4bd1b upstream. It is a bad idea to hold a spinlock and call flush_work_sync. Move the workqueue cleanup outside the spinlock and use cancel_work_sync, on closing the channel this seems to be the more correct function. Remove the never used and constant return value of mISDN_freebchannel. Signed-off-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02kobject: fix oops with "input0: bad kobj_uevent_env content in show_uevent()"Bjørn Mork
commit 60e233a56609fd963c59e99bd75c663d63fa91b6 upstream. Fengguang Wu <fengguang.wu@intel.com> writes: > After the __devinit* removal series, I can still get kernel panic in > show_uevent(). So there are more sources of bug.. > > Debug patch: > > @@ -343,8 +343,11 @@ static ssize_t show_uevent(struct device > goto out; > > /* copy keys to file */ > - for (i = 0; i < env->envp_idx; i++) > + dev_err(dev, "uevent %d env[%d]: %s/.../%s\n", env->buflen, env->envp_idx, top_kobj->name, dev->kobj.name); > + for (i = 0; i < env->envp_idx; i++) { > + printk(KERN_ERR "uevent %d env[%d]: %s\n", (int)count, i, env->envp[i]); > count += sprintf(&buf[count], "%s\n", env->envp[i]); > + } > > Oops message, the env[] is again not properly initilized: > > [ 44.068623] input input0: uevent 61 env[805306368]: input0/.../input0 > [ 44.069552] uevent 0 env[0]: (null) This is a completely different CONFIG_HOTPLUG problem, only demonstrating another reason why CONFIG_HOTPLUG should go away. I had a hard time trying to disable it anyway ;-) The problem this time is lots of code assuming that a call to add_uevent_var() will guarantee that env->buflen > 0. This is not true if CONFIG_HOTPLUG is unset. So things like this end up overwriting env->envp_idx because the array index is -1: if (add_uevent_var(env, "MODALIAS=")) return -ENOMEM; len = input_print_modalias(&env->buf[env->buflen - 1], sizeof(env->buf) - env->buflen, dev, 0); Don't know what the best action is, given that there seem to be a *lot* of this around the kernel. This patch "fixes" the problem for me, but I don't know if it can be considered an appropriate fix. [ It is the correct fix for now, for 3.7 forcing CONFIG_HOTPLUG to always be on is the longterm fix, but it's too late for 3.6 and older kernels to resolve this that way - gregkh ] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.Ian Chen
commit 3550ccdb9d8d350e526b809bf3dd92b550a74fe1 upstream. For several MoviNAND eMMC parts, there are known issues with secure erase and secure trim. For these specific MoviNAND devices, we skip these operations. Specifically, there is a bug in the eMMC firmware that causes unrecoverable corruption when the MMC is erased with MMC_CAP_ERASE enabled. References: http://forum.xda-developers.com/showthread.php?t=1644364 https://plus.google.com/111398485184813224730/posts/21pTYfTsCkB#111398485184813224730/posts/21pTYfTsCkB Signed-off-by: Ian Chen <ian.cy.chen@samsung.com> Reviewed-by: Namjae Jeon <linkinjeon@gmail.com> Acked-by: Jaehoon Chung <jh80.chung@samsung.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02perf_event: Switch to internal refcount, fix race with close()Al Viro
commit a6fa941d94b411bbd2b6421ffbde6db3c93e65ab upstream. Don't mess with file refcounts (or keep a reference to file, for that matter) in perf_event. Use explicit refcount of its own instead. Deal with the race between the final reference to event going away and new children getting created for it by use of atomic_long_inc_not_zero() in inherit_event(); just have the latter free what it had allocated and return NULL, that works out just fine (children of siblings of something doomed are created as singletons, same as if the child of leader had been created and immediately killed). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()Miklos Szeredi
commit b161dfa6937ae46d50adce8a7c6b12233e96e7bd upstream. IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02bnx2x: fix 57840_MF pci idYuval Mintz
[ Upstream commit 5c879d2094946081af934739850c7260e8b25d3c ] Commit c3def943c7117d42caaed3478731ea7c3c87190e have added support for new pci ids of the 57840 board, while failing to change the obsolete value in 'pci_ids.h'. This patch does so, allowing the probe of such devices. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02af_packet: don't emit packet on orig fanout groupEric Leblond
[ Upstream commit c0de08d04215031d68fa13af36f347a6cfa252ca ] If a packet is emitted on one socket in one group of fanout sockets, it is transmitted again. It is thus read again on one of the sockets of the fanout group. This result in a loop for software which generate packets when receiving one. This retransmission is not the intended behavior: a fanout group must behave like a single socket. The packet should not be transmitted on a socket if it originates from a socket belonging to the same fanout group. This patch fixes the issue by changing the transmission check to take fanout group info account. Reported-by: Aleksandr Kotov <a1k@mail.ru> Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-02net: Allow driver to limit number of GSO segments per skbBen Hutchings
[ Upstream commit 30b678d844af3305cda5953467005cebb5d7b687 ] A peer (or local user) may cause TCP to use a nominal MSS of as little as 88 (actual MSS of 76 with timestamps). Given that we have a sufficiently prodigious local sender and the peer ACKs quickly enough, it is nevertheless possible to grow the window for such a connection to the point that we will try to send just under 64K at once. This results in a single skb that expands to 861 segments. In some drivers with TSO support, such an skb will require hundreds of DMA descriptors; a substantial fraction of a TX ring or even more than a full ring. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This particularly affects sfc, for which the issue is designated as CVE-2012-3412. Therefore: 1. Add the field net_device::gso_max_segs holding the device-specific limit. 2. In netif_skb_features(), if the number of segments is too high then mask out GSO features to force fall back to software GSO. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-25USB: add USB_VENDOR_AND_INTERFACE_INFO() macroGustavo Padovan
commit d81a5d1956731c453b85c141458d4ff5d6cc5366 upstream. A lot of Broadcom Bluetooth devices provides vendor specific interface class and we are getting flooded by patches adding new device support. This change will help us enable support for any other Broadcom with vendor specific device that arrives in the future. Only the product id changes for those devices, so this macro would be perfect for us: { USB_VENDOR_AND_INTERFACE_INFO(0x0a5c, 0xff, 0x01, 0x01) } Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: Henrik Rydberg <rydberg@bitmath.se> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-25ext4: make sure the journal sb is written in ext4_clear_journal_err()Theodore Ts'o
commit d796c52ef0b71a988364f6109aeb63d79c5b116b upstream. After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15Input: eeti_ts: pass gpio value instead of IRQArnd Bergmann
commit 4eef6cbfcc03b294d9d334368a851b35b496ce53 upstream. The EETI touchscreen asserts its IRQ line as soon as it has data in its internal buffers. The line is automatically deasserted once all data has been read via I2C. Hence, the driver has to monitor the GPIO line and cannot simply rely on the interrupt handler reception. In the current implementation of the driver, irq_to_gpio() is used to determine the GPIO number from the i2c_client's IRQ value. As irq_to_gpio() is not available on all platforms, this patch changes this and makes the driver ignore the passed in IRQ. Instead, a GPIO is added to the platform_data struct and gpio_to_irq is used to derive the IRQ from that GPIO. If this fails, bail out. The driver is only able to work in environments where the touchscreen GPIO can be mapped to an IRQ. Without this patch, building raumfeld_defconfig results in: drivers/input/touchscreen/eeti_ts.c: In function 'eeti_ts_irq_active': drivers/input/touchscreen/eeti_ts.c:65:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration] Signed-off-by: Daniel Mack <zonque@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Sven Neumann <s.neumann@raumfeld.com> Cc: linux-input@vger.kernel.org Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15Yama: higher restrictions should block PTRACE_TRACEMEKees Cook
commit 9d8dad742ad1c74d7e7210ee05d0b44961d5ea16 upstream. The higher ptrace restriction levels should be blocking even PTRACE_TRACEME requests. The comments in the LSM documentation are misleading about when the checks happen (the parent does not go through security_ptrace_access_check() on a PTRACE_TRACEME call). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15ARM: pxa: remove irq_to_gpio from ezx-pcap driverArnd Bergmann
commit 59ee93a528b94ef4e81a08db252b0326feff171f upstream. The irq_to_gpio function was removed from the pxa platform in linux-3.2, and this driver has been broken since. There is actually no in-tree user of this driver that adds this platform device, but the driver can and does get enabled on some platforms. Without this patch, building ezx_defconfig results in: drivers/mfd/ezx-pcap.c: In function 'pcap_isr_work': drivers/mfd/ezx-pcap.c:205:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Daniel Ribeiro <drwyrm@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15random: remove rand_initialize_irq()Theodore Ts'o
commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream. With the new interrupt sampling system, we are no longer using the timer_rand_state structure in the irq descriptor, so we can stop initializing it now. [ Merged in fixes from Sedat to find some last missing references to rand_initialize_irq() ] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15random: add new get_random_bytes_arch() functionTheodore Ts'o
commit c2557a303ab6712bb6e09447df828c557c710ac9 upstream. Create a new function, get_random_bytes_arch() which will use the architecture-specific hardware random number generator if it is present. Change get_random_bytes() to not use the HW RNG, even if it is avaiable. The reason for this is that the hw random number generator is fast (if it is present), but it requires that we trust the hardware manufacturer to have not put in a back door. (For example, an increasing counter encrypted by an AES key known to the NSA.) It's unlikely that Intel (for example) was paid off by the US Government to do this, but it's impossible for them to prove otherwise --- especially since Bull Mountain is documented to use AES as a whitener. Hence, the output of an evil, trojan-horse version of RDRAND is statistically indistinguishable from an RDRAND implemented to the specifications claimed by Intel. Short of using a tunnelling electronic microscope to reverse engineer an Ivy Bridge chip and disassembling and analyzing the CPU microcode, there's no way for us to tell for sure. Since users of get_random_bytes() in the Linux kernel need to be able to support hardware systems where the HW RNG is not present, most time-sensitive users of this interface have already created their own cryptographic RNG interface which uses get_random_bytes() as a seed. So it's much better to use the HW RNG to improve the existing random number generator, by mixing in any entropy returned by the HW RNG into /dev/random's entropy pool, but to always _use_ /dev/random's entropy pool. This way we get almost of the benefits of the HW RNG without any potential liabilities. The only benefits we forgo is the speed/performance enhancements --- and generic kernel code can't depend on depend on get_random_bytes() having the speed of a HW RNG anyway. For those places that really want access to the arch-specific HW RNG, if it is available, we provide get_random_bytes_arch(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15random: create add_device_randomness() interfaceLinus Torvalds
commit a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream. Add a new interface, add_device_randomness() for adding data to the random pool that is likely to differ between two devices (or possibly even per boot). This would be things like MAC addresses or serial numbers, or the read-out of the RTC. This does *not* add any actual entropy to the pool, but it initializes the pool to different values for devices that might otherwise be identical and have very little entropy available to them (particularly common in the embedded world). [ Modified by tytso to mix in a timestamp, since there may be some variability caused by the time needed to detect/configure the hardware in question. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-15random: make 'add_interrupt_randomness()' do something saneTheodore Ts'o
commit 775f4b297b780601e61787b766f306ed3e1d23eb upstream. We've been moving away from add_interrupt_randomness() for various reasons: it's too expensive to do on every interrupt, and flooding the CPU with interrupts could theoretically cause bogus floods of entropy from a somewhat externally controllable source. This solves both problems by limiting the actual randomness addition to just once a second or after 64 interrupts, whicever comes first. During that time, the interrupt cycle data is buffered up in a per-cpu pool. Also, we make sure the the nonblocking pool used by urandom is initialized before we start feeding the normal input pool. This assures that /dev/urandom is returning unpredictable data as soon as possible. (Based on an original patch by Linus, but significantly modified by tytso.) Tested-by: Eric Wustrow <ewust@umich.edu> Reported-by: Eric Wustrow <ewust@umich.edu> Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu> Reported-by: Zakir Durumeric <zakir@umich.edu> Reported-by: J. Alex Halderman <jhalderm@umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09posix_types.h: Cleanup stale __NFDBITS and related definitionsJosh Boyer
commit 8ded2bbc1845e19c771eb55209aab166ef011243 upstream. Recently, glibc made a change to suppress sign-conversion warnings in FD_SET (glibc commit ceb9e56b3d1). This uncovered an issue with the kernel's definition of __NFDBITS if applications #include <linux/types.h> after including <sys/select.h>. A build failure would be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2 flags to gcc. It was suggested that the kernel should either match the glibc definition of __NFDBITS or remove that entirely. The current in-kernel uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no uses of the related __FDELT and __FDMASK defines. Given that, we'll continue the cleanup that was started with commit 8b3d1cda4f5f ("posix_types: Remove fd_set macros") and drop the remaining unused macros. Additionally, linux/time.h has similar macros defined that expand to nothing so we'll remove those at the same time. Reported-by: Jeff Law <law@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Boyer <jwboyer@redhat.com> [ .. and fix up whitespace as per akpm ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09workqueue: perform cpu down operations from low priority cpu_notifier()Tejun Heo
commit 6575820221f7a4dd6eadecf7bf83cdd154335eda upstream. Currently, all workqueue cpu hotplug operations run off CPU_PRI_WORKQUEUE which is higher than normal notifiers. This is to ensure that workqueue is up and running while bringing up a CPU before other notifiers try to use workqueue on the CPU. Per-cpu workqueues are supposed to remain working and bound to the CPU for normal CPU_DOWN_PREPARE notifiers. This holds mostly true even with workqueue offlining running with higher priority because workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which runs the per-cpu workqueue without concurrency management without explicitly detaching the existing workers. However, if the trustee needs to create new workers, it creates unbound workers which may wander off to other CPUs while CPU_DOWN_PREPARE notifiers are in progress. Furthermore, if the CPU down is cancelled, the per-CPU workqueue may end up with workers which aren't bound to the CPU. While reliably reproducible with a convoluted artificial test-case involving scheduling and flushing CPU burning work items from CPU down notifiers, this isn't very likely to happen in the wild, and, even when it happens, the effects are likely to be hidden by the following successful CPU down. Fix it by using different priorities for up and down notifiers - high priority for up operations and low priority for down operations. Workqueue cpu hotplug operations will soon go through further cleanup. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09tun: fix a crash bug and a memory leakMikulas Patocka
commit b09e786bd1dd66418b69348cb110f3a64764626a upstream. This patch fixes a crash tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel -> sock_release -> iput(SOCK_INODE(sock)) introduced by commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d The problem is that this socket is embedded in struct tun_struct, it has no inode, iput is called on invalid inode, which modifies invalid memory and optionally causes a crash. sock_release also decrements sockets_in_use, this causes a bug that "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when creating and closing tun devices. This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs sock_release to not free the inode and not decrement sockets_in_use, fixing both memory corruption and sockets_in_use underflow. It should be backported to 3.3 an 3.4 stabke. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faultsTony Luck
commit 6751ed65dc6642af64f7b8a440a75563c8aab7ae upstream. In commit dad1743e5993f1 ("x86/mce: Only restart instruction after machine check recovery if it is safe") we fixed mce_notify_process() to force a signal to the current process if it was not restartable (RIPV bit not set in MCG_STATUS). But doing it here means that the process doesn't get told the virtual address of the fault via siginfo_t->si_addr. This would prevent application level recovery from the fault. Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so that we will provide the right information with the signal. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-09USB: Remove unused LPM variable.Sarah Sharp
commit c5c4bdf02e518a281b229ae0891b346919e2d291 upstream. hub_initiated_lpm_disable_count is not used by any code, so remove it. This commit should be backported to kernels as old as 3.5, that contain the commit 8306095fd2c1100e8244c09bf560f97aca5a311d "USB: Disable USB 3.0 LPM in critical sections." Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-21Remove SYSTEM_SUSPEND_DISK system stateRafael J. Wysocki
The SYSTEM_SUSPEND_DISK system state is never used, so drop it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-21Merge branch 'anton-kgdb' (kgdb dmesg fixups)Linus Torvalds
Merge emailed kgdb dmesg fixups patches from Anton Vorontsov: "The dmesg command appears to be broken after the printk rework. The old logic in the kdb code makes no sense in terms of current printk/logging storage format, and KDB simply hangs forever upon entering 'dmesg' command. The first patch revives the command by switching to kmsg_dumper iterator. As a side-effect, the code is now much more simpler. A few changes were needed in the printk.c: we needed unlocked variant of the kmsg_dumper iterator, but these can surely wait for 3.6. It's probably too late even for the first patch to go to 3.5, but I'll try to convince otherwise. :-) Here we go: - The current code is broken for sure, and has no hope to work at all. It is a regression - The new code works for me, and probably works for everyone else; - If it compiles (and I urge everyone to compile-test it on your setup), it hardly can make things worse." * Merge emailed patches from Anton Vorontsov: (4 commits) kdb: Switch to nolock variants of kmsg_dump functions printk: Implement some unlocked kmsg_dump functions printk: Remove kdb_syslog_data kdb: Revive dmesg command
2012-07-21printk: Implement some unlocked kmsg_dump functionsAnton Vorontsov
If used from KDB, the locked variants are prone to deadlocks (suppose we got to the debugger w/ the logbuf lock held). So, we have to implement a few routines that grab no logbuf lock. Yet we don't need these functions in modules, so we don't export them. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull last minute Ceph fixes from Sage Weil: "The important one fixes a bug in the socket failure handling behavior that was turned up in some recent failure injection testing. The other two are minor bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: endian bug in rbd_req_cb() rbd: Fix ceph_snap_context size calculation libceph: fix messenger retry
2012-07-18Make wait_for_device_probe() also do scsi_complete_async_scans()Linus Torvalds
Commit a7a20d103994 ("sd: limit the scope of the async probe domain") make the SCSI device probing run device discovery in it's own async domain. However, as a result, the partition detection was no longer synchronized by async_synchronize_full() (which, despite the name, only synchronizes the global async space, not all of them). Which in turn meant that "wait_for_device_probe()" would not wait for the SCSI partitions to be parsed. And "wait_for_device_probe()" was what the boot time init code relied on for mounting the root filesystem. Now, most people never noticed this, because not only is it timing-dependent, but modern distributions all use initrd. So the root filesystem isn't actually on a disk at all. And then before they actually mount the final disk filesystem, they will have loaded the scsi-wait-scan module, which not only does the expected wait_for_device_probe(), but also does scsi_complete_async_scans(). [ Side note: scsi_complete_async_scans() had also been partially broken, but that was fixed in commit 43a8d39d0137 ("fix async probe regression"), so that same commit a7a20d103994 had actually broken setups even if you used scsi-wait-scan explicitly ] Solve this problem by just moving the scsi_complete_async_scans() call into wait_for_device_probe(). Everybody who wants to wait for device probing to finish really wants the SCSI probing to complete, so there's no reason not to do this. So now "wait_for_device_probe()" really does what the name implies, and properly waits for device probing to finish. This also removes the now unnecessary extra calls to scsi_complete_async_scans(). Reported-and-tested-by: Artem S. Tashkinov <t.artem@mailcity.com> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: James Bottomley <jbottomley@parallels.com> Cc: Borislav Petkov <bp@amd64.org> Cc: linux-scsi <linux-scsi@vger.kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-17libceph: fix messenger retrySage Weil
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-17PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPENDMichael Kerrisk
As discussed in http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990, the capability introduced in 4d7e30d98939a0340022ccd49325a3d70f7e0238 to govern EPOLLWAKEUP seems misnamed: this capability is about governing the ability to suspend the system, not using a particular API flag (EPOLLWAKEUP). We should make the name of the capability more general to encourage reuse in related cases. (Whether or not this capability should also be used to govern the use of /sys/power/wake_lock is a question that needs to be separately resolved.) This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure that the old capability name doesn't make it out into the wild, could you please apply and push up the tree to ensure that it is incorporated for the 3.5 release. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-14Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU, perf, and scheduler fixes from Ingo Molnar. The RCU fix is a revert for an optimization that could cause deadlocks. One of the scheduler commits (164c33c6adee "sched: Fix fork() error path to not crash") is correct but not complete (some architectures like Tile are not covered yet) - the resulting additional fixes are still WIP and Ingo did not want to delay these pending fixes. See this thread on lkml: [PATCH] fork: fix error handling in dup_task() The perf fixes are just trivial oneliners. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix segfault with report and mixed guestmount use perf kvm: Fix regression with guest machine creation perf script: Fix format regression due to libtraceevent merge ring-buffer: Fix accounting of entries when removing pages ring-buffer: Fix crash due to uninitialized new_pages list head * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS/sched: Update scheduler file pattern sched/nohz: Rewrite and fix load-avg computation -- again sched: Fix fork() error path to not crash
2012-07-13Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull the leap second fixes from Thomas Gleixner: "It's a rather large series, but well discussed, refined and reviewed. It got a massive testing by John, Prarit and tip. In theory we could split it into two parts. The first two patches f55a6faa3843: hrtimer: Provide clock_was_set_delayed() 4873fa070ae8: timekeeping: Fix leapsecond triggered load spike issue are merely preventing the stuff loops forever issues, which people have observed. But there is no point in delaying the other 4 commits which achieve full correctness into 3.6 as they are tagged for stable anyway. And I rather prefer to have the full fixes merged in bulk than a "prevent the observable wreckage and deal with the hidden fallout later" approach." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Update hrtimer base offsets each hrtimer_interrupt timekeeping: Provide hrtimer update function hrtimers: Move lock held region in hrtimer_interrupt() timekeeping: Maintain ktime_t based offsets for hrtimers timekeeping: Fix leapsecond triggered load spike issue hrtimer: Provide clock_was_set_delayed()
2012-07-11Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge random patches from Andrew Morton. * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (32 commits) memblock: free allocated memblock_reserved_regions later mm: sparse: fix usemap allocation above node descriptor section mm: sparse: fix section usemap placement calculation xtensa: fix incorrect memset shmem: cleanup shmem_add_to_page_cache shmem: fix negative rss in memcg memory.stat tmpfs: revert SEEK_DATA and SEEK_HOLE drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT fat: fix non-atomic NFS i_pos read MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section sgi-xp: nested calls to spin_lock_irqsave() fs: ramfs: file-nommu: add SetPageUptodate() drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails h8300/uaccess: add mising __clear_user() h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user() h8300/time: add missing #include <asm/irq_regs.h> h8300/signal: fix typo "statis" h8300/pgtable: add missing #include <asm-generic/pgtable.h> drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled ...
2012-07-11memblock: free allocated memblock_reserved_regions laterYinghai Lu
memblock_free_reserved_regions() calls memblock_free(), but memblock_free() would double reserved.regions too, so we could free the old range for reserved.regions. Also tj said there is another bug which could be related to this. | I don't think we're saving any noticeable | amount by doing this "free - give it to page allocator - reserve | again" dancing. We should just allocate regions aligned to page | boundaries and free them later when memblock is no longer in use. in that case, when DEBUG_PAGEALLOC, will get panic: memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39 BUG: unable to handle kernel paging request at ffff88102febd948 IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155 PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU 0 Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447 RIP: 0010:[<ffffffff836a5774>] [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155 See the discussion at https://lkml.org/lkml/2012/6/13/469 So try to allocate with PAGE_SIZE alignment and free it later. Reported-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11mm: sparse: fix usemap allocation above node descriptor sectionYinghai Lu
After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint in alloc_bootmem_section"), usemap allocations may easily be placed outside the optimal section that holds the node descriptor, even if there is space available in that section. This results in unnecessary hotplug dependencies that need to have the node unplugged before the section holding the usemap. The reason is that the bootmem allocator doesn't guarantee a linear search starting from the passed allocation goal but may start out at a much higher address absent an upper limit. Fix this by trying the allocation with the limit at the section end, then retry without if that fails. This keeps the fix from f5bf18fa22f8 of not panicking if the allocation does not fit in the section, but still makes sure to try to stay within the section at first. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11memory hotplug: fix invalid memory access caused by stale kswapd pointerJiang Liu
kswapd_stop() is called to destroy the kswapd work thread when all memory of a NUMA node has been offlined. But kswapd_stop() only terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL. The stale pointer will prevent kswapd_run() from creating a new work thread when adding memory to the memory-less NUMA node again. Eventually the stale pointer may cause invalid memory access. An example stack dump as below. It's reproduced with 2.6.32, but latest kernel has the same issue. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81051a94>] exit_creds+0x12/0x78 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/memory/memory391/state CPU 11 Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285 RIP: 0010:exit_creds+0x12/0x78 RSP: 0018:ffff8806044f1d78 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000 RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10 R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000 R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001 FS: 00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600) Stack: ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1 0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003 Call Trace: __put_task_struct+0x5d/0x97 kthread_stop+0x50/0x58 offline_pages+0x324/0x3da memory_block_change_state+0x179/0x1db store_mem_state+0x9e/0xbb sysfs_write_file+0xd0/0x107 vfs_write+0xad/0x169 sys_write+0x45/0x6e system_call_fastpath+0x16/0x1b Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0 RIP exit_creds+0x12/0x78 RSP <ffff8806044f1d78> CR2: 0000000000000000 [akpm@linux-foundation.org: add pglist_data.kswapd locking comments] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-11timekeeping: Provide hrtimer update functionThomas Gleixner
To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-07-11hrtimer: Provide clock_was_set_delayed()John Stultz
clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-07-11Merge tag 'usb-3.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg Kroah-Hartman: "Here are a few fixes and new device ids for the 3.5-rc6 tree. The PCI changes resolve a long-standing issue with resuming some EHCI controllers. It has been acked by the PCI maintainer, and he asked for it to go through my USB tree instead of his. The xhci patches also resolve a number of reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'usb-3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: PCI: EHCI: fix crash during suspend on ASUS computers USB: cdc-wdm: fix lockup on error in wdm_read USB: metro-usb: fix tty_flip_buffer_push use USB: option: Add MEDIATEK product ids USB: option: add ZTE MF60 xhci: Fix hang on back-to-back Set TR Deq Ptr commands. usb: Add support for root hub port status CAS
2012-07-10Merge tag 'fixes-for-v3.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Yes, this is a *LATE* GPIO pull request with fixes for v3.5. Grant moved across the planet and accidentally fell off the grid, so he asked me to take over the GPIO merges for a while 10 days ago. Since then I went over the archives and collected this pile of fixes, and pulled two of them from the TI maintainer Kevin Hilman. Then waited for them to at least hit linux-next once or twice." GPIO fixes for v3.5: - Invalid context restore on bank 0 for OMAP driver in runtime suspend/resume cycle - Check for NULL platform data in sta-2x11 driver - Constrain selection of the V1 MSM GPIO driver to applicable platforms (Kconfig issue) - Make sure the correct output value is set in the wm8994 driver - Export devm_gpio_request_one() so it can be used in modules. Apparently some in-kernel modules can be configured to use this leading to breakage. - Check that the GPIO is valid in the lantiq driver - Fix the flag bits introduced for v3.5, so they don't overlap - Fix a device tree intialization bug for imx21-compatible devices - Carry over the OF node to the TPS65910 GPIO chip struct * tag 'fixes-for-v3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: tps65910: initialize of_node of gpio_chip gpio/mxc: make irqs work for fsl,imx21-gpio devices gpio: fix bits conflict for gpio flags mips: pci-lantiq: Fix check for valid gpio gpio: export devm_gpio_request_one gpiolib: wm8994: Pay attention to the value set when enabling as output gpio/msm_v1: CONFIG_GPIO_MSM_V1 is only available on three SoCs gpio-sta2x11: don't use pdata if null gpio/omap: fix invalid context restore of gpio bank-0 gpio/omap: fix irq loss while in idle with debounce on