git.oblomov.eu Git - linux-2.6/log

ftrace: fix !CONFIG_DYNAMIC_FTRACE ftrace_swapper_pid definition

Impact: build fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>

ftrace: fix !CONFIG_FTRACE [un_]register_ftrace_command() prototypes

Impact: build fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

Merge branches 'tracing/hw-branch-tracing' and 'tracing/power-tracer' into tracing/core

ftrace: add pretty print function for traceon and traceoff hooks

This patch adds a pretty print version of traceon and traceoff
output for set_ftrace_filter.

# echo 'sys_open:traceon:4' > set_ftrace_filter
# cat set_ftrace_filter

#### all functions enabled ####
sys_open:traceon:count=4

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: add pretty print to selected fuction traces

This patch adds a call back for the tracers that have hooks to
selected functions. This allows the tracer to show better output
in the set_ftrace_filter file.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: show selected functions in set_ftrace_filter

This patch adds output to show what functions have tracer hooks
attached to them.

  # echo 'sys_open:traceon:4' > /debug/tracing/set_ftrace_filter
  # cat set_ftrace_filter

#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000004

  # echo 'do_fork:traceoff:' > set_ftrace_filter
  # cat set_ftrace_filter

#### all functions enabled ####
sys_open:ftrace_traceon:0000000000000002
do_fork:ftrace_traceoff:ffffffffffffffff

Note the 4 changed to a 2. This is because The code was executed twice
since the traceoff was added. If a cat is done again:

#### all functions enabled ####
sys_open:ftrace_traceon
do_fork:ftrace_traceoff:ffffffffffffffff

The number disappears. That is because it will not print a NULL.

Callbacks to allow the tracer to pretty print will be implemented soon.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: add traceon traceoff commands to enable/disable the buffers

This patch adds the new function selection commands traceon and
traceoff. traceon sets the function to enable the ring buffers
while traceoff disables the ring buffers. You can pass in the
number of times you want the command to be executed when the function
is hit. It will only execute if the state of the buffers are not
already in that state.

Example:

# echo do_fork:traceon:4

Will enable the ring buffers if they are disabled every time it
hits do_fork, up to 4 times.

# echo sys_close:traceoff

This will disable the ring buffers every time (unlimited) when
sys_close is called.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ring-buffer: add tracing_is_on to test if ring buffer is enabled

This patch adds the tracing_is_on() interface to tell if the ring
buffer is turned on or not.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: trace different functions with a different tracer

Impact: new feature

Currently, the function tracer only gives you an ability to hook
a tracer to all functions being traced. The dynamic function trace
allows you to pick and choose which of those functions will be
traced, but all functions being traced will call all tracers that
registered with the function tracer.

This patch adds a new feature that allows a tracer to hook to specific
functions, even when all functions are being traced. It allows for
different functions to call different tracer hooks.

The way this is accomplished is by a special function that will hook
to the function tracer and will set up a hash table knowing which
tracer hook to call with which function. This is the most general
and easiest method to accomplish this. Later, an arch may choose
to supply their own method in changing the mcount call of a function
to call a different tracer. But that will be an exercise for the
future.

To register a function:

struct ftrace_hook_ops {
void (*func)(unsigned long ip,
unsigned long parent_ip,
void **data);
int (*callback)(unsigned long ip, void **data);
void (*free)(void **data);
};

int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
  void *data);

glob is a simple glob to search for the functions to hook.
ops is a pointer to the operations (listed below)
data is the default data to be passed to the hook functions when traced

ops:
func is the hook function to call when the functions are traced
callback is a callback function that is called when setting up the hash.
   That is, if the tracer needs to do something special for each
   function, that is being traced, and wants to give each function
   its own data. The address of the entry data is passed to this
   callback, so that the callback may wish to update the entry to
   whatever it would like.
free is a callback for when the entry is freed. In case the tracer
   allocated any data, it is give the chance to free it.

To unregister we have three functions:

  void
  unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
void *data)

This will unregister all hooks that match glob, point to ops, and
have its data matching data. (note, if glob is NULL, blank or '*',
all functions will be tested).

  void
  unregister_ftrace_function_hook_func(char *glob,
struct ftrace_hook_ops *ops)

This will unregister all functions matching glob that has an entry
pointing to ops.

  void unregister_ftrace_function_hook_all(char *glob)

This simply unregisters all funcs.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: consolidate mutexes

Impact: clean up

Now that ftrace_lock is a mutex, there is no reason to have three
different mutexes protecting similar data. All the mutex paths
are not in hot paths, so having a mutex to cover more data is
not a problem.

This patch removes the ftrace_sysctl_lock and ftrace_start_lock
and uses the ftrace_lock to protect the locations that were protected
by these locks. By doing so, this change also removes some of
the lock nesting that was taking place.

There are still more mutexes in ftrace.c that can probably be
consolidated, but they can be dealt with later. We need to be careful
about the way the locks are nested, and by consolidating, we can cause
a recursive deadlock.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: convert ftrace_lock from a spinlock to mutex

Impact: clean up

The older versions of ftrace required doing the ftrace list
search under atomic context. Now all the calls are in non-atomic
context. There is no reason to keep the ftrace_lock as a spinlock.

This patch converts it to a mutex.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: add command interface for function selection

Allow for other tracers to add their own commands for function
selection. This interface gives a trace the ability to name a
command for function selection. Right now it is pretty limited
in what it offers, but this is a building step for more features.

The :mod: command is converted to this interface and also serves
as a template for other implementations.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: enable filtering only when a function is filtered on

Impact: fix to prevent empty set_ftrace_filter and no ftrace output

The function filter is used to only trace a given set of functions.
The filter is enabled when a function name is echoed into the
set_ftrace_filter file. But if the name has a typo and the function
is not found, the filter is enabled, but no function is listed.

This makes a confusing situation where set_ftrace_filter is empty
but no functions ever get enabled for tracing.

For example:

# cat /debug/tracing/set_ftrace_filter

  #### all functions enabled ####

# echo bad_name > set_ftrace_filter
# cat /debug/tracing/set_ftrace_filter

# echo function > current_tracer
# cat trace

  # tracer: nop
  #
  #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
  #              | |       |          |         |

This patch changes that to only enable filtering if a function
is set to be filtered on. Now, the filter is not enabled if
a bad name is echoed into set_ftrace_filter.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: add module command function filter selection

This patch adds a "command" syntax to the function filtering files:

  /debugfs/tracing/set_ftrace_filter
  /debugfs/tracing/set_ftrace_notrace

Of the format:  <function>:<command>:<parameter>

The command is optional, and dependent on the command, so are
the parameters.

echo do_fork > set_ftrace_filter

Will only trace 'do_fork'.

echo 'sched_*' > set_ftrace_filter

Will only trace functions starting with the letters 'sched_'.

echo '*:mod:ext3' > set_ftrace_filter

Will trace only the ext3 module functions.

echo '*write*:mod:ext3' > set_ftrace_notrace

Will prevent the ext3 functions with the letters 'write' in
the name from being traced.

echo '!*_allocate:mod:ext3' > set_ftrace_filter

Will remove the functions in ext3 that end with the letters
'_allocate' from the ftrace filter.

Although this patch implements the 'command' format, only the
'mod' command is supported. More commands to follow.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: break up ftrace_match_records into smaller components

Impact: clean up

ftrace_match_records does a lot of things that other features
can use. This patch breaks up ftrace_match_records and pulls
out ftrace_setup_glob and ftrace_match_record.

ftrace_setup_glob prepares a simple glob expression for use with
ftrace_match_record. ftrace_match_record compares a single record
with a glob type.

Breaking this up will allow for more features to run on individual
records.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: rename ftrace_match to ftrace_match_records

Impact: clean up

ftrace_match is too generic of a name. What it really does is
search all records and matches the records with the given string,
and either sets or unsets the functions to be traced depending
on if the parameter 'enable' is set or not.

This allows us to make another function called ftrace_match that
can be used to test a single record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: add do_for_each_ftrace_rec and while_for_each_ftrace_rec

Impact: clean up

To iterate over all the functions that dynamic trace knows about
it requires two for loops. One to iterate over the pages and the
other to iterate over the records within the page.

There are several duplications of these loops in ftrace.c. This
patch creates the macros do_for_each_ftrace_rec and
while_for_each_ftrace_rec to handle this logic, and removes the
duplicate code.

While making this change, I also discovered and fixed a small
bug that one of the iterations should exit the loop after it found the
record it was searching for. This used a break when it should have
used a goto, since there were two loops it needed to break out
from. No real harm was done by this bug since it would only continue
to search the other records, and the code was in a slow path anyway.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ftrace: state that all functions are enabled in set_ftrace_filter

Impact: clean up, make set_ftrace_filter less confusing

The set_ftrace_filter shows only the functions that will be traced.
But when it is empty, it will trace all functions. This can be a bit
confusing.

This patch makes set_ftrace_filter show:

#### all functions enabled ####

When all functions will be traced, and we do not filter only a select
few.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/power-tracer

tracing: fix section mismatch in trace_hw_branches.c

The function bts_trace_init() references a variable
bts_hotcpu_notifier which is marked
as __cpuinitdata. Thus causes section mismatch. This patch fixes it.

LD kernel/trace/built-in.o
WARNING: kernel/trace/built-in.o(.text+0xc90c): Section mismatch in
reference from the function bts_trace_init() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_init() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_init lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.

WARNING: kernel/trace/built-in.o(.text+0xc92a): Section mismatch in
reference from the function bts_trace_reset() to the variable
.cpuinit.data:bts_hotcpu_notifier
The function bts_trace_reset() references
the variable __cpuinitdata bts_hotcpu_notifier.
This is often because bts_trace_reset lacks a __cpuinitdata
annotation or the annotation of bts_hotcpu_notifier is wrong.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: markus.t.metzger@gmail.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core

Conflicts:
kernel/trace/trace_mmiotrace.c

doc: mmiotrace.txt, buffer size control change

Impact: prevents confusing the user when buffer size is inadequate

The tracing framework offers a resizeable buffer, which mmiotrace uses
to record events. If the buffer is full, the following events will be
lost. Events should not be lost, so the documentation instructs the user
to increase the buffer size. The buffer size is set via a debugfs file.

Mmiotrace documentation was not updated the same time the debugfs file
was changed. The old file was tracing/trace_entries and first contained
the number of entries the buffer had space for, per cpu. Nowadays this
file is replaced with the file tracing/buffer_size_kb, which tells the
amount of memory reserved for the buffer, per cpu, in kilobytes.

Previously, a flag had to be toggled via the debugfs file
tracing/tracing_enabled when the buffer size was changed. This is no
longer necessary.

The mmiotrace documentation is updated to reflect the current state of
the tracing framework.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

trace: mmiotrace to the tracer menu in Kconfig

Impact: cosmetic change in Kconfig menu layout

This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.

CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.

Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

mmiotrace: count events lost due to not recording

Impact: enhances lost events counting in mmiotrace

The tracing framework, or the ring buffer facility it uses, has a switch
to stop recording data. When recording is off, the trace events will be
lost. The framework does not count these, so mmiotrace has to count them
itself.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Linux 2.6.29-rc5

Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ASoC: Only register AC97 bus if it's not done already
  ALSA: hda - Add snd_hda_multi_out_dig_cleanup()
  ALSA: hda - Add missing terminator in slave dig-out array
  ALSA: hda - Change HP dv7 (103c:30f4) quirk from hp-m4 to hp-dv5 model
  ALSA: hda - Register (new) devices at reconfig
  ALSA: mtpav - Fix initial value for input hwport
  ALSA: hda - add id for Intel IbexPeak integrated HDMI codec
  ALSA: hda - compute checksum in HDMI audio infoframe
  ALSA: hda - enable HDMI audio pin out at module loading time
  ALSA: hda - allow multi-channel HDMI audio playback when ELD is not present
  ASoC: Update SDP3430 machine driver for snd_soc_card
  ALSA: hda - Add quirk for Asus z37e (1043:8284)
  sound: Remove OSSlib stuff from linux/soundcard.h
  ASoC: WM8990: Fix kcontrol's private value use in put callback
  ASoC: TLV320AIC3X: Fix kcontrol's private value use in put callback

User namespaces: Only put the userns when we unhash the uid

uids in namespaces other than init don't get a sysfs entry.

For those in the init namespace, while we're waiting to remove
the sysfs entry for the uid the uid is still hashed, and
alloc_uid() may re-grab that uid without getting a new
reference to the user_ns, which we've already put in free_user
before scheduling remove_user_sysfs_dir().

Reported-and-tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tracing: convert c/p state power tracer to use tracepoints

Convert the c/p state "power" tracer to use tracepoints. Avoids a
function call when the tracer is disabled.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>

Merge branch 'fix/asoc' into for-linus

Merge branch 'fix/hda' into for-linus

Merge branch 'fix/misc' into for-linus

Merge branch 'fix/oss-header-fix' into for-linus

ASoC: Only register AC97 bus if it's not done already

ASoC supports both explicit codec drivers for AC97 devices and a simple
driver which uses the standard ALSA AC97 framework for codec support.
When used with the generic AC97 codec support that will provide the
ad hoc AC97 device for drivers like touchscreens to attach to so the
core shouldn't do so.

Reported-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

ALSA: hda - Add snd_hda_multi_out_dig_cleanup()

Added the helper function snd_hda_multi_out_dig_cleanup() to clean up
the digital outputs with multi setup. This call is needed in cases
the codec supports multiple digital outputs as slaves. Otherwise the
slave widgets aren't properly cleaned up.

For a single digital output (e.g. in patch_conexant.c), this call isn't
needed.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Add missing terminator in slave dig-out array

Added the missing terminator for ad1989b_slave_dig_outs[].

Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'tip/tracing/core' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

Merge branch 'tip/tracing/ftrace' of git://git./linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace

Merge branches 'tracing/ftrace', 'tracing/ring-buffer', 'tracing/sysprof', 'tracing/urgent' and 'linus' into tracing/core

ALSA: hda - Change HP dv7 (103c:30f4) quirk from hp-m4 to hp-dv5 model

Change HP dv7 quirk: although reported to work with hp-m4 model
(https://bugzilla.novell.com/show_bug.cgi?id=445321), the original
report doesn't contain info about testing of internal microphone.

Recently I received a report about internal mic not working
(https://qa.mandriva.com/show_bug.cgi?id=44855#c193), this must be
related with the forced line in on pin 0x0e done with hp-m4 model. Thus
change the current quirk from STAC_HP_M4 to STAC_HP_DV5, later reported
to be fixed on a provided kernel with this change
(https://qa.mandriva.com/show_bug.cgi?id=44855#c196).

Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
  wimax: fix oops in wimax_dev_get_by_genl_info() when looking up non-wimax iface
  net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2
  netxen: fix compile waring "label ‘set_32_bit_mask’ defined but not used" on IA64 platform
  bnx2: Update version to 1.9.2 and copyright.
  bnx2: Fix jumbo frames error handling.
  bnx2: Update 5709 firmware.
  bnx2: Update 5706/5708 firmware.
  3c505: do not set pcb->data.raw beyond its size
  Documentation/connector/cn_test.c: don't use gfp_any()
  net: don't use in_atomic() in gfp_any()
  IRDA: cnt is off by 1
  netxen: remove pcie workaround
  sun3: print when lance_open() fails
  qlge: bugfix: Add missing rx buf clean index on early exit.
  qlge: bugfix: Fix RX scaling values.
  qlge: bugfix: Fix TSO breakage.
  qlge: bugfix: Add missing dev_kfree_skb_any() call.
  qlge: bugfix: Add missing put_page() call.
  qlge: bugfix: Fix fatal error recovery hang.
  qlge: bugfix: Use netif_receive_skb() and vlan_hwaccel_receive_skb().
  ...

wimax: fix oops in wimax_dev_get_by_genl_info() when looking up non-wimax iface

When a non-wimax interface is looked up by the stack, a bad pointer is
returned when the looked-up interface is not found in the list (of
registered WiMAX interfaces). This causes an oops in the caller when
trying to use the pointer.

Fix by properly setting the pointer to NULL if we don't exit from the
list_for_each() with a found entry.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: 4 bytes kernel memory disclosure in SO_BSDCOMPAT gsopt try #2

In function sock_getsockopt() located in net/core/sock.c, optval v.val
is not correctly initialized and directly returned in userland in case
we have SO_BSDCOMPAT option set.

This dummy code should trigger the bug:

int main(void)
{
unsigned char buf[4] = { 0, 0, 0, 0 };
int len;
int sock;
sock = socket(33, 2, 2);
getsockopt(sock, 1, SO_BSDCOMPAT, &buf, &len);
printf("%x%x%x%x\n", buf[0], buf[1], buf[2], buf[3]);
close(sock);
}

Here is a patch that fix this bug by initalizing v.val just after its
declaration.

Signed-off-by: Clément Lecigne <clement.lecigne@netasq.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: fix compile waring "label ‘set_32_bit_mask’ defined but not used" on IA64 platform

When compile the latest kernel on IA64 platform,I got a warning:
drivers/net/netxen/netxen_nic_main.c:203: warning: label ‘set_32_bit_mask’
defined but not used

We do not need label ‘set_32_bit_mask’ on IA64 platform,So move it to #else.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Update version to 1.9.2 and copyright.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Fix jumbo frames error handling.

If errors are reported on a frame descriptor, we need to
account for the buffer pages that may have been used for this
error packet and recycle them. Otherwise, we may get the wrong
pages for the next packet.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Update 5709 firmware.

New firmware fixes a data corruption issue when receiving and
placing jumbo frames into host buffers. In some cases, the
buffer descriptor is not updated correctly and this will lead
to the driver linking the wrong number of pages into the SKB.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Update 5706/5708 firmware.

New firmware fixes a data corruption issue when receiving and
placing jumbo frames into host buffers. In some cases, the
buffer descriptor is not updated correctly and this will lead
to the driver linking the wrong number of pages into the SKB.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c505: do not set pcb->data.raw beyond its size

Ensure that we do not set pcb->data.raw beyond its size, print an error message
and return false if we attempt to. A timout message was printed one too early.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation/connector/cn_test.c: don't use gfp_any()

cn_test_timer_func() is a timer handler and can never use GFP_KERNEL -
there's no point in using gfp_any() here.

Also, use setup_timer().

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: don't use in_atomic() in gfp_any()

The problem is that in_atomic() will return false inside spinlocks if
CONFIG_PREEMPT=n.  This will lead to deadlockable GFP_KERNEL allocations
from spinlocked regions.

Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking
will instead use GFP_ATOMIC from this callsite.  Hence we won't get the
might_sleep() debugging warnings which would have informed us of the buggy
callsites.

Solve both these problems by switching to in_interrupt().  Now, if someone
runs a gfp_any() allocation from inside spinlock we will get the warning
if CONFIG_PREEMPT=y.

I reviewed all callsites and most of them were too complex for my little
brain and none of them documented their interface requirements.  I have no
idea what this patch will do.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

IRDA: cnt is off by 1

If no prior break occurs, cnt reaches 101 after the loop, so we are still able
to change speed when cnt has become 100.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: remove pcie workaround

Remove workaround for pcie bug in early revisions of NX3031
(rev 41 or earlier). This is taken care of during firmware init.

The workaround required writing pcie config reg of every
pcie function on a card, not all of which are enabled.

Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sun3: print when lance_open() fails

With while (--i > 0) { ... } i reaches 0; print when lance_open() fails

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: bugfix: Add missing rx buf clean index on early exit.

The large receive buffer queue is not properly tracking the current
index in the case where an early exit occurs. This can happen when a
page alloc or dma mapping fails. If this occurs the queue will get
out of sync and invalid indexes can be written to the hardware.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: bugfix: Fix RX scaling values.

Receive packets were only scaling across 2 of the receive queues. The
value was hardcoded to 2 instead of being based on how many rx queues
were running.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: bugfix: Fix TSO breakage.

Moved the buffer mapping to a point after TSO logic has modified the
iph->check field. We were seeing stale data on the PCIe bus.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: bugfix: Add missing dev_kfree_skb_any() call.

We put the skb back if we can't get mapping for it. We don't
want unmapped buffers on our receive buffer queue.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: bugfix: Add missing put_page() call.

We put the page back if we can't get mapping for it. We don't
want unmapped buffers on our receive buffer queue.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: bugfix: Fix fatal error recovery hang.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: bugfix: Use netif_receive_skb() and vlan_hwaccel_receive_skb().

Replace calls to vlan_hwaccel_rx() and netif_rx().
Thanks to Dave Miller for pointing out the the driver was making
the wrong upcall for passing packets into the stack.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

TG3: limit reaches -1

With while (limit--) { ... } limit reaches -1, so 0 means success.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sched: do not account for NMIs

Impact: avoid corruption in system time accounting

Martin Schwidefsky told me that there was an issue with NMIs and
system accounting. The problem is that the accounting code is
not reentrant, and if an NMI goes off after an interrupt it can
corrupt the accounting.

For now, the best we can do is to treat NMIs like SMIs and they
are not accounted for.

This patch changes nmi_enter to not call __irq_enter and to do
the preempt-count and tracing calls directly.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

ring-buffer: rename label out_unlock to out_reset

Impact: clean up

While reviewing the ring buffer code, I thougth I saw a bug with

if (!__raw_spin_trylock(&cpu_buffer->lock))
goto out_unlock;

But I forgot that we use a variable "lock_taken" that is set if
the spinlock is taken, and only unlock it if that variable is set.

To avoid further confusion from other reviewers, this patch
renames the label out_unlock with out_reset, which is the more
appropriate name.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/penberg/slab-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
mm: Export symbol ksize()

preempt-count: force hardirq-count to max of 10

To add a bit in the preempt_count to be set when in NMI context, we
found that some archs did not have enough bits to spare. This is
due to the hardirq_count being a mask that can hold NR_IRQS.

Some archs allow for over 16000 IRQs, and that would require a mask
of 14 bits. The sofitrq mask is 8 bits and the preempt disable mask
is also 8 bits. The PREEMP_ACTIVE bit is bit 30, and bit 31 would
make the preempt_count (which is type int) a negative number.
A negative preempt_count is a sign of failure.

Add them up 14+8+8+1+1 you get 32 bits. No room for the NMI bit.

But the hardirq_count is to track the number of nested IRQs, not
the number of total IRQs. This originally took the paranoid approach
of setting the max nesting to NR_IRQS. But when we have archs with
over 1000 IRQs, it is not practical to think they will ever all
nest on a single CPU. Not to mention that this would most definitely
cause a stack overflow.

This patch sets a max of 10 bits to be used for IRQ nesting.
I did a 'git grep HARDIRQ' to examine all users of HARDIRQ_BITS and
HARDIRQ_MASK, and found that making it a max of 10 would not hurt
anyone. I did find that the m68k expected it to be 8 bits, so
I allow for the archs to set the number to be less than 10.

I removed the setting of HARDIRQ_BITS from the archs that set it
to more than 10. This includes ALPHA, ia64 and avr32.

This will always allow room for the NMI bit, and if we need to allow
for NMI nesting, we have 4 bits to play with.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

Fix page writeback thinko, causing Berkeley DB slowdown

A bug was introduced into write_cache_pages cyclic writeout by commit
31a12666d8f0c22235297e1c1575f82061480029 ("mm: write_cache_pages cyclic
fix").  The intention (and comments) is that we should cycle back and
look for more dirty pages at the beginning of the file if there is no
more work to be done.

But the !done condition was dropped from the test.  This means that any
time the page writeout loop breaks (eg.  due to nr_to_write == 0), we
will set index to 0, then goto again.  This will set done_index to
index, then find done is set, so will proceed to the end of the
function.  When updating mapping->writeback_index for cyclic writeout,
we now use done_index == 0, so we're always cycling back to 0.

This seemed to be causing random mmap writes (slapadd and iozone) to
start writing more pages from the LRU and writeout would slowdown, and
caused bugzilla entry

http://bugzilla.kernel.org/show_bug.cgi?id=12604

about Berkeley DB slowing down dramatically.

With this patch, iozone random write performance is increased nearly
5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reported-and-tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: Export symbol ksize()

Commit 7b2cd92adc5430b0c1adeb120971852b4ea1ab08 ("crypto: api - Fix
zeroing on free") added modular user of ksize(). Export that to fix
crypto.ko compilation.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>

Merge git://git.infradead.org/users/cbou/battery-2.6.29

* git://git.infradead.org/users/cbou/battery-2.6.29:
pcf50633_charger: Fix typo

ALSA: hda - Register (new) devices at reconfig

The devices that have been newly added during reconfig must be
registered. Otherwise they won't be visible to user-space.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: mtpav - Fix initial value for input hwport

Fix the initial value for input hwport. The old value (-1) may cause
Oops when an realtime MIDI byte is received before the input port is
explicitly given.
Instead, now it's set to the broadcasting as default.

Tested-by: Holger Dehnhardt <dehnhardt@ahdehnhardt.de>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

w1: w1 temp calculation overflow fix

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12646

When the temperature exceeds 32767 milli-degrees the temperature overflows
to -32768 millidegrees. These are bothe well within the -55 - +125 degree
range for the sensor.

Fix overflow in left-shift of a u8.

Signed-off-by: Ian Dall <ian@beware.dropbear.id.au>
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nbd: fix I/O hang on disconnected nbds

Fix a problem that causes I/O to a disconnected (or partially initialized)
nbd device to hang indefinitely.  To reproduce:

# ioctl NBD_SET_SIZE_BLOCKS /dev/nbd23 514048
# dd if=/dev/nbd23 of=/dev/null bs=4096 count=1

...hangs...

This can also occur when an nbd device loses its nbd-client/server
connection.  Although we clear the queue of any outstanding I/Os after the
client/server connection fails, any additional I/Os that get queued later
will hang.

This bug may also be the problem reported in this bug report:
http://bugzilla.kernel.org/show_bug.cgi?id=12277

Testing would need to be performed to determine if the two issues are the
same.

This problem was introduced by the new request handling thread code ("NBD:
allow nbd to be used locally", 3/2008), which entered into mainline around
2.6.25.

The fix, which is fairly simple, is to restore the check for lo->sock
being NULL in do_nbd_request.  This causes I/O to an uninitialized nbd to
immediately fail with an I/O error, as it did prior to the introduction of
this bug.

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Reported-by: Jon Nelson <jnelson-kernel-bugzilla@jamponi.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: rearrange exit_mmap() to unlock before arch_exit_mmap

Christophe Saout reported [in precursor to:
http://marc.info/?l=linux-kernel&m=123209902707347&w=4]:

> Note that I also some a different issue with CONFIG_UNEVICTABLE_LRU.
> Seems like Xen tears down current->mm early on process termination, so
> that __get_user_pages in exit_mmap causes nasty messages when the
> process had any mlocked pages.  (in fact, it somehow manages to get into
> the swapping code and produces a null pointer dereference trying to get
> a swap token)

Jeremy explained:

Yes.  In the normal case under Xen, an in-use pagetable is "pinned",
meaning that it is RO to the kernel, and all updates must go via hypercall
(or writes are trapped and emulated, which is much the same thing).  An
unpinned pagetable is not currently in use by any process, and can be
directly accessed as normal RW pages.

As an optimisation at process exit time, we unpin the pagetable as early
as possible (switching the process to init_mm), so that all the normal
pagetable teardown can happen with direct memory accesses.

This happens in exit_mmap() -> arch_exit_mmap().  The munlocking happens
a few lines below.  The obvious thing to do would be to move
arch_exit_mmap() to below the munlock code, but I think we'd want to
call it even if mm->mmap is NULL, just to be on the safe side.

Thus, this patch:

exit_mmap() needs to unlock any locked vmas before calling arch_exit_mmap,
as the latter may switch the current mm to init_mm, which would cause the
former to fail.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christophe Saout <christophe@saout.de>
Cc: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: Christophe Saout <christophe@saout.de>
Cc: Alex Williamson <alex.williamson@hp.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

parport: parport_serial, don't bind netmos ibm 0299

Since netmos 9835 with subids 0x1014(IBM):0x0299 is now bound with
serial/8250_pci, because it has no parallel ports and subdevice id isn't
in the expected form, return -ENODEV from probe function.

This is performed in netmos preinit_hook.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

writeback: fix break condition

Commit dcf6a79dda5cc2a2bec183e50d829030c0972aaa ("write-back: fix
nr_to_write counter") fixed nr_to_write counter, but didn't set the break
condition properly.

If nr_to_write == 0 after being decremented it will loop one more time
before setting done = 1 and breaking the loop.

[akpm@linux-foundation.org: coding-style fixes]
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

syscall define: fix uml compile bug

With the new system call defines we get this on uml:

arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x308): undefined reference to `sys_sigprocmask'

Reason for this is that uml passes the preprocessor option
-Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel.
This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to
SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system
call named sys_kernel_sigprocmask. However sys_sigprocmask is missing
because of this.

To avoid macro expansion for the system call name just concatenate the
name at first define instead of carrying it through severel levels.
This was pointed out by Al Viro.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ext2/xip: refuse to change xip flag during remount with busy inodes

For a reason that I was unable to understand in three months of debugging,
mount ext2 -o remount stopped working properly when remounting from
regular operation to xip, or the other way around.  According to a git
bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
rework in the vm:

commit 70688e4dd1647f0ceb502bbd5964fa344c5eb411
Author: Nick Piggin <npiggin@suse.de>
Date:   Mon Apr 28 02:13:02 2008 -0700

    xip: support non-struct page backed memory

In the failing scenario, the filesystem is mounted read only via root=
kernel parameter on s390x.  During remount (in rc.sysinit), the inodes of
the bash binary and its libraries are busy and cannot be invalidated (the
bash which is running rc.sysinit resides on subject filesystem).
Afterwards, another bash process (running ifup-eth) recurses into a
subshell, runs dup_mm (via fork).  Some of the mappings in this bash
process were created from inodes that could not be invalidated during
remount.

Both parent and child process crash some time later due to inconsistencies
in their address spaces.  The issue seems to be timing sensitive, various
attempts to recreate it have failed.

This patch refuses to change the xip flag during remount in case some
inodes cannot be invalidated.  This patch keeps users from running into
that issue.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroups: fix lockdep subclasses overflow

I enabled all cgroup subsystems when compiling kernel, and then:
# mount -t cgroup -o net_cls xxx /mnt
# mkdir /mnt/0

This showed up immediately:
BUG: MAX_LOCKDEP_SUBCLASSES too low!
turning off the locking correctness validator.

It's caused by the cgroup hierarchy lock:
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
if (ss->root == root)
mutex_lock_nested(&ss->hierarchy_mutex, i);
}

Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
MAX_LOCKDEP_SUBCLASSES is 8.

This patch uses different lockdep keys for different subsystems.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cgroups: add Li Zefan as a maintainer

Add Li Zefan as co-maintainer.

Acked-by: Paul Menage <menage@google.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: t reaches -1, tested 0

With a postfix decrement t will reach -1 rather than 0, so neither the
warning nor the `goto error_out' will occur.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Manuel Lauss <mano@roarinelk.homelinux.net>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel-doc: fix syscall wrapper processing

Fix kernel-doc processing of SYSCALL wrappers.

The SYSCALL wrapper patches played havoc with kernel-doc for
syscalls.  Syscalls that were scanned for DocBook processing
reported warnings like this one, for sys_tgkill:

Warning(kernel/signal.c:2285): No description found for parameter 'tgkill'
Warning(kernel/signal.c:2285): No description found for parameter 'pid_t'
Warning(kernel/signal.c:2285): No description found for parameter 'int'

because the macro parameters all "look like" function parameters,
although they are not:

/**
*  sys_tgkill - send signal to one specific thread
*  @tgid: the thread group ID of the thread
*  @pid: the PID of the thread
*  @sig: signal to be sent
*
*  This syscall also checks the @tgid and returns -ESRCH even if the PID
*  exists but it's not belonging to the target process anymore. This
*  method solves the problem of threads exiting and PIDs getting reused.
*/
SYSCALL_DEFINE3(tgkill, pid_t, tgid, pid_t, pid, int, sig)
{
...

This patch special-cases the handling SYSCALL_DEFINE* function
prototypes by expanding them to
long sys_foobar(type1 arg1, type1 arg2, ...)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel-doc: preferred ending marker and examples

Fix kernel-doc-nano-HOWTO.txt to use */ as the ending marker in kernel-doc
examples and state that */ is the preferred ending marker.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-by: Robert Love <robert.w.love@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: use __GFP_NOWARN in page cgroup allocation

page_cgroup's page allocation at init/memory hotplug uses kmalloc() and
vmalloc(). If kmalloc() failes, vmalloc() is used.

This is because vmalloc() is very limited resource on 32bit systems.
We want to use kmalloc() first.

But in this kind of call, __GFP_NOWARN should be specified.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

video/framebuffer: move the probe func into .devinit.text in Blackfin LCD driver

Signed-off-by: Uwe Kleine-Koenig <ukleinek@strlen.de>
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tpm: correct email address for tpm_infineon-driver

Update my email address.

Signed-off-by: Marcel Selhorst <m.selhorst@sirrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix mlocked page counter mismatch

When I tested following program, I found that the mlocked counter
is strange.  It cannot free some mlocked pages.

It is because try_to_unmap_file() doesn't check real
page mappings in vmas.

That is because the goal of an address_space for a file is to find all
processes into which the file's specific interval is mapped.  It is
related to the file's interval, not to pages.

Even if the page isn't really mapped by the vma, it returns SWAP_MLOCK
since the vma has VM_LOCKED, then calls try_to_mlock_page.  After this the
mlocked counter is increased again.

COWed anon page in a file-backed vma could be a such case.  This patch
resolves it.

-- my test program --

int main()
{
       mlockall(MCL_CURRENT);
       return 0;
}

-- before --

root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable:           0 kB
Mlocked:               0 kB

-- after --

root@barrios-target-linux:~# cat /proc/meminfo | egrep 'Mlo|Unev'
Unevictable:           8 kB
Mlocked:               8 kB

Signed-off-by: MinChan Kim <minchan.kim@gmail.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ext3: revert "ext3: wait on all pending commits in ext3_sync_fs"

This reverts commit c87591b719737b4e91eb1a9fa8fd55a4ff1886d6.

Since journal_start_commit() is now fixed to return 1 when we started a
transaction commit, there's some transaction waiting to be committed or
there's a transaction already committing, we don't need to call
ext3_force_commit() in ext3_sync_fs(). Furthermore ext3_force_commit()
can unnecessarily create sync transaction which is expensive so it's
worthwhile to remove it when we can.

Cc: Eric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

jbd: fix return value of journal_start_commit()

journal_start_commit() returns 1 if either a transaction is committing or
the function has queued a transaction commit.  But it returns 0 if we
raced with somebody queueing the transaction commit as well.  This
resulted in ext3_sync_fs() not functioning correctly (description from
Arthur Jones): In the case of a data=ordered umount with pending long
symlinks which are delayed due to a long list of other I/O on the backing
block device, this causes the buffer associated with the long symlinks to
not be moved to the inode dirty list in the second phase of fsync_super.
Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT
flag and the dirty pages are never written to the backing block device,
causing long symlink corruption and exposing new or previously freed block
data to userspace.

This can be reproduced with a script created by Eric Sandeen
<sandeen@redhat.com>:

        #!/bin/bash

        umount /mnt/test2
        mount /dev/sdb4 /mnt/test2
        rm -f /mnt/test2/*
        dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
        touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
        ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
        /mnt/test2/link
        umount /mnt/test2
        mount /dev/sdb4 /mnt/test2
        ls /mnt/test2/

This patch fixes journal_start_commit() to always return 1 when there's
a transaction committing or queued for commit.

Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Mike Snitzer <snitzer@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches

We need to pass an unsigned long as the minimum, because it gets casted
to an unsigned long in the sysctl handler. If we pass an int, we'll
access four more bytes on 64bit arches, resulting in a random minimum
value.

[rientjes@google.com: fix type of `old_bytes']
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gx1fb: properly alloc cmap and plug cmap leak

We weren't properly allocating the cmap for depths greater than 8bpp,
which caused pain for things like DirectFB. Also, we never freed the cmap
memory upon module unload..

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Marco La Porta <marco-laporta@tiscali.it>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gxfb: properly alloc cmap and plug cmap leak

We weren't properly allocating the cmap for depths greater than 8bpp,
which caused pain for things like DirectFB. Also, we never freed the cmap
memory upon module unload..

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Marco La Porta <marco-laporta@tiscali.it>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

lxfb: properly alloc cmap in all cases and don't leak the memory

We weren't properly allocating the cmap for depths greater than 8bpp,
which caused pain for things like DirectFB. Also, we never freed the cmap
memory upon module unload..

[dilinger@debian.org: dropped unnecessary code and clean up patch]
[dilinger@debian.org: add error checking and handling]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtc: update maintainership of pxa rtc driver

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

migration: migrate_vmas should check "vma"

migrate_vmas() should check "vma" not "vma->vm_next" for for-loop condition.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Do not account for hugetlbfs quota at mmap() time if mapping [SHM|MAP]_NORESERVE

Commit 5a6fe125950676015f5108fb71b2a67441755003 brought hugetlbfs more
in line with the core VM by obeying VM_NORESERVE and not reserving
hugepages for both shared and private mappings when [SHM|MAP]_NORESERVE
are specified.  However, it is still taking filesystem quota
unconditionally.

At fault time, if there are no reserves and attempt is made to allocate
the page and account for filesystem quota.  If either fail, the fault
fails.  The impact is that quota is getting accounted for twice.  This
patch partially reverts 5a6fe125950676015f5108fb71b2a67441755003.  To
help prevent this mistake happening again, it improves the documentation
of hugetlb_reserve_pages()

Reported-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

iwlwifi: fix suspend/resume and its usage of pci saved state

Here we do two things:

First, revert "iwlwifi: save PCI state before suspend, restore after
resume". That misguided patch led to being unable to use iwlwifi
devices after resume.

Next, indicate to PCI driver that the saved PCI state is valid during suspend.

We restore PCI state and enable the device when network interface is created,
similarly PCI state is saved and the device is disabled when network interface
is removed. Thus, when .suspend is called the PCI state is saved and device
is disabled. This is the case even if an interface is never created as PCI
state is saved and device disabled during .probe.

PCI driver assumes PCI state is saved in .suspend. Saving the state at this
time will save state of disabled device and thus cause problems during
resume (resuming a disabled device). We thus indicate directly to PCI
driver that current PCI saved state is valid.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Alex Riesen <fork0@users.sf.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

zd1211rw: treat MAXIM_NEW_RF(0x08) as UW2453_RF(0x09) for TP-Link WN322/422G

Three people (Petr Mensik <pihhan@cipis.net>
["si" should be U+0161 U+00ED], Stephen Ho <stephenhoinhk@gmail.com>
on zd1211-devs and Ismael Ojeda Perez <iojedaperez@gmail.com>
on linux-wireless) reported success in getting TP-Link WN322G/WN422G
working by treating MAXIM_NEW_RF(0x08) as UW2453_RF(0x09) for rf
chip hardware initialization.

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Tested-by: Petr Mensik <pihhan@cipis.net>
Tested-by: Stephen Ho <stephenhoinhk@gmail.com>
Tested-by: Ismael Ojeda Perez <iojedaperez@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

zd1211rw: adding 0ace:0xa211 as a ZD1211 device

Christoph Biedl <sourceforge.bnwi@manchmal.in-ulm.de> reported success
in the sourceforge zd1211 mailing list on this addition. This product ID
was supported by the vendor driver ZD1211LnxDrv 2.22.0.0 (and possibly
earlier) and it probably should have been added earlier.

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Tested-by: Christoph Biedl <sourceforge.bnwi@manchmal.in-ulm.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

mac80211: restrict to AP in outgoing interface heuristic

We try to find the correct outgoing interface for injected frames
based on the TA, but since this is a hack for hostapd 11w, restrict
the heuristic to AP mode interfaces. At some point we'll add the
ability to give an interface index in radiotap or so and just
remove this heuristic again.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: stable@kernel.org [2.6.28.x]
Signed-off-by: John W. Linville <linville@tuxdriver.com>