linux-2.6
16 years agoip: convert to net_device_ops for ioctl
Stephen Hemminger [Thu, 20 Nov 2008 05:52:05 +0000 (21:52 -0800)] 
ip: convert to net_device_ops for ioctl

Convert to net_device_ops function table pointer for ioctl.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agomacvlan: convert to net_device_ops
Stephen Hemminger [Thu, 20 Nov 2008 05:51:06 +0000 (21:51 -0800)] 
macvlan: convert to net_device_ops

Convert to net_device_ops function table.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoveth: convert to net_device_ops
Stephen Hemminger [Thu, 20 Nov 2008 05:50:10 +0000 (21:50 -0800)] 
veth: convert to net_device_ops

Convert to net_device_ops function table.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agobridge: convert to net_device_ops
Stephen Hemminger [Thu, 20 Nov 2008 05:49:00 +0000 (21:49 -0800)] 
bridge: convert to net_device_ops

Convert to net_device_ops function table.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoifb: convert to net_device_ops
Stephen Hemminger [Thu, 20 Nov 2008 05:47:07 +0000 (21:47 -0800)] 
ifb: convert to net_device_ops

Convert to new network device ops interface.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetdev: convert loopback to net_device_ops
Stephen Hemminger [Thu, 20 Nov 2008 05:46:18 +0000 (21:46 -0800)] 
netdev: convert loopback to net_device_ops

First device to convert over is the loopback device.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetdev: expose ethernet address primitives
Stephen Hemminger [Thu, 20 Nov 2008 06:42:31 +0000 (22:42 -0800)] 
netdev: expose ethernet address primitives

When ethernet devices are converted, the function pointer setup
by eth_setup() need to be done during intialization.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetdev: introduce dev_get_stats()
Stephen Hemminger [Thu, 20 Nov 2008 05:40:23 +0000 (21:40 -0800)] 
netdev: introduce dev_get_stats()

In order for the network device ops get_stats call to be immutable, the handling
of the default internal network device stats block has to be changed. Add a new
helper function which replaces the old use of internal_get_stats.

Note: change return code to make it clear that the caller should not
go changing the returned statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetdev: network device operations infrastructure
Stephen Hemminger [Thu, 20 Nov 2008 05:32:24 +0000 (21:32 -0800)] 
netdev: network device operations infrastructure

This patch changes the network device internal API to move adminstrative
operations out of the network device structure and into a separate structure.

This patch involves some hackery to maintain compatablity between the
new and old model, so all 300+ drivers don't have to be changed at once.
For drivers that aren't converted yet, the netdevice_ops virt function list
still resides in the net_device structure. For old protocols, the new
net_device_ops are copied out to the old net_device pointers.

After the transistion is completed the nag message can be changed to
an WARN_ON, and the compatiablity code can be made configurable.

Some function pointers aren't moved:
* destructor can't be in net_device_ops because
  it may need to be referenced after the module is unloaded.
* neighbor setup is manipulated in a couple of places that need special
  consideration
* hard_start_xmit is in the fast path for transmit.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: af_unix should use KERN_INFO instead of KERN_DEBUG
Eric Dumazet [Wed, 19 Nov 2008 23:48:09 +0000 (15:48 -0800)] 
net: af_unix should use KERN_INFO instead of KERN_DEBUG

As spotted by Joe Perches, we should use KERN_INFO in unix_sock_destructor()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodrivers/s390/ - csum_partial - remove unnecessary casts
Joe Perches [Wed, 19 Nov 2008 23:45:15 +0000 (15:45 -0800)] 
drivers/s390/ - csum_partial - remove unnecessary casts

    The first argument to csum_partial is const void *
    casts to char/u8 * are not necessary

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoinclude/net net/ - csum_partial - remove unnecessary casts
Joe Perches [Wed, 19 Nov 2008 23:44:53 +0000 (15:44 -0800)] 
include/net net/ - csum_partial - remove unnecessary casts

The first argument to csum_partial is const void *
casts to char/u8 * are not necessary

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: inet_diag_handler structs can be const
Eric Dumazet [Wed, 19 Nov 2008 23:43:27 +0000 (15:43 -0800)] 
net: inet_diag_handler structs can be const

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: make /proc/net/protocols namespace aware
Eric Dumazet [Wed, 19 Nov 2008 23:14:01 +0000 (15:14 -0800)] 
net: make /proc/net/protocols namespace aware

Converting /proc/net/protocols to be namespace aware is quite easy
and permits us to use sock_prot_inuse_get().

This provides seperate counters for each protocol. For example
we can really count TCPv6 sockets and TCPv4 sockets, while previously,
we had the same value, and this value was not namespace aware.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: af_packet should update its inuse counter
Eric Dumazet [Wed, 19 Nov 2008 22:25:35 +0000 (14:25 -0800)] 
net: af_packet should update its inuse counter

This patch is a preparation to namespace conversion of /proc/net/protocols

In order to have relevant information for PACKET protocols, we should use
sock_prot_inuse_add() to update a (percpu and pernamespace) counter of
inuse sockets.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Wed, 19 Nov 2008 07:38:23 +0000 (23:38 -0800)] 
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

Conflicts:

drivers/isdn/i4l/isdn_net.c
fs/cifs/connect.c

16 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Tue, 18 Nov 2008 16:07:51 +0000 (08:07 -0800)] 
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: hold extra reference to bio in blk_rq_map_user_iov()
  relay: fix cpu offline problem
  Release old elevator on change elevator
  block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash
  block/md: fix md autodetection
  block: make add_partition() return pointer to hd_struct
  block: fix add_partition() error path

16 years agosuspend: use WARN not WARN_ON to print the message
Arjan van de Ven [Tue, 18 Nov 2008 14:56:51 +0000 (06:56 -0800)] 
suspend: use WARN not WARN_ON to print the message

By using WARN(), kerneloops.org can collect which component is causing
the delay and make statistics about that. suspend_test_finish() is
currently the number 2 item but unless we can collect who's causing
it we're not going to be able to fix the hot topic ones..

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 18 Nov 2008 16:06:35 +0000 (08:06 -0800)] 
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kernel/profile.c: fix section mismatch warning
  function tracing: fix wrong pos computing when read buffer has been fulfilled
  tracing: fix mmiotrace resizing crash
  ring-buffer: no preempt for sched_clock()
  ring-buffer: buffer record on/off switch

16 years agoMerge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 18 Nov 2008 16:06:21 +0000 (08:06 -0800)] 
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpuset: fix regression when failed to generate sched domains
  sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers
  sched: fix kernel warning on /proc/sched_debug access
  sched: correct sched-rt-group.txt pathname in init/Kconfig

16 years agoMerge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 18 Nov 2008 16:06:00 +0000 (08:06 -0800)] 
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  swiotlb: use coherent_dma_mask in alloc_coherent
  MAINTAINERS: remove me as RAID maintainer

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney...
Linus Torvalds [Tue, 18 Nov 2008 16:05:43 +0000 (08:05 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/cooloney/blackfin-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
  Blackfin arch: fix a broken define in dma-mapping
  Blackfin arch: fix bug - Turn on DEBUG_DOUBLEFAULT, booting SMP kernel crash
  Blackfin arch: fix bug - shared lib function in L2 failed be called
  Blackfin arch: fix incorrect limit check for bf54x check_gpio
  Blackfin arch: fix bug - Cpufreq assumes clocks in kHz and not Hz.
  Blackfin arch: dont warn when running a kernel on the oldest supported silicon
  Blackfin arch: fix bug - kernel build with write back policy fails to be booted up
  Blackfin arch: fix bug - dmacopy test case fail on all platform
  Blackfin arch: Fix typo when adding CONFIG_DEBUG_VERBOSE
  Blackfin arch: don't copy bss when copying L1
  Blackfin arch: fix bug - Fail to boot jffs2 kernel for BF561 with SMP patch
  Blackfin arch: handle case of d_path() returning error in decode_address()

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Tue, 18 Nov 2008 16:05:05 +0000 (08:05 -0800)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix resume of GPIO unsol event for STAC/IDT
  ALSA: hda - Add quirks for HP Pavilion DV models
  ALSA: hda - Fix GPIO initialization in patch_stac92hd71bxx()
  ALSA: hda - Check model type instead of SSID in patch_92hd71bxx()
  ALSA: sound/pci/pcxhr/pcxhr.c: introduce missing kfree and pci_disable_device
  ALSA: hda: STAC_VREF_EVENT value change
  ALSA: hda - Missing NULL check in hda_beep.c
  ALSA: hda - Add digital beep playback switch for STAC/IDT codecs

16 years agoblock: hold extra reference to bio in blk_rq_map_user_iov()
Jens Axboe [Tue, 18 Nov 2008 14:07:05 +0000 (15:07 +0100)] 
block: hold extra reference to bio in blk_rq_map_user_iov()

If the size passed in is OK but we end up mapping too many segments,
we call the unmap path directly like from IO completion. But from IO
completion we have an extra reference to the bio, so this error case
goes OOPS when it attempts to free and already free bio.

Fix it by getting an extra reference to the bio before calling the
unmap failure case.

Reported-by: Petr Vandrovec <vandrove@vc.cvut.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agorelay: fix cpu offline problem
Lai Jiangshan [Fri, 14 Nov 2008 09:44:59 +0000 (10:44 +0100)] 
relay: fix cpu offline problem

relay_open() will close allocated buffers when failed.
but if cpu offlined, some buffer will not be closed.
this patch fixed it.

and did cleanup for relay_reset() too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoRelease old elevator on change elevator
Zhaolei [Fri, 14 Nov 2008 08:44:33 +0000 (09:44 +0100)] 
Release old elevator on change elevator

We should release old elevator when change to use a new one.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash
Zhang, Yanmin [Fri, 14 Nov 2008 07:26:30 +0000 (08:26 +0100)] 
block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash

We run into system boot failure with kernel 2.6.28-rc. We found it on a
couple of machines, including T61 notebook, nehalem machine, and another
HPC NX6325 notebook.  All the machines use FedoraCore 8 or FedoraCore 9.
With kernel prior to 2.6.28-rc, system boot doesn't fail.

I debug it and locate the root cause. Pls. see
http://bugzilla.kernel.org/show_bug.cgi?id=11899
https://bugzilla.redhat.com/show_bug.cgi?id=471517

As a matter of fact, there are 2 bugs.

1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times
and fails once. nash has a bug. Some of its functions misuse return
value 0.  Sometimes, 0 means timeout and no uevent available. Sometimes,
0 means nash gets an uevent, but the uevent isn't block-related (for
exmaple, usb). If by coincidence, kernel tells nash that uevents are
available, but kernel also set timeout, nash might stops collecting
other uevents in queue if current uevent isn't block-related.  I work
out a patch for nash to fix it.
http://bugzilla.kernel.org/attachment.cgi?id=18858

2) root=LABEL=/, system always can't boot. initrd init reports
switchroot fails. Here is an executation branch of nash when booting:
    (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
    (2) nash query /proc/devices with the major number; It found line
"8 sd";
    (3) nash use 'sd' to search its own probe table to find device (DISK)
type for the device and add it to its own list;
    (4) Later on, it probes all devices in its list to get filesystem
labels; scsi register "8 sd" always.

When major is 259, nash fails to find the device(DISK) type. I enables
CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up
for device /dev/sda1, which causes nash to fail to find device (DISK)
type.

To fixing issue 2), I create a patch for nash and another patch for
kernel.

http://bugzilla.kernel.org/attachment.cgi?id=18859
http://bugzilla.kernel.org/attachment.cgi?id=18837

Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new
block device in proc/devices.

With 2 patches on nash and 1 patch on kernel, I boot my machines for
dozens of times without failure.

Signed-off-by Zhang Yanmin <yanmin.zhang@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock/md: fix md autodetection
Tejun Heo [Mon, 10 Nov 2008 06:30:47 +0000 (15:30 +0900)] 
block/md: fix md autodetection

Block ext devt conversion missed md_autodetect_dev() call in
rescan_partitions() leaving md autodetect unable to see partitions.
Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: make add_partition() return pointer to hd_struct
Tejun Heo [Mon, 10 Nov 2008 06:29:58 +0000 (15:29 +0900)] 
block: make add_partition() return pointer to hd_struct

Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure.  This change will be used to fix md
autodetection bug.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoblock: fix add_partition() error path
Tejun Heo [Mon, 10 Nov 2008 06:28:59 +0000 (15:28 +0900)] 
block: fix add_partition() error path

Partition stats structure was not freed on devt allocation failure
path.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
16 years agoMerge branches 'topic/fix/hda' and 'topic/fix/misc' into for-linus
Takashi Iwai [Tue, 18 Nov 2008 12:49:39 +0000 (13:49 +0100)] 
Merge branches 'topic/fix/hda' and 'topic/fix/misc' into for-linus

16 years agoALSA: hda - Fix resume of GPIO unsol event for STAC/IDT
Takashi Iwai [Tue, 18 Nov 2008 09:55:36 +0000 (10:55 +0100)] 
ALSA: hda - Fix resume of GPIO unsol event for STAC/IDT

Use cached write for setting the GPIO unsolicited event mask to be
restored properly at resume.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years agoALSA: hda - Add quirks for HP Pavilion DV models
Takashi Iwai [Tue, 18 Nov 2008 09:48:41 +0000 (10:48 +0100)] 
ALSA: hda - Add quirks for HP Pavilion DV models

Added the quirk entries for HP Pavilion DV5 and DV7 with model=hp-m4.

Reference: Novell bnc#445321, bnc#445161
https://bugzilla.novell.com/show_bug.cgi?id=445321
https://bugzilla.novell.com/show_bug.cgi?id=445161

Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years agoBlackfin arch: fix a broken define in dma-mapping
Mike Frysinger [Tue, 18 Nov 2008 09:48:22 +0000 (17:48 +0800)] 
Blackfin arch: fix a broken define in dma-mapping

dma_mapping_error is an actual function, so fix broken define with a
real inline stub

Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
16 years agoBlackfin arch: fix bug - Turn on DEBUG_DOUBLEFAULT, booting SMP kernel crash
Graf Yang [Tue, 18 Nov 2008 09:48:22 +0000 (17:48 +0800)] 
Blackfin arch: fix bug - Turn on DEBUG_DOUBLEFAULT, booting SMP kernel crash

Signed-off-by: Graf Yang <graf.yang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
16 years agoALSA: hda - Fix GPIO initialization in patch_stac92hd71bxx()
Takashi Iwai [Tue, 18 Nov 2008 09:45:15 +0000 (10:45 +0100)] 
ALSA: hda - Fix GPIO initialization in patch_stac92hd71bxx()

Fixed the GPIO mask and co initialization in patch_stac92hd71bxx()
so that the gpio_maks for HP_M4 model is set properly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
16 years agokernel/profile.c: fix section mismatch warning
Rakib Mullick [Tue, 18 Nov 2008 04:15:24 +0000 (10:15 +0600)] 
kernel/profile.c: fix section mismatch warning

Impact: fix section mismatch warning in kernel/profile.c

Here, profile_nop function has been called from a non-init function
create_hash_tables(void). Which generetes a section mismatch warning.
Previously, create_hash_tables(void) was a init function. So, removing
__init from create_hash_tables(void) requires profile_nop to be
non-init.

This patch makes profile_nop function inline and fixes the
following warning:

 WARNING: vmlinux.o(.text+0x6ebb6): Section mismatch in reference from
 the function create_hash_tables() to the function
 .init.text:profile_nop()
 The function create_hash_tables() references
 the function __init profile_nop().
 This is often because create_hash_tables lacks a __init
 annotation or the annotation of profile_nop is wrong.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agocpuset: fix regression when failed to generate sched domains
Li Zefan [Tue, 18 Nov 2008 06:02:03 +0000 (14:02 +0800)] 
cpuset: fix regression when failed to generate sched domains

Impact: properly rebuild sched-domains on kmalloc() failure

When cpuset failed to generate sched domains due to kmalloc()
failure, the scheduler should fallback to the single partition
'fallback_doms' and rebuild sched domains, but now it only
destroys but not rebuilds sched domains.

The regression was introduced by:

| commit dfb512ec4834116124da61d6c1ee10fd0aa32bd6
| Author: Max Krasnyansky <maxk@qualcomm.com>
| Date:   Fri Aug 29 13:11:41 2008 -0700
|
|    sched: arch_reinit_sched_domains() must destroy domains to force rebuild

After the above commit, partition_sched_domains(0, NULL, NULL) will
only destroy sched domains and partition_sched_domains(1, NULL, NULL)
will create the default sched domain.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
Linus Torvalds [Tue, 18 Nov 2008 04:53:31 +0000 (20:53 -0800)] 
Merge git://git./linux/kernel/git/sfrench/cifs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  prevent cifs_writepages() from skipping unwritten pages
  Fixed parsing of mount options when doing DFS submount
  [CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch
  [CIFS] Fix build break
  cifs: reinstate sharing of tree connections
  [CIFS] minor cleanup to cifs_mount
  cifs: reinstate sharing of SMB sessions sans races
  cifs: disable sharing session and tcon and add new TCP sharing code
  [CIFS] clean up server protocol handling
  [CIFS] remove unused list, add new cifs sock list to prepare for mount/umount fix
  [CIFS] Fix cifs reconnection flags
  [CIFS] Can't rely on iov length and base when kernel_recvmsg returns error

16 years agoprevent cifs_writepages() from skipping unwritten pages
Dave Kleikamp [Tue, 18 Nov 2008 03:49:05 +0000 (03:49 +0000)] 
prevent cifs_writepages() from skipping unwritten pages

Fixes a data corruption under heavy stress in which pages could be left
dirty after all open instances of a inode have been closed.

In order to write contiguous pages whenever possible, cifs_writepages()
asks pagevec_lookup_tag() for more pages than it may write at one time.
Normally, it then resets index just past the last page written before calling
pagevec_lookup_tag() again.

If cifs_writepages() can't write the first page returned, it wasn't resetting
index, and the next call to pagevec_lookup_tag() resulted in skipping all of
the pages it previously returned, even though cifs_writepages() did nothing
with them.  This can result in data loss when the file descriptor is about
to be closed.

This patch ensures that index gets set back to the next returned page so
that none get skipped.

Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Shirish S Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
16 years agoFixed parsing of mount options when doing DFS submount
Igor Mammedov [Thu, 23 Oct 2008 09:58:42 +0000 (13:58 +0400)] 
Fixed parsing of mount options when doing DFS submount

Since these hit the same routines, and are relatively small, it is easier to review
them as one patch.

Fixed incorrect handling of the last option in some cases
Fixed prefixpath handling convert path_consumed into host depended string length (in bytes)
Use non default separator if it is provided in the original mount options

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
16 years agoRemove -mno-spe flags as they dont belong
Kumar Gala [Sat, 15 Nov 2008 18:02:34 +0000 (12:02 -0600)] 
Remove -mno-spe flags as they dont belong

For some unknown reason at Steven Rostedt added in disabling of the SPE
instruction generation for e500 based PPC cores in commit
6ec562328fda585be2d7f472cfac99d3b44d362a.

We are removing it because:

1. It generates e500 kernels that don't work
2. its not the correct set of flags to do this
3. we handle this in the arch/powerpc/Makefile already
4. its unknown in talking to Steven why he did this

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Tested-and-Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'for-linus' of git://git.o-hand.com/linux-mfd
Linus Torvalds [Mon, 17 Nov 2008 18:45:39 +0000 (10:45 -0800)] 
Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd

* 'for-linus' of git://git.o-hand.com/linux-mfd:
  mfd: Correct WM8350 I2C return code usage
  mfd: fix event masking for da9030

16 years ago[CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier...
Steve French [Mon, 17 Nov 2008 16:03:00 +0000 (16:03 +0000)] 
[CIFS] Fix check for tcon seal setting and fix oops on failed mount from earlier patch

set tcon->ses earlier

If the inital tree connect fails, we'll end up calling cifs_put_smb_ses
with a NULL pointer. Fix it by setting the tcon->ses earlier.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Mon, 17 Nov 2008 15:54:47 +0000 (07:54 -0800)] 
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  rtc: rtc-sun4v fixes, revised
  sparc: Fix tty compile warnings.
  sparc: struct device - replace bus_id with dev_name(), dev_set_name()

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Mon, 17 Nov 2008 15:53:25 +0000 (07:53 -0800)] 
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  rtnetlink: propagate error from dev_change_flags in do_setlink()
  isdn: remove extra byteswap in isdn_net_ciscohdlck_slarp_send_reply
  Phonet: refuse to send bigger than MTU packets
  e1000e: fix IPMI traffic
  e1000e: fix warn_on reload after phy_id error
  phy: fix phy address bug
  e100: fix dma error in direction for mapping
  igb: use dev_printk instead of printk
  qla3xxx: Cleanup: Fix link print statements.
  igb: Use device_set_wakeup_enable
  e1000: Use device_set_wakeup_enable
  e1000e: Use device_set_wakeup_enable
  via-velocity: enable perfect filtering for multicast packets
  phy: Add support for Marvell 88E1118 PHY
  mlx4_en: Pause parameters per port
  phylib: fix premature freeing of struct mii_bus
  atl1: Do not enumerate options unsupported by chip
  atl1e: fix broken multicast by removing unnecessary crc inversion
  gianfar: Fix DMA unmap invocations
  net/ucc_geth: Fix oops in uec_get_ethtool_stats()
  ...

16 years agosched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers
Oleg Nesterov [Mon, 17 Nov 2008 14:39:47 +0000 (15:39 +0100)] 
sched, signals: fix the racy usage of ->signal in account_group_xxx/run_posix_cpu_timers

Impact: fix potential NULL dereference

Contrary to ad474caca3e2a0550b7ce0706527ad5ab389a4d4 changelog, other
acct_group_xxx() helpers can be called after exit_notify() by timer tick.
Thanks to Roland for pointing out this. Somehow I missed this simple fact
when I read the original patch, and I am afraid I confused Frank during
the discussion. Sorry.

Fortunately, these helpers work with current, we can check ->exit_state
to ensure that ->signal can't go away under us.

Also, add the comment and compiler barrier to account_group_exec_runtime(),
to make sure we load ->signal only once.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agonet: sctp should update its inuse counter
Eric Dumazet [Mon, 17 Nov 2008 10:41:00 +0000 (02:41 -0800)] 
net: sctp should update its inuse counter

This patch is a preparation to namespace conversion of /proc/net/protocols

In order to have relevant information for SCTP protocols, we should use
sock_prot_inuse_add() to update a (percpu and pernamespace) counter of
inuse sockets.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: af_unix should update its inuse counter
Eric Dumazet [Mon, 17 Nov 2008 10:38:49 +0000 (02:38 -0800)] 
net: af_unix should update its inuse counter

This patch is a preparation to namespace conversion of /proc/net/protocols

In order to have relevant information for UNIX protocol, we should use
sock_prot_inuse_add() to update a (percpu and pernamespace) counter of
inuse sockets.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoswiotlb: use coherent_dma_mask in alloc_coherent
FUJITA Tomonori [Mon, 17 Nov 2008 07:24:34 +0000 (16:24 +0900)] 
swiotlb: use coherent_dma_mask in alloc_coherent

Impact: fix DMA buffer allocation coherency bug in certain configs

This patch fixes swiotlb to use dev->coherent_dma_mask in
swiotlb_alloc_coherent().

coherent_dma_mask is a subset of dma_mask (equal to it most of
the time), enumerating the address range that a given device
is able to DMA to/from in a cache-coherent way.

But currently, swiotlb uses dev->dma_mask in alloc_coherent()
implicitly via address_needs_mapping(), but alloc_coherent is really
supposed to use coherent_dma_mask.

This bug could break drivers that uses smaller coherent_dma_mask than
dma_mask (though the current code works for the majority that use the
same mask for coherent_dma_mask and dma_mask).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agonet: af_unix can make unix_nr_socks visbile in /proc
Eric Dumazet [Mon, 17 Nov 2008 08:00:30 +0000 (00:00 -0800)] 
net: af_unix can make unix_nr_socks visbile in /proc

Currently, /proc/net/protocols displays socket counts only for TCP/TCPv6
protocols

We can provide unix_nr_socks for free here, this counter being
already maintained in af_unix

Before patch :

# grep UNIX /proc/net/protocols
UNIX       428     -1      -1   NI       0   yes  kernel

After patch :

# grep UNIX /proc/net/protocols
UNIX       428     98      -1   NI       0   yes  kernel

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agortnetlink: propagate error from dev_change_flags in do_setlink()
Johannes Berg [Mon, 17 Nov 2008 07:20:31 +0000 (23:20 -0800)] 
rtnetlink: propagate error from dev_change_flags in do_setlink()

Unlike ifconfig, iproute doesn't report an error when setting
an interface up fails:

(example: put wireless network mac80211 interface into repeater mode
with iwconfig but do not set a peer MAC address, it should fail with
-ENOLINK)

without patch:
# ip link set wlan0 up ; echo $?
0
#

with patch:
# ip link set wlan0 up ; echo $?
RTNETLINK answers: Link has been severed
2
#

Propagate the return value from dev_change_flags() to fix this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonetdevice chelsio: Convert directly reference of netdev->priv
Wang Chen [Mon, 17 Nov 2008 07:06:39 +0000 (23:06 -0800)] 
netdevice chelsio: Convert directly reference of netdev->priv

Several netdev share one adapter here.
We use netdev->ml_priv of the netdevs point to the first netdev's priv.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoisdn: remove extra byteswap in isdn_net_ciscohdlck_slarp_send_reply
Harvey Harrison [Mon, 17 Nov 2008 07:03:45 +0000 (23:03 -0800)] 
isdn: remove extra byteswap in isdn_net_ciscohdlck_slarp_send_reply

commit a144ea4b7a13087081ab5402fa9ad0bcfd249e67 [IPV4]: annotate struct in_ifaddr

Missed this extra byteswap as the isdn inlines hide the htonl inside
put_u32 which causes an extra byteswap on little-endian arches.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoematch: simpler tcf_em_unregister()
Alexey Dobriyan [Mon, 17 Nov 2008 07:01:49 +0000 (23:01 -0800)] 
ematch: simpler tcf_em_unregister()

Simply delete ops from list and let list debugging do the job.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: Cleanup of af_unix
Eric Dumazet [Mon, 17 Nov 2008 06:58:44 +0000 (22:58 -0800)] 
net: Cleanup of af_unix

This is a pure cleanup of net/unix/af_unix.c to meet current code
style standards

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodccp: Tidy up setsockopt calls
Gerrit Renker [Mon, 17 Nov 2008 06:56:55 +0000 (22:56 -0800)] 
dccp: Tidy up setsockopt calls

This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.

Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.

The second switch-case statement now only has those statements which need
locking and which make use of `val'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Eugene Teo <eugeneteo@kernel.sg>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodccp: Deprecate Ack Ratio sysctl
Gerrit Renker [Mon, 17 Nov 2008 06:55:08 +0000 (22:55 -0800)] 
dccp: Deprecate Ack Ratio sysctl

This patch deprecates the Ack Ratio sysctl, since
 * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
 * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
 * even if it would work in CCID-2, there is no point for a user to change it:
   - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
   - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
     (since waiting for Acks which will never arrive in this window),
   - cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
 * Ack Ratio is resolved as a dependency of the selected CCID;
 * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
   the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
   Ratio 2 for both endpoints";
 * what happens then is part of another patch set, since it concerns the
   dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodccp: Feature negotiation for minimum-checksum-coverage
Gerrit Renker [Mon, 17 Nov 2008 06:53:48 +0000 (22:53 -0800)] 
dccp: Feature negotiation for minimum-checksum-coverage

This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodccp: Deprecate old setsockopt framework
Gerrit Renker [Mon, 17 Nov 2008 06:51:23 +0000 (22:51 -0800)] 
dccp: Deprecate old setsockopt framework

The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation.
These are essentially wrappers around the internal __feat_register functions,
with checking added to avoid

 * wrong usage (type);
 * changing values while the connection is in progress.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agodccp: Mechanism to resolve CCID dependencies
Gerrit Renker [Mon, 17 Nov 2008 06:49:52 +0000 (22:49 -0800)] 
dccp: Mechanism to resolve CCID dependencies

This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.

The concept is documented on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html#ccid_dependencies

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agovirtio_net: VIRTIO_NET_F_MSG_RXBUF (imprive rcv buffer allocation)
Mark McLoughlin [Mon, 17 Nov 2008 06:41:34 +0000 (22:41 -0800)] 
virtio_net: VIRTIO_NET_F_MSG_RXBUF (imprive rcv buffer allocation)

If segmentation offload is enabled by the host, we currently allocate
maximum sized packet buffers and pass them to the host. This uses up
20 ring entries, allowing us to supply only 20 packet buffers to the
host with a 256 entry ring. This is a huge overhead when receiving
small packets, and is most keenly felt when receiving MTU sized
packets from off-host.

The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support
using receive buffers which are smaller than the maximum packet size.
In order to transfer large packets to the guest, the host merges
together multiple receive buffers to form a larger logical buffer.
The number of merged buffers is returned to the guest via a field in
the virtio_net_hdr.

Make use of this support by supplying single page receive buffers to
the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
the payload to the skb's linear data buffer and adjust the fragment
offset to point to the remaining data. This ensures proper alignment
and allows us to not use any paged data for small packets. If the
payload occupies multiple pages, we simply append those pages as
fragments and free the associated skbs.

This scheme allows us to be efficient in our use of ring entries
while still supporting large packets. Benchmarking using netperf from
an external machine to a guest over a 10Gb/s network shows a 100%
improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark
with GSO disabled on the host side, throughput was seen to increase
from 700Mb/s to 1.7Gb/s.

Based on a patch from Herbert Xu.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agovirtio_net: hook up the set-tso ethtool op
Mark McLoughlin [Mon, 17 Nov 2008 06:40:36 +0000 (22:40 -0800)] 
virtio_net: hook up the set-tso ethtool op

Seems like an oversight that we have set-tx-csum and set-sg hooked
up, but not set-tso.

Also leads to the strange situation that if you e.g. disable tx-csum,
then tso doesn't get disabled.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agovirtio_net: Recycle some more rx buffer pages
Mark McLoughlin [Mon, 17 Nov 2008 06:39:18 +0000 (22:39 -0800)] 
virtio_net: Recycle some more rx buffer pages

Each time we re-fill the recv queue with buffers, we allocate
one too many skbs and free it again when adding fails. We should
recycle the pages allocated in this case.

A previous version of this patch made trim_pages() trim trailing
unused pages from skbs with some paged data, but this actually
caused a barely measurable slowdown.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[CIFS] Fix build break
Steve French [Mon, 17 Nov 2008 03:57:13 +0000 (03:57 +0000)] 
[CIFS] Fix build break

Signed-off-by: Steve French <sfrench@us.ibm.com>
16 years agonet: use %pF for /proc/net/ptype
Alexey Dobriyan [Mon, 17 Nov 2008 03:50:35 +0000 (19:50 -0800)] 
net: use %pF for /proc/net/ptype

Technically, patch changes format for modules, but I think nobody cares.

-86dd          :ipv6:ipv6_rcv+0x0
+86dd          ipv6_rcv+0x0/0x400 [ipv6]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoPhonet: refuse to send bigger than MTU packets
RĂ©mi Denis-Courmont [Mon, 17 Nov 2008 03:48:49 +0000 (19:48 -0800)] 
Phonet: refuse to send bigger than MTU packets

Signed-off-by: RĂ©mi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: make sure struct dst_entry refcount is aligned on 64 bytes
Eric Dumazet [Mon, 17 Nov 2008 03:46:36 +0000 (19:46 -0800)] 
net: make sure struct dst_entry refcount is aligned on 64 bytes

As found in the past (commit f1dd9c379cac7d5a76259e7dffcd5f8edc697d17
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.

We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.

for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0

As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)

Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.

"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agorcu: documents rculist_nulls
Eric Dumazet [Mon, 17 Nov 2008 03:41:14 +0000 (19:41 -0800)] 
rcu: documents rculist_nulls

Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'
end-of-list can help in some RCU algos.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agonet: Convert TCP & DCCP hash tables to use RCU / hlist_nulls
Eric Dumazet [Mon, 17 Nov 2008 03:40:17 +0000 (19:40 -0800)] 
net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls

RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
  price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.

This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.

Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.

__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)

Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoudp: Use hlist_nulls in UDP RCU code
Eric Dumazet [Mon, 17 Nov 2008 03:39:21 +0000 (19:39 -0800)] 
udp: Use hlist_nulls in UDP RCU code

This is a straightforward patch, using hlist_nulls infrastructure.

RCUification already done on UDP two weeks ago.

Using hlist_nulls permits us to avoid some memory barriers, both
at lookup time and delete time.

Patch is large because it adds new macros to include/net/sock.h.
These macros will be used by TCP & DCCP in next patch.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agorcu: Introduce hlist_nulls variant of hlist
Eric Dumazet [Mon, 17 Nov 2008 03:37:55 +0000 (19:37 -0800)] 
rcu: Introduce hlist_nulls variant of hlist

hlist uses NULL value to finish a chain.

hlist_nulls variant use the low order bit set to 1 to signal an end-of-list marker.

This allows to store many different end markers, so that some RCU lockless
algos (used in TCP/UDP stack for example) can save some memory barriers in
fast paths.

Two new files are added :

include/linux/list_nulls.h
  - mimics hlist part of include/linux/list.h, derived to hlist_nulls variant

include/linux/rculist_nulls.h
  - mimics hlist part of include/linux/rculist.h, derived to hlist_nulls variant

   Only four helpers are declared for the moment :

     hlist_nulls_del_init_rcu(), hlist_nulls_del_rcu(),
     hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry_rcu()

prefetches() were removed, since an end of list is not anymore NULL value.
prefetches() could trigger useless (and possibly dangerous) memory transactions.

Example of use (extracted from __udp4_lib_lookup())

struct sock *sk, *result;
        struct hlist_nulls_node *node;
        unsigned short hnum = ntohs(dport);
        unsigned int hash = udp_hashfn(net, hnum);
        struct udp_hslot *hslot = &udptable->hash[hash];
        int score, badness;

        rcu_read_lock();
begin:
        result = NULL;
        badness = -1;
        sk_nulls_for_each_rcu(sk, node, &hslot->head) {
                score = compute_score(sk, net, saddr, hnum, sport,
                                      daddr, dport, dif);
                if (score > badness) {
                        result = sk;
                        badness = score;
                }
        }
        /*
         * if the nulls value we got at the end of this lookup is
         * not the expected one, we must restart lookup.
         * We probably met an item that was moved to another chain.
         */
        if (get_nulls_value(node) != hash)
                goto begin;

        if (result) {
                if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
                        result = NULL;
                else if (unlikely(compute_score(result, net, saddr, hnum, sport,
                                  daddr, dport, dif) < badness)) {
                        sock_put(result);
                        goto begin;
                }
        }
        rcu_read_unlock();
        return result;

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoTPROXY: implemented IP_RECVORIGDSTADDR socket option
Balazs Scheidler [Mon, 17 Nov 2008 03:32:39 +0000 (19:32 -0800)] 
TPROXY: implemented IP_RECVORIGDSTADDR socket option

In case UDP traffic is redirected to a local UDP socket,
the originally addressed destination address/port
cannot be recovered with the in-kernel tproxy.

This patch adds an IP_RECVORIGDSTADDR sockopt that enables
a IP_ORIGDSTADDR ancillary message in recvmsg(). This
ancillary message contains the original destination address/port
of the packet being received.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoipv4: Fix ARP behavior with many mac-vlans
Ben Greear [Mon, 17 Nov 2008 03:19:38 +0000 (19:19 -0800)] 
ipv4: Fix ARP behavior with many mac-vlans

Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans.  My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan.  I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?

The attached patch makes it work much better for me.  The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request.  The old code
*would* create a stale entry even if we are not going to respond.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agocifs: reinstate sharing of tree connections
Jeff Layton [Sat, 15 Nov 2008 16:12:47 +0000 (11:12 -0500)] 
cifs: reinstate sharing of tree connections

Use a similar approach to the SMB session sharing. Add a list of tcons
attached to each SMB session. Move the refcount to non-atomic. Protect
all of the above with the cifs_tcp_ses_lock. Add functions to
properly find and put references to the tcons.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
16 years agoe1000e: enable ECC correction on 82571 silicon
Alexander Duyck [Fri, 14 Nov 2008 06:54:36 +0000 (06:54 +0000)] 
e1000e: enable ECC correction on 82571 silicon

This change enables ECC correction for the packet buffer on all 82571
silicon.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoe1000e: fix IPMI traffic
Jeff Kirsher [Fri, 14 Nov 2008 06:45:23 +0000 (06:45 +0000)] 
e1000e: fix IPMI traffic

Some users reported that they have machines with BMCs enabled that cannot
receive IPMI traffic after e1000e is loaded.
http://marc.info/?l=e1000-devel&m=121909039127414&w=2
http://marc.info/?l=e1000-devel&m=121365543823387&w=2

This fixes the issue if they load with the new parameter = 0 by disabling
crc stripping, but leaves the performance feature on for most users.
Based on work done by Hong Zhang.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoe1000e: fix warn_on reload after phy_id error
Jeff Kirsher [Fri, 14 Nov 2008 06:45:07 +0000 (06:45 +0000)] 
e1000e: fix warn_on reload after phy_id error

If the driver fails to initialize the first time due to the failure in the
phy_id check the kernel triggers a warn_on on the second try to load the
driver because the driver did not free the msi/x resources in the first
load because of the previous failure in phy_id check.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agophylib: make mdio-gpio work without OF (v4)
Paulius Zaleckas [Fri, 14 Nov 2008 00:24:34 +0000 (00:24 +0000)] 
phylib: make mdio-gpio work without OF (v4)

make mdio-gpio work with non OpenFirmware gpio implementation.

Aditional changes to mdio-gpio:
- use gpio_request() and gpio_free()
- place irq[] array in struct mdio_gpio_info
- add module description, author and license
- add note about compiling this driver as module
- rename mdc and mdio function (were ugly names)
- change MII to MDIO in bus name
- add __init __exit to module (un)loading functions
- probe fails if no phys added to the bus
- kzalloc bitbang with sizeof(*bitbang)

Changes since v3:
- keep bus naming "%x" to be compatible with existing drivers.

Changes since v2:
- more #ifdefs reduction
- platform driver will be registered on OF platforms also
- unified platform and OF bus_id to phy%i

Changes since v1:
- removed NO_IRQ
- reduced #idefs

Laurent, please test this driver under OF.

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agophylib: rename mdio-ofgpio to mdio-gpio
Paulius Zaleckas [Fri, 14 Nov 2008 00:24:28 +0000 (00:24 +0000)] 
phylib: rename mdio-ofgpio to mdio-gpio

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agounitialized return value in mm/mlock.c: __mlock_vma_pages_range()
Helge Deller [Sun, 16 Nov 2008 23:30:57 +0000 (00:30 +0100)] 
unitialized return value in mm/mlock.c: __mlock_vma_pages_range()

Fix an unitialized return value when compiling on parisc (with CONFIG_UNEVICTABLE_LRU=y):
mm/mlock.c: In function `__mlock_vma_pages_range':
mm/mlock.c:165: warning: `ret' might be used uninitialized in this function

Signed-off-by: Helge Deller <deller@gmx.de>
[ It isn't ever really used uninitialized, since no caller should ever
  call this function with an empty range.  But the compiler is correct
  that from a local analysis standpoint that is impossible to see, and
  fixing the warning is appropriate.  ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agostop_machine: fix race with return value (fixes Bug #11989)
Rusty Russell [Sun, 16 Nov 2008 21:52:18 +0000 (08:22 +1030)] 
stop_machine: fix race with return value (fixes Bug #11989)

Bug #11989: Suspend failure on NForce4-based boards due to chanes in
stop_machine

We should not access active.fnret outside the lock; in theory the next
stop_machine could overwrite it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoFix broken ownership of /proc/sys/ files
Al Viro [Sun, 16 Nov 2008 22:19:10 +0000 (22:19 +0000)] 
Fix broken ownership of /proc/sys/ files

D'oh...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-and-tested-by: Peter Palfrader <peter@palfrader.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agodm9000: Fix build error.
David S. Miller [Sun, 16 Nov 2008 20:41:35 +0000 (12:41 -0800)] 
dm9000: Fix build error.

Reported by Stephen Rothwell:

drivers/net/dm9000.c:1450: error: expected ')' before ';' token
drivers/net/dm9000.c:1455: error: expected ';' before '}' token

Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agomfd: Correct WM8350 I2C return code usage
Mark Brown [Wed, 12 Nov 2008 16:34:02 +0000 (17:34 +0100)] 
mfd: Correct WM8350 I2C return code usage

The vendor BSP used for the WM8350 development provided an I2C driver
which incorrectly returned zero on succesful sends rather than the
number of transmitted bytes, an error which was then propagated into the
WM8350 I2C accessors.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
16 years agomfd: fix event masking for da9030
Mike Rapoport [Sat, 8 Nov 2008 00:28:19 +0000 (01:28 +0100)] 
mfd: fix event masking for da9030

Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>
16 years agoacpi: fix oops in acpi_system_wakeup_device_seq_show
Linus Torvalds [Sun, 16 Nov 2008 18:09:34 +0000 (10:09 -0800)] 
acpi: fix oops in acpi_system_wakeup_device_seq_show

Commit 0794469da3f7b2093575cbdfc1108308dd3641ce: ("ACPI: struct device -
replace bus_id with dev_name(), dev_set_name()") introduced a bug by
testing 'dev_name(ldev)' instead of 'ldev->bus' for NULL when printing
out the bus information.

So if ldev->bus was NULL, we'd oops.

Reported-and-tested-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agophy: fix phy address bug
Giulio Benetti [Thu, 13 Nov 2008 21:53:13 +0000 (21:53 +0000)] 
phy: fix phy address bug

PHYID returns 0xffff and not 0xffffffff when not found and in some
case(at91sam9263) 0x0. Maybe this patch could be useful.

Signed-off-by: Giulio Benetti <giulio.benetti@micronovasrl.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoe100: fix dma error in direction for mapping
Jesse Brandeburg [Fri, 14 Nov 2008 13:51:54 +0000 (13:51 +0000)] 
e100: fix dma error in direction for mapping

The e100 driver triggers BUG_ON(buf->direction != dir)
by doing pci_map_single(..., PCI_DMA_BIDIRECTIONAL)
and pci_dma_sync_single_for_device(..., PCI_DMA_TODEVICE).

Changing the DMA direction, especially with dmabounce will result
in unexpected behaviour.

Reported-by: Anders Grafstrom <grfstrm@users.sourceforge.net>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoigb: use dev_printk instead of printk
Bjorn Helgaas [Thu, 13 Nov 2008 06:20:10 +0000 (06:20 +0000)] 
igb: use dev_printk instead of printk

Use dev_printk() instead of printk() to give a little more context
and use consistent format.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoqla3xxx: Cleanup: Fix link print statements.
Ron Mercer [Tue, 11 Nov 2008 07:54:54 +0000 (07:54 +0000)] 
qla3xxx: Cleanup: Fix link print statements.

Removed debug print statements and improved conditionals around informational statements.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoigb: Use device_set_wakeup_enable
\"Rafael J. Wysocki\ [Fri, 7 Nov 2008 20:30:37 +0000 (20:30 +0000)] 
igb: Use device_set_wakeup_enable

Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
igb_set_wol().  Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoe1000: Use device_set_wakeup_enable
\"Rafael J. Wysocki\ [Fri, 7 Nov 2008 20:30:19 +0000 (20:30 +0000)] 
e1000: Use device_set_wakeup_enable

Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol().  Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoe1000e: Use device_set_wakeup_enable
\"Rafael J. Wysocki\ [Wed, 12 Nov 2008 09:52:32 +0000 (09:52 +0000)] 
e1000e: Use device_set_wakeup_enable

Since dev->power.should_wakeup bit is used by the PCI core to
decide whether the device should wake up the system from sleep
states, set/unset this bit whenever WOL is enabled/disabled using
e1000_set_wol().  Accordingly, use device_can_wakeup() for checking
if wake-up is supported by the device.

Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agovia-velocity: enable perfect filtering for multicast packets
Joey Zhuo [Sun, 16 Nov 2008 08:39:35 +0000 (00:39 -0800)] 
via-velocity: enable perfect filtering for multicast packets

Signed-off-by: Joey Zhuo <joeyzhuo@via.com.tw>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agopegasus: minor resource shrinkage
David Brownell [Sun, 16 Nov 2008 08:36:08 +0000 (00:36 -0800)] 
pegasus: minor resource shrinkage

Make pegasus driver not allocate a workqueue until the driver
is bound to some device, which will need that workqueue if
the device is brought up.  This conserves resources when the
driver is linked but there's no pegasus device connected.

Also shrink the runtime footprint a smidgeon by moving some
init-only code into its proper section, and move an obnoxious
(frequent and meaningless) message to be debug-only.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoixgbe: Fix usage of netif_*_all_queues() with netif_carrier_{off|on}()
PJ Waskiewicz [Fri, 7 Nov 2008 12:16:08 +0000 (12:16 +0000)] 
ixgbe: Fix usage of netif_*_all_queues() with netif_carrier_{off|on}()

netif_carrier_off() is sufficient to stop Tx into the driver.  Stopping the Tx
queues is redundant and unnecessary.  By the same token, netif_carrier_on()
will be sufficient to re-enable Tx, so waking the queues is unnecessary.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agofunction tracing: fix wrong pos computing when read buffer has been fulfilled
walimis [Sat, 15 Nov 2008 07:19:06 +0000 (15:19 +0800)] 
function tracing: fix wrong pos computing when read buffer has been fulfilled

Impact: make output of available_filter_functions complete

phenomenon:

The first value of dyn_ftrace_total_info is not equal with
`cat available_filter_functions | wc -l`, but they should be equal.

root cause:

When printing functions with seq_printf in t_show, if the read buffer
is just overflowed by current function record, then this function
won't be printed to user space through read buffer, it will
just be dropped. So we can't see this function printing.

So, every time the last function to fill the read buffer, if overflowed,
will be dropped.

This also applies to set_ftrace_filter if set_ftrace_filter has
more bytes than read buffer.

fix:

Through checking return value of seq_printf, if less than 0, we know
this function doesn't be printed. Then we decrease position to force
this function to be printed next time, in next read buffer.

Another little fix is to show correct allocating pages count.

Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agoMAINTAINERS: remove me as RAID maintainer
Ingo Molnar [Sun, 16 Nov 2008 07:27:53 +0000 (08:27 +0100)] 
MAINTAINERS: remove me as RAID maintainer

Neil has been the maintainer of the RAID/MD code for a long time,
remove me as a co-maintainer.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
16 years agosched: fix kernel warning on /proc/sched_debug access
Ingo Molnar [Sun, 16 Nov 2008 07:07:15 +0000 (08:07 +0100)] 
sched: fix kernel warning on /proc/sched_debug access

Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
triggers when using latencytop:

> [  775.663239] BUG: using smp_processor_id() in preemptible [00000000] code: latencytop/6585
> [  775.663303] caller is native_sched_clock+0x3a/0x80
> [  775.663314] Pid: 6585, comm: latencytop Tainted: G        W 2.6.28-rc4-00355-g9c7c354 #1
> [  775.663322] Call Trace:
> [  775.663343]  [<ffffffff803a94e4>] debug_smp_processor_id+0xe4/0xf0
> [  775.663356]  [<ffffffff80213f7a>] native_sched_clock+0x3a/0x80
> [  775.663368]  [<ffffffff80213e19>] sched_clock+0x9/0x10
> [  775.663381]  [<ffffffff8024550d>] proc_sched_show_task+0x8bd/0x10e0
> [  775.663395]  [<ffffffff8034466e>] sched_show+0x3e/0x80
> [  775.663408]  [<ffffffff8031039b>] seq_read+0xdb/0x350
> [  775.663421]  [<ffffffff80368776>] ? security_file_permission+0x16/0x20
> [  775.663435]  [<ffffffff802f4198>] vfs_read+0xc8/0x170
> [  775.663447]  [<ffffffff802f4335>] sys_read+0x55/0x90
> [  775.663460]  [<ffffffff8020c67a>] system_call_fastpath+0x16/0x1b
> ...

This breakage was caused by me via:

  7cbaef9: sched: optimize sched_clock() a bit

Change the calls to cpu_clock().

Reported-by: Luis Henriques <henrix@sapo.pt>