linux-2.6
16 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Sun, 15 Jul 2007 23:50:46 +0000 (16:50 -0700)] 
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
  [TCP]: Verify the presence of RETRANS bit when leaving FRTO
  [IPV6]: Call inet6addr_chain notifiers on link down
  [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
  [NET_SCHED]: act_api: qdisc internal reclassify support
  [NET_SCHED]: sch_dsmark: act_api support
  [NET_SCHED]: sch_atm: act_api support
  [NET_SCHED]: sch_atm: Lindent
  [IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
  [IPV4]: Cleanup call to __neigh_lookup()
  [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
  [NETFILTER]: nf_conntrack: UDPLITE support
  [NETFILTER]: nf_conntrack: mark protocols __read_mostly
  [NETFILTER]: x_tables: add connlimit match
  [NETFILTER]: Lower *tables printk severity
  [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
  [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
  [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
  [NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
  [NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
  [AF_IUCV]: Add lock when updating accept_q
  ...

16 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh...
Linus Torvalds [Sun, 15 Jul 2007 23:44:53 +0000 (16:44 -0700)] 
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix a race condition bug in umount which caused a segfault
  9p: re-enable mount time debug option
  9p: cache meta-data when cache=loose
  net/9p: set error to EREMOTEIO if trans->write returns zero
  net/9p: change net/9p module name to 9pnet
  9p: Reorganization of 9p file system code

16 years agoMerge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
Linus Torvalds [Sun, 15 Jul 2007 23:43:43 +0000 (16:43 -0700)] 
Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6

* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (37 commits)
  [XFS] Fix lockdep annotations for xfs_lock_inodes
  [LIB]: export radix_tree_preload()
  [XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode
  [XFS] Compat ioctl handler for handle operations
  [XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1.
  [XFS] Clean up function name handling in tracing code
  [XFS] Quota inode has no parent.
  [XFS] Concurrent Multi-File Data Streams
  [XFS] Use uninitialized_var macro to stop warning about rtx
  [XFS] XFS should not be looking at filp reference counts
  [XFS] Use is_power_of_2 instead of open coding checks
  [XFS] Reduce shouting by removing unnecessary macros from dir2 code.
  [XFS] Simplify XFS min/max macros.
  [XFS] Kill off xfs_count_bits
  [XFS] Cancel transactions on xfs_itruncate_start error.
  [XFS] Use do_div() on 64 bit types.
  [XFS] Fix remount,readonly path to flush everything correctly.
  [XFS] Cleanup inode extent size hint extraction
  [XFS] Prevent ENOSPC from aborting transactions that need to succeed
  [XFS] Prevent deadlock when flushing inodes on unmount
  ...

16 years agomake i2c-acorn tristate
Al Viro [Sun, 15 Jul 2007 20:37:16 +0000 (21:37 +0100)] 
make i2c-acorn tristate

It depends on tristate I2C and it's trivial to make modular.  The
current Kconfig allows I2C=m, I2C_ACORN=y, which doesn't work at
all; alternatives are dependency on I2C=y and making I2C_ACORN
itself a tristate.  The latter is the right thing to do...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
16 years agoicside: devm_iounmap() needs linux/io.h
Al Viro [Sun, 15 Jul 2007 20:01:32 +0000 (21:01 +0100)] 
icside: devm_iounmap() needs linux/io.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
16 years agomissing argument in bin_attribute ->read()/->write()
Al Viro [Sun, 15 Jul 2007 20:01:22 +0000 (21:01 +0100)] 
missing argument in bin_attribute ->read()/->write()

Fallout from commit 91a6902958f052358899f58683d44e36228d85c2 ('sysfs:
add parameter "struct bin_attribute *" ...')

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofallout from constified seq_operations
Al Viro [Sun, 15 Jul 2007 20:01:12 +0000 (21:01 +0100)] 
fallout from constified seq_operations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofallout from Auke's pci ->revision patch
Al Viro [Sun, 15 Jul 2007 20:01:02 +0000 (21:01 +0100)] 
fallout from Auke's pci ->revision patch

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoax88796: dev_dbg() wants device, not platform device
Al Viro [Sun, 15 Jul 2007 20:00:51 +0000 (21:00 +0100)] 
ax88796: dev_dbg() wants device, not platform device

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agopass -msize-long to sparse on s390
Al Viro [Sun, 15 Jul 2007 20:00:41 +0000 (21:00 +0100)] 
pass -msize-long to sparse on s390

s390 is the only 32bit with unsigned long for size_t (usual for those
is unsigned int).  Tell sparse...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofrv: missing __clear_user()
Al Viro [Sun, 15 Jul 2007 20:00:31 +0000 (21:00 +0100)] 
frv: missing __clear_user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agozd1211rw: too early inclusion of asm/unaligned.h
Al Viro [Sun, 15 Jul 2007 20:00:21 +0000 (21:00 +0100)] 
zd1211rw: too early inclusion of asm/unaligned.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agofix return type of skb_checksum_complete()
Al Viro [Sun, 15 Jul 2007 20:00:11 +0000 (21:00 +0100)] 
fix return type of skb_checksum_complete()

It returns __sum16, not unsigned int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoPDA_POWER depends on having request_irq()
Al Viro [Sun, 15 Jul 2007 20:00:01 +0000 (21:00 +0100)] 
PDA_POWER depends on having request_irq()

... so all proud owners of s390-based PDAs will have to live without that one

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoieee1394: forgotten dereference...
Al Viro [Sun, 15 Jul 2007 19:59:51 +0000 (20:59 +0100)] 
ieee1394: forgotten dereference...

Going through the string and waiting for _pointer_ to become '\0'
is not what the authors meant...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Ben Collins <ben.collins@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agothe wrong variable checked after request_irq()
Al Viro [Sun, 15 Jul 2007 19:59:41 +0000 (20:59 +0100)] 
the wrong variable checked after request_irq()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agowrong order of arguments of ->readdir()
Al Viro [Sun, 15 Jul 2007 19:59:31 +0000 (20:59 +0100)] 
wrong order of arguments of ->readdir()

Shows how many people are testing coda - the bug had been there for 5 years
and results of stepping on it are not subtle.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agominimal fixes for drivers/usb/gadget/m66592-udc.c
Al Viro [Sun, 15 Jul 2007 19:59:22 +0000 (20:59 +0100)] 
minimal fixes for drivers/usb/gadget/m66592-udc.c

still looks racy (and definitely leaks)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years ago[TCP]: Verify the presence of RETRANS bit when leaving FRTO
Ilpo Järvinen [Sun, 15 Jul 2007 07:19:29 +0000 (00:19 -0700)] 
[TCP]: Verify the presence of RETRANS bit when leaving FRTO

For yet unknown reason, something cleared SACKED_RETRANS bit
underneath FRTO.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: Call inet6addr_chain notifiers on link down
Vlad Yasevich [Sun, 15 Jul 2007 07:16:35 +0000 (00:16 -0700)] 
[IPV6]: Call inet6addr_chain notifiers on link down

Currently if the link is brought down via ip link or ifconfig down,
the inet6addr_chain notifiers are not called even though all
the addresses are removed from the interface.  This caused SCTP
to add duplicate addresses to it's list.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
Patrick McHardy [Sun, 15 Jul 2007 07:03:05 +0000 (00:03 -0700)] 
[NET_SCHED]: Kill CONFIG_NET_CLS_POLICE

The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET_SCHED]: act_api: qdisc internal reclassify support
Patrick McHardy [Sun, 15 Jul 2007 07:02:31 +0000 (00:02 -0700)] 
[NET_SCHED]: act_api: qdisc internal reclassify support

The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
it to the qdisc, which could handle it internally or ignore it. With
NET_CLS_ACT however, tc_classify starts over at the first classifier
and never returns it to the qdisc. This makes it impossible to support
qdisc-internal reclassification, which in turn makes it impossible to
remove the old NET_CLS_POLICE code without breaking compatibility since
we have two qdiscs (CBQ and ATM) that support this.

This patch adds a tc_classify_compat function that handles
reclassification the old way and changes CBQ and ATM to use it.

This again is of course not fully backwards compatible with the previous
NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
compatibility *and* support qdisc internal reclassification with
NET_CLS_ACT, but this seems like the better choice over keeping the two
incompatible options around forever.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET_SCHED]: sch_dsmark: act_api support
Patrick McHardy [Sun, 15 Jul 2007 07:02:10 +0000 (00:02 -0700)] 
[NET_SCHED]: sch_dsmark: act_api support

Handle act_api classification results.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET_SCHED]: sch_atm: act_api support
Patrick McHardy [Sun, 15 Jul 2007 07:01:49 +0000 (00:01 -0700)] 
[NET_SCHED]: sch_atm: act_api support

Handle act_api classification results.

The ATM scheduler behaves slightly different than other schedulers
in that it only handles policer results for successful classifications,
this behaviour is retained for the act_api case.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET_SCHED]: sch_atm: Lindent
Patrick McHardy [Sun, 15 Jul 2007 07:01:25 +0000 (00:01 -0700)] 
[NET_SCHED]: sch_atm: Lindent

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
Dmitry Butskoy [Sun, 15 Jul 2007 06:53:08 +0000 (23:53 -0700)] 
[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets

From: Dmitry Butskoy <dmitry@butskoy.name>

Taken from http://bugzilla.kernel.org/show_bug.cgi?id=8747

Problem Description:

It is related to the possibility to obtain MSG_ERRQUEUE messages from the udp
and raw sockets, both connected and unconnected.

There is a little typo in net/ipv6/icmp.c code, which prevents such messages
to be delivered to the errqueue of the correspond raw socket, when the socket
is CONNECTED.  The typo is due to swap of local/remote addresses.

Consider __raw_v6_lookup() function from net/ipv6/raw.c. When a raw socket is
looked up usual way, it is something like:

sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);

where "daddr" is a destination address of the incoming packet (IOW our local
address), "saddr" is a source address of the incoming packet (the remote end).

But when the raw socket is looked up for some icmp error report, in
net/ipv6/icmp.c:icmpv6_notify() , daddr/saddr are obtained from the echoed
fragment of the "bad" packet, i.e.  "daddr" is the original destination
address of that packet, "saddr" is our local address.  Hence, for
icmpv6_notify() must use "saddr, daddr" in its arguments, not "daddr, saddr"
...

Steps to reproduce:

Create some raw socket, connect it to an address, and cause some error
situation: f.e. set ttl=1 where the remote address is more than 1 hop to reach.
Set IPV6_RECVERR .
Then send something and wait for the error (f.e. poll() with POLLERR|POLLIN).
You should receive "time exceeded" icmp message (because of "ttl=1"), but the
socket do not receive it.

If you do not connect your raw socket, you will receive MSG_ERRQUEUE
successfully.  (The reason is that for unconnected socket there are no actual
checks for local/remote addresses).

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6
David S. Miller [Sun, 15 Jul 2007 06:47:04 +0000 (23:47 -0700)] 
Merge /pub/scm/linux/kernel/git/herbert/crypto-2.6

Conflicts:

crypto/Kconfig

16 years ago[IPV4]: Cleanup call to __neigh_lookup()
Jean Delvare [Sun, 15 Jul 2007 03:51:44 +0000 (20:51 -0700)] 
[IPV4]: Cleanup call to __neigh_lookup()

Back in the times of Linux 2.2, negative values for the creat parameter
of __neigh_lookup() had a particular meaning, but no longer, so we
should pass 1 instead.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
Patrick McHardy [Sun, 15 Jul 2007 03:49:26 +0000 (20:49 -0700)] 
[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization

As noticed by Ranko Zivojnovic <ranko@spidernet.net>, calling qdisc_run
from the timer handler can result in deadlock:

> CPU#0
>
> qdisc_watchdog() fires and gets dev->queue_lock
> qdisc_run()...qdisc_restart()...
> -> releases dev->queue_lock and enters dev_hard_start_xmit()
>
> CPU#1
>
> tc del qdisc dev ...
> qdisc_graft()...dev_graft_qdisc()...dev_deactivate()...
> -> grabs dev->queue_lock ...
>
> qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()...
> -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still
>         holding dev->queue_lock
>
> CPU#0
>
> dev_hard_start_xmit() returns ...
> -> wants to get dev->queue_lock(!)
>
> DEADLOCK!

The entire optimization is a bit questionable IMO, it moves potentially
large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ,
which kind of defeats the separation of them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Ranko Zivojnovic <ranko@spidernet.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: nf_conntrack: UDPLITE support
Patrick McHardy [Sun, 15 Jul 2007 03:48:44 +0000 (20:48 -0700)] 
[NETFILTER]: nf_conntrack: UDPLITE support

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: nf_conntrack: mark protocols __read_mostly
Patrick McHardy [Sun, 15 Jul 2007 03:48:19 +0000 (20:48 -0700)] 
[NETFILTER]: nf_conntrack: mark protocols __read_mostly

Also remove two unnecessary EXPORT_SYMBOLs and move the
nf_conntrack_l3proto_ipv4 declaration to the correct file.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: x_tables: add connlimit match
Jan Engelhardt [Sun, 15 Jul 2007 03:47:26 +0000 (20:47 -0700)] 
[NETFILTER]: x_tables: add connlimit match

ipt_connlimit has been sitting in POM-NG for a long time.
Here is a new shiny xt_connlimit with:

 * xtables'ified
 * will request the layer3 module
   (previously it hotdropped every packet when it was not loaded)
 * fixed: there was a deadlock in case of an OOM condition
 * support for any layer4 protocol (e.g. UDP/SCTP)
 * using jhash, as suggested by Eric Dumazet
 * ipv6 support

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: Lower *tables printk severity
Patrick McHardy [Sun, 15 Jul 2007 03:46:15 +0000 (20:46 -0700)] 
[NETFILTER]: Lower *tables printk severity

Lower ip6tables, arptables and ebtables printk severity similar to
Dan Aloni's patch for iptables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:45:41 +0000 (20:45 -0700)] 
[NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error

The conntrack assigned to locally generated ICMP error is usually the one
assigned to the original packet which has caused the error. But if
the original packet is handled as invalid by nf_conntrack, no conntrack
is assigned to the original packet. Then nf_ct_attach() cannot assign
any conntrack to the ICMP error packet. In that case the current
nf_conntrack_icmp assigns appropriate conntrack to it. But the current
code mistakes the direction of the packet. As a result, NAT code mistakes
the address to be mangled.

To fix the bug, this changes nf_conntrack_icmp not to assign conntrack
to such ICMP error. Actually no address is necessary to be mangled
in this case.

Spotted by Jordan Russell.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:45:14 +0000 (20:45 -0700)] 
[NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it

nf_ct_get_tuple() requires the offset to transport header and that bothers
callers such as icmp[v6] l4proto modules. This introduces new function
to simplify them.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:44:50 +0000 (20:44 -0700)] 
[NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it

The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple.
But they have to find the offset to transport protocol header before that.
Their processings are almost same as prepare() of l3proto modules.
This makes prepare() more generic to simplify icmp[v6] l4proto module
later.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
Yasuyuki Kozakai [Sun, 15 Jul 2007 03:44:23 +0000 (20:44 -0700)] 
[NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
Michael Chan [Sun, 15 Jul 2007 02:07:52 +0000 (19:07 -0700)] 
[NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.

Add ethtool utility function to set or clear IPV6_CSUM feature flag.
Modify tg3.c and bnx2.c to use this function when doing ethtool -K
to change tx checksum.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[AF_IUCV]: Add lock when updating accept_q
Ursula Braun [Sun, 15 Jul 2007 02:04:25 +0000 (19:04 -0700)] 
[AF_IUCV]: Add lock when updating accept_q

The accept_queue of an af_iucv socket will be corrupted, if
adding and deleting of entries in this queue occurs at the
same time (connect request from one client, while accept call
is processed for another client).
Solution: add locking when updating accept_q

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[AF_IUCV]: Avoid deadlock between iucv_path_connect and tasklet.
Ursula Braun [Sun, 15 Jul 2007 02:03:41 +0000 (19:03 -0700)] 
[AF_IUCV]: Avoid deadlock between iucv_path_connect and tasklet.

An iucv deadlock may occur, where one CPU is spinning on the
iucv_table_lock for iucv_tasklet_fn(), while another CPU is holding
the iucv_table_lock for an iucv_path_connect() and is waiting for
the first CPU in an smp_call_function.
Solution: replace spin_lock in iucv_tasklet_fn by spin_trylock and
reschedule tasklet in case of non-granted lock.

Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[AF_IUCV]: Improve description of IUCV and AFIUCV configuration options.
Jennifer Hunt [Sun, 15 Jul 2007 02:03:00 +0000 (19:03 -0700)] 
[AF_IUCV]: Improve description of IUCV and AFIUCV configuration options.

Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Ursula Braun >braunu@de.ibm.com>
Acked-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[INET_SOCK]: make net/ipv4/inet_timewait_sock.c:__inet_twsk_kill() static
Adrian Bunk [Sun, 15 Jul 2007 02:00:59 +0000 (19:00 -0700)] 
[INET_SOCK]: make net/ipv4/inet_timewait_sock.c:__inet_twsk_kill() static

This patch makes the needlessly global __inet_twsk_kill() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years agoMerge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/linville...
David S. Miller [Sun, 15 Jul 2007 01:58:49 +0000 (18:58 -0700)] 
Merge branch 'upstream-davem' of /linux/kernel/git/linville/wireless-2.6

16 years ago[TCP]: tcp probe add back ssthresh field
Stephen Hemminger [Sun, 15 Jul 2007 01:57:19 +0000 (18:57 -0700)] 
[TCP]: tcp probe add back ssthresh field

Sangtae noticed the ssthresh got missed.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[VLAN]: Fix memset length
Patrick McHardy [Sun, 15 Jul 2007 01:56:30 +0000 (18:56 -0700)] 
[VLAN]: Fix memset length

Fix sizeof(ETH_ALEN) Introduced by my rtnl_link patches.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Add macvlan driver
Patrick McHardy [Sun, 15 Jul 2007 01:55:06 +0000 (18:55 -0700)] 
[NET]: Add macvlan driver

Add macvlan driver, which allows to create virtual ethernet devices
based on MAC address.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[VLAN]: Use multicast list synchronization helpers
Patrick McHardy [Sun, 15 Jul 2007 01:53:28 +0000 (18:53 -0700)] 
[VLAN]: Use multicast list synchronization helpers

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[VLAN]: Fix promiscous/allmulti synchronization races
Patrick McHardy [Sun, 15 Jul 2007 01:52:56 +0000 (18:52 -0700)] 
[VLAN]: Fix promiscous/allmulti synchronization races

The set_multicast_list function may be called without holding the rtnl
mutex, resulting in races when changing the underlying device's promiscous
and allmulti state. Use the change_rx_mode hook, which is always invoked
under the rtnl.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: dev_mcast: add multicast list synchronization helpers
Patrick McHardy [Sun, 15 Jul 2007 01:52:02 +0000 (18:52 -0700)] 
[NET]: dev_mcast: add multicast list synchronization helpers

The method drivers currently use to synchronize multicast lists is not
very pretty:

- walk the multicast list
- search each entry on a copy of the previous list
- if new add to lower device
- walk the copy of the previous list
- search each entry on the current list
- if removed delete from lower device
- copy entire list

This patch adds a new field to struct dev_addr_list to store the
synchronization state and adds two helper functions for synchronization
and cleanup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[NET]: Add net_device change_rx_mode callback
Patrick McHardy [Sun, 15 Jul 2007 01:51:31 +0000 (18:51 -0700)] 
[NET]: Add net_device change_rx_mode callback

Currently the set_multicast_list (and set_rx_mode) callbacks are
responsible for configuring the device according to the IFF_PROMISC,
IFF_MULTICAST and IFF_ALLMULTI flags and the mc_list (and uc_list in
case of set_rx_mode).

These callbacks can be invoked from BH context without the rtnl_mutex
by dev_mc_add/dev_mc_delete, which makes reading the device flags and
promiscous/allmulti count racy. For real hardware drivers that just
commit all changes to the hardware this is not a real problem since
the stack guarantees to call them for every change, so at least the
final call will not race and commit the correct configuration to the
hardware.

For software devices that want to synchronize promiscous and multicast
state to an underlying device however this can cause corruption of the
underlying device's flags or promisc/allmulti counts.

When the software device is concurrently put in promiscous or allmulti
mode while set_multicast_list is invoked from bottem half context, the
device might synchronize the change to the underlying device without
holding the rtnl_mutex, which races with concurrent changes to the
underlying device.

Add a dev->change_rx_flags hook that is invoked when any of the flags
that affect rx filtering change (under the rtnl_mutex), which allows
drivers to perform synchronization immediately and only synchronize
the address lists in set_multicast_list/set_rx_mode.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago[RFKILL]: fix net/rfkill/rfkill-input.c bug on 64-bit systems
Ingo Molnar [Sun, 15 Jul 2007 01:50:15 +0000 (18:50 -0700)] 
[RFKILL]: fix net/rfkill/rfkill-input.c bug on 64-bit systems

Subject: [patch] net/input: fix net/rfkill/rfkill-input.c bug on 64-bit systems

this recent commit:

 commit cf4328cd949c2086091c62c5685f1580fe9b55e4
 Author: Ivo van Doorn <IvDoorn@gmail.com>
 Date:   Mon May 7 00:34:20 2007 -0700

     [NET]: rfkill: add support for input key to control wireless radio

added this 64-bit bug:

        ....
unsigned int flags;

  spin_lock_irqsave(&task->lock, flags);
        ....

irq 'flags' must be unsigned long, not unsigned int. The -rt tree has
strict checks about this on 64-bit so this triggered a build failure.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
16 years ago9p: fix a race condition bug in umount which caused a segfault
Eric Van Hensbergen [Fri, 13 Jul 2007 21:47:58 +0000 (16:47 -0500)] 
9p: fix a race condition bug in umount which caused a segfault

umounting partitions after heavy activity would sometimes trigger a
segmentation violation.  This fix appears to remove that problem.
Fix originally provided by Latchesar Ionkov.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
16 years ago9p: re-enable mount time debug option
Eric Van Hensbergen [Fri, 13 Jul 2007 18:05:21 +0000 (13:05 -0500)] 
9p: re-enable mount time debug option

During reorganization, the mount time debug option was removed in favor
of module-load-time parameters.  However, the mount time option is still
a useful for feature during debug and for user-fault isolation when the
module is compiled into the kernel.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
16 years ago9p: cache meta-data when cache=loose
Eric Van Hensbergen [Fri, 13 Jul 2007 18:01:27 +0000 (13:01 -0500)] 
9p: cache meta-data when cache=loose

This patch expands the impact of the loose cache mode to allow for cached
metadata increasing the performance of directory listings and other metadata
read operations.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
16 years agonet/9p: set error to EREMOTEIO if trans->write returns zero
Latchesar Ionkov [Wed, 11 Jul 2007 21:14:46 +0000 (15:14 -0600)] 
net/9p: set error to EREMOTEIO if trans->write returns zero

If trans->write returns 0, p9_write_work goes through the error path, but
sets the error code to zero.

This patch sets the error code to EREMOTEIO if trans->write returns zero
value.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
16 years agonet/9p: change net/9p module name to 9pnet
Latchesar Ionkov [Wed, 11 Jul 2007 21:13:54 +0000 (15:13 -0600)] 
net/9p: change net/9p module name to 9pnet

Change module name of net/9p module from 9p.ko to 9pnet.ko. fs/9p module
already uses 9p.ko name.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
16 years ago9p: Reorganization of 9p file system code
Latchesar Ionkov [Tue, 10 Jul 2007 22:57:28 +0000 (17:57 -0500)] 
9p: Reorganization of 9p file system code

This patchset moves non-filesystem interfaces of v9fs from fs/9p to net/9p.
It moves the transport, packet marshalling and connection layers to net/9p
leaving only the VFS related files in fs/9p.  This work is being done in
preparation for in-kernel 9p servers as well as alternate 9p clients (other
than VFS).

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
16 years ago[XFS] Fix lockdep annotations for xfs_lock_inodes
David Chinner [Fri, 29 Jun 2007 07:26:09 +0000 (17:26 +1000)] 
[XFS] Fix lockdep annotations for xfs_lock_inodes

SGI-PV: 967035
SGI-Modid: xfs-linux-melb:xfs-kern:29026a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[LIB]: export radix_tree_preload()
David Chinner [Sat, 14 Jul 2007 06:05:04 +0000 (16:05 +1000)] 
[LIB]: export radix_tree_preload()

XFS filestreams functionality uses radix trees and the preload
functions. XFS can be built as a module and hence we need
radix_tree_preload() exported. radix_tree_preload_end() is a
static inline, so it doesn't need exporting.

Signed-Off-By: Dave Chinner <dgc@sgi.com>
Signed-Off-By: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode
Michal Marek [Wed, 11 Jul 2007 01:10:19 +0000 (11:10 +1000)] 
[XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode

* 32bit struct xfs_fsop_bulkreq has different size and layout of
members, no matter the alignment. Move the code out of the #else
branch (why was it there in the first place?). Define _32 variants of
the ioctl constants.
* 32bit struct xfs_bstat is different because of time_t and on
i386 because of different padding. Make xfs_bulkstat_one() accept a
custom "output formatter" in the private_data argument which takes care
of the xfs_bulkstat_one_compat() that takes care of the different
layout in the compat case.
* i386 struct xfs_inogrp has different padding.
Add a similar "output formatter" mecanism to xfs_inumbers().

SGI-PV: 967354
SGI-Modid: xfs-linux-melb:xfs-kern:29102a

Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Compat ioctl handler for handle operations
Michal Marek [Wed, 11 Jul 2007 01:10:09 +0000 (11:10 +1000)] 
[XFS] Compat ioctl handler for handle operations

32bit struct xfs_fsop_handlereq has different size and offsets (due to
pointers). TODO: case XFS_IOC_{FSSETDM,ATTRLIST,ATTRMULTI}_BY_HANDLE still
not handled.

SGI-PV: 967354
SGI-Modid: xfs-linux-melb:xfs-kern:29101a

Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1.
Michal Marek [Wed, 11 Jul 2007 01:09:57 +0000 (11:09 +1000)] 
[XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1.

i386 struct xfs_fsop_geom_v1 has no padding after the last member, so the
size is different.

SGI-PV: 967354
SGI-Modid: xfs-linux-melb:xfs-kern:29100a

Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Clean up function name handling in tracing code
Eric Sandeen [Wed, 11 Jul 2007 01:09:47 +0000 (11:09 +1000)] 
[XFS] Clean up function name handling in tracing code

Remove the hardcoded "fnames" for tracing, and just embed them in tracing
macros via __FUNCTION__. Kills a lot of #ifdefs too.

SGI-PV: 967353
SGI-Modid: xfs-linux-melb:xfs-kern:29099a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Quota inode has no parent.
David Chinner [Wed, 11 Jul 2007 01:09:33 +0000 (11:09 +1000)] 
[XFS] Quota inode has no parent.

Avoid using a special "zero inode" as the parent of the quota inode as
this can confuse the filestreams code into thinking the quota inode has a
parent. We do not want the quota inode to follow filestreams allocation
rules, so pass a NULL as the parent inode and detect this condition when
doing stream associations.

SGI-PV: 964469
SGI-Modid: xfs-linux-melb:xfs-kern:29098a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Concurrent Multi-File Data Streams
David Chinner [Wed, 11 Jul 2007 01:09:12 +0000 (11:09 +1000)] 
[XFS] Concurrent Multi-File Data Streams

In media spaces, video is often stored in a frame-per-file format. When
dealing with uncompressed realtime HD video streams in this format, it is
crucial that files do not get fragmented and that multiple files a placed
contiguously on disk.

When multiple streams are being ingested and played out at the same time,
it is critical that the filesystem does not cross the streams and
interleave them together as this creates seek and readahead cache miss
latency and prevents both ingest and playout from meeting frame rate
targets.

This patch set creates a "stream of files" concept into the allocator to
place all the data from a single stream contiguously on disk so that RAID
array readahead can be used effectively. Each additional stream gets
placed in different allocation groups within the filesystem, thereby
ensuring that we don't cross any streams. When an AG fills up, we select a
new AG for the stream that is not in use.

The core of the functionality is the stream tracking - each inode that we
create in a directory needs to be associated with the directories' stream.
Hence every time we create a file, we look up the directories' stream
object and associate the new file with that object.

Once we have a stream object for a file, we use the AG that the stream
object point to for allocations. If we can't allocate in that AG (e.g. it
is full) we move the entire stream to another AG. Other inodes in the same
stream are moved to the new AG on their next allocation (i.e. lazy
update).

Stream objects are kept in a cache and hold a reference on the inode.
Hence the inode cannot be reclaimed while there is an outstanding stream
reference. This means that on unlink we need to remove the stream
association and we also need to flush all the associations on certain
events that want to reclaim all unreferenced inodes (e.g. filesystem
freeze).

SGI-PV: 964469
SGI-Modid: xfs-linux-melb:xfs-kern:29096a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Barry Naujok <bnaujok@sgi.com>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Vlad Apostolov <vapo@sgi.com>
16 years ago[XFS] Use uninitialized_var macro to stop warning about rtx
Andrew Morton [Thu, 28 Jun 2007 06:46:56 +0000 (16:46 +1000)] 
[XFS] Use uninitialized_var macro to stop warning about rtx

Appease gcc in regards to "warning: 'rtx' is used uninitialized in
this function".

SGI-PV: 907752
SGI-Modid: xfs-linux-melb:xfs-kern:29007a

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] XFS should not be looking at filp reference counts
Christoph Hellwig [Thu, 28 Jun 2007 06:46:47 +0000 (16:46 +1000)] 
[XFS] XFS should not be looking at filp reference counts

A check for file_count is always a bad idea. Linux has the ->release
method to deal with cleanups on last close and ->flush is only for the
very rare case where we want to perform an operation on every drop of a
reference to a file struct.

This patch gets rid of vop_close and surrounding code in favour of simply
doing the page flushing from ->release.

SGI-PV: 966562
SGI-Modid: xfs-linux-melb:xfs-kern:28952a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Use is_power_of_2 instead of open coding checks
Vignesh Babu [Thu, 28 Jun 2007 06:46:37 +0000 (16:46 +1000)] 
[XFS] Use is_power_of_2 instead of open coding checks

SGI-PV: 966576
SGI-Modid: xfs-linux-melb:xfs-kern:28950a

Signed-off-by: Vignesh Babu <vignesh.babu@wipro.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Reduce shouting by removing unnecessary macros from dir2 code.
Christoph Hellwig [Thu, 28 Jun 2007 06:43:50 +0000 (16:43 +1000)] 
[XFS] Reduce shouting by removing unnecessary macros from dir2 code.

SGI-PV: 966505
SGI-Modid: xfs-linux-melb:xfs-kern:28947a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Simplify XFS min/max macros.
David Chinner [Thu, 28 Jun 2007 06:43:39 +0000 (16:43 +1000)] 
[XFS] Simplify XFS min/max macros.

SGI-PV: 964547
SGI-Modid: xfs-linux-melb:xfs-kern:28945a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Nathan Scott <nscott@aconex.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Kill off xfs_count_bits
Eric Sandeen [Thu, 28 Jun 2007 06:43:30 +0000 (16:43 +1000)] 
[XFS] Kill off xfs_count_bits

xfs_count_bits is only called once, and is then compared to 0. IOW, what
it really wants to know is, is the bitmap empty. This can be done more
simply, certainly.

SGI-PV: 966503
SGI-Modid: xfs-linux-melb:xfs-kern:28944a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Cancel transactions on xfs_itruncate_start error.
Jesper Juhl [Thu, 28 Jun 2007 06:43:14 +0000 (16:43 +1000)] 
[XFS] Cancel transactions on xfs_itruncate_start error.

SGI-PV: 966502
SGI-Modid: xfs-linux-melb:xfs-kern:28943a

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Use do_div() on 64 bit types.
Christoph Hellwig [Mon, 18 Jun 2007 07:57:45 +0000 (17:57 +1000)] 
[XFS] Use do_div() on 64 bit types.

SGI-PV: 966145
SGI-Modid: xfs-linux-melb:xfs-kern:28889a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Fix remount,readonly path to flush everything correctly.
David Chinner [Mon, 18 Jun 2007 06:50:48 +0000 (16:50 +1000)] 
[XFS] Fix remount,readonly path to flush everything correctly.

The remount readonly path can fail to writeback properly because we still
have active transactions after calling xfs_quiesce_fs(). Further
investigation shows that this path is broken in the same ways that the xfs
freeze path was broken so fix it the same way.

SGI-PV: 964464
SGI-Modid: xfs-linux-melb:xfs-kern:28869a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Cleanup inode extent size hint extraction
David Chinner [Mon, 18 Jun 2007 06:50:37 +0000 (16:50 +1000)] 
[XFS] Cleanup inode extent size hint extraction

SGI-PV: 966004
SGI-Modid: xfs-linux-melb:xfs-kern:28866a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Prevent ENOSPC from aborting transactions that need to succeed
David Chinner [Mon, 18 Jun 2007 06:50:27 +0000 (16:50 +1000)] 
[XFS] Prevent ENOSPC from aborting transactions that need to succeed

During delayed allocation extent conversion or unwritten extent
conversion, we need to reserve some blocks for transactions reservations.
We need to reserve these blocks in case a btree split occurs and we need
to allocate some blocks.

Unfortunately, we've only ever reserved the number of data blocks we are
allocating, so in both the unwritten and delalloc case we can get ENOSPC
to the transaction reservation. This is bad because in both cases we
cannot report the failure to the writing application.

The fix is two-fold:

1 - leverage the reserved block infrastructure XFS already
has to reserve a small pool of blocks by default to allow
specially marked transactions to dip into when we are at
ENOSPC.
Default setting is min(5%, 1024 blocks).

2 - convert critical transaction reservations to be allowed
to dip into this pool. Spots changed are delalloc
conversion, unwritten extent conversion and growing a
filesystem at ENOSPC.
This also allows growing the filesytsem to succeed at ENOSPC.

SGI-PV: 964468
SGI-Modid: xfs-linux-melb:xfs-kern:28865a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Prevent deadlock when flushing inodes on unmount
David Chinner [Mon, 18 Jun 2007 06:50:17 +0000 (16:50 +1000)] 
[XFS] Prevent deadlock when flushing inodes on unmount

When we are unmounting the filesystem, we flush all the inodes to disk.
Unfortunately, if we have an inode cluster that has just been freed and
marked stale sitting in an incore log buffer (i.e. hasn't been flushed to
disk), it will be holding all the flush locks on the inodes in that
cluster.

xfs_iflush_all() which is called during unmount walks all the inodes
trying to reclaim them, and it doing so calls xfs_finish_reclaim() on each
inode. If the inode is dirty, if grabs the flush lock and flushes it.
Unfortunately, find dirty inodes that already have their flush lock held
and so we sleep.

At this point in the unmount process, we are running single-threaded.
There is nothing more that can push on the log to force the transaction
holding the inode flush locks to disk and hence we deadlock.

The fix is to issue a log force before flushing the inodes on unmount so
that all the flush locks will be released before we start flushing the
inodes.

SGI-PV: 964538
SGI-Modid: xfs-linux-melb:xfs-kern:28862a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Log the agf_length change in xfs_growfs_data_private().
Tim Shimmin [Mon, 18 Jun 2007 06:50:08 +0000 (16:50 +1000)] 
[XFS] Log the agf_length change in xfs_growfs_data_private().

SGI-PV: 963528
SGI-Modid: xfs-linux-melb:xfs-kern:28856a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
16 years ago[XFS] Map unwritten extents correctly for I/o completion processing
David Chinner [Mon, 18 Jun 2007 06:49:58 +0000 (16:49 +1000)] 
[XFS] Map unwritten extents correctly for I/o completion processing

If we have multiple unwritten extents within a single page, we fail to
tell the I/o completion construction handlers we need a new handle for the
second and subsequent blocks in the page. While we still issue the I/O
correctly, we do not have the correct ranges recorded in the ioend
structures and hence when we go to convert the unwritten extents we screw
it up.

Make sure we start a new ioend every time the mapping changes so that we
convert the correct ranges on I/O completion.

SGI-PV: 964647
SGI-Modid: xfs-linux-melb:xfs-kern:28797a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Apply transaction delta counts atomically to incore counters
David Chinner [Mon, 18 Jun 2007 06:49:44 +0000 (16:49 +1000)] 
[XFS] Apply transaction delta counts atomically to incore counters

With the per-cpu superblock counters, batch updates are no longer atomic
across the entire batch of changes. This is not an issue if each
individual change in the batch is applied atomically. Unfortunately, free
block count changes are not applied atomically, and they are applied in a
manner guaranteed to cause problems.

Essentially, the free block count reservation that the transaction took
initially is returned to the in core counters before a second delta takes
away what is used. because these two operations are not atomic, we can
race with another thread that can use the returned transaction reservation
before the transaction takes the space away again and we can then get
ENOSPC being reported in a spot where we don't have an ENOSPC condition,
nor should we ever see one there.

Fix it up by rolling the two deltas into the one so it can be applied
safely (i.e. atomically) to the incore counters.

SGI-PV: 964465
SGI-Modid: xfs-linux-melb:xfs-kern:28796a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Handle null returned from xfs_vtoi() in xfs_setfilesize().
David Chinner [Tue, 5 Jun 2007 06:24:44 +0000 (16:24 +1000)] 
[XFS] Handle null returned from xfs_vtoi() in xfs_setfilesize().

SGI-PV: 965636
SGI-Modid: xfs-linux-melb:xfs-kern:28777a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Block on unwritten extent conversion during synchronous direct I/O.
David Chinner [Tue, 5 Jun 2007 06:24:36 +0000 (16:24 +1000)] 
[XFS] Block on unwritten extent conversion during synchronous direct I/O.

Currently we do not wait on extent conversion to occur, and hence we can
return to userspace from a synchronous direct I/O write without having
completed all the actions in the write. Hence a read after the write may
see zeroes (unwritten extent) rather than the data that was written.

Block the I/O completion by triggering a synchronous workqueue flush to
ensure that the conversion has occurred before we return to userspace.

SGI-PV: 964092
SGI-Modid: xfs-linux-melb:xfs-kern:28775a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Flush the block device before closing it on unmount.
David Chinner [Tue, 5 Jun 2007 06:24:27 +0000 (16:24 +1000)] 
[XFS] Flush the block device before closing it on unmount.

SGI-PV: 965630
SGI-Modid: xfs-linux-melb:xfs-kern:28774a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] xfs_bmapi fails to update the previous extent pointer
David Chinner [Tue, 5 Jun 2007 06:24:15 +0000 (16:24 +1000)] 
[XFS] xfs_bmapi fails to update the previous extent pointer

When processing multiple extent maps, xfs_bmapi needs to keep track of the
extent behind the one it is currently working on to be able to trim extent
ranges correctly. Failing to update the previous pointer can result in
corrupted extent lists in memory and this will result in panics or assert
failures.

Update the previous pointer correctly when we move to the next extent to
process.

SGI-PV: 965631
SGI-Modid: xfs-linux-melb:xfs-kern:28773a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Fix the transaction flags to make lazy superblock counters work.
David Chinner [Thu, 24 May 2007 05:26:51 +0000 (15:26 +1000)] 
[XFS] Fix the transaction flags to make lazy superblock counters work.

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28653a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Lazy Superblock Counters
David Chinner [Thu, 24 May 2007 05:26:31 +0000 (15:26 +1000)] 
[XFS] Lazy Superblock Counters

When we have a couple of hundred transactions on the fly at once, they all
typically modify the on disk superblock in some way.
create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
free block counts.

When these counts are modified in a transaction, they must eventually lock
the superblock buffer and apply the mods. The buffer then remains locked
until the transaction is committed into the incore log buffer. The result
of this is that with enough transactions on the fly the incore superblock
buffer becomes a bottleneck.

The result of contention on the incore superblock buffer is that
transaction rates fall - the more pressure that is put on the superblock
buffer, the slower things go.

The key to removing the contention is to not require the superblock fields
in question to be locked. We do that by not marking the superblock dirty
in the transaction. IOWs, we modify the incore superblock but do not
modify the cached superblock buffer. In short, we do not log superblock
modifications to critical fields in the superblock on every transaction.
In fact we only do it just before we write the superblock to disk every
sync period or just before unmount.

This creates an interesting problem - if we don't log or write out the
fields in every transaction, then how do the values get recovered after a
crash? the answer is simple - we keep enough duplicate, logged information
in other structures that we can reconstruct the correct count after log
recovery has been performed.

It is the AGF and AGI structures that contain the duplicate information;
after recovery, we walk every AGI and AGF and sum their individual
counters to get the correct value, and we do a transaction into the log to
correct them. An optimisation of this is that if we have a clean unmount
record, we know the value in the superblock is correct, so we can avoid
the summation walk under normal conditions and so mount/recovery times do
not change under normal operation.

One wrinkle that was discovered during development was that the blocks
used in the freespace btrees are never accounted for in the AGF counters.
This was once a valid optimisation to make; when the filesystem is full,
the free space btrees are empty and consume no space. Hence when it
matters, the "accounting" is correct. But that means the when we do the
AGF summations, we would not have a correct count and xfs_check would
complain. Hence a new counter was added to track the number of blocks used
by the free space btrees. This is an *on-disk format change*.

As a result of this, lazy superblock counters are a mkfs option and at the
moment on linux there is no way to convert an old filesystem. This is
possible - xfs_db can be used to twiddle the right bits and then
xfs_repair will do the format conversion for you. Similarly, you can
convert backwards as well. At some point we'll add functionality to
xfs_admin to do the bit twiddling easily....

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28652a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Use generic shrinker interfaces in XFS.
Andrew Morton [Thu, 24 May 2007 05:25:42 +0000 (15:25 +1000)] 
[XFS] Use generic shrinker interfaces in XFS.

SGI-PV: 964986
SGI-Modid: xfs-linux-melb:xfs-kern:28642a

Signed-Off-By: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Make hole punching at EOF atomic.
David Chinner [Thu, 24 May 2007 05:22:19 +0000 (15:22 +1000)] 
[XFS] Make hole punching at EOF atomic.

If hole punching at EOF is done as two steps (i.e. truncate then extend)
the file is in a transient state between the two steps where an
application can see the incorrect file size. Punching a hole to EOF needs
to be treated in teh same way as all other hole punching cases so that the
file size is never seen to change.

SGI-PV: 962012
SGI-Modid: xfs-linux-melb:xfs-kern:28641a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Vlad Apostolov <vapo@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Fix vmalloc leak on mount/unmount.
David Chinner [Thu, 24 May 2007 05:21:57 +0000 (15:21 +1000)] 
[XFS] Fix vmalloc leak on mount/unmount.

When setting the length of the iclogbuf to write out we should just be
changing the desired byte count rather completely reassociating the buffer
memory with the buffer. Reassociating the buffer memory changes the
apparent length of the buffer and hence when we free the buffer, we don't
free all the vmap()d space we originally allocated.

SGI-PV: 964983
SGI-Modid: xfs-linux-melb:xfs-kern:28640a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Fix double free in xfs_buf_get_noaddr error handling path
Christoph Hellwig [Thu, 24 May 2007 05:21:11 +0000 (15:21 +1000)] 
[XFS] Fix double free in xfs_buf_get_noaddr error handling path

SGI-PV: 964983
SGI-Modid: xfs-linux-melb:xfs-kern:28639a

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Fix use-after-free during log unmount.
David Chinner [Mon, 14 May 2007 08:24:16 +0000 (18:24 +1000)] 
[XFS] Fix use-after-free during log unmount.

Don't reference the log buffer after running the callbacks as the callback
can trigger the log buffers to be freed during unmount.

SGI-PV: 964545
SGI-Modid: xfs-linux-melb:xfs-kern:28567a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Sleeping with the ilock waiting for I/O completion is Bad.
David Chinner [Mon, 14 May 2007 08:24:09 +0000 (18:24 +1000)] 
[XFS] Sleeping with the ilock waiting for I/O completion is Bad.

Recent fixes to the filesystem freezing code introduced a vn_iowait call
in the middle of the sync code. Unfortunately, at the point where this
call was added we are holding the ilock. The ilock is needed by I/O
completion for unwritten extent conversion and now updating the file size.
Hence I/o cannot complete if we hold the ilock while waiting for I/O
completion.

Fix up the bug and clean the code up around it.

SGI-PV: 963674
SGI-Modid: xfs-linux-melb:xfs-kern:28566a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Don't grow filesystems past the size they can index.
Nathan Scott [Mon, 14 May 2007 08:24:02 +0000 (18:24 +1000)] 
[XFS] Don't grow filesystems past the size they can index.

When growing a filesystem we don't check to see if the new size overflows
the page cache index range, so we can do silly things like grow a
filesystem page 16TB on a 32bit. Check new filesystem sizes against the
limits the kernel can support.

SGI-PV: 957886
SGI-Modid: xfs-linux-melb:xfs-kern:28563a

Signed-Off-By: Nathan Scott <nscott@aconex.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years ago[XFS] Only use refcounted pages for I/O
Christoph Hellwig [Mon, 14 May 2007 08:23:50 +0000 (18:23 +1000)] 
[XFS] Only use refcounted pages for I/O

Many block drivers (aoe, iscsi) really want refcountable pages in bios,
which is what almost everyone send down. XFS unfortunately has a few
places where it sends down buffers that may come from kmalloc, which
breaks them.

Fix the places that use kmalloc()d buffers.

SGI-PV: 964546
SGI-Modid: xfs-linux-melb:xfs-kern:28562a

Signed-Off-By: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
16 years agoRevert "SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel"
Linus Torvalds [Fri, 13 Jul 2007 23:53:18 +0000 (16:53 -0700)] 
Revert "SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel"

This reverts commit 9faf65fb6ee2b4e08325ba2d69e5ccf0c46453d0.

It bit people like Michal Piotrowski:

  "My system is too secure, I can not login :)"

because it changed how CONFIG_NETLABEL worked, and broke older SElinux
policies.

As a result, quoth James Morris:

  "Can you please revert this patch?

   We thought it only affected people running MLS, but it will affect others.

   Sorry for the hassle."

Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Paul Moore <paul.moore@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge git://git.linux-nfs.org/pub/linux/nfs-2.6
Linus Torvalds [Fri, 13 Jul 2007 23:46:18 +0000 (16:46 -0700)] 
Merge git://git.linux-nfs.org/pub/linux/nfs-2.6

* git://git.linux-nfs.org/pub/linux/nfs-2.6: (122 commits)
  sunrpc: drop BKL around wrap and unwrap
  NFSv4: Make sure unlock is really an unlock when cancelling a lock
  NLM: fix source address of callback to client
  SUNRPC client: add interface for binding to a local address
  SUNRPC server: record the destination address of a request
  SUNRPC: cleanup transport creation argument passing
  NFSv4: Make the NFS state model work with the nosharedcache mount option
  NFS: Error when mounting the same filesystem with different options
  NFS: Add the mount option "nosharecache"
  NFS: Add support for mounting NFSv4 file systems with string options
  NFS: Add final pieces to support in-kernel mount option parsing
  NFS: Introduce generic mount client API
  NFS: Add enums and match tables for mount option parsing
  NFS: Improve debugging output in NFS in-kernel mount client
  NFS: Clean up in-kernel NFS mount
  NFS: Remake nfsroot_mount as a permanent part of NFS client
  SUNRPC: Add a convenient default for the hostname when calling rpc_create()
  SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name
  SUNRPC: Rename rpcb_getport_external routine
  SUNRPC: Allow rpcbind requests to be interrupted by a signal.
  ...

16 years agonfsd: fix nfsd_vfs_read() splice actor setup
Jens Axboe [Fri, 13 Jul 2007 20:42:20 +0000 (22:42 +0200)] 
nfsd: fix nfsd_vfs_read() splice actor setup

When nfsd was transitioned to use splice instead of sendfile() for data
transfers, a line setting the page index was lost. Restore it, so that
nfsd is functional when that path is used.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoCFS: Fix missing digit off in wmult table
Thomas Gleixner [Fri, 13 Jul 2007 19:43:55 +0000 (21:43 +0200)] 
CFS: Fix missing digit off in wmult table

Roman Zippel noticed another inconsistency of the wmult table.

wmult[16] has a missing digit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
Linus Torvalds [Fri, 13 Jul 2007 23:06:30 +0000 (16:06 -0700)] 
Merge /pub/scm/linux/kernel/git/davej/cpufreq

* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix typos in powernow-k8 printk's.
  [CPUFREQ] Restore previously used governor on a hot-replugged CPU
  [CPUFREQ] bugfix cpufreq in combination with performance governor
  [CPUFREQ] powernow-k8 compile fix.
  [CPUFREQ] the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI
  [CPUFREQ] Longhaul - Option to disable ACPI C3 support

Fixed up arch/i386/kernel/cpu/cpufreq/powernow-k8.c due to revert that
got fixed differently in the cpufreq branch.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop
Linus Torvalds [Fri, 13 Jul 2007 17:52:27 +0000 (10:52 -0700)] 
Merge branch 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop

* 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop: (28 commits)
  ioatdma: add the unisys "i/oat" pci vendor/device id
  ARM: Add drivers/dma to arch/arm/Kconfig
  iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver
  iop13xx: surface the iop13xx adma units to the iop-adma driver
  dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
  md: remove raid5 compute_block and compute_parity5
  md: handle_stripe5 - request io processing in raid5_run_ops
  md: handle_stripe5 - add request/completion logic for async expand ops
  md: handle_stripe5 - add request/completion logic for async read ops
  md: handle_stripe5 - add request/completion logic for async check ops
  md: handle_stripe5 - add request/completion logic for async compute ops
  md: handle_stripe5 - add request/completion logic for async write ops
  md: common infrastructure for running operations with raid5_run_ops
  md: raid5_run_ops - run stripe operations outside sh->lock
  raid5: replace custom debug PRINTKs with standard pr_debug
  raid5: refactor handle_stripe5 and handle_stripe6 (v3)
  async_tx: add the async_tx api
  xor: make 'xor_blocks' a library routine for use with async_tx
  dmaengine: make clients responsible for managing channels
  dmaengine: refactor dmaengine around dma_async_tx_descriptor
  ...