Jan Engelhardt [Thu, 18 Oct 2007 10:04:34 +0000 (03:04 -0700)]
Remove CONFIG_VT_UNICODE
Since default_utf8 is already a sysfs attribute, having an extra
CONFIG_VT_UNICODE compile-time option is redundant, since sysfs attributes can
be set at boot and run time.
Also let Linux VCs default to UTF-8 (as per the discussion at
http://lkml.org/lkml/2007/9/6/99).
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Cc: Bill Nottingham <notting@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by@vergenet.net":Simon [Thu, 18 Oct 2007 10:04:33 +0000 (03:04 -0700)]
Kexec: Update URL in MAINTAINERS file
I'm not sure that the new URL satifies the requirement of status/info, but
it does at least as good a job as the old URL, and contains current
releases of kexec-tools, rather than somewhat ancient versions.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Karsten Keil [Thu, 18 Oct 2007 10:04:32 +0000 (03:04 -0700)]
i4l: Fix random hard freeze with AVM c4 card
The patch
- Includes the call to capilib_data_b3_req in the spinlock. This routine
in turn calls the offending mq_enqueue routine that triggered the
freeze if not locked. This should also fix other indicators of
incosistent capilib_msgidqueue list, that trigger messages like:
Oct 5 03:05:57 BERL0 kernel: kcapi: msgid 3019 ncci 0x30301 not on queue
that we saw several times a day (usually several in a row).
- Fixes all occurrences of c4_dispatch_tx to be called with active
spinlock, there were some instances where no lock was active. Mostly
these are in very infrequently called routines, so the additional
performance penalty is minimal.
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Rainer Brestan <rainer.brestan@frequentis.com>
Signed-off-by: Ralf Schlatterbeck <rsc@runtux.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Karsten Keil [Thu, 18 Oct 2007 10:04:31 +0000 (03:04 -0700)]
i4l: fix random freezes with AVM B1 drivers
This fix the same issue which was debbuged for the C4 controller for the B1
versions.
The capilib_ function modify or traverse a linked list without locking.
This patch extends the existing locking to the calls of these function to
prevent access to a list which is in the middle of a modification.
Signed-off-by: Karsten Keil <kkeil@suse.de>
C: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:04:30 +0000 (03:04 -0700)]
au1100fb: fix modpost warnings
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x170be8): Section mismatch: reference to .init.data:au1100fb_fix (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170dc4): Section mismatch: reference to .init.data:au1100fb_var (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170dd0): Section mismatch: reference to .init.data:au1100fb_fix (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170de0): Section mismatch: reference to .init.data:au1100fb_var (between 'au1100fb_drv_probe' and 'read_null')
WARNING: vmlinux.o(.text+0x170e70): Section mismatch: reference to .init.data:au1100fb_var (between 'au1100fb_drv_probe' and 'read_null')
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Thu, 18 Oct 2007 10:04:29 +0000 (03:04 -0700)]
netport_con.c: fix build errors and warnings
Fix build broken by
accaa24c492f1aa3b9c37226d868dc59c3007531:
CC drivers/video/console/newport_con.o
drivers/video/console/newport_con.c: In function 'newport_show_logo':
drivers/video/console/newport_con.c:111: error: assignment of read-only location
drivers/video/console/newport_con.c:111: warning: assignment makes integer from pointer without a cast
drivers/video/console/newport_con.c:112: error: assignment of read-only location
drivers/video/console/newport_con.c:112: warning: assignment makes integer from pointer without a cast
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Blanchard [Mon, 15 Oct 2007 21:13:56 +0000 (16:13 -0500)]
hrtimer: hook compat_sys_nanosleep up to high res timer code
Now we have high res timers on ppc64 I thought Id test them. It turns
out compat_sys_nanosleep hasnt been converted to the hrtimer code and so
is limited to HZ resolution.
The follow patch converts compat_sys_nanosleep to use high res timers.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Anton Blanchard [Mon, 15 Oct 2007 21:06:04 +0000 (16:06 -0500)]
hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
Pull the copy_to_user out of hrtimer_nanosleep and into the callers
(common_nsleep, sys_nanosleep) in preparation for converting
compat_sys_nanosleep to use hrtimers.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeff Garzik [Thu, 18 Oct 2007 20:21:18 +0000 (16:21 -0400)]
[libata] kill ata_sg_is_last()
Short term, this works around a bug introduced by early sg-chaining
work.
Long term, removing this function eliminates a branch from a hot
path loop in each scatter/gather table build. Also, as this code
demonstrates, we don't need to _track_ the end of the s/g list, as
long as we mark it in some way. And doing so programatically is nice.
So its a useful cleanup, regardless of its short term effects.
Based conceptually on a quick patch by Jens Axboe.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Ken Chen [Thu, 18 Oct 2007 19:32:56 +0000 (21:32 +0200)]
sched: reduce schedstat variable overhead a bit
schedstat is useful in investigating CPU scheduler behavior. Ideally,
I think it is beneficial to have it on all the time. However, the
cost of turning it on in production system is quite high, largely due
to number of events it collects and also due to its large memory
footprint.
Most of the fields probably don't need to be full 64-bit on 64-bit
arch. Rolling over 4 billion events will most like take a long time
and user space tool can be made to accommodate that. I'm proposing
kernel to cut back most of variable width on 64-bit system. (note,
the following patch doesn't affect 32-bit system).
Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 18 Oct 2007 19:32:56 +0000 (21:32 +0200)]
sched: add KERN_CONT annotation
printk: add the KERN_CONT annotation (which is empty string but via
which checkpatch.pl can notice that the lacking KERN_ level is fine).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 18 Oct 2007 19:32:55 +0000 (21:32 +0200)]
sched: cleanup, make struct rq comments more consistent
cleanup, make struct rq comments more consistent.
found via scripts/checkpatch.pl.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 18 Oct 2007 19:32:55 +0000 (21:32 +0200)]
sched: cleanup, fix spacing
cleanup: fix sysctl_sched_features initialization spacing, and
fix sd_alloc_ctl_cpu_table() prototype spacing.
found via scripts/checkpatch.pl.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Thu, 18 Oct 2007 19:32:55 +0000 (21:32 +0200)]
sched: fix return value of wait_for_completion_interruptible()
The recent wait_for_completion() cleanups:
commit
8cbbe86dfcfd68ad69916164bdc838d9e09adca8
Author: Andi Kleen <ak@suse.de>
Date: Mon Oct 15 17:00:14 2007 +0200
sched: cleanup: refactor common code of sleep_on / wait_for_completion
Refactor common code of sleep_on / wait_for_completion
broke the return value of wait_for_completion_interruptible().
Previously it returned 0 on success, now -1. Fix that.
Problem found by Geert Uytterhoeven.
[ mingo: fixed whitespace damage ]
Reported-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ralf Baechle [Thu, 18 Oct 2007 16:48:11 +0000 (17:48 +0100)]
[MIPS] time: Move R4000 clockevent device code to separate configurable file
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 18 Oct 2007 15:00:19 +0000 (16:00 +0100)]
[MIPS] time: Delete dead cycles_per_jiffy, mips_timer_ack and null_timer_ack
cycles_per_jiffy was only ever getting assigned and the function pointer
not being called anymore and mips_timer_ack had gotten similarly stale. I
leave the remaining assignments unfixed as a lighthouse pointing platform
maintainers to what needs a rewrite. These changes make null_timer_ack()
unreferenced, so delete that too.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 18 Oct 2007 13:00:16 +0000 (14:00 +0100)]
[MIPS] IP32: Retire use of plat_timer_setup.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 18 Oct 2007 12:51:15 +0000 (13:51 +0100)]
[MIPS] Jazz: Retire use of plat_timer_setup.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 18 Oct 2007 12:34:12 +0000 (13:34 +0100)]
[MIPS] IP27: Convert to clock_event_device.
This separates the tick timer stuff from the generic MIPS time.c.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 18 Oct 2007 00:10:12 +0000 (01:10 +0100)]
[MIPS] JMR3927: Convert to clock_event_device.
This separate the tick timer stuff from the generic MIPS time.c.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Thomas Bogendoerfer [Thu, 13 Sep 2007 18:23:48 +0000 (20:23 +0200)]
[MIPS] Always do the ARC64_TWIDDLE_PC thing.
Always jump to the place where the kernel is linked to. This helps where
the bootloaders/proms ignores the start address inside the ELF header.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Pavel Emelyanov [Thu, 18 Oct 2007 12:38:48 +0000 (05:38 -0700)]
[IPV6]: Fix again the fl6_sock_lookup() fixed locking
YOSHIFUJI fairly pointed out, that the users increment should
be done under the ip6_sk_fl_lock not to give IPV6_FL_A_PUT a
chance to put this count to zero and release the flowlabel.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jozsef Kadlecsik [Thu, 18 Oct 2007 12:20:12 +0000 (05:20 -0700)]
[NETFILTER]: nf_conntrack_tcp: fix connection reopening fix
If one side aborts an established connection, the entry still lingers
for 10s in conntrack for the late packets. Allow to open up the
connection again for the party which sent the RST packet.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 12:18:56 +0000 (05:18 -0700)]
[IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels
In the IPV6_FL_A_GET case the hash is checked for flowlabels
with the given label. If it is not found, the lock, protecting
the hash, is dropped to be re-get for writing. After this a
newly allocated entry is inserted, but no checks are performed
to catch a classical SMP race, when the conflicting label may
be inserted on another cpu.
Use the (currently unused) return value from fl_intern() to
return the conflicting entry (if found) and re-check, whether
we can reuse it (IPV6_FL_F_EXCL) or return -EEXISTS.
Also add the comment, about why not re-lookup the current
sock for conflicting flowlabel entry.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 12:15:57 +0000 (05:15 -0700)]
[IPV6]: Lost locking in fl6_sock_lookup
This routine scans the ipv6_fl_list whose update is
protected with the socket lock and the ip6_sk_fl_lock.
Since the socket lock is not taken in the lookup, use
the other one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 12:14:58 +0000 (05:14 -0700)]
[IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list
The new flowlabels should be inserted into the sock list
under the ip6_sk_fl_lock. This was lost in one place.
This list is naturally protected with the socket lock, but
the fl6_sock_lookup() is called without it, so another
protection is required.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li Zefan [Thu, 18 Oct 2007 12:12:21 +0000 (05:12 -0700)]
[NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required
Macros like SCTP_CHUNKMAP_XXX(chukmap) require chukmap to be an array,
but match_packet() passes a pointer to these macros. Also remove the
ELEMCOUNT macro and fix a bug in SCTP_CHUNKMAP_COPY.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 18 Oct 2007 12:09:28 +0000 (05:09 -0700)]
[NET]: Fix OOPS due to missing check in dev_parse_header().
[ This is kernel bugzilla 9174 "linux-2.6.23-git11 kernel panic" ]
The device in question is an IPv6-over-IPv4 tunnel, which doesn't have
any header_ops, so the crash happens in dev_parse_header when
dereferencing them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Thu, 18 Oct 2007 12:07:57 +0000 (05:07 -0700)]
[TCP]: Remove lost_retrans zero seqno special cases
Both high-sack detection and new lowest seq variables have
unnecessary zero special case which are now removed by setting
safe initial seqnos.
This also fixes problem which caused zero received_upto being
passed to tcp_mark_lost_retrans which confused after relations
within the marker loop causing incorrect TCPCB_SACKED_RETRANS
clearing. The problem was noticed because of a performance
report from TAKANO Ryousei <takano@axe-inc.co.jp>.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Ryousei Takano <takano-ryousei@aist.go.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wim Van Sebroeck [Fri, 17 Aug 2007 08:38:02 +0000 (08:38 +0000)]
mv watchdog tree under drivers
move watchdog tree from drivers/char/watchdog to drivers/watchdog.
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Jeff Garzik [Thu, 18 Oct 2007 06:26:43 +0000 (23:26 -0700)]
[NET]: fix carrier-on bug?
While looking at a net driver with the following construct,
if (!netif_carrier_ok(dev))
netif_carrier_on(dev);
it stuck me that the netif_carrier_ok() check was redundant, since
netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway. This is
the same reason why netif_queue_stopped() need not be called prior to
netif_wake_queue().
This is true, but there is however an unwanted side effect from assuming
that netif_carrier_on() can be called multiple times: it touches the
watchdog, regardless of pre-existing carrier state.
The fix: move watchdog-up inside the bit-cleared code path.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 18 Oct 2007 04:37:22 +0000 (21:37 -0700)]
[NET]: Fix uninitialised variable in ip_frag_reasm()
Fix uninitialised variable in ip_frag_reasm(). err should be set to
-ENOMEM if the initial call of skb_clone() fails.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:35:51 +0000 (21:35 -0700)]
[IPSEC]: Rename mode to outer_mode and add inner_mode
This patch adds a new field to xfrm states called inner_mode. The existing
mode object is renamed to outer_mode.
This is the first part of an attempt to fix inter-family transforms. As it
is we always use the outer family when determining which mode to use. As a
result we may end up shoving IPv4 packets into netfilter6 and vice versa.
What we really want is to use the inner family for the first part of outbound
processing and the outer family for the second part. For inbound processing
we'd use the opposite pairing.
I've also added a check to prevent silly combinations such as transport mode
with inter-family transforms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:35:15 +0000 (21:35 -0700)]
[IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP
Combining RO and AH/ESP/IPCOMP does not make sense. So this patch adds a
check in the state initialisation function to prevent this.
This allows us to safely remove the mode input function of RO since it
can never be called anymore. Indeed, if somehow it does get called we'll
know about it through an OOPS instead of it slipping past silently.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:34:46 +0000 (21:34 -0700)]
[IPSEC]: Use the top IPv4 route's peer instead of the bottom
For IPv4 we were using the bottom route's peer instead of the top one.
This is wrong because the peer is only used by TCP to keep track of
information about the TCP destination address which certainly does not
live in the bottom route.
This patch fixes that which allows us to get rid of the family check
since the bottom route could be IPv6 while the top one must always
be IPv4.
I've also changed the other fields which are IPv4-specific to get the
info from the top route instead of potentially bogus data from the
bottom route.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:33:12 +0000 (21:33 -0700)]
[IPSEC]: Store afinfo pointer in xfrm_mode
It is convenient to have a pointer from xfrm_state to address-specific
functions such as the output function for a family. Currently the
address-specific policy code calls out to the xfrm state code to get
those pointers when we could get it in an easier way via the state
itself.
This patch adds an xfrm_state_afinfo to xfrm_mode (since they're
address-specific) and changes the policy code to use it. I've also
added an owner field to do reference counting on the module providing
the afinfo even though it isn't strictly necessary today since IPv6
can't be unloaded yet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:31:50 +0000 (21:31 -0700)]
[IPSEC]: Add missing BEET checks
Currently BEET mode does not reinject the packet back into the stack
like tunnel mode does. Since BEET should behave just like tunnel mode
this is incorrect.
This patch fixes this by introducing a flags field to xfrm_mode that
tells the IPsec code whether it should terminate and reinject the packet
back into the stack.
It then sets the flag for BEET and tunnel mode.
I've also added a number of missing BEET checks elsewhere where we check
whether a given mode is a tunnel or not.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:31:12 +0000 (21:31 -0700)]
[IPSEC]: Move type and mode map into xfrm_state.c
The type and mode maps are only used by SAs, not policies. So it makes
sense to move them from xfrm_policy.c into xfrm_state.c. This also allows
us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as
static.
The only other change I've made in the move is to get rid of the casts
on the request_module call for types. They're unnecessary because C
will promote them to ints anyway.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:30:34 +0000 (21:30 -0700)]
[IPSEC]: Fix length check in xfrm_parse_spi
Currently xfrm_parse_spi requires there to be 16 bytes for AH and ESP.
In contrived cases there may not actually be 16 bytes there since the
respective header sizes are less than that (8 and 12 currently).
This patch changes the test to use the actual header length instead of 16.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:30:07 +0000 (21:30 -0700)]
[IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi
Not every transform needs to zap ip_summed. For example, a pure tunnel
mode encapsulation does not affect the hardware checksum at all. In fact,
every algorithm (that needs this) other than AH6 already does its own
ip_summed zapping.
This patch moves the zapping into AH6 which is in line with what IPv4 does.
Possible future optimisation: Checksum the data as we copy them in IPComp.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:29:25 +0000 (21:29 -0700)]
[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
Currently xfrm6_rcv_spi gets the nexthdr value itself from the packet.
This means that we need to fix up the value in case we have a 4-on-6
tunnel. Moving this logic into the caller simplifies things and allows
us to merge the code with IPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:28:53 +0000 (21:28 -0700)]
[IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
This patch moves the tunnel parsing for IPv4 out of xfrm4_input and into
xfrm4_tunnel. This change is in line with what IPv6 does and will allow
us to merge the two input functions.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 18 Oct 2007 04:28:06 +0000 (21:28 -0700)]
[IPSEC]: Fix pure tunnel modes involving IPv6
I noticed that my recent patch broke 6-on-4 pure IPsec tunnels (the ones
that are only used for incompressible IPsec packets). Subsequent reviews
show that I broke 6-on-6 pure tunnels more than three years ago and nobody
ever noticed. I suppose every must be testing 6-on-6 IPComp with large
pings which are very compressible :)
This patch fixes both cases.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:25:32 +0000 (21:25 -0700)]
[IPV6]: Cleanup snmp6_alloc_dev()
This functions is never called with NULL or not setup argument,
so the checks inside are redundant.
Also, the return value is always -ENOMEM, so no need in
additional variable for this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:23:43 +0000 (21:23 -0700)]
[IPV6]: Fix return type for snmp6_free_dev()
This call is essentially void.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:22:42 +0000 (21:22 -0700)]
[NET]: Fix the race between sk_filter_(de|at)tach and sk_clone()
The proposed fix is to delay the reference counter decrement
until the quiescent state pass. This will give sk_clone() a
chance to get the reference on the cloned filter.
Regular sk_filter_uncharge can happen from the sk_free() only
and there's no need in delaying the put - the socket is dead
anyway and is to be release itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:22:17 +0000 (21:22 -0700)]
[NET]: Cleanup the error path in sk_attach_filter
The sk_filter_uncharge is called for error handling and
for releasing the former filter, but this will have to
be done in a bit different manner, so cleanup the error
path a bit.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:21:51 +0000 (21:21 -0700)]
[NET]: Move the filter releasing into a separate call
This is done merely as a preparation for the fix.
The sk_filter_uncharge() unaccounts the filter memory and calls
the sk_filter_release(), which in turn decrements the refcount
anf frees the filter.
The latter function will be required separately.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 04:21:26 +0000 (21:21 -0700)]
[NET]: Introduce the sk_detach_filter() call
Filter is attached in a separate function, so do the
same for filter detaching.
This also removes one variable sock_setsockopt().
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Thu, 18 Oct 2007 04:17:42 +0000 (21:17 -0700)]
[SPARC/64]: Consolidate of_register_driver
Also of_unregister_driver. These will be shortly also used by the
PowerPC code.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Thu, 18 Oct 2007 04:16:16 +0000 (21:16 -0700)]
[MAC80211]: only honor IW_SCAN_THIS_ESSID in STA, IBSS, and AP modes
The previous IW_SCAN_THIS_ESSID patch left a hole allowing scan
requests on interfaces in inappropriate modes.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Oct 2007 04:14:35 +0000 (21:14 -0700)]
Merge branch 'fixes-davem' of /linux/kernel/git/linville/wireless-2.6
Pavel Emelyanov [Thu, 18 Oct 2007 02:48:26 +0000 (19:48 -0700)]
[INET]: Consolidate frag queues freeing
Since we now allocate the queues in inet_fragment.c, we
can safely free it in the same place. The ->destructor
callback thus becomes optional for inet_frags.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:47:56 +0000 (19:47 -0700)]
[INET]: Remove no longer needed ->equal callback
Since this callback is used to check for conflicts in
hashtable when inserting a newly created frag queue, we can
do the same by checking for matching the queue with the
argument, used to create one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:47:21 +0000 (19:47 -0700)]
[INET]: Consolidate xxx_find() in fragment management
Here we need another callback ->match to check whether the
entry found in hash matches the key passed. The key used
is the same as the creation argument for inet_frag_create.
Yet again, this ->match is the same for netfilter and ipv6.
Running a frew steps forward - this callback will later
replace the ->equal one.
Since the inet_frag_find() uses the already consolidated
inet_frag_create() remove the xxx_frag_create from protocol
codes.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:46:47 +0000 (19:46 -0700)]
[INET]: Consolidate xxx_frag_create()
This one uses the xxx_frag_intern() and xxx_frag_alloc()
routines, which are already consolidated, so remove them
from protocol code (as promised).
The ->constructor callback is used to init the rest of
the frag queue and it is the same for netfilter and ipv6.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:45:23 +0000 (19:45 -0700)]
[INET]: Consolidate xxx_frag_alloc()
Just perform the kzalloc() allocation and setup common
fields in the inet_frag_queue(). Then return the result
to the caller to initialize the rest.
The inet_frag_alloc() may return NULL, so check the
return value before doing the container_of(). This looks
ugly, but the xxx_frag_alloc() will be removed soon.
The xxx_expire() timer callbacks are patches,
because the argument is now the inet_frag_queue, not
the protocol specific queue.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:44:34 +0000 (19:44 -0700)]
[INET]: Consolidate xxx_frag_intern
This routine checks for the existence of a given entry
in the hash table and inserts the new one if needed.
The ->equal callback is used to compare two frag_queue-s
together, but this one is temporary and will be removed
later. The netfilter code and the ipv6 one use the same
routine to compare frags.
The inet_frag_intern() always returns non-NULL pointer,
so convert the inet_frag_queue into protocol specific
one (with the container_of) without any checks.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:43:37 +0000 (19:43 -0700)]
[INET]: Omit double hash calculations in xxx_frag_intern
Since the hash value is already calculated in xxx_find, we can
simply use it later. This is already done in netfilter code,
so make the same in ipv4 and ipv6.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Kaehlcke [Thu, 18 Oct 2007 02:40:31 +0000 (19:40 -0700)]
[SPARC] Videopix Frame Grabber: Convert device_lock_sem to mutex
Videopix Frame Grabber: Convert the semaphore device_lock_sem to the
mutex API
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Thu, 18 Oct 2007 02:39:22 +0000 (19:39 -0700)]
[BR2684]: get rid of broken header code.
Recent header_ops change would break the following dead
code in br2684. Maintaining conditonal code in mainline is wrong.
"Do, or do not. There is no 'try.'"
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Thu, 18 Oct 2007 02:38:10 +0000 (19:38 -0700)]
[SPARC]: Support for new termios.
[akpm@linux-foundation.org: coding-style tweaks]
Signed-off-by: David Miller <davem@davemloft.net>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Dike [Thu, 18 Oct 2007 02:35:04 +0000 (19:35 -0700)]
[UMP]: header_ops conversion needed for non-ethernet drivers
UML's two non-ethernet drivers need some header_ops conversion.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ryan Reading [Thu, 18 Oct 2007 02:34:11 +0000 (19:34 -0700)]
[IRDA]: IrCOMM discovery indication simplification
From: Ryan Reading <ryanr23@gmail.com>
Every IrCOMM socket is registered with the discovery subsystem, so we don't
need to loop over all of them for every discovery event. We just need to
do it for the registered IrCOMM socket.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ingo Molnar [Thu, 18 Oct 2007 02:33:06 +0000 (19:33 -0700)]
[DCCP]: fix link error with !CONFIG_SYSCTL
Do not define the sysctl_dccp_sync_ratelimit sysctl variable in the
CONFIG_SYSCTL dependent sysctl.c module - move it to input.c instead.
This fixes the following build bug:
net/built-in.o: In function `dccp_check_seqno':
input.c:(.text+0xbd859): undefined reference to `sysctl_dccp_sync_ratelimit'
distcc[29953] ERROR: compile (null) on localhost failed
make: *** [vmlinux] Error 1
Found via 'make randconfig' build testing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Thu, 18 Oct 2007 02:30:40 +0000 (19:30 -0700)]
[IPV6]: Fix memory leak in cleanup_ipv6_mibs()
The icmpv6msg mib statistics is not freed.
This is almost not critical for current kernel, since ipv6
module is unloadable, but this can happen on load error and
will happen every time we stop the network namespace (when
we have one, of course).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 18 Oct 2007 02:26:28 +0000 (19:26 -0700)]
[BNX2]: Update version to 1.6.8.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 18 Oct 2007 02:26:15 +0000 (19:26 -0700)]
[BNX2]: Fix Serdes WoL bug.
The bug is in the code in bnx2_set_power_state() that assumes copper
devices when setting up WoL. This is no longer true after adding WoL
support for Serdes devices.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 18 Oct 2007 02:25:27 +0000 (19:25 -0700)]
[BNX2]: Update 5709 firmware to 3.7.1.
This firmware update fixes a problem running with IPMI management
firmware.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sonic Zhang [Tue, 16 Oct 2007 08:43:27 +0000 (16:43 +0800)]
Update libata driver for bf548 atapi controller against the 2.6.24 tree.
Changes:
1. Remove irq_ack() and port_disable() methods
2. Acocomodate for the libata-link patches
3. Change Kconfig ATAPI mode option into a module param.
4. Add supported WMDMA mode.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Mon, 15 Oct 2007 18:25:29 +0000 (19:25 +0100)]
libata-sff: Correct use of check_status()
ata_check_status() does an SFF compliant check
ata_chk_status() does a generic call to ap->ops->check_status (usually
ata_check_status)
libata-sff uses the wrong one. Hardly suprising given the naming here,
which ought to get fixed to ata_sff_check_status() perhaps ?
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Li Yang [Tue, 16 Oct 2007 12:58:38 +0000 (20:58 +0800)]
drivers/ata: add support to Freescale 3.0Gbps SATA Controller
This patch adds support for Freescale 3.0Gbps SATA Controller supporting
Native Command Queueing(NCQ), device hotplug, and ATAPI. This controller
can be found on MPC8315 and MPC8378.
Signed-off-by: Ashish Kalra <ashish.kalra@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 17 Oct 2007 06:24:16 +0000 (15:24 +0900)]
pata_acpi: fix build breakage if !CONFIG_PM
There are configurations where CONFIG_ACPI but !CONFIG_PM. In this
case, pata_acpi can be selected but won't build. Fix it.
Reported by Avuton Olrich.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Avuton Olrich <avuton@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Don Fry [Thu, 18 Oct 2007 00:06:19 +0000 (17:06 -0700)]
pcnet32: remove private net_device_stats structure
Remove the statistics from the private structure.
Use the net_device_stats in netn_device structure.
Following Jeff Garzik's massive cleanup Sep 01.
pcnet32 was not "low-hanging fruit".
Tested x86_64.
Signed-off-by: Don Fry <pcnet32@verizon.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Badari Pulavarty [Wed, 17 Oct 2007 23:15:56 +0000 (16:15 -0700)]
vortex_up should initialize "err"
Simple compile warning fix. (against 2.6.23-git12)
Thanks,
Badari
vortex_up() should initialize 'err' for a successful return.
drivers/net/3c59x.c: In function `vortex_up':
drivers/net/3c59x.c:1494: warning: `err' might be used uninitialized in this function
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Don Fry [Wed, 17 Oct 2007 23:10:10 +0000 (16:10 -0700)]
pcnet32: remove compile warnings in non-napi mode
Remove compile warning when in non-napi mode.
Signed-off-by: Don Fry <pcnet32@verizon.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Don Fry [Wed, 17 Oct 2007 22:59:22 +0000 (15:59 -0700)]
pcnet32: fix non-napi packet reception
Recent changes to the driver for the new napi API broke the reception
of packets when in non-napi mode. The initialization of napi.weight
was removed for the non-napi case leaving the value zero.
Tested NAPI and non-NAPI on x86_64.
Signed-off-by: Don Fry <pcnet32@verizon.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Benjamin Herrenschmidt [Wed, 17 Oct 2007 23:14:03 +0000 (09:14 +1000)]
fix EMAC driver for proper napi_synchronize API
The EMAC driver "fix" was merged by mistake before the dust had settled on
the new napi synchronize interface (and before it got merged). The final
version of that function is spelled without underscores.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Wed, 17 Oct 2007 20:26:42 +0000 (13:26 -0700)]
sky2: shutdown cleanup
Solve issues with dual port devices due to shared NAPI.
* shutting down one device shouldn't kill other one.
* suspend shouldn't hang.
Also fix potential race between restart and shutdown.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Wed, 17 Oct 2007 20:26:41 +0000 (13:26 -0700)]
napi_synchronize: waiting for NAPI
Some drivers with shared NAPI need a synchronization barrier.
Also suggested by Benjamin Herrenschmidt for EMAC.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Manfred Spraul [Wed, 17 Oct 2007 19:52:33 +0000 (21:52 +0200)]
forcedeth msi bugfix
pci_enable_msi() replaces the INTx irq number in pci_dev->irq with the
new MSI irq number.
The forcedeth driver did not update the copy in netdevice->irq and
parts of the driver used the stale copy.
See bugzilla.kernel.org, bug 9047.
The patch
- updates netdevice->irq
- replaces all accesses to netdevice->irq with pci_dev->irq.
The patch is against 2.6.23.1. IMHO suitable for both 2.6.23 and 2.6.24
Signed-Off-By: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Anton Vorontsov [Wed, 17 Oct 2007 19:57:46 +0000 (23:57 +0400)]
gianfar: fix obviously wrong #ifdef CONFIG_GFAR_NAPI placement
Erroneous #ifdef introduced by
293c8513398657f6263fcdb03c87f2760cf61be4
causing NAPI-less ethernet malfunctioning.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Scott Wood [Wed, 17 Oct 2007 17:42:43 +0000 (12:42 -0500)]
fs_enet: Update for API changes
This driver was recently broken by several changes for which this
driver was not (or was improperly) updated:
1. SET_MODULE_OWNER() was removed.
2. netif_napi_add() was only being called when building with
the old CPM binding.
3. The received/budget test was backwards.
4. to_net_dev() was wrong -- the device struct embedded in
the net_device struct is not the same as the of_platform
device in the private struct.
5. napi_disable/napi_enable was being called even when napi
was not being used.
These changes have been fixed, and napi is now on by default.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Sebastian Siewior [Wed, 17 Oct 2007 08:52:22 +0000 (10:52 +0200)]
gianfar: remove orphan struct.
struct net_device_stats is no longer used in driver's private
struct but in struct net_device.
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Ingo Molnar [Wed, 17 Oct 2007 10:18:23 +0000 (12:18 +0200)]
forcedeth: fix rx-work condition in nv_rx_process_optimized() too
The merge of my previous fix to forcedeth.c,
bcb5febb248f7cc1e4a39ff61507f6343ba1c594, lost an important hunk.
We need to fix nv_rx_process_optimized() too, as it contains duplicate logic.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
David S. Miller [Tue, 16 Oct 2007 03:45:32 +0000 (20:45 -0700)]
[SPARC64]: Check of_get_property() return in pci_determine_mem_io_space().
If the PCI controller lacks the 'ranges' property nothing
is going to work.
Noticed by Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Oct 2007 23:43:19 +0000 (16:43 -0700)]
[SPARC64]: Fix boot failures due to bootmem.
Do not use *alloc_bootmem_low*(), because ARCH_LOW_ADDRESS_LIMIT
is 4GB and this results in boot failures if all of the physical
memory in the machine is above 4GB.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 15 Oct 2007 23:41:44 +0000 (16:41 -0700)]
[SPARC64]: Implement atomic backoff.
When the cpu count is high and contention hits an atomic object, the
processors can synchronize such that some cpus continually get knocked
out and cannot complete the atomic update.
So implement an exponential backoff when SMP.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: lighten up resize transaction requirements
When resizing online, setup_new_group_blocks attempts to reserve a
potentially very large transaction, depending on the current filesystem
geometry. For some journal sizes, there may not be enough room for this
transaction, and the online resize will fail.
The patch below resizes & restarts the transaction as necessary while
setting up the new group, and should work with even the smallest journal.
Tested with something like:
[root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768
[root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384
[root@newbox ~]# mount -o loop fsfile mnt/
[root@newbox ~]# resize2fs /dev/loop0
resize2fs 1.40.2 (12-Jul-2007)
Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks.
resize2fs: No space left on device While trying to add group #2
[root@newbox ~]# dmesg | tail -n 1
JBD: resize2fs wants too many credits (258 > 256)
[root@newbox ~]#
With the below change, it works.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
Eric Sandeen [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: fix setup_new_group_blocks locking
setup_new_group_blocks() manipulates the group descriptor block bh
under the block_bitmap bh's lock. It shouldn't matter since nobody
but resize should be touching these blocks, but it's worth fixing up.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: sparse fixes
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert ext4_extent_idx.ei_leaf to ext4_extent_idx.ei_leaf_lo
Convert ext4_extent_idx.ei_leaf ext4_extent_idx.ei_leaf_lo
This helps in finding BUGs due to direct partial access of
these split 48 bit values.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert ext4_extent.ee_start to ext4_extent.ee_start_lo
Convert ext4_extent.ee_start to ext4_extent.ee_start_lo
This helps in finding BUGs due to direct partial access of
these split 48 bit values
Also fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert s_r_blocks_count and s_free_blocks_count
Convert s_r_blocks_count and s_free_blocks_count to
s_r_blocks_count_lo and s_free_blocks_count_lo
This helps in finding BUGs due to direct partial access of
these split 64 bit values
Also fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert s_blocks_count to s_blocks_count_lo
Convert s_blocks_count to s_blocks_count_lo
This helps in finding BUGs due to direct partial access of
these split 64 bit values
Also fix direct partial access in ext4 code
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert bg_inode_bitmap and bg_inode_table
Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
and bg_inode_table_lo. This helps in finding BUGs due to
direct partial access of these split 64 bit values
Also fix one direct partial access
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Convert bg_block_bitmap to bg_block_bitmap_lo
Convert bg_block_bitmap to bg_block_bitmap_lo
This helps in catching some BUGS due to direct
partial access of these split fields.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Jose R. Santos [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: FLEX_BG Kernel support v2.
This feature relaxes check restrictions on where each block groups meta
data is located within the storage media. This allows for the allocation
of bitmaps or inode tables outside the block group boundaries in cases
where bad blocks forces us to look for new blocks which the owning block
group can not satisfy. This will also allow for new meta-data allocation
schemes to improve performance and scalability.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Aneesh Kumar K.V [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
ext4: Fix sparse warnings
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andreas Dilger [Tue, 16 Oct 2007 22:38:25 +0000 (18:38 -0400)]
Ext4: Uninitialized Block Groups
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use. This is this the most time consuming part
of the filesystem check. The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.
With this feature, there is a a high water mark of used inodes for each block
group. Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time. A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.
The feature is enabled through a mkfs option
mke2fs /dev/ -O uninit_groups
A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.
The patches have been stress tested with fsstress and fsx. In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem. In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users. With performance improvement of 2-20
times, depending on how full the filesystem is.
The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.
In each group descriptor if we have
EXT4_BG_INODE_UNINIT set in bg_flags:
Inode table is not initialized/used in this group. So we can skip
the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
No block in the group is used. So we can skip the block bitmap
verification for this group.
We also add two new fields to group descriptor as a part of
uninitialized group patch.
__le16 bg_itable_unused; /* Unused inodes count */
__le16 bg_checksum; /* crc16(sb_uuid+group+desc) */
bg_itable_unused:
If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.
bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>