Brian King [Thu, 23 Mar 2006 23:30:08 +0000 (17:30 -0600)]
[PATCH] libata: ata_scsi_ioctl cleanup
In preparation for SAS, kill some unnecessary code in ata_scsi_ioctl
to find the ATA port and device given the scsi_device. Neither local
is used in the function.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Brian King [Thu, 23 Mar 2006 23:30:02 +0000 (17:30 -0600)]
[PATCH] libata: ata_scsi_queuecmd cleanup
Encapsulate part of ata_scsi_queuecmd so that it can be
reused by future SAS patches.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jeff Garzik [Fri, 24 Mar 2006 14:56:57 +0000 (09:56 -0500)]
[libata] export ata_dev_pair; trim trailing whitespace
Mostly, trim trailing whitespace.
Also:
* export ata_dev_pair
* move ata_dev_classify export closer to ata_dev_pair export
Alan Cox [Thu, 23 Mar 2006 15:38:34 +0000 (15:38 +0000)]
[PATCH] libata: add ata_dev_pair helper
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Nigel Cunningham [Thu, 23 Mar 2006 13:22:16 +0000 (23:22 +1000)]
[PATCH] Make libata not powerdown drivers on PM_EVENT_FREEZE.
At the moment libata doesn't pass pm_message_t down ata_device_suspend.
This causes drives to be powered down when we just want a freeze,
causing unnecessary wear and tear. This patch gets pm_message_t passed
down so that it can be used to determine whether to power down the
drive.
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
drivers/scsi/libata-core.c | 5 +++--
drivers/scsi/libata-scsi.c | 4 ++--
drivers/scsi/scsi_sysfs.c | 2 +-
include/linux/libata.h | 4 ++--
include/scsi/scsi_host.h | 2 +-
5 files changed, 9 insertions(+), 8 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 24 Mar 2006 06:25:31 +0000 (15:25 +0900)]
[PATCH] libata: make ata_set_mode() responsible for failure handling
Make ata_set_mode() responsible for determining whether to take port
or device offline on failure. ata_dev_set_xfermode() and
ata_dev_set_mode() indicate error to the caller instead of disabling
port directly on failure. Also, for consistency, ata_dev_present()
check is done in ata_set_mode() instead of ata_dev_set_mode().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 24 Mar 2006 06:25:31 +0000 (15:25 +0900)]
[PATCH] libata: use ata_dev_disable() in ata_bus_probe()
We may or may not disable a device after ata_dev_configure() fails.
Kill 'not supported, ignoring' message in ata_dev_configure() and use
ata_dev_disable() in ata_bus_probe().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 24 Mar 2006 06:25:31 +0000 (15:25 +0900)]
[PATCH] libata: implement ata_dev_disable()
This patch implements ata_dev_disable() which prints a warning message
and takes @dev offline. Currently, this is done by explicitly
incrementing dev->class with case-by-case warning messages. Giving
user clear indication when libata gives up will be more important as
libata will be doing more retries.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 24 Mar 2006 06:25:30 +0000 (15:25 +0900)]
[PATCH] libata: check if port is disabled after internal command
libata core is being changed to disallow port/device disable on lower
layers. However, some LLDDs (sata_mv) directly disable port on
command failure. This patch makes ata_exec_internal() check whether a
port got disabled after an internal command. If it is, AC_ERR_SYSTEM
is added to err_mask and the port gets re-enabled.
As internal command failure results in device disable for drivers
which don't implement newer reset/EH callbacks, this change results in
no behavior change for single device per port controllers. For
slave-possible LLDDs which disable port on command failure, (1) such
drivers don't exist currently, (2) issuing command to the other device
of once-disabled port shouldn't result in catastrophe even if such
driver exists. So, this should be enough as a temporary measure.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 24 Mar 2006 05:07:50 +0000 (14:07 +0900)]
[PATCH] libata: make per-dev transfer mode limits per-dev
Now that each ata_device has xfer masks, per-dev limits can be made
per-dev instead of per-port. Make per-dev limits per-dev.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 24 Mar 2006 05:07:50 +0000 (14:07 +0900)]
[PATCH] libata: add per-dev pio/mwdma/udma_mask
Add per-dev pio/mwdma/udma_mask. All transfer mode limits used to be
applied to ap->*_mask which unnecessarily restricted other devices
sharing the port. This change will also benefit later EH speed down
and hotplug.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Fri, 24 Mar 2006 05:07:49 +0000 (14:07 +0900)]
[PATCH] libata: implement ata_unpack_xfermask()
Implement ata_unpack_xfermask().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jeff Garzik [Thu, 23 Mar 2006 05:32:03 +0000 (00:32 -0500)]
[libata] Move some bmdma-specific code to libata-bmdma.c
No code changes, just moving code between files.
Jeff Garzik [Thu, 23 Mar 2006 05:14:36 +0000 (00:14 -0500)]
[libata sata_uli] kill scr_addr abuse
sata_uli was storing PCI config addresses in a variable intended for
port addresses, a variable soon to become void __iomem *.
Update the driver to store the SCR address, found in PCI config space,
in the driver-private data area.
Jeff Garzik [Thu, 23 Mar 2006 04:59:46 +0000 (23:59 -0500)]
[libata sata_nv] eliminate duplicate codepaths with iomap
eliminate a bunch of
if (mmio)
writel()
else
outl()
code with the pci_iomap() and io{read,write}{8,16,32}() interface.
Jeff Garzik [Thu, 23 Mar 2006 04:50:50 +0000 (23:50 -0500)]
[libata sata_nv] cleanups: convert #defines to enums; remove in-file history
Jeff Garzik [Thu, 23 Mar 2006 04:30:34 +0000 (23:30 -0500)]
[libata sata_sil24] cleanups: use pci_iomap(), kzalloc()
* libata will soon move to iomap, so we should use
pci_iomap() and pci_iounmap().
* Use kzalloc() where appropriate.
Linus Torvalds [Thu, 23 Mar 2006 01:51:31 +0000 (17:51 -0800)]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
[PATCH] Use of uninitialized variable in drivers/net/depca.c
[PATCH] Use after free in net/tulip/de2104x.c
[PATCH] sis900 adm7001 PHY support
[PATCH] sky2: more ethtool stats
[PATCH] s390: qeth :allow setting of attribute "route6" to "no_router".
[PATCH] s390: qeth driver cleanups
[PATCH] s390: qeth driver statistics fixes
[PATCH] AMD Au1xx0: fix Ethernet TX stats
[PATCH] fix spidernet build issue
Linus Torvalds [Thu, 23 Mar 2006 01:39:38 +0000 (17:39 -0800)]
scsi: link in the debug driver last
If the debug driver is built-in, link it in last, so that any real
drivers will probe first, rather than having the debug driver pick the
first scsi slots..
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Thu, 23 Mar 2006 01:36:04 +0000 (17:36 -0800)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TCP]: Do not use inet->id of global tcp_socket when sending RST.
[NETFILTER]: Fix undefined references to get_h225_addr
[NETFILTER]: futher {ip,ip6,arp}_tables unification
[NETFILTER]: Fix xt_policy address matching
[NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
[NETFILTER]: x_tables: set the protocol family in x_tables targets/matches
[NETFILTER]: conntrack: cleanup the conntrack ID initialization
[NETFILTER]: nfnetlink_queue: fix nfnetlink message size
[NETFILTER]: ctnetlink: Fix expectaction mask dumping
[NETFILTER]: Fix Kconfig typos
[NETFILTER]: Fix ip6tables breakage from {get,set}sockopt compat layer
Linus Torvalds [Thu, 23 Mar 2006 01:33:12 +0000 (17:33 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-serial
* master.kernel.org:/home/rmk/linux-2.6-serial:
[SERIAL] Merge avlab serial board entries in parport_serial
[SERIAL] kernel console should send CRLF not LFCR
Linus Torvalds [Thu, 23 Mar 2006 01:32:09 +0000 (17:32 -0800)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm: (45 commits)
[ARM] 3389/1: typo and grammar fix
[ARM] 3386/1: AT91RM9200 Clock update
[ARM] 3384/1: AT91RM9200: Timer
[ARM] 3382/1: ixp2000: unify defconfigs
[ARM] 3381/1: ixp2000: fix slowport write timing control register fields
[ARM] 3380/1: ixp2000: simplify ixdp2x00_master_npu() check
[ARM] 3379/1: ixp2000: use generic 8250 debug macros
[ARM] 3378/1: ixp2000: fix gpio interrupt handling
[ARM] Quieten spurious IRQ detection
[ARM] Use kcalloc to allocate counter_config array rather than kmalloc
[ARM] Oprofile: dynamically allocate counter_config
[ARM] Oprofile: Convert semaphore to mutex
[ARM] 3376/2: S3C2410 - update defconfig
[ARM] 3375/1: S3C2440 - fix osiris machine build
[ARM] 3374/1: ep93xx: gpio interrupt support
[ARM] 3361/1: S3C24XX - add USB bus clock source
[ARM] 3360/1: S3C2440 - add set rate methods and camera clock
[ARM] 3359/1: S3C24XX - add support for clk_set_rate
[ARM] Convert kmalloc+memset to kzalloc
[ARM] 3373/1: move uengine loader to arch/arm/common
...
Eric Sesterhenn [Wed, 22 Mar 2006 21:49:48 +0000 (22:49 +0100)]
[PATCH] Use of uninitialized variable in drivers/net/depca.c
hi,
this fixes coverity bug #888, where the variable
dev is used uninitialized. I assume the programmer
meant to use mdev, which is initialized.
Compile tested only.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Eric Sesterhenn [Wed, 22 Mar 2006 21:30:34 +0000 (22:30 +0100)]
[PATCH] Use after free in net/tulip/de2104x.c
hi,
this fixes coverity bug #912, where skb is freed first,
and dereferenced a few lines later with skb->len.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Artur Skawina [Tue, 21 Mar 2006 21:04:36 +0000 (22:04 +0100)]
[PATCH] sis900 adm7001 PHY support
this patch is required to get a SIS964 based motherboard ethernet working (FSC D1875)
(picking the #1 transceiver, instead of the last one, in case no known ones were found
might be a better default, and would have worked in this case too)
Signed-off-by: Artur Skawina <art_k@o2.pl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Stephen Hemminger [Wed, 22 Mar 2006 18:38:45 +0000 (10:38 -0800)]
[PATCH] sky2: more ethtool stats
Expose all the available hardware statistics via ethtool.
And cleanup some of the statistics definitions.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Frank Pavlic [Wed, 22 Mar 2006 15:03:44 +0000 (16:03 +0100)]
[PATCH] s390: qeth :allow setting of attribute "route6" to "no_router".
[patch 4/6] s390: qeth :allow setting of attribute "route6" to "no_router".
From: Ursula Braun <braunu@de.ibm.com>
when setting route6 attribute back to no_router qeth does not
issue an IP ASSIST command to reset router value to no_router.
Once primary_router is set device stays in this mode.
Issue an IP ASSIST command when no_router is set in route6.
Device will be reset and thus will not longer run as a primary
router.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
diffstat:
qeth_main.c | 5 -----
1 files changed, 5 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Frank Pavlic [Wed, 22 Mar 2006 15:03:41 +0000 (16:03 +0100)]
[PATCH] s390: qeth driver cleanups
[patch 3/6] s390: qeth driver cleanups
From: Ursula Braun <braunu@de.ibm.com>
- code analyzing tool BEAM has found some unreachable
and unnecessary statements and also conditions
which are always true.
- removed some useless MII code since OSA card will never
allow to set such values.
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
diffstat:
qeth_main.c | 49 ++++---------------------------------------------
qeth_proc.c | 18 +++++++++---------
qeth_sys.c | 2 +-
3 files changed, 14 insertions(+), 55 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Frank Pavlic [Wed, 22 Mar 2006 15:03:39 +0000 (16:03 +0100)]
[PATCH] s390: qeth driver statistics fixes
[patch 2/6] s390: qeth driver statistics fixes
From: Ursula Braun <braunu@de.ibm.com>
- display "unsigned int" values in /proc/qeth_perf with %u instead of %i
- omit qdio header length when increasing card->stats.tx_bytes
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
diffstat:
qeth_main.c | 3 ++-
qeth_proc.c | 38 +++++++++++++++++++-------------------
2 files changed, 21 insertions(+), 20 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Sergei Shtylylov [Wed, 22 Mar 2006 06:53:52 +0000 (22:53 -0800)]
[PATCH] AMD Au1xx0: fix Ethernet TX stats
With Au1xx0 Ethernet driver, TX bytes/packets always remain zero. The
problem seems to be that when packet has been transmitted, the length word
in DMA buffer is zero.
The patch updates the TX stats when a buffer is fed to DMA. The initial
2.4 patch was posted to linux-mips@linux-mips.org by Thomas Lange 21 Jan
2005.
Signed-off-by: Thomas Lange <thomas@corelatus.se>
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jens Osterkamp [Wed, 22 Mar 2006 06:53:47 +0000 (22:53 -0800)]
[PATCH] fix spidernet build issue
<unchangelogged>
Signed-off-by: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 22 Mar 2006 12:07:03 +0000 (21:07 +0900)]
[PATCH] ahci: add softreset
Now that libata is smart enought to handle both soft and hard resets,
add softreset method.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Wed, 22 Mar 2006 11:48:18 +0000 (20:48 +0900)]
[PATCH] libata: do not ignore PIO-only devices
As libata now can do PIO, don't ignore PIO-only devices.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Wed, 22 Mar 2006 15:55:54 +0000 (15:55 +0000)]
[PATCH] libata: Symbol exports
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Wed, 22 Mar 2006 15:54:04 +0000 (15:54 +0000)]
[PATCH] Update libata DMA blacklist to cover versions, and resync with IDE layer
Not much to say here except that some drives have fixed and bad firmware
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Wed, 22 Mar 2006 15:52:40 +0000 (15:52 +0000)]
[PATCH] libata: Fix a drive detection problem
The current code follows the spec but uses an overlong delay. This would
be great if the hardware did. Several vendors however forget the D7
pulldown. Fortunately 0xFF isnt a sane reset state so we can use it to
skip detection as is done in drivers/ide. (ie this is a tested solution
over a long time)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Alan Cox [Wed, 22 Mar 2006 15:47:34 +0000 (15:47 +0000)]
[PATCH] libata: note missing posting in mmio cmd write
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Jeff Garzik [Thu, 23 Mar 2006 00:13:54 +0000 (19:13 -0500)]
Merge branch 'master'
Alexey Kuznetsov [Wed, 22 Mar 2006 22:27:59 +0000 (14:27 -0800)]
[TCP]: Do not use inet->id of global tcp_socket when sending RST.
The problem is in ip_push_pending_frames(), which uses:
if (!df) {
__ip_select_ident(iph, &rt->u.dst, 0);
} else {
iph->id = htons(inet->id++);
}
instead of ip_select_ident().
Right now I think the code is a nonsense. Most likely, I copied it from
old ip_build_xmit(), where it was really special, we had to decide
whether to generate unique ID when generating the first (well, the last)
fragment.
In ip_push_pending_frames() it does not make sense, it should use plain
ip_select_ident() instead.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 22 Mar 2006 21:57:25 +0000 (13:57 -0800)]
[NETFILTER]: Fix undefined references to get_h225_addr
get_h225_addr is exported, but declared static, which fails when
linking statically.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Mishin [Wed, 22 Mar 2006 21:56:56 +0000 (13:56 -0800)]
[NETFILTER]: futher {ip,ip6,arp}_tables unification
This patch moves {ip,ip6,arp}t_entry_{match,target} definitions to
x_tables.h. This move simplifies code and future compatibility fixes.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 22 Mar 2006 21:56:33 +0000 (13:56 -0800)]
[NETFILTER]: Fix xt_policy address matching
Fix missing inversion in address matching, it was broken during the
conversion to x_tables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 22 Mar 2006 21:56:08 +0000 (13:56 -0800)]
[NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
x_tables matches and targets that require nf_conntrack_ipv[4|6] to work
don't have enough information to load on demand these modules. This
patch introduces the following changes to solve this issue:
o nf_ct_l3proto_try_module_get: try to load the layer 3 connection
tracker module and increases the refcount.
o nf_ct_l3proto_module put: drop the refcount of the module.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 22 Mar 2006 21:55:40 +0000 (13:55 -0800)]
[NETFILTER]: x_tables: set the protocol family in x_tables targets/matches
Set the family field in xt_[matches|targets] registered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 22 Mar 2006 21:55:11 +0000 (13:55 -0800)]
[NETFILTER]: conntrack: cleanup the conntrack ID initialization
Currently the first conntrack ID assigned is 2, use 1 instead.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 22 Mar 2006 21:54:40 +0000 (13:54 -0800)]
[NETFILTER]: nfnetlink_queue: fix nfnetlink message size
Fix oversized message, use NLMSG_SPACE just one since it reserves space
for the netlink header and NFA_SPACE for every attribute.
Thanks to Harald Welte for the feedback
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso [Wed, 22 Mar 2006 21:54:15 +0000 (13:54 -0800)]
[NETFILTER]: ctnetlink: Fix expectaction mask dumping
The expectation mask has some particularities that requires a different
handling. The protocol number fields can be set to non-valid protocols,
ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask
tuple will not be dumped. Moreover, this results in a kernel panic when
nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F)
long.
This patch introduces the function ctnetlink_exp_dump_mask, that correctly
dumps the expectation mask. Such function uses the l3num value from the
expectation tuple that is a valid layer 3 protocol number. The value of the
l3num mask isn't dumped since it is meaningless from the userspace side.
Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Vögtle [Wed, 22 Mar 2006 21:53:48 +0000 (13:53 -0800)]
[NETFILTER]: Fix Kconfig typos
Signed-off-by: Thomas Vögtle <tv@lio96.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 22 Mar 2006 21:53:20 +0000 (13:53 -0800)]
[NETFILTER]: Fix ip6tables breakage from {get,set}sockopt compat layer
do_ipv6_getsockopt returns -EINVAL for unknown options, not
-ENOPROTOOPT as do_ipv6_setsockopt.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hovland [Wed, 22 Mar 2006 21:02:11 +0000 (21:02 +0000)]
[ARM] 3389/1: typo and grammar fix
Patch from Erik Hovland
I found a typo and what seems to be a run-on sentence in
arch/arm/common/dmabounce.c
This patch corrects both.
Signed-off-by: Erik Hovland <erik@hovland.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andrew Victor [Wed, 22 Mar 2006 20:14:14 +0000 (20:14 +0000)]
[ARM] 3386/1: AT91RM9200 Clock update
Patch from Andrew Victor
This patch includes a few changes to the clock support on the
AT91RM9200.
1. Added definitions for Ethernet, MMC, TWI, USARTs, and SPI peripheral
clocks.
2. Replaced some hard-coded hex values with the text definitions in
at91rm9200_sys.h.
3. If the USB96M bit is set for PLLB, then the rate of PLLB is not
affected but only the USB Host/Device clocks which are derived from it.
Issue reported by Sergei Sharonov.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andrew Victor [Wed, 22 Mar 2006 20:14:13 +0000 (20:14 +0000)]
[ARM] 3384/1: AT91RM9200: Timer
Patch from Andrew Victor
If the timer interrupt is ever significantly delayed (or after the
system was suspended), the system could spin incrementing the time for
too long.
The fix is to replace the "do {} while" with a "while {}".
Orignal patch by Savin Zlobec and Peter Menzebach.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Lennert Buytenhek [Wed, 22 Mar 2006 20:14:12 +0000 (20:14 +0000)]
[ARM] 3382/1: ixp2000: unify defconfigs
Patch from Lennert Buytenhek
Unify the five existing ixp2000 defconfigs into one defconfig.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Lennert Buytenhek [Wed, 22 Mar 2006 20:14:11 +0000 (20:14 +0000)]
[ARM] 3381/1: ixp2000: fix slowport write timing control register fields
Patch from Lennert Buytenhek
The original version of the chip docs had the PW and SU fields in
the slowport write timing control register accidentally reversed.
This is mentioned in the errata (documentation change #4) and fixed
in newer docs.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Lennert Buytenhek [Wed, 22 Mar 2006 20:14:11 +0000 (20:14 +0000)]
[ARM] 3380/1: ixp2000: simplify ixdp2x00_master_npu() check
Patch from Lennert Buytenhek
On the IXDP2x00s, the NPU that is PCI master is always the egress
(i.e. 'master') NPU. At least on the IXDP2800, both NPUs have flash,
so the ixp2000_has_flash() check in ixdp2x00_master_npu() is useless.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Lennert Buytenhek [Wed, 22 Mar 2006 20:14:09 +0000 (20:14 +0000)]
[ARM] 3379/1: ixp2000: use generic 8250 debug macros
Patch from Lennert Buytenhek
The xscale UART in the ixp2000 is basically just an 8250 UART (with
some extra bits and pieces), so we can use the generic 8250 debug
macros on the ixp2000.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Lennert Buytenhek [Wed, 22 Mar 2006 20:14:09 +0000 (20:14 +0000)]
[ARM] 3378/1: ixp2000: fix gpio interrupt handling
Patch from Lennert Buytenhek
ixp2000 used to initially mark GPIO interrupts as invalid, and not
mark them valid until set_irq_type() was called, but this doesn't
work if you want to use request_irq() with the SA_TRIGGER_* flags.
So, just mark the GPIO interrupts valid from the beginning. We
configure GPIOs as inputs when set_irq_type() is called anyway, so
this shouldn't be a problem.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Wed, 22 Mar 2006 18:59:20 +0000 (10:59 -0800)]
Merge git://git./linux/kernel/git/perex/alsa
* git://git.kernel.org/pub/scm/linux/kernel/git/perex/alsa: (124 commits)
[ALSA] version 1.0.11rc4
[PATCH] Intruduce DMA_28BIT_MASK
[ALSA] hda-codec - Add support for ASUS P4GPL-X
[ALSA] hda-codec - Add support for HP nx9420 laptop
[ALSA] Fix memory leaks in error path of control.c
[ALSA] AMD Au1x00: AC'97 controller is memory mapped
[ALSA] AMD Au1x00: fix DMA init/cleanup
[ALSA] hda-codec - Fix generic auto-configurator
[ALSA] hda-codec - Fix BIOS auto-configuration
[ALSA] Fixes typos in Audiophile-USB.txt
[ALSA] ice1712 - typo fixes for dxr_enable module option
[ALSA] AMD Au1x00: make driver build after cleanup
[ALSA] ice1712 - Fix wrong value types for enum items
[ALSA] fix resource leak in usbmixer
[ALSA] Fix gus_pcm dereference before NULL
[ALSA] Fix seq_clientmgr dereferences before NULL check
[ALSA] hda-codec - Fix for Samsung R65 and ASUS A6J
[ALSA] hda-codec - Add support for VAIO FE550G and SZ110
[ALSA] usb-audio: add Maya44 mixer control names
[ALSA] usb-audio: add Casio PL-40R support
...
Linus Torvalds [Wed, 22 Mar 2006 18:58:05 +0000 (10:58 -0800)]
Merge git://git./linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
fixed path to moved file in include/linux/device.h
Fix spelling in E1000_DISABLE_PACKET_SPLIT Kconfig description
Documentation/dvb/get_dvb_firmware: fix firmware URL
Documentation: Update to BUG-HUNTING
Remove superfluous NOTIFY_COOKIE_LEN define
add "tags" to .gitignore
Fix "frist", "fisrt", typos
fix rwlock usage example
It's UTF-8
Linus Torvalds [Wed, 22 Mar 2006 18:56:57 +0000 (10:56 -0800)]
Merge /pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Add a secondary TSB for hugepage mappings.
[SPARC]: Respect vm_page_prot in io_remap_page_range().
Linus Torvalds [Wed, 22 Mar 2006 18:56:23 +0000 (10:56 -0800)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[TG3]: Bump driver version and reldate.
[TG3]: Skip phy power down on some devices
[TG3]: Fix SRAM access during tg3_init_one()
[X25]: dte facilities 32 64 ioctl conversion
[X25]: allow ITU-T DTE facilities for x25
[X25]: fix kernel error message 64 bit kernel
[X25]: ioctl conversion 32 bit user to 64 bit kernel
[NET]: socket timestamp 32 bit handler for 64 bit kernel
[NET]: allow 32 bit socket ioctl in 64 bit kernel
[BLUETOOTH]: Return negative error constant
Linus Torvalds [Wed, 22 Mar 2006 18:47:24 +0000 (10:47 -0800)]
Merge /pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (138 commits)
[SCSI] libata: implement minimal transport template for ->eh_timed_out
[SCSI] eliminate rphy allocation in favour of expander/end device allocation
[SCSI] convert mptsas over to end_device/expander allocations
[SCSI] allow displaying and setting of cache type via sysfs
[SCSI] add scsi_mode_select to scsi_lib.c
[SCSI] 3ware 9000 add big endian support
[SCSI] qla2xxx: update MAINTAINERS
[SCSI] scsi: move target_destroy call
[SCSI] fusion - bump version
[SCSI] fusion - expander hotplug suport in mptsas module
[SCSI] fusion - exposing raid components in mptsas
[SCSI] fusion - memory leak, and initializing fields
[SCSI] fusion - exclosure misspelled
[SCSI] fusion - cleanup mptsas event handling functions
[SCSI] fusion - removing target_id/bus_id from the VirtDevice structure
[SCSI] fusion - static fix's
[SCSI] fusion - move some debug firmware event debug msgs to verbose level
[SCSI] fusion - loginfo header update
[SCSI] add scsi_reprobe_device
[SCSI] megaraid_sas: fix extended timeout handling
...
James Morris [Wed, 22 Mar 2006 08:09:22 +0000 (00:09 -0800)]
[PATCH] SELinux: add slab cache for inode security struct
Add a slab cache for the SELinux inode security struct, one of which is
allocated for every inode instantiated by the system.
The memory savings are considerable.
On 64-bit, instead of the size-128 cache, we have a slab object of 96
bytes, saving 32 bytes per object. After booting, I see about 4000 of
these and then about 17,000 after a kernel compile. With this patch, we
save around 530KB of kernel memory in the latter case. On 32-bit, the
savings are about half of this.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Wed, 22 Mar 2006 08:09:21 +0000 (00:09 -0800)]
[PATCH] SELinux: cleanup stray variable in selinux_inode_init_security()
Remove an unneded pointer variable in selinux_inode_init_security().
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Wed, 22 Mar 2006 08:09:20 +0000 (00:09 -0800)]
[PATCH] SELinux: fix hard link count for selinuxfs root directory
A further fix is needed for selinuxfs link count management, to ensure that
the count is correct for the parent directory when a subdirectory is
created. This is only required for the root directory currently, but the
code has been updated for the general case.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Wed, 22 Mar 2006 08:09:19 +0000 (00:09 -0800)]
[PATCH] selinuxfs cleanups: sel_make_avc_files
Fix copy & paste error in sel_make_avc_files(), removing a supurious call to
d_genocide() in the error path. All of this will be cleaned up by
kill_litter_super().
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Wed, 22 Mar 2006 08:09:18 +0000 (00:09 -0800)]
[PATCH] selinuxfs cleanups: sel_make_bools
Remove the call to sel_make_bools() from sel_fill_super(), as policy needs to
be loaded before the boolean files can be created. Policy will never be
loaded during sel_fill_super() as selinuxfs is kernel mounted during init and
the only means to load policy is via selinuxfs.
Also, the call to d_genocide() on the error path of sel_make_bools() is
incorrect and replaced with sel_remove_bools().
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Wed, 22 Mar 2006 08:09:17 +0000 (00:09 -0800)]
[PATCH] selinuxfs cleanups: sel_fill_super exit path
Unify the error path of sel_fill_super() so that all errors pass through the
same point and generate an error message. Also, removes a spurious dput() in
the error path which breaks the refcounting for the filesystem
(litter_kill_super() will correctly clean things up itself on error).
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Wed, 22 Mar 2006 08:09:17 +0000 (00:09 -0800)]
[PATCH] selinuxfs cleanups: use sel_make_dir()
Use existing sel_make_dir() helper to create booleans directory rather than
duplicating the logic.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Wed, 22 Mar 2006 08:09:16 +0000 (00:09 -0800)]
[PATCH] selinuxfs cleanups: fix hard link count
Fix the hard link count for selinuxfs directories, which are currently one
short.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Wed, 22 Mar 2006 08:09:15 +0000 (00:09 -0800)]
[PATCH] selinux: simplify sel_read_bool
Simplify sel_read_bool to use the simple_read_from_buffer helper, like the
other selinuxfs functions.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ingo Molnar [Wed, 22 Mar 2006 08:09:14 +0000 (00:09 -0800)]
[PATCH] sem2mutex: security/
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Cc: James Morris <jmorris@namei.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Wed, 22 Mar 2006 08:09:13 +0000 (00:09 -0800)]
[PATCH] selinux: Disable automatic labeling of new inodes when no policy is loaded
This patch disables the automatic labeling of new inodes on disk
when no policy is loaded.
Discussion is here:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180296
In short, we're changing the behavior so that when no policy is loaded,
SELinux does not label files at all. Currently it does add an 'unlabeled'
label in this case, which we've found causes problems later.
SELinux always maintains a safe internal label if there is none, so with this
patch, we just stick with that and wait until a policy is loaded before adding
a persistent label on disk.
The effect is simply that if you boot with SELinux enabled but no policy
loaded and create a file in that state, SELinux won't try to set a security
extended attribute on the new inode on the disk. This is the only sane
behavior for SELinux in that state, as it cannot determine the right label to
assign in the absence of a policy. That state usually doesn't occur, but the
rawhide installer seemed to be misbehaving temporarily so it happened to show
up on a test install.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Wed, 22 Mar 2006 08:09:12 +0000 (00:09 -0800)]
[PATCH] page migration reorg
Centralize the page migration functions in anticipation of additional
tinkering. Creates a new file mm/migrate.c
1. Extract buffer_migrate_page() from fs/buffer.c
2. Extract central migration code from vmscan.c
3. Extract some components from mempolicy.c
4. Export pageout() and remove_from_swap() from vmscan.c
5. Make it possible to configure NUMA systems without page migration
and non-NUMA systems with page migration.
I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Wed, 22 Mar 2006 08:09:11 +0000 (00:09 -0800)]
[PATCH] mm: slab cache interleave rotor fix
The alien cache rotor in mm/slab.c assumes that the first online node is
node 0. Eventually for some archs, especially with hotplug, this will no
longer be true.
Fix the interleave rotor to handle the general case of node numbering.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Jackson [Wed, 22 Mar 2006 08:09:10 +0000 (00:09 -0800)]
[PATCH] mm: hugetlb alloc_fresh_huge_page bogus node loop fix
Fix bogus node loop in hugetlb.c alloc_fresh_huge_page(), which was
assuming that nodes are numbered contiguously from 0 to num_online_nodes().
Once the hotplug folks get this far, that will be false.
Signed-off-by: Paul Jackson <pj@sgi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Akinobu Mita [Wed, 22 Mar 2006 08:09:09 +0000 (00:09 -0800)]
[PATCH] fix swap cluster offset
When we've allocated SWAPFILE_CLUSTER pages, ->cluster_next should be the
first index of swap cluster. But current code probably sets it wrong offset.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Wed, 22 Mar 2006 08:09:08 +0000 (00:09 -0800)]
[PATCH] drain_node_pages: interrupt latency reduction / optimization
1. Only disable interrupts if there is actually something to free
2. Only dirty the pcp cacheline if we actually freed something.
3. Disable interrupts for each single pcp and not for cleaning
all the pcps in all zones of a node.
drain_node_pages is called every 2 seconds from cache_reap. This
fix should avoid most disabling of interrupts.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Wed, 22 Mar 2006 08:09:07 +0000 (00:09 -0800)]
[PATCH] slab: fix drain_array() so that it works correctly with the shared_array
The list_lock also protects the shared array and we call drain_array() with
the shared array. Therefore we cannot go as far as I wanted to but have to
take the lock in a way so that it also protects the array_cache in
drain_pages.
(Note: maybe we should make the array_cache locking more consistent? I.e.
always take the array cache lock for shared arrays and disable interrupts
for the per cpu arrays?)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Wed, 22 Mar 2006 08:09:07 +0000 (00:09 -0800)]
[PATCH] slab: remove drain_array_locked
Remove drain_array_locked and use that opportunity to limit the time the l3
lock is taken further.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Wed, 22 Mar 2006 08:09:06 +0000 (00:09 -0800)]
[PATCH] slab: make drain_array more universal by adding more parameters
And a parameter to drain_array to control the freeing of all objects and
then use drain_array() to replace instances of drain_array_locked with
drain_array. Doing so will avoid taking locks in those locations if the
arrays are empty.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Wed, 22 Mar 2006 08:09:05 +0000 (00:09 -0800)]
[PATCH] slab: cache_reap(): further reduction in interrupt holdoff
cache_reap takes the l3->list_lock (disabling interrupts) unconditionally
and then does a few checks and maybe does some cleanup. This patch makes
cache_reap() only take the lock if there is work to do and then the lock is
taken and released for each cleaning action.
The checking of when to do the next reaping is done without any locking and
becomes racy. Should not matter since reaping can also be skipped if the
slab mutex cannot be acquired.
The same is true for the touched processing. If we get this wrong once in
awhile then we will mistakenly clean or not clean the shared cache. This
will impact performance slightly.
Note that the additional drain_array() function introduced here will fall
out in a subsequent patch since array cleaning will now be very similar
from all callers.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rafael J. Wysocki [Wed, 22 Mar 2006 08:09:04 +0000 (00:09 -0800)]
[PATCH] mm: make shrink_all_memory try harder
Make shrink_all_memory() repeat the attempts to free more memory if there
seems to be no pages to free.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chen, Kenneth W [Wed, 22 Mar 2006 08:09:03 +0000 (00:09 -0800)]
[PATCH] optimize follow_hugetlb_page
follow_hugetlb_page() walks a range of user virtual address and then fills
in list of struct page * into an array that is passed from the argument
list. It also gets a reference count via get_page(). For compound page,
get_page() actually traverse back to head page via page_private() macro and
then adds a reference count to the head page. Since we are doing a virt to
pte look up, kernel already has a struct page pointer into the head page.
So instead of traverse into the small unit page struct and then follow a
link back to the head page, optimize that with incrementing the reference
count directly on the head page.
The benefit is that we don't take a cache miss on accessing page struct for
the corresponding user address and more importantly, not to pollute the
cache with a "not very useful" round trip of pointer chasing. This adds a
moderate performance gain on an I/O intensive database transaction
workload.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chen, Kenneth W [Wed, 22 Mar 2006 08:09:02 +0000 (00:09 -0800)]
[PATCH] convert hugetlbfs_counter to atomic
Implementation of hugetlbfs_counter() is functionally equivalent to
atomic_inc_return(). Use the simpler atomic form.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:09:01 +0000 (00:09 -0800)]
[PATCH] hugepage: is_aligned_hugepage_range() cleanup
Quite a long time back, prepare_hugepage_range() replaced
is_aligned_hugepage_range() as the callback from mm/mmap.c to arch code to
verify if an address range is suitable for a hugepage mapping.
is_aligned_hugepage_range() stuck around, but only to implement
prepare_hugepage_range() on archs which didn't implement their own.
Most archs (everything except ia64 and powerpc) used the same
implementation of is_aligned_hugepage_range(). On powerpc, which
implements its own prepare_hugepage_range(), the custom version was never
used.
In addition, "is_aligned_hugepage_range()" was a bad name, because it
suggests it returns true iff the given range is a good hugepage range,
whereas in fact it returns 0-or-error (so the sense is reversed).
This patch cleans up by abolishing is_aligned_hugepage_range(). Instead
prepare_hugepage_range() is defined directly. Most archs use the default
version, which simply checks the given region is aligned to the size of a
hugepage. ia64 and powerpc define custom versions. The ia64 one simply
checks that the range is in the correct address space region in addition to
being suitably aligned. The powerpc version (just as previously) checks
for suitable addresses, and if necessary performs low-level MMU frobbing to
set up new areas for use by hugepages.
No libhugetlbfs testsuite regressions on ppc64 (POWER5 LPAR).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:08:59 +0000 (00:08 -0800)]
[PATCH] hugepage: Move hugetlb_free_pgd_range() prototype to hugetlb.h
The optional hugepage callback, hugetlb_free_pgd_range() is presently
implemented non-trivially only on ia64 (but I plan to add one for powerpc
shortly). It has its own prototype for the function in asm-ia64/pgtable.h.
However, since the function is called from generic code, it make sense for
its prototype to be in the generic hugetlb.h header file, as the protypes
other arch callbacks already are (prepare_hugepage_range(),
set_huge_pte_at(), etc.). This patch makes it so.
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:08:58 +0000 (00:08 -0800)]
[PATCH] hugepage: Fix hugepage logic in free_pgtables() harder
Turns out the hugepage logic in free_pgtables() was doubly broken. The
loop coalescing multiple normal page VMAs into one call to free_pgd_range()
had an off by one error, which could mean it would coalesce one hugepage
VMA into the same bundle (checking 'vma' not 'next' in the loop). I
transferred this bug into the new is_vm_hugetlb_page() based version.
Here's the fix.
This one didn't bite on powerpc previously for the same reason the
is_hugepage_only_range() problem didn't: powerpc's hugetlb_free_pgd_range()
is identical to free_pgd_range(). It didn't bite on ia64 because the
hugepage region is distant enough from any other region that the separated
PMD_SIZE distance test would always prevent coalescing the two together.
No libhugetlbfs testsuite regressions (ppc64, POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:08:57 +0000 (00:08 -0800)]
[PATCH] hugepage: Fix hugepage logic in free_pgtables()
free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
of the normal free_pgd_range() on hugepage VMAs. However, the test it uses
to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
range at the start of the vma. is_hugepage_only_range() will return true
if the given range has any intersection with a hugepage address region, and
in this case the given region need not be hugepage aligned. So, for
example, this test can return true if called on, say, a 4k VMA immediately
preceding a (nicely aligned) hugepage VMA.
At present we get away with this because the powerpc version of
hugetlb_free_pgd_range() is just a call to free_pgd_range(). On ia64 (the
only other arch with a non-trivial is_hugepage_only_range()) we get away
with it for a different reason; the hugepage area is not contiguous with
the rest of the user address space, and VMAs are not permitted in between,
so the test can't return a false positive there.
Nonetheless this should be fixed. We do that in the patch below by
replacing the is_hugepage_only_range() test with an explicit test of the
VMA using is_vm_hugetlb_page().
This in turn changes behaviour for platforms where is_hugepage_only_range()
returns false always (everything except powerpc and ia64). We address this
by ensuring that hugetlb_free_pgd_range() is defined to be identical to
free_pgd_range() (instead of a no-op) on everything except ia64. Even so,
it will prevent some otherwise possible coalescing of calls down to
free_pgd_range(). Since this only happens for hugepage VMAs, removing this
small optimization seems unlikely to cause any trouble.
This patch causes no regressions on the libhugetlbfs testsuite - ppc64
POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:08:56 +0000 (00:08 -0800)]
[PATCH] hugepage: Make {alloc,free}_huge_page() local
Originally, mm/hugetlb.c just handled the hugepage physical allocation path
and its {alloc,free}_huge_page() functions were used from the arch specific
hugepage code. These days those functions are only used with mm/hugetlb.c
itself. Therefore, this patch makes them static and removes their
prototypes from hugetlb.h. This requires a small rearrangement of code in
mm/hugetlb.c to avoid a forward declaration.
This patch causes no regressions on the libhugetlbfs testsuite (ppc64,
POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:08:55 +0000 (00:08 -0800)]
[PATCH] hugepage: Strict page reservation for hugepage inodes
These days, hugepages are demand-allocated at first fault time. There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.
A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations. In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available. In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.
The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated. MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL. MAP_PRIVATE mappings can still
trigger an OOM. (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)
This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.
This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:08:53 +0000 (00:08 -0800)]
[PATCH] hugepage: serialize hugepage allocation and instantiation
Currently, no lock or mutex is held between allocating a hugepage and
inserting it into the pagetables / page cache. When we do go to insert the
page into pagetables or page cache, we recheck and may free the newly
allocated hugepage. However, since the number of hugepages in the system
is strictly limited, and it's usualy to want to use all of them, this can
still lead to spurious allocation failures.
For example, suppose two processes are both mapping (MAP_SHARED) the same
hugepage file, large enough to consume the entire available hugepage pool.
If they race instantiating the last page in the mapping, they will both
attempt to allocate the last available hugepage. One will fail, of course,
returning OOM from the fault and thus causing the process to be killed,
despite the fact that the entire mapping can, in fact, be instantiated.
The patch fixes this race by the simple method of adding a (sleeping) mutex
to serialize the hugepage fault path between allocation and insertion into
pagetables and/or page cache. It would be possible to avoid the
serialization by catching the allocation failures, waiting on some
condition, then rechecking to see if someone else has instantiated the page
for us. Given the likely frequency of hugepage instantiations, it seems
very doubtful it's worth the extra complexity.
This patch causes no regression on the libhugetlbfs testsuite, and one
test, which can trigger this race now passes where it previously failed.
Actually, the test still sometimes fails, though less often and only as a
shmat() failure, rather processes getting OOM killed by the VM. The dodgy
heuristic tests in fs/hugetlbfs/inode.c for whether there's enough hugepage
space aren't protected by the new mutex, and would be ugly to do so, so
there's still a race there. Another patch to replace those tests with
something saner for this reason as well as others coming...
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Wed, 22 Mar 2006 08:08:51 +0000 (00:08 -0800)]
[PATCH] hugepage: Small fixes to hugepage clear/copy path
Move the loops used in mm/hugetlb.c to clear and copy hugepages to their
own functions for clarity. As we do so, we add some checks of need_resched
- we are, after all copying megabytes of memory here. We also add
might_sleep() accordingly. We generally dropped locks around the clear and
copy, already but not everyone has PREEMPT enabled, so we should still be
checking explicitly.
For this to work, we need to remove the clear_huge_page() from
alloc_huge_page(), which is called with the page_table_lock held in the COW
path. We move the clear_huge_page() to just after the alloc_huge_page() in
the hugepage no-page path. In the COW path, the new page is about to be
copied over, so clearing it was just a waste of time anyway. So as a side
effect we also fix the fact that we held the page_table_lock for far too
long in this path by calling alloc_huge_page() under it.
It causes no regressions on the libhugetlbfs testsuite (ppc64, POWER5).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zhang, Yanmin [Wed, 22 Mar 2006 08:08:50 +0000 (00:08 -0800)]
[PATCH] Enable mprotect on huge pages
2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
mprotect.
From: David Gibson <david@gibson.dropbear.id.au>
Remove a test from the mprotect() path which checks that the mprotect()ed
range on a hugepage VMA is hugepage aligned (yes, really, the sense of
is_aligned_hugepage_range() is the opposite of what you'd guess :-/).
In fact, we don't need this test. If the given addresses match the
beginning/end of a hugepage VMA they must already be suitably aligned. If
they don't, then mprotect_fixup() will attempt to split the VMA. The very
first test in split_vma() will check for a badly aligned address on a
hugepage VMA and return -EINVAL if necessary.
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE. The
identify of hugetlb pte is lost when changing page protection via mprotect.
A page fault occurs later will trigger a bug check in huge_pte_alloc().
The fix is to always make new pte a hugetlb pte and also to clean up
legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steven Pratt [Wed, 22 Mar 2006 08:08:48 +0000 (00:08 -0800)]
[PATCH] readahead: fix initial window size calculation
The current current get_init_ra_size is not optimal across different IO
sizes and max_readahead values. Here is a quick summary of sizes computed
under current design and under the attached patch. All of these assume 1st
IO at offset 0, or 1st detected sequential IO.
32k max, 4k request
old new
-----------------
8k 8k
16k 16k
32k 32k
128k max, 4k request
old new
-----------------
32k 16k
64k 32k
128k 64k
128k 128k
128k max, 32k request
old new
-----------------
32k 64k <-----
64k 128k
128k 128k
512k max, 4k request
old new
-----------------
4k 32k <----
16k 64k
64k 128k
128k 256k
512k 512k
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Wed, 22 Mar 2006 08:08:47 +0000 (00:08 -0800)]
[PATCH] readahead: ->prev_page can overrun the ahead window
If get_next_ra_size() does not grow fast enough, ->prev_page can overrun
the ahead window. This means the caller will read the pages from
->ahead_start + ->ahead_size to ->prev_page synchronously.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Wed, 22 Mar 2006 08:08:46 +0000 (00:08 -0800)]
[PATCH] shmem: inline to avoid warning
shmem.c was named and shamed in Jesper's "Building 100 kernels" warnings:
shmem_parse_mpol is only used when CONFIG_TMPFS parses mount options; and
only called from that one site, so mark it inline like its non-NUMA stub.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Christoph Lameter [Wed, 22 Mar 2006 08:08:45 +0000 (00:08 -0800)]
[PATCH] vmscan: emove obsolete checks from shrink_list() and fix unlikely in refill_inactive_zone()
As suggested by Marcelo:
1. The optimization introduced recently for not calling
page_referenced() during zone reclaim makes two additional checks in
shrink_list unnecessary.
2. The if (unlikely(sc->may_swap)) in refill_inactive_zone is optimized
for the zone_reclaim case. However, most peoples system only does swap.
Undo that.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Buesch [Wed, 22 Mar 2006 08:08:44 +0000 (00:08 -0800)]
[PATCH] Uninline sys_mmap common code (reduce binary size)
Remove the inlining of the new vs old mmap system call common code. This
reduces the size of the resulting vmlinux for defconfig as follows:
mb@pc1:~/develop/git/linux-2.6$ size vmlinux.mmap*
text data bss dec hex filename
3303749 521524 186564
4011837 3d373d vmlinux.mmapinline
3303557 521524 186564
4011645 3d367d vmlinux.mmapnoinline
The new sys_mmap2() has also one function call overhead removed, now.
(probably it was already optimized to a jmp before, but anyway...)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Wed, 22 Mar 2006 08:08:43 +0000 (00:08 -0800)]
[PATCH] mm: optimise page_count
Optimise page_count compound page test and make it consistent with similar
functions.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>