linux-2.6
18 years agoNFSv4: Fix an Oops in nfs_do_expire_all_delegations
Trond Myklebust [Tue, 3 Jan 2006 08:55:58 +0000 (09:55 +0100)] 
NFSv4: Fix an Oops in nfs_do_expire_all_delegations

 If the loop errors, we need to exit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Allow entries in the idmap cache to expire
Trond Myklebust [Tue, 3 Jan 2006 08:55:57 +0000 (09:55 +0100)] 
NFSv4: Allow entries in the idmap cache to expire

 If someone changes the uid/gid mapping in userland, then we do eventually
 want those changes to be propagated to the kernel. Currently the kernel
 assumes that it may cache entries forever.

 Add an expiration time + garbage collector for idmap entries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Clean up xprt_destroy()
Trond Myklebust [Tue, 3 Jan 2006 08:55:56 +0000 (09:55 +0100)] 
SUNRPC: Clean up xprt_destroy()

 We ought never to be calling xprt_destroy() if there are still active
 rpc_tasks. Optimise away the broken code that attempts to "fix" that case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Ensure client closes the socket when server initiates a close
Trond Myklebust [Tue, 3 Jan 2006 08:55:55 +0000 (09:55 +0100)] 
SUNRPC: Ensure client closes the socket when server initiates a close

 If the server decides to close the RPC socket, we currently don't actually
 respond until either another RPC call is scheduled, or until xprt_autoclose()
 gets called by the socket expiry timer (which may be up to 5 minutes
 later).

 This patch ensures that xprt_autoclose() is called much sooner if the
 server closes the socket.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: get rid of some needless code obfuscation in xdr_encode_sattr().
Trond Myklebust [Tue, 3 Jan 2006 08:55:54 +0000 (09:55 +0100)] 
NFS: get rid of some needless code obfuscation in xdr_encode_sattr().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Send valid mode bits to the server
Trond Myklebust [Tue, 3 Jan 2006 08:55:53 +0000 (09:55 +0100)] 
NFS: Send valid mode bits to the server

 inode->i_mode contains a lot more than just the mode bits. Make sure that
 we mask away this extra stuff in SETATTR calls to the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: get rid of cl_chatty
Chuck Lever [Tue, 3 Jan 2006 08:55:52 +0000 (09:55 +0100)] 
SUNRPC: get rid of cl_chatty

 Clean up: Every ULP that uses the in-kernel RPC client, except the NLM
 client, sets cl_chatty.  There's no reason why NLM shouldn't set it, so
 just get rid of cl_chatty and always be verbose.

 Test-plan:
 Compile with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: transport switch API for setting port number
Chuck Lever [Tue, 3 Jan 2006 08:55:51 +0000 (09:55 +0100)] 
SUNRPC: transport switch API for setting port number

 At some point, transport endpoint addresses will no longer be IPv4.  To hide
 the structure of the rpc_xprt's address field from ULPs and port mappers,
 add an API for setting the port number during an RPC bind operation.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
 Probably need to rig a server where certain services aren't running, or
 that returns an error for some typical operation.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: new interface to force an RPC rebind
Chuck Lever [Tue, 3 Jan 2006 08:55:50 +0000 (09:55 +0100)] 
SUNRPC: new interface to force an RPC rebind

 We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols.
 Start by creating an API to force RPC rebind, replacing logic that simply
 sets cl_port to zero.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
 Probably need to rig a server where certain services aren't running, or
 that returns an error for some typical operation.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: switchable buffer allocation
Chuck Lever [Tue, 3 Jan 2006 08:55:49 +0000 (09:55 +0100)] 
SUNRPC: switchable buffer allocation

 Add RPC client transport switch support for replacing buffer management
 on a per-transport basis.

 In the current IPv4 socket transport implementation, RPC buffers are
 allocated as needed for each RPC message that is sent.  Some transport
 implementations may choose to use pre-allocated buffers for encoding,
 sending, receiving, and unmarshalling RPC messages, however.  For
 transports capable of direct data placement, the buffers can be carved
 out of a pre-registered area of memory rather than from a slab cache.

 Test-plan:
 Millions of fsx operations.  Performance characterization with "sio" and
 "iozone".  Use oprofile and other tools to look for significant regression
 in CPU utilization.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv3: try get_root user-supplied security_flavor
J. Bruce Fields [Tue, 3 Jan 2006 08:55:48 +0000 (09:55 +0100)] 
NFSv3: try get_root user-supplied security_flavor

 Thanks to Ed Keizer for bug and root cause.  He says: "... we could only mount
 the top-level Solaris share. We could not mount deeper into the tree.
 Investigation showed that Solaris allows UNIX authenticated FSINFO only on the
 top level of the share. This is a problem because we share/export our home
 directories one level higher than we mount them. I.e. we share the partition
 and not the individual home directories. This prevented access to home
 directories."

 We still may need to try auth_sys for the case where the client doesn't have
 appropriate credentials.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNLM: fix parsing of sm notify procedure
J. Bruce Fields [Tue, 3 Jan 2006 08:55:46 +0000 (09:55 +0100)] 
NLM: fix parsing of sm notify procedure

 The procedure that decodes statd sm_notify call seems to be skipping a
 few arguments.  How did this ever work?

 >From folks at Polyserve.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNLM: Further cancel fixes
J. Bruce Fields [Tue, 3 Jan 2006 08:55:46 +0000 (09:55 +0100)] 
NLM: Further cancel fixes

 If the server receives an NLM cancel call and finds no waiting lock to
 cancel, then chances are the lock has already been applied, and the client
 just hadn't yet processed the NLM granted callback before it sent the
 cancel.

 The Open Group text, for example, perimts a server to return either success
 (LCK_GRANTED) or failure (LCK_DENIED) in this case.  But returning an error
 seems more helpful; the client may be able to use it to recognize that a
 race has occurred and to recover from the race.

 So, modify the relevant functions to return an error in this case.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNLM: clean up nlmsvc_delete_block
J. Bruce Fields [Tue, 3 Jan 2006 08:55:45 +0000 (09:55 +0100)] 
NLM: clean up nlmsvc_delete_block

 The fl_next check here is superfluous (and possibly a layering violation).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNLM: don't unlock on cancel requests
J. Bruce Fields [Tue, 3 Jan 2006 08:55:44 +0000 (09:55 +0100)] 
NLM: don't unlock on cancel requests

 Currently when lockd gets an NLM_CANCEL request, it also does an unlock for
 the same range.  This is incorrect.

 The Open Group documentation says that "This procedure cancels an
 *outstanding* blocked lock request."  (Emphasis mine.)

 Also, consider a client that holds a lock on the first byte of a file, and
 requests a lock on the entire file.  If the client cancels that request
 (perhaps because the requesting process is signalled), the server shouldn't
 apply perform an unlock on the entire file, since that will also remove the
 previous lock that the client was already granted.

 Or consider a lock request that actually *downgraded* an exclusive lock to
 a shared lock.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNLM: Clean up nlmsvc_grant_reply locking
J. Bruce Fields [Tue, 3 Jan 2006 08:55:42 +0000 (09:55 +0100)] 
NLM: Clean up nlmsvc_grant_reply locking

 Slightly simpler logic here makes it more trivial to verify that the up's
 and down's are balanced here.  Break out an assignment from a conditional
 while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: net/sunrpc/xdr.c: remove xdr_decode_string()
Adrian Bunk [Tue, 3 Jan 2006 08:55:41 +0000 (09:55 +0100)] 
SUNRPC: net/sunrpc/xdr.c: remove xdr_decode_string()

 This patch removes ths unused function xdr_decode_string().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Charles Lever <Charles.Lever@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Allow user to set the port used by the NFSv4 callback channel
Trond Myklebust [Tue, 3 Jan 2006 08:55:41 +0000 (09:55 +0100)] 
NFSv4: Allow user to set the port used by the NFSv4 callback channel

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Clean up weak cache consistency code
Trond Myklebust [Tue, 3 Jan 2006 08:55:40 +0000 (09:55 +0100)] 
NFS: Clean up weak cache consistency code

 ...and ensure that nfs_update_inode() respects wcc

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Ensure DELEGRETURN returns attributes
Trond Myklebust [Tue, 3 Jan 2006 08:55:38 +0000 (09:55 +0100)] 
NFSv4: Ensure DELEGRETURN returns attributes

 Upon return of a write delegation, the server will almost always bump the
 change attribute. Ensure that we pick up that change so that we don't
 invalidate our data cache unnecessarily.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Ensure change attribute returned by GETATTR callback conforms to spec
Trond Myklebust [Tue, 3 Jan 2006 08:55:37 +0000 (09:55 +0100)] 
NFSv4: Ensure change attribute returned by GETATTR callback conforms to spec

 According to RFC3530 we're supposed to cache the change attribute
 at the time the client receives a write delegation.
 If the inode is clean, a CB_GETATTR callback by the server to the
 client is supposed to return the cached change attribute.
 If, OTOH, the inode is dirty, the client should bump the cached
 change attribute by 1.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Fix a potential race in rpc_pipefs.
Trond Myklebust [Tue, 3 Jan 2006 08:55:36 +0000 (09:55 +0100)] 
SUNRPC: Fix a potential race in rpc_pipefs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Make directIO aware of compound pages...
Trond Myklebust [Tue, 3 Jan 2006 08:55:35 +0000 (09:55 +0100)] 
NFS: Make directIO aware of compound pages...

 ...and avoid calling set_page_dirty on them

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Make stat() return updated mtimes after a write()
Trond Myklebust [Tue, 3 Jan 2006 08:55:34 +0000 (09:55 +0100)] 
NFS: Make stat() return updated mtimes after a write()

 The SuS states that a call to write() will cause mtime to be updated on
 the file. In order to satisfy that requirement, we need to flush out
 any cached writes in nfs_getattr().
 Speed things up slightly by not committing the writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Ensure that we return the delegation on the target of a rename too.
Trond Myklebust [Tue, 3 Jan 2006 08:55:33 +0000 (09:55 +0100)] 
NFSv4: Ensure that we return the delegation on the target of a rename too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: support large reads and writes on the wire
Chuck Lever [Wed, 30 Nov 2005 23:09:02 +0000 (18:09 -0500)] 
NFS: support large reads and writes on the wire

 Most NFS server implementations allow up to 64KB reads and writes on the
 wire.  The Solaris NFS server allows up to a megabyte, for instance.

 Now the Linux NFS client supports transfer sizes up to 1MB, too.  This will
 help reduce protocol and context switch overhead on read/write intensive NFS
 workloads, and support larger atomic read and write operations on servers
 that support them.

 Test-plan:
 Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
 Tests with NFS over UDP to verify the maximum RPC payload size cap.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: make "inode number mismatch" message more useful
Chuck Lever [Wed, 30 Nov 2005 23:08:55 +0000 (18:08 -0500)] 
NFS: make "inode number mismatch" message more useful

 To help NFS users and server developers, make the "inode number mismatch"
 message display more useful information.

 Test-plan:
 None.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: get rid of useless kernel log message
Chuck Lever [Wed, 30 Nov 2005 23:08:57 +0000 (18:08 -0500)] 
NFS: get rid of useless kernel log message

 nfs_statfs() generates a log message when GETATTR returns an error.  This
 is usually a useless message.  Make it a dprintk.

 Test plan:
 None

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: simplify inlined bit ops in nfs_page.h
Chuck Lever [Wed, 30 Nov 2005 23:08:59 +0000 (18:08 -0500)] 
NFS: simplify inlined bit ops in nfs_page.h

 Minor cleanup:  inlined bit ops in nfs_page.h can be simpler.

 Test plan:
 Write-intensive workload against a server that requires COMMITs.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs()
Chuck Lever [Wed, 30 Nov 2005 23:08:19 +0000 (18:08 -0500)] 
NFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs()

 Red Hat found a problem in the error recovery logic in __init_nfs.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: use generic_write_checks() to sanity check direct writes
Chuck Lever [Wed, 30 Nov 2005 23:08:17 +0000 (18:08 -0500)] 
NFS: use generic_write_checks() to sanity check direct writes

 Replace ad hoc write parameter sanity checking in nfs_file_direct_write()
 with a call to generic_write_checks().  This should make the proper checks
 modulo the O_LARGEFILE flag, and should catch NFSv2-specific limitations by
 virtue of i_sb->s_maxbytes.

 Test plan:
 Posix compliance testing with both NFSv2 and NFSv3.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Remove requirement for machine creds for the "setclientid" operation
Trond Myklebust [Tue, 3 Jan 2006 08:55:26 +0000 (09:55 +0100)] 
NFSv4: Remove requirement for machine creds for the "setclientid" operation

 Use a cred from the nfs4_client->cl_state_owners list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Remove requirement for machine creds for the "renew" operation
Trond Myklebust [Tue, 3 Jan 2006 08:55:25 +0000 (09:55 +0100)] 
NFSv4: Remove requirement for machine creds for the "renew" operation

 In RFC3530, the RENEW operation is allowed to use either

 the same principal, RPC security flavour and (if RPCSEC_GSS), the same
  mechanism and service that was used for SETCLIENTID_CONFIRM

 OR

 Any principal, RPC security flavour and service combination that
 currently has an OPEN file on the server.

 Choose the latter since that doesn't require us to keep credentials for
 the same principal for the entire duration of the mount.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Send RENEW requests to the server only when we're holding state
Trond Myklebust [Tue, 3 Jan 2006 08:55:24 +0000 (09:55 +0100)] 
NFSv4: Send RENEW requests to the server only when we're holding state

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Convert instances of kernel_thread() to kthread()
Trond Myklebust [Tue, 3 Jan 2006 08:55:23 +0000 (09:55 +0100)] 
NFS: Convert instances of kernel_thread() to kthread()

 Convert private implementations in NFSv4 state recovery and delegation
 code to use kthreads.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: State recovery cleanup
Trond Myklebust [Tue, 3 Jan 2006 08:55:22 +0000 (09:55 +0100)] 
NFSv4: State recovery cleanup

 Use wait_on_bit() when waiting for state recovery to complete.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: OPEN/LOCK/LOCKU/CLOSE will automatically renew the NFSv4 lease
Trond Myklebust [Tue, 3 Jan 2006 08:55:21 +0000 (09:55 +0100)] 
NFSv4: OPEN/LOCK/LOCKU/CLOSE will automatically renew the NFSv4 lease

 Cut down on the number of unnecessary RENEW requests on the wire.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call.
Trond Myklebust [Tue, 3 Jan 2006 08:55:19 +0000 (09:55 +0100)] 
SUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call.

 ...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to
 interrupt RPC calls too (as per the Solaris implementation).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Make DELEGRETURN an interruptible operation.
Trond Myklebust [Tue, 3 Jan 2006 08:55:18 +0000 (09:55 +0100)] 
NFSv4: Make DELEGRETURN an interruptible operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Convert LOCK rpc call into an asynchronous RPC call
Trond Myklebust [Tue, 3 Jan 2006 08:55:17 +0000 (09:55 +0100)] 
NFSv4: Convert LOCK rpc call into an asynchronous RPC call

 In order to allow users to interrupt/cancel it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: locking XDR cleanup
Trond Myklebust [Tue, 3 Jan 2006 08:55:16 +0000 (09:55 +0100)] 
NFSv4: locking XDR cleanup

 Get rid of some unnecessary intermediate structures

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Make open recovery track O_RDWR, O_RDONLY and O_WRONLY correctly
Trond Myklebust [Tue, 3 Jan 2006 08:55:15 +0000 (09:55 +0100)] 
NFSv4: Make open recovery track O_RDWR, O_RDONLY and O_WRONLY correctly

 When recovering from a delegation recall or a network partition, we need
 to replay open(O_RDWR), open(O_RDONLY) and open(O_WRONLY) separately.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Make nfs4_state track O_RDWR, O_RDONLY and O_WRONLY separately
Trond Myklebust [Tue, 3 Jan 2006 08:55:13 +0000 (09:55 +0100)] 
NFSv4: Make nfs4_state track O_RDWR, O_RDONLY and O_WRONLY separately

 A closer reading of RFC3530 reveals that OPEN_DOWNGRADE must always
 specify a access modes that have been the argument of a previous OPEN
 operation.
 IOW: doing OPEN(O_RDWR) and then OPEN_DOWNGRADE(O_WRONLY) is forbidden
 unless the user called OPEN(O_WRONLY)

 In order to fix that, we really need to track the three possible open
 states separately.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Make open_confirm() asynchronous too
Trond Myklebust [Tue, 3 Jan 2006 08:55:12 +0000 (09:55 +0100)] 
NFSv4: Make open_confirm() asynchronous too

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Convert open() into an asynchronous RPC call
Trond Myklebust [Tue, 3 Jan 2006 08:55:11 +0000 (09:55 +0100)] 
NFSv4: Convert open() into an asynchronous RPC call

 OPEN is a stateful operation, so we must ensure that it always
 completes. In order to allow users to interrupt the operation,
 we need to make the RPC call asynchronous, and then wait on
 completion (or cancel).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: rpc_execute should not return task->tk_status;
Trond Myklebust [Tue, 3 Jan 2006 08:55:10 +0000 (09:55 +0100)] 
SUNRPC: rpc_execute should not return task->tk_status;

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Get rid of some unused exports
Trond Myklebust [Tue, 3 Jan 2006 08:55:09 +0000 (09:55 +0100)] 
SUNRPC: Get rid of some unused exports

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Allocate OPEN call RPC arguments using kmalloc()
Trond Myklebust [Tue, 3 Jan 2006 08:55:08 +0000 (09:55 +0100)] 
NFSv4: Allocate OPEN call RPC arguments using kmalloc()

 Cleanup in preparation for making OPEN calls interruptible by the user.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Make locku use the new RPC "wait on completion" interface.
Trond Myklebust [Tue, 3 Jan 2006 08:55:07 +0000 (09:55 +0100)] 
NFSv4: Make locku use the new RPC "wait on completion" interface.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: stateful NFSv4 RPC call interface
Trond Myklebust [Tue, 3 Jan 2006 08:55:06 +0000 (09:55 +0100)] 
NFSv4: stateful NFSv4 RPC call interface

 The NFSv4 model requires us to complete all RPC calls that might
 establish state on the server whether or not the user wants to
 interrupt it. We may also need to schedule new work (including
 new RPC calls) in order to cancel the new state.

 The asynchronous RPC model will allow us to ensure that RPC calls
 always complete, but in order to allow for "synchronous" RPC, we
 want to add the ability to wait for completion.
 The waits are, of course, interruptible.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Further cleanups
Trond Myklebust [Tue, 3 Jan 2006 08:55:05 +0000 (09:55 +0100)] 
SUNRPC: Further cleanups

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoRPC: Clean up RPC task structure
Trond Myklebust [Tue, 3 Jan 2006 08:55:04 +0000 (09:55 +0100)] 
RPC: Clean up RPC task structure

 Shrink the RPC task structure. Instead of storing separate pointers
 for task->tk_exit and task->tk_release, put them in a structure.

 Also pass the user data pointer as a parameter instead of passing it via
 task->tk_calldata. This enables us to nest callbacks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Yet more RPC cleanups
Trond Myklebust [Tue, 3 Jan 2006 08:55:03 +0000 (09:55 +0100)] 
SUNRPC: Yet more RPC cleanups

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Work correctly with single-page ->writepage() calls
Trond Myklebust [Tue, 3 Jan 2006 08:55:02 +0000 (09:55 +0100)] 
NFS: Work correctly with single-page ->writepage() calls

 Ensure that we always initiate flushing of data before we exit
 a single-page ->writepage() call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoidentify multipage ->writepages() calls
Andrew Morton [Wed, 16 Nov 2005 23:07:01 +0000 (15:07 -0800)] 
identify multipage ->writepages() calls

 NFS needs to be able to distinguish between single-page ->writepage() calls and
 multipage ->writepages() calls.

 For the single-page writepage calls NFS can kick off the I/O within the
 context of ->writepage().

 For multipage ->writepages calls, nfs_writepage() will leave the I/O pending
 and nfs_writepages() will kick off the I/O when it all has been queued up
 within NFS.

Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoMerge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-block
Linus Torvalds [Fri, 6 Jan 2006 17:01:25 +0000 (09:01 -0800)] 
Merge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-block

Manual fixup for merge with Jens' "Suspend support for libata", commit
ID 9b847548663ef1039dd49f0eb4463d001e596bc3.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years agox86: remove bogus 'pci=usepirqmask' suggestion when no irq is defined
Linus Torvalds [Fri, 6 Jan 2006 16:43:16 +0000 (08:43 -0800)] 
x86: remove bogus 'pci=usepirqmask' suggestion when no irq is defined

This was harmless, but for the case of a device that had no irq
pre-defined we would incorrectly suggest that "usepirqmask" might make a
difference.  It never would, and the message was just confusing people.

Reported in the dmesg of Etienne Lorrain.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] Suspend support for libata
Jens Axboe [Fri, 6 Jan 2006 08:28:07 +0000 (09:28 +0100)] 
[PATCH] Suspend support for libata

This patch adds suspend patch to libata, and ata_piix in particular. For
most low level drivers, they should just need to add the 4 hooks to
work. As I can only test ata_piix, I didn't enable it for more
though.

Suspend support is the single most important feature on a notebook, and
most new notebooks have sata drives. It's quite embarrassing that we
_still_ do not support this. Right now, it's perfectly possible to
suspend the drive in mid-transfer.

Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: allow sync-speed to be controlled per-device
NeilBrown [Fri, 6 Jan 2006 08:21:36 +0000 (00:21 -0800)] 
[PATCH] md: allow sync-speed to be controlled per-device

Also export current (average) speed and status in sysfs.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: support adding new devices to md arrays via sysfs
NeilBrown [Fri, 6 Jan 2006 08:21:16 +0000 (00:21 -0800)] 
[PATCH] md: support adding new devices to md arrays via sysfs

Writing major:minor to md/new_dev will bind that device to the array.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: allow available size of component devices to be set via sysfs
NeilBrown [Fri, 6 Jan 2006 08:21:06 +0000 (00:21 -0800)] 
[PATCH] md: allow available size of component devices to be set via sysfs

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md-export-rdev-data_offset-via-sysfs-fix
Andrew Morton [Fri, 6 Jan 2006 08:20:59 +0000 (00:20 -0800)] 
[PATCH] md-export-rdev-data_offset-via-sysfs-fix

drivers/md/md.c: In function `offset_show':
drivers/md/md.c:1670: warning: long long unsigned int format, different type arg (arg 3)

Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: export rdev->data_offset via sysfs
NeilBrown [Fri, 6 Jan 2006 08:20:56 +0000 (00:20 -0800)] 
[PATCH] md: export rdev->data_offset via sysfs

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: expose device slot information via sysfs
NeilBrown [Fri, 6 Jan 2006 08:20:55 +0000 (00:20 -0800)] 
[PATCH] md: expose device slot information via sysfs

This the role that a device has in an array can be viewed and set.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: keep better track of dev/array size when assembling md arrays
NeilBrown [Fri, 6 Jan 2006 08:20:55 +0000 (00:20 -0800)] 
[PATCH] md: keep better track of dev/array size when assembling md arrays

Move the checks - that dev size is never less than array size - into
bind_rdev_to_array to make sure it always happens properly (there is one place
where currently it doesn't).

Also reject any superblock which claims an array size smaller than the device
in question can hold.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: allow md/raid_disks to be settable
NeilBrown [Fri, 6 Jan 2006 08:20:54 +0000 (00:20 -0800)] 
[PATCH] md: allow md/raid_disks to be settable

If array is active, try to reshape, else just set the value.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: count corrected read errors per drive
NeilBrown [Fri, 6 Jan 2006 08:20:52 +0000 (00:20 -0800)] 
[PATCH] md: count corrected read errors per drive

Store this total in superblock (As appropriate), and make it available to
userspace via sysfs.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: allow array level to be set textually via sysfs
NeilBrown [Fri, 6 Jan 2006 08:20:51 +0000 (00:20 -0800)] 
[PATCH] md: allow array level to be set textually via sysfs

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: expose md metadata format in sysfs
NeilBrown [Fri, 6 Jan 2006 08:20:50 +0000 (00:20 -0800)] 
[PATCH] md: expose md metadata format in sysfs

Allow it to be set to a particular version, or 'none'.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: allow md array component size to be accessed and set via sysfs
NeilBrown [Fri, 6 Jan 2006 08:20:49 +0000 (00:20 -0800)] 
[PATCH] md: allow md array component size to be accessed and set via sysfs

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: allow chunk_size to be settable through sysfs
NeilBrown [Fri, 6 Jan 2006 08:20:47 +0000 (00:20 -0800)] 
[PATCH] md: allow chunk_size to be settable through sysfs

... only before array is started of course.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: fix rdev->pending counts in raid1
NeilBrown [Fri, 6 Jan 2006 08:20:46 +0000 (00:20 -0800)] 
[PATCH] md: fix rdev->pending counts in raid1

When we do a user-requested check/repair, we lose count of the outstanding
requests...

Also make sure that when anything is written to md/sync_action, the
RECOVERY_NEEDED flag is set and the thread is woken up so any changes take
effect.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: make sure bitmap updates are visible through filesystem
NeilBrown [Fri, 6 Jan 2006 08:20:45 +0000 (00:20 -0800)] 
[PATCH] md: make sure bitmap updates are visible through filesystem

When we update a page_cache page in the kernel, we need to flush_dache_page or
userspace might not see the change.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] drivers/md/md.c: make md_new_event() static
Adrian Bunk [Fri, 6 Jan 2006 08:20:44 +0000 (00:20 -0800)] 
[PATCH] drivers/md/md.c: make md_new_event() static

Make the needlessly global function md_new_event() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: make a couple of names in md.c static
NeilBrown [Fri, 6 Jan 2006 08:20:43 +0000 (00:20 -0800)] 
[PATCH] md: make a couple of names in md.c static

.. because they aren't used outside md.c

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: fix typo in comment
NeilBrown [Fri, 6 Jan 2006 08:20:42 +0000 (00:20 -0800)] 
[PATCH] md: fix typo in comment

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: helper function to match commands written to sysfs files
NeilBrown [Fri, 6 Jan 2006 08:20:41 +0000 (00:20 -0800)] 
[PATCH] md: helper function to match commands written to sysfs files

Commands written to sysfs files may, or my not, be \n terminated.  We want to
accept with case.  For this we use cmd_match.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: define and use safe_put_page for md
NeilBrown [Fri, 6 Jan 2006 08:20:40 +0000 (00:20 -0800)] 
[PATCH] md: define and use safe_put_page for md

md sometimes call put_page on NULL pointers (treating it like kfree).  This is
not safe, so define and use a 'safe_put_page' which checks for NULL.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: remove inappropriate limits in md/bitmap configuration.
NeilBrown [Fri, 6 Jan 2006 08:20:39 +0000 (00:20 -0800)] 
[PATCH] md: remove inappropriate limits in md/bitmap configuration.

The kernel should not be imposing these policy limits: The time between
bitmap updates should certainly be allowed to be more than 15 seconds, and
if someone wants a bitmap chunk size in excess of 4MB, the kernel isn't the
place to stop them.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: fix possible problem in raid1/raid10 error overwriting
NeilBrown [Fri, 6 Jan 2006 08:20:37 +0000 (00:20 -0800)] 
[PATCH] md: fix possible problem in raid1/raid10 error overwriting

The code to overwrite/reread for addressing read errors in raid1/raid10
currently assumes that the read will not alter the buffer which could be used
to write to the next device.  This is not a safe assumption to make.

So we split the loops into a overwrite loop and a separate re-read loop, so
that the writing is complete before reading is attempted.

Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: remove personality numbering from md
NeilBrown [Fri, 6 Jan 2006 08:20:36 +0000 (00:20 -0800)] 
[PATCH] md: remove personality numbering from md

md supports multiple different RAID level, each being implemented by a
'personality' (which is often in a separate module).

These personalities have fairly artificial 'numbers'.  The numbers
are use to:
 1- provide an index into an array where the various personalities
    are recorded
 2- identify the module (via an alias) which implements are particular
    personality.

Neither of these uses really justify the existence of personality numbers.
The array can be replaced by a linked list which is searched (array lookup
only happens very rarely).  Module identification can be done using an alias
based on level rather than 'personality' number.

The current 'raid5' modules support two level (4 and 5) but only one
personality.  This slight awkwardness (which was handled in the mapping from
level to personality) can be better handled by allowing raid5 to register 2
personalities.

With this change in place, the core md module does not need to have an
exhaustive list of all possible personalities, so other personalities can be
added independently.

This patch also moves the check for chunksize being non-zero into the ->run
routines for the personalities that need it, rather than having it in core-md.
 This has a side effect of allowing 'faulty' and 'linear' not to have a
chunk-size set.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: break out of a loop that doesn't need to run to completion
NeilBrown [Fri, 6 Jan 2006 08:20:35 +0000 (00:20 -0800)] 
[PATCH] md: break out of a loop that doesn't need to run to completion

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: convert recently exported symbol to GPL
NeilBrown [Fri, 6 Jan 2006 08:20:34 +0000 (00:20 -0800)] 
[PATCH] md: convert recently exported symbol to GPL

...because that seems to be the preferred practice these days.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: convert various kmap calls to kmap_atomic
NeilBrown [Fri, 6 Jan 2006 08:20:34 +0000 (00:20 -0800)] 
[PATCH] md: convert various kmap calls to kmap_atomic

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: tidy up raid5/6 hash table code
NeilBrown [Fri, 6 Jan 2006 08:20:33 +0000 (00:20 -0800)] 
[PATCH] md: tidy up raid5/6 hash table code

- replace open-coded hash chain with hlist macros

- Fix hash-table size at one page - it is already quite generous, so there
  will never be a need to use multiple pages, so no need for __get_free_pages

No functional change.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: convert md to use kzalloc throughout
NeilBrown [Fri, 6 Jan 2006 08:20:32 +0000 (00:20 -0800)] 
[PATCH] md: convert md to use kzalloc throughout

Replace multiple kmalloc/memset pairs with kzalloc calls.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: clean up 'page' related names in md
NeilBrown [Fri, 6 Jan 2006 08:20:31 +0000 (00:20 -0800)] 
[PATCH] md: clean up 'page' related names in md

Substitute:

  page_cache_get -> get_page
  page_cache_release -> put_page
  PAGE_CACHE_SHIFT -> PAGE_SHIFT
  PAGE_CACHE_SIZE -> PAGE_SIZE
  PAGE_CACHE_MASK -> PAGE_MASK
  __free_page -> put_page

because we aren't using the page cache, we are just using pages.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: make /proc/mdstat pollable
NeilBrown [Fri, 6 Jan 2006 08:20:30 +0000 (00:20 -0800)] 
[PATCH] md: make /proc/mdstat pollable

With this patch it is possible to poll /proc/mdstat to detect arrays appearing
or disappearing, to detect failures, recovery starting, recovery completing,
and devices being added and removed.

It is similar to the poll-ability of /proc/mounts, though different in that:

We always report that the file is readable (because face it, it is, even if
only for EOF).

We report POLLPRI when there is a change so that select() can detect
it as an exceptional event.  Not only are these exceptional events, but
that is the mechanism that the current 'mdadm' uses to watch for events
(It also polls after a timeout).
(We also report POLLERR like /proc/mounts).

Finally, we only reset the per-file event counter when the start of the file
is read, rather than when poll() returns an event.  This is more robust as it
means that an fd will continue to report activity to poll/select until the
program clearly responds to that activity.

md_new_event takes an 'mddev' which isn't currently used, but it will be soon.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: raid10 read-error handling - resync and read-only
NeilBrown [Fri, 6 Jan 2006 08:20:29 +0000 (00:20 -0800)] 
[PATCH] md: raid10 read-error handling - resync and read-only

Add in correct read-error handling for resync and read-only situations.

When read-only, we don't over-write, so we need to mark the failed drive in
the r10_bio so we don't re-try it.  During resync, we always read all blocks,
so if there is a read error, we simply over-write it with the good block that
we found (assuming we found one).

Note that the recovery case still isn't handled in an interesting way.  There
is nothing useful to do for the 2-copies case.  If there are 3 or more copies,
then we could try reading from one of the non-missing copies, but this is a
bit complicated and very rarely would be used, so I'm leaving it for now.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: auto-correct correctable read errors in raid10
NeilBrown [Fri, 6 Jan 2006 08:20:28 +0000 (00:20 -0800)] 
[PATCH] md: auto-correct correctable read errors in raid10

Largely just a cross-port from raid1.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: make sure read error on last working drive of raid1 actually returns...
NeilBrown [Fri, 6 Jan 2006 08:20:27 +0000 (00:20 -0800)] 
[PATCH] md: make sure read error on last working drive of raid1 actually returns failure

We are inadvertently setting the R1BIO_Uptodate bit on read errors when we
decide not to try correcting (because there are no other working devices).
This means that the read error is reported to the client as success.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: allow raid1 to check consistency
NeilBrown [Fri, 6 Jan 2006 08:20:26 +0000 (00:20 -0800)] 
[PATCH] md: allow raid1 to check consistency

Where performing a user-requested 'check' or 'repair', we read all readable
devices, and compare the contents.  We only write to blocks which had read
errors, or blocks with content that differs from the first good device found.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: support check-without-repair of raid10 arrays
NeilBrown [Fri, 6 Jan 2006 08:20:25 +0000 (00:20 -0800)] 
[PATCH] md: support check-without-repair of raid10 arrays

Also keep count on the number of errors found.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: fix up some rdev rcu locking in raid5/6
NeilBrown [Fri, 6 Jan 2006 08:20:24 +0000 (00:20 -0800)] 
[PATCH] md: fix up some rdev rcu locking in raid5/6

There is this "FIXME" comment with a typo in it!!  that been annoying me for
days, so I just had to remove it.

conf->disks[i].rdev should only be accessed if
  - we know we hold a reference or
  - the mddev->reconfig_sem is down or
  - we have a rcu_readlock

handle_stripe was referencing rdev in three places without any of these.  For
the first two, get an rcu_readlock.  For the last, the same access
(md_sync_acct call) is made a little later after the rdev has been claimed
under and rcu_readlock, if R5_Syncio is set.  So just use that access...
However R5_Syncio isn't really needed as the 'syncing' variable contains the
same information.  So use that instead.

Issues, comment, and fix are identical in raid5 and raid6.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: handle errors when read-only
NeilBrown [Fri, 6 Jan 2006 08:20:23 +0000 (00:20 -0800)] 
[PATCH] md: handle errors when read-only

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: better handling for read error in raid1 during resync
NeilBrown [Fri, 6 Jan 2006 08:20:22 +0000 (00:20 -0800)] 
[PATCH] md: better handling for read error in raid1 during resync

Handling of read errors during resync is separate from handling of read errors
during normal IO in raid1.  A previous patch added support for read errors
during normal IO.  This one adds support for read errors during resync or
recovery.

The key differences are that we don't need to freeze the array, because the
normal handling of resync means that this part of the array will be idle
except for resync, and the read/overwrite/re-read is needed in a separate
piece of code.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: tidyup some issues with raid1 resync and prepare for catching read errors
NeilBrown [Fri, 6 Jan 2006 08:20:21 +0000 (00:20 -0800)] 
[PATCH] md: tidyup some issues with raid1 resync and prepare for catching read errors

We are dereferencing ->rdev without an rcu lock!

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: attempt to auto-correct read errors in raid1
NeilBrown [Fri, 6 Jan 2006 08:20:19 +0000 (00:20 -0800)] 
[PATCH] md: attempt to auto-correct read errors in raid1

On a read-error we suspend the array, then synchronously read the block from
other arrays until we find one where we can read it.  Then we try writing the
good data back everywhere and make sure it works.  If any write or subsequent
read fails, only then do we fail the device out of the array.

To be able to suspend the array, we need to also keep track of how many
requests are queued for handling by raid1d.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: improve handing of read errors with raid6
NeilBrown [Fri, 6 Jan 2006 08:20:18 +0000 (00:20 -0800)] 
[PATCH] md: improve handing of read errors with raid6

This is a simple port of match functionality across from raid5.  If we get a
read error, we don't kick the drive straight away, but try to over-write with
good data first.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] md: fix raid6 resync check/repair code
NeilBrown [Fri, 6 Jan 2006 08:20:17 +0000 (00:20 -0800)] 
[PATCH] md: fix raid6 resync check/repair code

raid6 currently does not check the P/Q syndromes when doing a resync, it just
calculates the correct value and writes it.  Doing the check can reduce writes
(often to 0) for a resync, and it is needed to properly implement the

  echo check > sync_action

operation.

This patch implements the appropriate checks and tidies up some related code.

It also allows raid6 user-requested resync to bypass the intent bitmap.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>