linux-2.6
18 years agoSUNRPC: Clean up soft task error handling
Trond Myklebust [Thu, 31 Aug 2006 19:44:52 +0000 (15:44 -0400)] 
SUNRPC: Clean up soft task error handling

- Ensure that the task aborts the RPC call only when it has actually timed out.
 - Ensure that req->rq_majortimeo is initialised correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors
Trond Myklebust [Wed, 30 Aug 2006 18:32:49 +0000 (14:32 -0400)] 
SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors

In case of any of the above errors occuring, delay for 3 seconds, then
handle as if it were a timeout error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: rpc_delay() should not clobber the rpc_task->tk_status
Trond Myklebust [Thu, 31 Aug 2006 22:24:08 +0000 (18:24 -0400)] 
SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status

Doing so prevents stuff like call_encode() from working correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoFix a referral error Oops
andros@citi.umich.edu [Tue, 29 Aug 2006 16:19:41 +0000 (12:19 -0400)] 
Fix a referral error Oops

Fix an oops when the referral server is not responding.
Check the error return from nfs4_set_client() in nfs4_create_referral_server.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: NFS_ROOT should use the new rpc_create API
Chuck Lever [Sun, 27 Aug 2006 21:23:53 +0000 (17:23 -0400)] 
NFS: NFS_ROOT should use the new rpc_create API

Teach NFS_ROOT to use the new rpc_create API instead of the old two-call
API for creating an RPC transport.

Test plan:
Compile the kernel with the NFS client build-in, and set CONFIG_NFS_ROOT.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix up compiler warnings on 64-bit platforms in client.c
David Howells [Thu, 24 Aug 2006 19:44:16 +0000 (15:44 -0400)] 
NFS: Fix up compiler warnings on 64-bit platforms in client.c

Fix up warnings from compiling on ppc64.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Make rpc_mkpipe() take the parent dentry as an argument
Trond Myklebust [Thu, 24 Aug 2006 05:03:17 +0000 (01:03 -0400)] 
SUNRPC: Make rpc_mkpipe() take the parent dentry as an argument

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSv4: Fix a use-after-free issue with the nfs server.
Trond Myklebust [Thu, 24 Aug 2006 05:03:05 +0000 (01:03 -0400)] 
NFSv4: Fix a use-after-free issue with the nfs server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoAdd a real API for dealing with blk_congestion_wait()
Trond Myklebust [Wed, 23 Aug 2006 00:06:24 +0000 (20:06 -0400)] 
Add a real API for dealing with blk_congestion_wait()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Use cached page as buffer for NFS symlink requests
Chuck Lever [Wed, 23 Aug 2006 00:06:23 +0000 (20:06 -0400)] 
NFS: Use cached page as buffer for NFS symlink requests

Now that we have a copy of the symlink path in the page cache, we can pass
a struct page down to the XDR routines instead of a string buffer.

Test plan:
Connectathon, all NFS versions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: copy symlinks into page cache before sending NFS SYMLINK request
Chuck Lever [Wed, 23 Aug 2006 00:06:23 +0000 (20:06 -0400)] 
NFS: copy symlinks into page cache before sending NFS SYMLINK request

Currently the NFS client does not cache symlinks it creates.  They get
cached only when the NFS client reads them back from the server.

Copy the symlink into the page cache before sending it.

Test plan:
Connectathon, all NFS versions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix double d_drop in nfs_instantiate() error path
Chuck Lever [Wed, 23 Aug 2006 00:06:22 +0000 (20:06 -0400)] 
NFS: Fix double d_drop in nfs_instantiate() error path

If the LOOKUP or GETATTR in nfs_instantiate fail, nfs_instantiate will do a
d_drop before returning.  But some callers already do a d_drop in the case
of an error return.  Make certain we do only one d_drop in all error paths.

This issue was introduced because over time, the symlink proc API diverged
slightly from the create/mkdir/mknod proc API.  To prevent other coding
mistakes of this type, change the symlink proc API to be more like
create/mkdir/mknod and move the nfs_instantiate call into the symlink proc
routines so it is used in exactly the same way for create, mkdir, mknod,
and symlink.

Test plan:
Connectathon, all versions of NFS.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: remove a no-longer-needed error check in nfs_symlink()
Chuck Lever [Wed, 23 Aug 2006 00:06:22 +0000 (20:06 -0400)] 
NFS: remove a no-longer-needed error check in nfs_symlink()

In the early days of NFS, there was no duplicate reply cache on the server.
Thus retransmitted non-idempotent requests often found that the request had
already completed on the server.  To avoid passing an unanticipated return
code to unsuspecting applications, NFS clients would often shunt error
codes that implied the request had been retried but already completed.

Thanks to NFS over TCP, duplicate reply caches on the server, and network
performance and reliability improvements, it is safe to remove such checks.

Test plan:
None.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: export new RPC client functions with _GPL
Chuck Lever [Wed, 23 Aug 2006 00:06:22 +0000 (20:06 -0400)] 
SUNRPC: export new RPC client functions with _GPL

This patch is optional.

It has been suggested that the RPC client internal functions used by upper
layer protocols (such as NFS) be exported via EXPORT_SYMBOL_GPL.  This
patch does that.

Test plan:
Compile kernel with CONFIG_NFS enabled as a module.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Eliminate xprt_create_proto and rpc_create_client
Chuck Lever [Wed, 23 Aug 2006 00:06:21 +0000 (20:06 -0400)] 
SUNRPC: Eliminate xprt_create_proto and rpc_create_client

The two function call API for creating a new RPC client is now obsolete.
Remove it.

Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services.  The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.

Test plan:
Repeated runs of Connectathon locking suite.  Check network trace to ensure
correctness of NLM requests and replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Convert RPC portmapper to use new rpc_create() API
Chuck Lever [Wed, 23 Aug 2006 00:06:21 +0000 (20:06 -0400)] 
SUNRPC: Convert RPC portmapper to use new rpc_create() API

Replace xprt_create_proto/rpc_create_client calls in pmap_clnt.c with new
rpc_create() API.

Test plan:
Repeated runs of Connectathon locking suite.  Check network trace for
proper PMAP calls and replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFSD: Convert NFS server callback logic to use new rpc_create API
Chuck Lever [Wed, 23 Aug 2006 00:06:21 +0000 (20:06 -0400)] 
NFSD: Convert NFS server callback logic to use new rpc_create API

Replace xprt_create_proto/rpc_create_client call in NFS server callback
functions to use new rpc_create() API.

Test plan:
NFSv4 delegation functionality tests.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Convert NFS client to use new rpc_create() API
Chuck Lever [Wed, 23 Aug 2006 00:06:20 +0000 (20:06 -0400)] 
NFS: Convert NFS client to use new rpc_create() API

Convert NFS client mount logic to use rpc_create() instead of the old
xprt_create_proto/rpc_create_client API.

Test plan:
Mount stress tests.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoLOCKD: Convert to use new rpc_create() API
Chuck Lever [Wed, 23 Aug 2006 00:06:20 +0000 (20:06 -0400)] 
LOCKD: Convert to use new rpc_create() API

Replace xprt_create_proto/rpc_create_client with new rpc_create()
interface in the Network Lock Manager.

Note that the semantics of NLM transports is now "hard" instead of "soft"
to provide a better guarantee that lock requests will get to the server.

Test plan:
Repeated runs of Connectathon locking suite.  Check network trace to ensure
NLM requests are working correctly.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: use sockaddr + size when creating remote transport endpoints
Chuck Lever [Wed, 23 Aug 2006 00:06:20 +0000 (20:06 -0400)] 
SUNRPC: use sockaddr + size when creating remote transport endpoints

Prepare for more generic transport endpoint handling needed by transports
that might use different forms of addressing, such as IPv6.

Introduce a single function call to replace the two-call
xprt_create_proto/rpc_create_client API.  Define a new rpc_create_args
structure that allows callers to pass in remote endpoint addresses of
varying length.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Clean-up after previous patches.
Chuck Lever [Wed, 23 Aug 2006 00:06:19 +0000 (20:06 -0400)] 
SUNRPC: Clean-up after previous patches.

Remove some unused macros related to accessing an RPC peer address

Test plan:
Compile kernel with CONFIG_NFS option enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Use "sockaddr_storage" for storing RPC client's remote peer address
Chuck Lever [Wed, 23 Aug 2006 00:06:19 +0000 (20:06 -0400)] 
SUNRPC: Use "sockaddr_storage" for storing RPC client's remote peer address

IPv6 addresses are big (128 bytes).  Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Teach rpc_pipe.c to use new rpc_peeraddr() API
Chuck Lever [Wed, 23 Aug 2006 00:06:19 +0000 (20:06 -0400)] 
SUNRPC: Teach rpc_pipe.c to use new rpc_peeraddr() API

Hide the details of how the RPC client stores remote peer addresses from
the RPC pipefs implementation.

Test plan:
Connectathon with Kerberos 5 authentication.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Create API for displaying remote peer address
Chuck Lever [Wed, 23 Aug 2006 00:06:18 +0000 (20:06 -0400)] 
SUNRPC: Create API for displaying remote peer address

Provide an API for formatting the remote peer address for printing without
exposing its internal structure.  The address could be dynamic, so we
support a function call to get the address rather than reading it straight
out of a structure.

Test-plan:
Destructive testing (unplugging the network temporarily).  Probably need
to rig a server where certain services aren't running, or that returns an
error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: add xprt switch API for printing formatted remote peer addresses
Chuck Lever [Wed, 23 Aug 2006 00:06:18 +0000 (20:06 -0400)] 
SUNRPC: add xprt switch API for printing formatted remote peer addresses

Add a new method to the transport switch API to provide a way to convert
the opaque contents of xprt->addr to a human-readable string.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: remove extraneous header inclusions
Chuck Lever [Wed, 23 Aug 2006 00:06:18 +0000 (20:06 -0400)] 
SUNRPC: remove extraneous header inclusions

include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h.
We can remove xprt.h from source files that already include clnt.h.
Likewise include/linux/sunrpc/timer.h.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Teach the RPC portmapper to use the new rpc_peeraddr() API.
Chuck Lever [Wed, 23 Aug 2006 00:06:17 +0000 (20:06 -0400)] 
SUNRPC: Teach the RPC portmapper to use the new rpc_peeraddr() API.

Hide the details of how the RPC client stores remote peer addresses from
the RPC portmapper.

Test plan:
Destructive testing (unplugging the network temporarily).  Connectathon
with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoLOCKD: Teach lockd to use the new rpc_peeraddr() API
Chuck Lever [Wed, 23 Aug 2006 00:06:17 +0000 (20:06 -0400)] 
LOCKD: Teach lockd to use the new rpc_peeraddr() API

Hide the details of how the RPC client stores remote peer addresses from
the Network Lock Manager.

Test plan:
Destructive testing (unplugging the network temporarily).  Connectathon
with UDP and TCP.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: create API for getting remote peer address
Chuck Lever [Wed, 23 Aug 2006 00:06:17 +0000 (20:06 -0400)] 
SUNRPC: create API for getting remote peer address

Provide an API for retrieving the remote peer address without allowing
direct access to the rpc_xprt struct.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Introduce transport switch callout for pluggable rpcbind
Chuck Lever [Wed, 23 Aug 2006 00:06:16 +0000 (20:06 -0400)] 
SUNRPC: Introduce transport switch callout for pluggable rpcbind

Introduce a clean transport switch API for plugging in different types of
rpcbind mechanisms.  For instance, rpcbind can cleanly replace the
existing portmapper client, or a transport can choose to implement RPC
binding any way it likes.

Test plan:
Destructive testing (unplugging the network temporarily).  Connectathon
with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Support for RPC child tasks no longer needed
Chuck Lever [Wed, 23 Aug 2006 00:06:16 +0000 (20:06 -0400)] 
SUNRPC: Support for RPC child tasks no longer needed

The previous patches removed the last user of RPC child tasks, so we can
remove support for child tasks from net/sunrpc/sched.c now.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Clean-up after recent changes to sunrpc/pmap_clnt.c
Chuck Lever [Wed, 23 Aug 2006 00:06:16 +0000 (20:06 -0400)] 
SUNRPC: Clean-up after recent changes to sunrpc/pmap_clnt.c

Add comments for external functions, use modern function definition style,
and fix up dprintk formatting.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Make RPC portmapper use per-transport storage
Chuck Lever [Wed, 23 Aug 2006 00:06:15 +0000 (20:06 -0400)] 
SUNRPC: Make RPC portmapper use per-transport storage

Move connection and bind state that was maintained in the rpc_clnt
structure to the rpc_xprt structure.  This will allow the creation of
a clean API for plugging in different types of bind mechanisms.

This brings improvements such as the elimination of a single spin lock to
control serialization for all in-kernel RPC binding.  A set of per-xprt
bitops is used to serialize tasks during RPC binding, just like it now
works for making RPC transport connections.

Test-plan:
Destructive testing (unplugging the network temporarily).  Connectathon
with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoSUNRPC: Create a helper to tell whether a transport is bound
Chuck Lever [Wed, 23 Aug 2006 00:06:15 +0000 (20:06 -0400)] 
SUNRPC: Create a helper to tell whether a transport is bound

Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field.  This change is required to support
alternate RPC host address formats (eg IPv6).

Test-plan:
Destructive testing (unplugging the network temporarily).  Repeated runs of
Connectathon locking suite with UDP and TCP.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix nfs_alloc_client()
Trond Myklebust [Wed, 23 Aug 2006 00:06:14 +0000 (20:06 -0400)] 
NFS: Fix nfs_alloc_client()

The scheme to indicate which services have been started up appears to be
seriously broken.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Ensure NFSv2/v3 mounts respect the NFS_MOUNT_SECFLAVOUR flag
Trond Myklebust [Wed, 23 Aug 2006 00:06:14 +0000 (20:06 -0400)] 
NFS: Ensure NFSv2/v3 mounts respect the NFS_MOUNT_SECFLAVOUR flag

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Secure the roots of the NFS subtrees in a shared superblock
David Howells [Sun, 30 Jul 2006 18:58:27 +0000 (14:58 -0400)] 
NFS: Secure the roots of the NFS subtrees in a shared superblock

Invoke security_d_instantiate() on root dentries after allocating them with
dentry_alloc_anon().  Normally dentry_alloc_root() would do that, but we don't
call that as we don't want to assign a name to the root dentry at this point
(we may discover the real name later).

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix error handling
David Howells [Sun, 30 Jul 2006 18:40:56 +0000 (14:40 -0400)] 
NFS: Fix error handling

Fix an error handling problem: nfs_put_client() can be given a NULL pointer if
nfs_free_server() is asked to destroy a partially initialised record.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add server and volume lists to /proc
David Howells [Wed, 23 Aug 2006 00:06:13 +0000 (20:06 -0400)] 
NFS: Add server and volume lists to /proc

Make two new proc files available:

/proc/fs/nfsfs/servers
/proc/fs/nfsfs/volumes

The first lists the servers with which we are currently dealing (struct
nfs_client), and the second lists the volumes we have on those servers (struct
nfs_server).

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Share NFS superblocks per-protocol per-server per-FSID
David Howells [Wed, 23 Aug 2006 00:06:13 +0000 (20:06 -0400)] 
NFS: Share NFS superblocks per-protocol per-server per-FSID

The attached patch makes NFS share superblocks between mounts from the same
server and FSID over the same protocol.

It does this by creating each superblock with a false root and returning the
real root dentry in the vfsmount presented by get_sb(). The root dentry set
starts off as an anonymous dentry if we don't already have the dentry for its
inode, otherwise it simply returns the dentry we already have.

We may thus end up with several trees of dentries in the superblock, and if at
some later point one of anonymous tree roots is discovered by normal filesystem
activity to be located in another tree within the superblock, the anonymous
root is named and materialises attached to the second tree at the appropriate
point.

Why do it this way? Why not pass an extra argument to the mount() syscall to
indicate the subpath and then pathwalk from the server root to the desired
directory? You can't guarantee this will work for two reasons:

 (1) The root and intervening nodes may not be accessible to the client.

     With NFS2 and NFS3, for instance, mountd is called on the server to get
     the filehandle for the tip of a path. mountd won't give us handles for
     anything we don't have permission to access, and so we can't set up NFS
     inodes for such nodes, and so can't easily set up dentries (we'd have to
     have ghost inodes or something).

     With this patch we don't actually create dentries until we get handles
     from the server that we can use to set up their inodes, and we don't
     actually bind them into the tree until we know for sure where they go.

 (2) Inaccessible symbolic links.

     If we're asked to mount two exports from the server, eg:

mount warthog:/warthog/aaa/xxx /mmm
mount warthog:/warthog/bbb/yyy /nnn

     We may not be able to access anything nearer the root than xxx and yyy,
     but we may find out later that /mmm/www/yyy, say, is actually the same
     directory as the one mounted on /nnn. What we might then find out, for
     example, is that /warthog/bbb was actually a symbolic link to
     /warthog/aaa/xxx/www, but we can't actually determine that by talking to
     the server until /warthog is made available by NFS.

     This would lead to having constructed an errneous dentry tree which we
     can't easily fix. We can end up with a dentry marked as a directory when
     it should actually be a symlink, or we could end up with an apparently
     hardlinked directory.

     With this patch we need not make assumptions about the type of a dentry
     for which we can't retrieve information, nor need we assume we know its
     place in the grand scheme of things until we actually see that place.

This patch reduces the possibility of aliasing in the inode and page caches for
inodes that may be accessed by more than one NFS export. It also reduces the
number of superblocks required for NFS where there are many NFS exports being
used from a server (home directory server + autofs for example).

This in turn makes it simpler to do local caching of network filesystems, as it
can then be guaranteed that there won't be links from multiple inodes in
separate superblocks to the same cache file.

Obviously, cache aliasing between different levels of NFS protocol could still
be a problem, but at least that gives us another key to use when indexing the
cache.

This patch makes the following changes:

 (1) The server record construction/destruction has been abstracted out into
     its own set of functions to make things easier to get right.  These have
     been moved into fs/nfs/client.c.

     All the code in fs/nfs/client.c has to do with the management of
     connections to servers, and doesn't touch superblocks in any way; the
     remaining code in fs/nfs/super.c has to do with VFS superblock management.

 (2) The sequence of events undertaken by NFS mount is now reordered:

     (a) A volume representation (struct nfs_server) is allocated.

     (b) A server representation (struct nfs_client) is acquired.  This may be
       allocated or shared, and is keyed on server address, port and NFS
       version.

     (c) If allocated, the client representation is initialised.  The state
       member variable of nfs_client is used to prevent a race during
       initialisation from two mounts.

     (d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find
       the root filehandle for the mount (fs/nfs/getroot.c).  For NFS2/3 we
       are given the root FH in advance.

     (e) The volume FSID is probed for on the root FH.

     (f) The volume representation is initialised from the FSINFO record
       retrieved on the root FH.

     (g) sget() is called to acquire a superblock.  This may be allocated or
       shared, keyed on client pointer and FSID.

     (h) If allocated, the superblock is initialised.

     (i) If the superblock is shared, then the new nfs_server record is
       discarded.

     (j) The root dentry for this mount is looked up from the root FH.

     (k) The root dentry for this mount is assigned to the vfsmount.

 (3) nfs_readdir_lookup() creates dentries for each of the entries readdir()
     returns; this function now attaches disconnected trees from alternate
     roots that happen to be discovered attached to a directory being read (in
     the same way nfs_lookup() is made to do for lookup ops).

     The new d_materialise_unique() function is now used to do this, thus
     permitting the whole thing to be done under one set of locks, and thus
     avoiding any race between mount and lookup operations on the same
     directory.

 (4) The client management code uses a new debug facility: NFSDBG_CLIENT which
     is set by echoing 1024 to /proc/net/sunrpc/nfs_debug.

 (5) Clone mounts are now called xdev mounts.

 (6) Use the dentry passed to the statfs() op as the handle for retrieving fs
     statistics rather than the root dentry of the superblock (which is now a
     dummy).

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Start rpciod in server common management
David Howells [Wed, 23 Aug 2006 00:06:12 +0000 (20:06 -0400)] 
NFS: Start rpciod in server common management

Start rpciod in the server common (nfs_client struct) management code rather
than in the superblock management code.  This means we only need to "start" it
once per server instead of once per superblock.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Eliminate client_sys in favour of cl_rpcclient
David Howells [Wed, 23 Aug 2006 00:06:12 +0000 (20:06 -0400)] 
NFS: Eliminate client_sys in favour of cl_rpcclient

Eliminate nfs_server::client_sys in favour of nfs_client::cl_rpcclient as we
only really need one per server that we're talking to since it doesn't have any
security on it.

The retransmission management variables are also moved to the common struct as
they're required to set up the cl_rpcclient connection.

The NFS2/3 client and client_acl connections are thenceforth derived by cloning
the cl_rpcclient connection and post-applying the authorisation flavour.

The code for setting up the initial common connection has been moved to
client.c as nfs_create_rpc_client().  All the NFS program definition tables are
also moved there as that's where they're now required rather than super.c.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Move rpc_ops from nfs_server to nfs_client
David Howells [Wed, 23 Aug 2006 00:06:12 +0000 (20:06 -0400)] 
NFS: Move rpc_ops from nfs_server to nfs_client

Move the rpc_ops from the nfs_server struct to the nfs_client struct as they're
common to all server records of a particular NFS protocol version.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Make better use of inode* dereferencing macros
David Howells [Wed, 23 Aug 2006 00:06:11 +0000 (20:06 -0400)] 
NFS: Make better use of inode* dereferencing macros

Make better use of inode* dereferencing macros to hide dereferencing chains
(including NFS_PROTO and NFS_CLIENT).

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Maintain a common server record for NFS2/3 as well as for NFS4
David Howells [Wed, 23 Aug 2006 00:06:11 +0000 (20:06 -0400)] 
NFS: Maintain a common server record for NFS2/3 as well as for NFS4

Maintain a common server record for NFS2/3 as well as for NFS4 so that common
stuff can be moved there from struct nfs_server.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add extra const qualifiers
David Howells [Wed, 23 Aug 2006 00:06:11 +0000 (20:06 -0400)] 
NFS: Add extra const qualifiers

Add some extra const qualifiers into NFS.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Use the dentry superblock directly in nfs_statfs()
David Howells [Wed, 23 Aug 2006 00:06:10 +0000 (20:06 -0400)] 
NFS: Use the dentry superblock directly in nfs_statfs()

Use the nominated dentry's superblock directly in the NFS statfs() op to get a
file handle, rather than using s_root (which will become a dummy dentry in a
future patch).

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Generalise the nfs_client structure
David Howells [Wed, 23 Aug 2006 00:06:10 +0000 (20:06 -0400)] 
NFS: Generalise the nfs_client structure

Generalise the nfs_client structure by:

 (1) Moving nfs_client to a more general place (nfs_fs_sb.h).

 (2) Renaming its maintenance routines to be non-NFS4 specific.

 (3) Move those maintenance routines to a new non-NFS4 specific file (client.c)
     and move the declarations to internal.h.

 (4) Make nfs_find/get_client() take a full sockaddr_in to include the port
     number (will be required for NFS2/3).

 (5) Make nfs_find/get_client() take the NFS protocol version (again will be
     required to differentiate NFS2, 3 & 4 client records).

Also:

 (6) Make nfs_client construction proceed akin to inodes, marking them as under
     construction and providing a function to indicate completion.

 (7) Make nfs_get_client() wait interruptibly if it finds a client that it can
     share, but that client is currently being constructed.

 (8) Make nfs4_create_client() use (6) and (7) instead of locking cl_sem.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add a server capabilities NFS RPC op
David Howells [Wed, 23 Aug 2006 00:06:10 +0000 (20:06 -0400)] 
NFS: Add a server capabilities NFS RPC op

Add a set_capabilities NFS RPC op so that the server capabilities can be set.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add a lookupfh NFS RPC op
David Howells [Wed, 23 Aug 2006 00:06:09 +0000 (20:06 -0400)] 
NFS: Add a lookupfh NFS RPC op

Add a lookup filehandle NFS RPC op so that a file handle can be looked up
without requiring dentries and inodes and other VFS stuff when doing an NFS4
pathwalk during mounting.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Return an error when starting the idmapping pipe
David Howells [Wed, 23 Aug 2006 00:06:09 +0000 (20:06 -0400)] 
NFS: Return an error when starting the idmapping pipe

Return an error when starting the idmapping pipe so that we can detect it
failing.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Rename nfs_server::nfs4_state
David Howells [Wed, 23 Aug 2006 00:06:09 +0000 (20:06 -0400)] 
NFS: Rename nfs_server::nfs4_state

Rename nfs_server::nfs4_state to nfs_client as it will be used to represent the
client state for NFS2 and NFS3 also.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Rename struct nfs4_client to struct nfs_client
David Howells [Wed, 23 Aug 2006 00:06:08 +0000 (20:06 -0400)] 
NFS: Rename struct nfs4_client to struct nfs_client

Rename struct nfs4_client to struct nfs_client so that it can become the basis
for a general client record for NFS2 and NFS3 in addition to NFS4.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix NFS4 callback up/down prototypes
David Howells [Wed, 23 Aug 2006 00:06:08 +0000 (20:06 -0400)] 
NFS: Fix NFS4 callback up/down prototypes

Make the nfs_callback_up()/down() prototypes just do nothing if NFS4 is not
enabled.  Also make the down function void type since we can't really do
anything if it fails.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Disambiguate nfs_stat_to_errno()
David Howells [Wed, 23 Aug 2006 00:06:08 +0000 (20:06 -0400)] 
NFS: Disambiguate nfs_stat_to_errno()

Rename the NFS4 version of nfs_stat_to_errno() so that it doesn't conflict with
the common one used by NFS2 and NFS3.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Fix up split of fs/nfs/inode.c
David Howells [Wed, 23 Aug 2006 00:06:07 +0000 (20:06 -0400)] 
NFS: Fix up split of fs/nfs/inode.c

Fix ups for the splitting of the superblock stuff out of fs/nfs/inode.c,
including:

 (*) Move the callback tcpport module param into callback.c.

 (*) Move the idmap cache timeout module param into idmap.c.

 (*) Changes to internal.h:

     (*) namespace-nfs4.c was renamed to nfs4namespace.c.

     (*) nfs_stat_to_errno() is in nfs2xdr.c, not nfs4xdr.c.

     (*) nfs4xdr.c is contingent on CONFIG_NFS_V4.

     (*) nfs4_path() is only uses if CONFIG_NFS_V4 is set.

Plus also:

 (*) The sec_flavours[] table should really be const.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add dentry materialisation op
David Howells [Wed, 23 Aug 2006 00:06:07 +0000 (20:06 -0400)] 
NFS: Add dentry materialisation op

The attached patch adds a new directory cache management function that prepares
a disconnected anonymous function to be connected into the dentry tree. The
anonymous dentry is transferred the name and parentage from another dentry.

The following changes were made in [try #2]:

 (*) d_materialise_dentry() now switches the parentage of the two nodes around
     correctly when one or other of them is self-referential.

The following changes were made in [try #7]:

 (*) d_instantiate_unique() has had the interior part split out as function
     __d_instantiate_unique(). Callers of this latter function must be holding
     the appropriate locks.

 (*) _d_rehash() has been added as a wrapper around __d_rehash() to call it
     with the most obvious hash list (the one from the name). d_rehash() now
     calls _d_rehash().

 (*) d_materialise_dentry() is now __d_materialise_dentry() and is static.

 (*) d_materialise_unique() added to perform the combination of d_find_alias(),
     d_materialise_dentry() and d_add_unique() that the NFS client was doing
     twice, all within a single dcache_lock critical section. This reduces the
     number of times two different spinlocks were being accessed.

The following further changes were made:

 (*) Add the dentries onto their parents d_subdirs lists.

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add an ACCESS cache memory shrinker
Trond Myklebust [Tue, 25 Jul 2006 15:28:19 +0000 (11:28 -0400)] 
NFS: Add an ACCESS cache memory shrinker

A pinned inode may in theory end up filling memory with cached ACCESS
calls. This patch ensures that the VM may shrink away the cache in these
particular cases.
The shrinker works by iterating through the list of inodes on the global
nfs_access_lru_list, and removing the least recently used access
cache entry until it is done (or until the entire cache is empty).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add a global LRU list for the ACCESS cache
Trond Myklebust [Tue, 25 Jul 2006 15:28:18 +0000 (11:28 -0400)] 
NFS: Add a global LRU list for the ACCESS cache

...in order to allow the addition of a memory shrinker.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoNFS: Add a new ACCESS rpc call cache to the linux nfs client
Trond Myklebust [Tue, 25 Jul 2006 15:28:18 +0000 (11:28 -0400)] 
NFS: Add a new ACCESS rpc call cache to the linux nfs client

The current access cache only allows one entry at a time to be cached for each
inode. Add a per-inode red-black tree in order to allow more than one to
be cached at a time.

Should significantly cut down the time spent in path traversal for shared
directories such as ${PATH}, /usr/share, etc.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
18 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
Linus Torvalds [Sat, 23 Sep 2006 00:51:59 +0000 (17:51 -0700)] 
Merge git://git./linux/kernel/git/sfrench/cifs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] statfs for cifs unix extensions no longer experimental
  [CIFS] New POSIX locking code not setting rc properly to zero on successful
  [CIFS] Support deep tree mounts (e.g. mounts to //server/share/path)

18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart
Linus Torvalds [Sat, 23 Sep 2006 00:50:50 +0000 (17:50 -0700)] 
Merge /pub/scm/linux/kernel/git/davej/agpgart

* master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
  [AGPGART] Rework AGPv3 modesetting fallback.
  [AGPGART] Add suspend callback for i965
  [AGPGART] Fix number of aperture sizes in 830 gart structs.
  [AGPGART] Intel 965 Express support.
  [AGPGART] agp.h: constify struct agp_bridge_data::version
  [AGPGART] const'ify VIA AGP PCI table.
  [AGPGART] CONFIG_PM=n slim: drivers/char/agp/intel-agp.c
  [AGPGART] CONFIG_PM=n slim: drivers/char/agp/efficeon-agp.c
  [AGPGART] Const'ify the agpgart driver version.
  [AGPGART] remove private page protection map

18 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
Linus Torvalds [Sat, 23 Sep 2006 00:50:22 +0000 (17:50 -0700)] 
Merge /pub/scm/linux/kernel/git/davej/cpufreq

* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] sw_any_bug_dmi_table can be used on resume, so it isn't initdata
  [CPUFREQ] Fix some more CPU hotplug locking.
  [CPUFREQ] Workaround for BIOS bug in software coordination of frequency
  [CPUFREQ] Longhaul - Add voltage scaling to driver
  [CPUFREQ] Fix sparse warning in ondemand
  [CPUFREQ] make drivers/cpufreq/cpufreq_ondemand.c:powersave_bias_target() static
  [CPUFREQ] Longhaul - Add ignore_latency option
  [CPUFREQ] Longhaul - Disable arbiter
  [CPUFREQ][2/2] ondemand: updated add powersave_bias tunable
  [CPUFREQ][1/2] ondemand: updated tune for hardware coordination
  [CPUFREQ] Fix typo.

18 years ago[PATCH] fallout from hcd-core patch
Al Viro [Sat, 23 Sep 2006 00:29:34 +0000 (01:29 +0100)] 
[PATCH] fallout from hcd-core patch

missing le16_to_cpu()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix the survivors of fbcon_vbl_handler() renaming
Al Viro [Sat, 23 Sep 2006 00:27:30 +0000 (01:27 +0100)] 
[PATCH] fix the survivors of fbcon_vbl_handler() renaming

In

|Author: James Simmons <jsimmons@kozmo.(none)>
|Date:   Thu Mar 13 22:37:08 2003 -0800
|
|    [FBCON] Cursor handling clean up. I nuked several static variables.

we have

-static void fbcon_vbl_handler(int irq, void *dummy, struct pt_regs *fp)
+static void fb_vbl_handler(int irq, void *dev_id, struct pt_regs *fp)

and 3 years later a couple of instances missed back then still remains
there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] sun4: fix sbus_setup_iommu()
Al Viro [Sat, 23 Sep 2006 00:26:02 +0000 (01:26 +0100)] 
[PATCH] sun4: fix sbus_setup_iommu()

iommu_init() and iounit_init() are never called for sun4, but that's not
enough - these calls should be ifdefed out since the functions in question
simply do not exist for CONFIG_SUN4 kernel.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] asm/backlight.h is ppc-only
Al Viro [Sat, 23 Sep 2006 00:25:18 +0000 (01:25 +0100)] 
[PATCH] asm/backlight.h is ppc-only

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] sanitize frv archclean
Al Viro [Sat, 23 Sep 2006 00:22:46 +0000 (01:22 +0100)] 
[PATCH] sanitize frv archclean

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] aoa is pmac-only
Al Viro [Sat, 23 Sep 2006 00:24:25 +0000 (01:24 +0100)] 
[PATCH] aoa is pmac-only

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] memcpy_fromio() missing in istallion
Al Viro [Sat, 23 Sep 2006 00:20:31 +0000 (01:20 +0100)] 
[PATCH] memcpy_fromio() missing in istallion

memcpy() from iomem is a bad thing...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix ancient breakage in ebus_init()
Al Viro [Sat, 23 Sep 2006 00:18:41 +0000 (01:18 +0100)] 
[PATCH] fix ancient breakage in ebus_init()

Back when pci_dev had base_address[], loop of form
base = &...->base_address[0];
for (.....) {
...
*base++ = addr;
}
was fine, but when that array got spread in ->resource[...].start
replacing the initialization with
base = &...->resource[0].start;
was not a sufficient modification.  IOW this code got broken for cases
when there had been more than one resource to fill.  All way back in
2.3.41-pre3...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[PATCH] fix missing ifdefs in syscall classes hookup for generic targets
Al Viro [Fri, 22 Sep 2006 23:10:18 +0000 (00:10 +0100)] 
[PATCH] fix missing ifdefs in syscall classes hookup for generic targets

several targets have no ....at() family and m32r calls its only chown variant
chown32(), with __NR_chown being undefined.  creat(2) is also absent in some
targets.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18 years ago[CPUFREQ] sw_any_bug_dmi_table can be used on resume, so it isn't initdata
Jeremy Fitzhardinge [Wed, 13 Sep 2006 01:55:53 +0000 (18:55 -0700)] 
[CPUFREQ] sw_any_bug_dmi_table can be used on resume, so it isn't initdata

sw_any_bug_dmi_table can be used on resume, so it isn't initdata.

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
18 years ago[CPUFREQ] Fix some more CPU hotplug locking.
Dave Jones [Fri, 22 Sep 2006 23:15:23 +0000 (19:15 -0400)] 
[CPUFREQ] Fix some more CPU hotplug locking.

Lukewarm IQ detected in hotplug locking
BUG: warning at kernel/cpu.c:38/lock_cpu_hotplug()
[<b0134a42>] lock_cpu_hotplug+0x42/0x65
[<b02f8af1>] cpufreq_update_policy+0x25/0xad
[<b0358756>] kprobe_flush_task+0x18/0x40
[<b0355aab>] schedule+0x63f/0x68b
[<b01377c2>] __link_module+0x0/0x1f
[<b0119e7d>] __cond_resched+0x16/0x34
[<b03560bf>] cond_resched+0x26/0x31
[<b0355b0e>] wait_for_completion+0x17/0xb1
[<f965c547>] cpufreq_stat_cpu_callback+0x13/0x20 [cpufreq_stats]
[<f9670074>] cpufreq_stats_init+0x74/0x8b [cpufreq_stats]
[<b0137872>] sys_init_module+0x91/0x174
[<b0102c81>] sysenter_past_esp+0x56/0x79

As there are other places that call cpufreq_update_policy without
the hotplug lock, it seems better to keep the hotplug locking
at the lower level for the time being until this is revamped.

Signed-off-by: Dave Jones <davej@redhat.com>
18 years agoMerge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband
Linus Torvalds [Fri, 22 Sep 2006 22:47:06 +0000 (15:47 -0700)] 
Merge branch 'for-linus' of /linux/kernel/git/roland/infiniband

* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (65 commits)
  IB: Fix typo in kerneldoc for ib_set_client_data()
  IPoIB: Add some likely/unlikely annotations in hot path
  IPoIB: Remove unused include of vmalloc.h
  IPoIB: Rejoin all multicast groups after a port event
  IPoIB: Create MCGs with all attributes required by RFC
  IB/sa: fix ib_sa_selector names
  IB/iser: INFINIBAND_ISER depends on INET
  IB/mthca: Simplify calls to mthca_cq_clean()
  RDMA/cma: Document rdma_accept() error handling
  IB/mthca: Recover from catastrophic errors
  RDMA/cma: Document rdma_destroy_id() function
  IB/cm: Do not track remote QPN in timewait state
  IB/sa: Require SA registration
  IPoIB: Refactor completion handling
  IB/iser: Do not use FMR for a single dma entry sg
  IB/iser: fix some debug prints
  IB/iser: make FMR "page size" be 4K and not PAGE_SIZE
  IB/iser: Limit the max size of a scsi command
  IB/iser: fix a check of SG alignment for RDMA
  RDMA/cma: Protect against adding device during destruction
  ...

18 years agoMerge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Fri, 22 Sep 2006 22:37:31 +0000 (15:37 -0700)] 
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6

* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
  [netdrvr] mv643xx_eth: fix obvious typo, which caused build breakage
  [netdrvr] lp486e: fix typo

18 years agoIB: Fix typo in kerneldoc for ib_set_client_data()
Krishna Kumar [Fri, 22 Sep 2006 22:22:58 +0000 (15:22 -0700)] 
IB: Fix typo in kerneldoc for ib_set_client_data()

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIPoIB: Add some likely/unlikely annotations in hot path
Eli Cohen [Fri, 22 Sep 2006 22:22:58 +0000 (15:22 -0700)] 
IPoIB: Add some likely/unlikely annotations in hot path

Signed-off-by: Eli Cohen <eli@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIPoIB: Remove unused include of vmalloc.h
Dotan Barak [Thu, 21 Sep 2006 15:26:43 +0000 (18:26 +0300)] 
IPoIB: Remove unused include of vmalloc.h

IPoIB doesn't use anything from <linux/vmalloc.h>, so don't include it.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIPoIB: Rejoin all multicast groups after a port event
Eli Cohen [Fri, 22 Sep 2006 22:22:56 +0000 (15:22 -0700)] 
IPoIB: Rejoin all multicast groups after a port event

When ipoib_ib_dev_flush() is called because of a port event, the
driver needs to rejoin all multicast groups, since the flush will call
ipoib_mcast_dev_flush() (via ipoib_ib_dev_down()).  Otherwise no
(non-broadcast) multicast groups will be rejoined until the networking
core calls ->set_multicast_list again, and so multicast reception will
be broken for potentially a long time.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIPoIB: Create MCGs with all attributes required by RFC
Roland Dreier [Fri, 22 Sep 2006 22:22:56 +0000 (15:22 -0700)] 
IPoIB: Create MCGs with all attributes required by RFC

RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:

  If the IB multicast group does not already exist, one must be
  created first with the IPoIB link MTU.  The MGID MUST use the same
  P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
  broadcast-GID.  The rest of attributes SHOULD follow the values used
  in the broadcast-GID as well.

However, the current IPoIB driver is only setting the attributes
required by the InfiniBand spec to create a multicast group, so in
particular the MTU and HopLimit are not being set.  Add these
attributes when creating MCGs, and also set the Rate attribute, since
IPoIB pays attention to that attribute as well.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/sa: fix ib_sa_selector names
Michael S. Tsirkin [Mon, 18 Sep 2006 19:17:08 +0000 (22:17 +0300)] 
IB/sa: fix ib_sa_selector names

Relevant SA queries are actually "greater than" / "less than", not
"greater than or equal" / "less than or equal" as the names imply.
(See IB spec 1.2 Vol 1, 15.2.5.16 PATHRECORD/Table 205 PathRecord)

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/iser: INFINIBAND_ISER depends on INET
Roland Dreier [Fri, 22 Sep 2006 22:22:55 +0000 (15:22 -0700)] 
IB/iser: INFINIBAND_ISER depends on INET

iSER won't build without CONFIG_INET enabled, so make Kconfig reflect that.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/mthca: Simplify calls to mthca_cq_clean()
Roland Dreier [Fri, 22 Sep 2006 22:22:55 +0000 (15:22 -0700)] 
IB/mthca: Simplify calls to mthca_cq_clean()

If a QP has separate send and receive CQs, then the send CQ will never
have receive completions from that QP in it.  So when cleaning the
send CQ, there's no need to pass in an SRQ pointer, even if the QP is
attached to an SRQ.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/cma: Document rdma_accept() error handling
Or Gerlitz [Fri, 22 Sep 2006 22:22:54 +0000 (15:22 -0700)] 
RDMA/cma: Document rdma_accept() error handling

Document the reject sending and modifying QP to error done in rdma_accept().

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/mthca: Recover from catastrophic errors
Jack Morgenstein [Tue, 15 Aug 2006 18:11:18 +0000 (21:11 +0300)] 
IB/mthca: Recover from catastrophic errors

Trigger device remove and then add when a catastrophic error is
detected in hardware.  This, in turn, will cause a device reset, which
we hope will recover from the catastrophic condition.

Since this might interefere with debugging the root cause, add a
module option to suppress this behaviour.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/cma: Document rdma_destroy_id() function
Or Gerlitz [Tue, 12 Sep 2006 16:03:33 +0000 (09:03 -0700)] 
RDMA/cma: Document rdma_destroy_id() function

Clarify that rdma_destroy_id cancels outstanding asynchronous operations on the
Associated id.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/cm: Do not track remote QPN in timewait state
Michael S. Tsirkin [Mon, 28 Aug 2006 13:32:50 +0000 (16:32 +0300)] 
IB/cm: Do not track remote QPN in timewait state

Do not track remote QPN in TimeWait state, since QP is not connected.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/sa: Require SA registration
Michael S. Tsirkin [Mon, 21 Aug 2006 23:40:12 +0000 (16:40 -0700)] 
IB/sa: Require SA registration

Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running.  Update all in-tree users for the new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIPoIB: Refactor completion handling
Roland Dreier [Fri, 22 Sep 2006 22:22:52 +0000 (15:22 -0700)] 
IPoIB: Refactor completion handling

Split up ipoib_ib_handle_wc() into ipoib_ib_handle_rx_wc() and
ipoib_ib_handle_tx_wc() to make the code easier to read.  This will
also help implement NAPI in the future.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/iser: Do not use FMR for a single dma entry sg
Erez Zilber [Mon, 11 Sep 2006 09:26:33 +0000 (12:26 +0300)] 
IB/iser: Do not use FMR for a single dma entry sg

Fast Memory Registration (fmr) is used to register for rdma an sg whose
elements are not linearly sequential after dma mapping.

The IB verbs layer provides an "all dma memory MR (memory region)" which
can be used for RDMA-ing a dma linearly sequential buffer.

Change the code to use the dma mr instead of doing fmr when dma mapping
produces a single dma entry sg.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/iser: fix some debug prints
Erez Zilber [Mon, 11 Sep 2006 09:24:00 +0000 (12:24 +0300)] 
IB/iser: fix some debug prints

fix and add some debug prints related to iser
handling of memory for rdma.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/iser: make FMR "page size" be 4K and not PAGE_SIZE
Erez Zilber [Mon, 11 Sep 2006 09:22:30 +0000 (12:22 +0300)] 
IB/iser: make FMR "page size" be 4K and not PAGE_SIZE

As iser is able to use at most one rdma operation for the
execution of a scsi command, and registration of the sg
associated with scsi command has its restrictions, the code
checks if an sg is "aligned for rdma".

Alignment for rdma is measured in "fmr page" units whose
possible resolutions are different between HCAs and can be
smaller, equal or bigger to the system page size.

When the system page size is bigger than 4KB (eg the default
with ia64 kernels) there a bigger chance that an sg would be
aligned for rdma if the fmr page size is 4KB.

Change the code to create FMR whose pages are of size 4KB
and to take that into account when processing the sg.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/iser: Limit the max size of a scsi command
Erez Zilber [Mon, 11 Sep 2006 09:20:54 +0000 (12:20 +0300)] 
IB/iser: Limit the max size of a scsi command

Currently, the data length of a command coming down from scsi-ml
is limited only by the size of its sg list (sg_tablesize). The
max data length may be different for different page size values.
By setting max_sectors, we limit the data length to
max_sectors*512 bytes.

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB/iser: fix a check of SG alignment for RDMA
Erez Zilber [Mon, 11 Sep 2006 09:19:17 +0000 (12:19 +0300)] 
IB/iser: fix a check of SG alignment for RDMA

dma mapping may include a "compaction" of the sg associated with scsi command.
Hence, the size of the maximal prefix of the SG which is aligned for rdma must be
compared against the length of the dma mapped sg (mem->dma_nents) and not against
the size of it before it was mapped (mem->size).

Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/cma: Protect against adding device during destruction
Sean Hefty [Fri, 1 Sep 2006 22:33:55 +0000 (15:33 -0700)] 
RDMA/cma: Protect against adding device during destruction

Closes a window where address resolution can attach an rdma_cm_id to a
device during destruction of the rdma_cm_id.  This can result in the
rdma_cm_id remaining in the device list after its memory has been
freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA/amso1100: Add driver for Ammasso 1100 RNIC
Tom Tucker [Fri, 22 Sep 2006 22:22:48 +0000 (15:22 -0700)] 
RDMA/amso1100: Add driver for Ammasso 1100 RNIC

Add a driver for the Ammasso 1100 gigabit ethernet RNIC.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA: iWARP Core Changes.
Tom Tucker [Thu, 3 Aug 2006 21:02:42 +0000 (16:02 -0500)] 
RDMA: iWARP Core Changes.

Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
 - Hook iWARP CM into the build system and use it in rdma_cm.
 - Convert enum ib_node_type to enum rdma_node_type, which includes
   the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoRDMA: iWARP Connection Manager.
Tom Tucker [Thu, 3 Aug 2006 21:02:40 +0000 (16:02 -0500)] 
RDMA: iWARP Connection Manager.

Add an iWARP Connection Manager (CM), which abstracts connection
management for iWARP devices (RNICs).  It is a logical instance of the
xx_cm where xx is the transport type (ib or iw).  The symbols exported
are used by the transport independent rdma_cm module, and are
available also for transport dependent ULPs.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
18 years agoIB: Whitespace fixes
Roland Dreier [Fri, 22 Sep 2006 22:22:46 +0000 (15:22 -0700)] 
IB: Whitespace fixes

Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all.  Also fix a few other whitespace
bogosities.

Signed-off-by: Roland Dreier <rolandd@cisco.com>