Commit | Line | Data |
---|---|---|
1da177e4 LT |
1 | |
2 | Making Filesystems Exportable | |
3 | ============================= | |
4 | ||
e38f9817 CH |
5 | Overview |
6 | -------- | |
7 | ||
8 | All filesystem operations require a dentry (or two) as a starting | |
1da177e4 | 9 | point. Local applications have a reference-counted hold on suitable |
e38f9817 | 10 | dentries via open file descriptors or cwd/root. However remote |
1da177e4 LT |
11 | applications that access a filesystem via a remote filesystem protocol |
12 | such as NFS may not be able to hold such a reference, and so need a | |
13 | different way to refer to a particular dentry. As the alternative | |
14 | form of reference needs to be stable across renames, truncates, and | |
15 | server-reboot (among other things, though these tend to be the most | |
16 | problematic), there is no simple answer like 'filename'. | |
17 | ||
18 | The mechanism discussed here allows each filesystem implementation to | |
e38f9817 | 19 | specify how to generate an opaque (outside of the filesystem) byte |
1da177e4 LT |
20 | string for any dentry, and how to find an appropriate dentry for any |
21 | given opaque byte string. | |
22 | This byte string will be called a "filehandle fragment" as it | |
23 | corresponds to part of an NFS filehandle. | |
24 | ||
25 | A filesystem which supports the mapping between filehandle fragments | |
e38f9817 | 26 | and dentries will be termed "exportable". |
1da177e4 LT |
27 | |
28 | ||
29 | ||
30 | Dcache Issues | |
31 | ------------- | |
32 | ||
33 | The dcache normally contains a proper prefix of any given filesystem | |
34 | tree. This means that if any filesystem object is in the dcache, then | |
35 | all of the ancestors of that filesystem object are also in the dcache. | |
36 | As normal access is by filename this prefix is created naturally and | |
37 | maintained easily (by each object maintaining a reference count on | |
38 | its parent). | |
39 | ||
40 | However when objects are included into the dcache by interpreting a | |
41 | filehandle fragment, there is no automatic creation of a path prefix | |
42 | for the object. This leads to two related but distinct features of | |
43 | the dcache that are not needed for normal filesystem access. | |
44 | ||
45 | 1/ The dcache must sometimes contain objects that are not part of the | |
46 | proper prefix. i.e that are not connected to the root. | |
47 | 2/ The dcache must be prepared for a newly found (via ->lookup) directory | |
48 | to already have a (non-connected) dentry, and must be able to move | |
49 | that dentry into place (based on the parent and name in the | |
50 | ->lookup). This is particularly needed for directories as | |
51 | it is a dcache invariant that directories only have one dentry. | |
52 | ||
53 | To implement these features, the dcache has: | |
54 | ||
55 | a/ A dentry flag DCACHE_DISCONNECTED which is set on | |
56 | any dentry that might not be part of the proper prefix. | |
57 | This is set when anonymous dentries are created, and cleared when a | |
58 | dentry is noticed to be a child of a dentry which is in the proper | |
59 | prefix. | |
60 | ||
61 | b/ A per-superblock list "s_anon" of dentries which are the roots of | |
62 | subtrees that are not in the proper prefix. These dentries, as | |
63 | well as the proper prefix, need to be released at unmount time. As | |
64 | these dentries will not be hashed, they are linked together on the | |
65 | d_hash list_head. | |
66 | ||
67 | c/ Helper routines to allocate anonymous dentries, and to help attach | |
68 | loose directory dentries at lookup time. They are: | |
69 | d_alloc_anon(inode) will return a dentry for the given inode. | |
70 | If the inode already has a dentry, one of those is returned. | |
71 | If it doesn't, a new anonymous (IS_ROOT and | |
72 | DCACHE_DISCONNECTED) dentry is allocated and attached. | |
73 | In the case of a directory, care is taken that only one dentry | |
74 | can ever be attached. | |
75 | d_splice_alias(inode, dentry) will make sure that there is a | |
76 | dentry with the same name and parent as the given dentry, and | |
77 | which refers to the given inode. | |
78 | If the inode is a directory and already has a dentry, then that | |
79 | dentry is d_moved over the given dentry. | |
80 | If the passed dentry gets attached, care is taken that this is | |
81 | mutually exclusive to a d_alloc_anon operation. | |
82 | If the passed dentry is used, NULL is returned, else the used | |
83 | dentry is returned. This corresponds to the calling pattern of | |
84 | ->lookup. | |
85 | ||
86 | ||
87 | Filesystem Issues | |
88 | ----------------- | |
89 | ||
90 | For a filesystem to be exportable it must: | |
91 | ||
92 | 1/ provide the filehandle fragment routines described below. | |
93 | 2/ make sure that d_splice_alias is used rather than d_add | |
94 | when ->lookup finds an inode for a given parent and name. | |
e38f9817 CH |
95 | Typically the ->lookup routine will end with a: |
96 | ||
97 | return d_splice_alias(inode, dentry); | |
1da177e4 LT |
98 | } |
99 | ||
100 | ||
101 | ||
102 | A file system implementation declares that instances of the filesystem | |
103 | are exportable by setting the s_export_op field in the struct | |
104 | super_block. This field must point to a "struct export_operations" | |
e38f9817 CH |
105 | struct which has the following members: |
106 | ||
107 | encode_fh (optional) | |
108 | Takes a dentry and creates a filehandle fragment which can later be used | |
109 | to find or create a dentry for the same object. The default | |
110 | implementation creates a filehandle fragment that encodes a 32bit inode | |
111 | and generation number for the inode encoded, and if necessary the | |
112 | same information for the parent. | |
113 | ||
114 | fh_to_dentry (mandatory) | |
115 | Given a filehandle fragment, this should find the implied object and | |
116 | create a dentry for it (possibly with d_alloc_anon). | |
117 | ||
118 | fh_to_parent (optional but strongly recommended) | |
119 | Given a filehandle fragment, this should find the parent of the | |
120 | implied object and create a dentry for it (possibly with d_alloc_anon). | |
121 | May fail if the filehandle fragment is too small. | |
122 | ||
123 | get_parent (optional but strongly recommended) | |
124 | When given a dentry for a directory, this should return a dentry for | |
125 | the parent. Quite possibly the parent dentry will have been allocated | |
126 | by d_alloc_anon. The default get_parent function just returns an error | |
127 | so any filehandle lookup that requires finding a parent will fail. | |
128 | ->lookup("..") is *not* used as a default as it can leave ".." entries | |
129 | in the dcache which are too messy to work with. | |
130 | ||
131 | get_name (optional) | |
132 | When given a parent dentry and a child dentry, this should find a name | |
133 | in the directory identified by the parent dentry, which leads to the | |
134 | object identified by the child dentry. If no get_name function is | |
135 | supplied, a default implementation is provided which uses vfs_readdir | |
136 | to find potential names, and matches inode numbers to find the correct | |
137 | match. | |
1da177e4 LT |
138 | |
139 | ||
140 | A filehandle fragment consists of an array of 1 or more 4byte words, | |
141 | together with a one byte "type". | |
142 | The decode_fh routine should not depend on the stated size that is | |
143 | passed to it. This size may be larger than the original filehandle | |
144 | generated by encode_fh, in which case it will have been padded with | |
145 | nuls. Rather, the encode_fh routine should choose a "type" which | |
146 | indicates the decode_fh how much of the filehandle is valid, and how | |
147 | it should be interpreted. |