Commit | Line | Data |
---|---|---|
f3e97da3 IM |
1 | Runtime locking correctness validator |
2 | ===================================== | |
3 | ||
4 | started by Ingo Molnar <mingo@redhat.com> | |
5 | additions by Arjan van de Ven <arjan@linux.intel.com> | |
6 | ||
7 | Lock-class | |
8 | ---------- | |
9 | ||
10 | The basic object the validator operates upon is a 'class' of locks. | |
11 | ||
12 | A class of locks is a group of locks that are logically the same with | |
13 | respect to locking rules, even if the locks may have multiple (possibly | |
14 | tens of thousands of) instantiations. For example a lock in the inode | |
15 | struct is one class, while each inode has its own instantiation of that | |
16 | lock class. | |
17 | ||
18 | The validator tracks the 'state' of lock-classes, and it tracks | |
19 | dependencies between different lock-classes. The validator maintains a | |
20 | rolling proof that the state and the dependencies are correct. | |
21 | ||
22 | Unlike an lock instantiation, the lock-class itself never goes away: when | |
23 | a lock-class is used for the first time after bootup it gets registered, | |
24 | and all subsequent uses of that lock-class will be attached to this | |
25 | lock-class. | |
26 | ||
27 | State | |
28 | ----- | |
29 | ||
30 | The validator tracks lock-class usage history into 5 separate state bits: | |
31 | ||
32 | - 'ever held in hardirq context' [ == hardirq-safe ] | |
33 | - 'ever held in softirq context' [ == softirq-safe ] | |
34 | - 'ever held with hardirqs enabled' [ == hardirq-unsafe ] | |
35 | - 'ever held with softirqs and hardirqs enabled' [ == softirq-unsafe ] | |
36 | ||
37 | - 'ever used' [ == !unused ] | |
38 | ||
fd7bcea3 JC |
39 | When locking rules are violated, these 4 state bits are presented in the |
40 | locking error messages, inside curlies. A contrived example: | |
41 | ||
42 | modprobe/2287 is trying to acquire lock: | |
43 | (&sio_locks[i].lock){--..}, at: [<c02867fd>] mutex_lock+0x21/0x24 | |
44 | ||
45 | but task is already holding lock: | |
46 | (&sio_locks[i].lock){--..}, at: [<c02867fd>] mutex_lock+0x21/0x24 | |
47 | ||
48 | ||
49 | The bit position indicates hardirq, softirq, hardirq-read, | |
50 | softirq-read respectively, and the character displayed in each | |
51 | indicates: | |
52 | ||
5fcce743 | 53 | '.' acquired while irqs disabled |
fd7bcea3 | 54 | '+' acquired in irq context |
5fcce743 AK |
55 | '-' acquired with irqs enabled |
56 | '?' read acquired in irq context with irqs enabled. | |
fd7bcea3 JC |
57 | |
58 | Unused mutexes cannot be part of the cause of an error. | |
59 | ||
60 | ||
f3e97da3 IM |
61 | Single-lock state rules: |
62 | ------------------------ | |
63 | ||
64 | A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The | |
65 | following states are exclusive, and only one of them is allowed to be | |
66 | set for any lock-class: | |
67 | ||
68 | <hardirq-safe> and <hardirq-unsafe> | |
69 | <softirq-safe> and <softirq-unsafe> | |
70 | ||
71 | The validator detects and reports lock usage that violate these | |
72 | single-lock state rules. | |
73 | ||
74 | Multi-lock dependency rules: | |
75 | ---------------------------- | |
76 | ||
77 | The same lock-class must not be acquired twice, because this could lead | |
78 | to lock recursion deadlocks. | |
79 | ||
80 | Furthermore, two locks may not be taken in different order: | |
81 | ||
82 | <L1> -> <L2> | |
83 | <L2> -> <L1> | |
84 | ||
85 | because this could lead to lock inversion deadlocks. (The validator | |
86 | finds such dependencies in arbitrary complexity, i.e. there can be any | |
87 | other locking sequence between the acquire-lock operations, the | |
88 | validator will still track all dependencies between locks.) | |
89 | ||
90 | Furthermore, the following usage based lock dependencies are not allowed | |
91 | between any two lock-classes: | |
92 | ||
93 | <hardirq-safe> -> <hardirq-unsafe> | |
94 | <softirq-safe> -> <softirq-unsafe> | |
95 | ||
96 | The first rule comes from the fact the a hardirq-safe lock could be | |
97 | taken by a hardirq context, interrupting a hardirq-unsafe lock - and | |
98 | thus could result in a lock inversion deadlock. Likewise, a softirq-safe | |
99 | lock could be taken by an softirq context, interrupting a softirq-unsafe | |
100 | lock. | |
101 | ||
102 | The above rules are enforced for any locking sequence that occurs in the | |
103 | kernel: when acquiring a new lock, the validator checks whether there is | |
104 | any rule violation between the new lock and any of the held locks. | |
105 | ||
106 | When a lock-class changes its state, the following aspects of the above | |
107 | dependency rules are enforced: | |
108 | ||
109 | - if a new hardirq-safe lock is discovered, we check whether it | |
110 | took any hardirq-unsafe lock in the past. | |
111 | ||
112 | - if a new softirq-safe lock is discovered, we check whether it took | |
113 | any softirq-unsafe lock in the past. | |
114 | ||
115 | - if a new hardirq-unsafe lock is discovered, we check whether any | |
116 | hardirq-safe lock took it in the past. | |
117 | ||
118 | - if a new softirq-unsafe lock is discovered, we check whether any | |
119 | softirq-safe lock took it in the past. | |
120 | ||
121 | (Again, we do these checks too on the basis that an interrupt context | |
122 | could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which | |
123 | could lead to a lock inversion deadlock - even if that lock scenario did | |
124 | not trigger in practice yet.) | |
125 | ||
126 | Exception: Nested data dependencies leading to nested locking | |
127 | ------------------------------------------------------------- | |
128 | ||
129 | There are a few cases where the Linux kernel acquires more than one | |
130 | instance of the same lock-class. Such cases typically happen when there | |
131 | is some sort of hierarchy within objects of the same type. In these | |
132 | cases there is an inherent "natural" ordering between the two objects | |
133 | (defined by the properties of the hierarchy), and the kernel grabs the | |
134 | locks in this fixed order on each of the objects. | |
135 | ||
2fe0ae78 | 136 | An example of such an object hierarchy that results in "nested locking" |
f3e97da3 IM |
137 | is that of a "whole disk" block-dev object and a "partition" block-dev |
138 | object; the partition is "part of" the whole device and as long as one | |
139 | always takes the whole disk lock as a higher lock than the partition | |
140 | lock, the lock ordering is fully correct. The validator does not | |
141 | automatically detect this natural ordering, as the locking rule behind | |
142 | the ordering is not static. | |
143 | ||
144 | In order to teach the validator about this correct usage model, new | |
145 | versions of the various locking primitives were added that allow you to | |
146 | specify a "nesting level". An example call, for the block device mutex, | |
147 | looks like this: | |
148 | ||
149 | enum bdev_bd_mutex_lock_class | |
150 | { | |
151 | BD_MUTEX_NORMAL, | |
152 | BD_MUTEX_WHOLE, | |
153 | BD_MUTEX_PARTITION | |
154 | }; | |
155 | ||
156 | mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION); | |
157 | ||
158 | In this case the locking is done on a bdev object that is known to be a | |
159 | partition. | |
160 | ||
a2ffd275 | 161 | The validator treats a lock that is taken in such a nested fashion as a |
f3e97da3 IM |
162 | separate (sub)class for the purposes of validation. |
163 | ||
164 | Note: When changing code to use the _nested() primitives, be careful and | |
2fe0ae78 | 165 | check really thoroughly that the hierarchy is correctly mapped; otherwise |
f3e97da3 IM |
166 | you can get false positives or false negatives. |
167 | ||
168 | Proof of 100% correctness: | |
169 | -------------------------- | |
170 | ||
171 | The validator achieves perfect, mathematical 'closure' (proof of locking | |
172 | correctness) in the sense that for every simple, standalone single-task | |
992caacf | 173 | locking sequence that occurred at least once during the lifetime of the |
f3e97da3 IM |
174 | kernel, the validator proves it with a 100% certainty that no |
175 | combination and timing of these locking sequences can cause any class of | |
176 | lock related deadlock. [*] | |
177 | ||
178 | I.e. complex multi-CPU and multi-task locking scenarios do not have to | |
179 | occur in practice to prove a deadlock: only the simple 'component' | |
180 | locking chains have to occur at least once (anytime, in any | |
181 | task/context) for the validator to be able to prove correctness. (For | |
182 | example, complex deadlocks that would normally need more than 3 CPUs and | |
183 | a very unlikely constellation of tasks, irq-contexts and timings to | |
184 | occur, can be detected on a plain, lightly loaded single-CPU system as | |
185 | well!) | |
186 | ||
187 | This radically decreases the complexity of locking related QA of the | |
188 | kernel: what has to be done during QA is to trigger as many "simple" | |
189 | single-task locking dependencies in the kernel as possible, at least | |
190 | once, to prove locking correctness - instead of having to trigger every | |
191 | possible combination of locking interaction between CPUs, combined with | |
192 | every possible hardirq and softirq nesting scenario (which is impossible | |
193 | to do in practice). | |
194 | ||
195 | [*] assuming that the validator itself is 100% correct, and no other | |
196 | part of the system corrupts the state of the validator in any way. | |
197 | We also assume that all NMI/SMM paths [which could interrupt | |
198 | even hardirq-disabled codepaths] are correct and do not interfere | |
199 | with the validator. We also assume that the 64-bit 'chain hash' | |
200 | value is unique for every lock-chain in the system. Also, lock | |
201 | recursion must not be higher than 20. | |
202 | ||
203 | Performance: | |
204 | ------------ | |
205 | ||
206 | The above rules require _massive_ amounts of runtime checking. If we did | |
207 | that for every lock taken and for every irqs-enable event, it would | |
208 | render the system practically unusably slow. The complexity of checking | |
209 | is O(N^2), so even with just a few hundred lock-classes we'd have to do | |
210 | tens of thousands of checks for every event. | |
211 | ||
212 | This problem is solved by checking any given 'locking scenario' (unique | |
213 | sequence of locks taken after each other) only once. A simple stack of | |
214 | held locks is maintained, and a lightweight 64-bit hash value is | |
215 | calculated, which hash is unique for every lock chain. The hash value, | |
216 | when the chain is validated for the first time, is then put into a hash | |
217 | table, which hash-table can be checked in a lockfree manner. If the | |
218 | locking chain occurs again later on, the hash table tells us that we | |
219 | dont have to validate the chain again. |