2 Device Power Management
5 Device power management encompasses two areas - the ability to save
6 state and transition a device to a low-power state when the system is
7 entering a low-power state; and the ability to transition a device to
8 a low-power state while the system is running (and independently of
9 any other power management activity).
14 The methods to suspend and resume devices reside in struct bus_type:
18 int (*suspend)(struct device * dev, pm_message_t state);
19 int (*resume)(struct device * dev);
22 Each bus driver is responsible implementing these methods, translating
23 the call into a bus-specific request and forwarding the call to the
24 bus-specific drivers. For example, PCI drivers implement suspend() and
25 resume() methods in struct pci_driver. The PCI core is simply
26 responsible for translating the pointers to PCI-specific ones and
27 calling the low-level driver.
29 This is done to a) ease transition to the new power management methods
30 and leverage the existing PM code in various bus drivers; b) allow
31 buses to implement generic and default PM routines for devices, and c)
32 make the flow of execution obvious to the reader.
35 System Power Management
37 When the system enters a low-power state, the device tree is walked in
38 a depth-first fashion to transition each device into a low-power
39 state. The ordering of the device tree is guaranteed by the order in
40 which devices get registered - children are never registered before
41 their ancestors, and devices are placed at the back of the list when
42 registered. By walking the list in reverse order, we are guaranteed to
43 suspend devices in the proper order.
45 Devices are suspended once with interrupts enabled. Drivers are
46 expected to stop I/O transactions, save device state, and place the
47 device into a low-power state. Drivers may sleep, allocate memory,
50 Some devices are broken and will inevitably have problems powering
51 down or disabling themselves with interrupts enabled. For these
52 special cases, they may return -EAGAIN. This will put the device on a
53 list to be taken care of later. When interrupts are disabled, before
54 we enter the low-power state, their drivers are called again to put
55 their device to sleep.
57 On resume, the devices that returned -EAGAIN will be called to power
58 themselves back on with interrupts disabled. Once interrupts have been
59 re-enabled, the rest of the drivers will be called to resume their
60 devices. On resume, a driver is responsible for powering back on each
61 device, restoring state, and re-enabling I/O transactions for that
64 System devices follow a slightly different API, which can be found in
66 include/linux/sysdev.h
69 System devices will only be suspended with interrupts disabled, and
70 after all other devices have been suspended. On resume, they will be
71 resumed before any other devices, and also with interrupts disabled.
74 Runtime Power Management
76 Many devices are able to dynamically power down while the system is
77 still running. This feature is useful for devices that are not being
78 used, and can offer significant power savings on a running system.
80 In each device's directory, there is a 'power' directory, which
81 contains at least a 'state' file. Reading from this file displays what
82 power state the device is currently in. Writing to this file initiates
83 a transition to the specified power state, which must be a decimal in
84 the range 1-3, inclusive; or 0 for 'On'.
86 The PM core will call the ->suspend() method in the bus_type object
87 that the device belongs to if the specified state is not 0, or
90 Nothing will happen if the specified state is the same state the
91 device is currently in.
93 If the device is already in a low-power state, and the specified state
94 is another, but different, low-power state, the ->resume() method will
95 first be called to power the device back on, then ->suspend() will be
96 called again with the new state.
98 The driver is responsible for saving the working state of the device
99 and putting it into the low-power state specified. If this was
100 successful, it returns 0, and the device's power_state field is
103 The driver must take care to know whether or not it is able to
104 properly resume the device, including all step of reinitialization
105 necessary. (This is the hardest part, and the one most protected by
108 The driver must also take care not to suspend a device that is
109 currently in use. It is their responsibility to provide their own
110 exclusion mechanisms.
112 The runtime power transition happens with interrupts enabled. If a
113 device cannot support being powered down with interrupts, it may
114 return -EAGAIN (as it would during a system power management
115 transition), but it will _not_ be called again, and the transaction
118 There is currently no way to know what states a device or driver
119 supports a priori. This will change in the future.
123 pm_message_t has two fields. event ("major"), and flags. If driver
124 does not know event code, it aborts the request, returning error. Some
125 drivers may need to deal with special cases based on the actual type
126 of suspend operation being done at the system level. This is why
131 ON -- no need to do anything except special cases like broken
134 # NOTIFICATION -- pretty much same as ON?
136 FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
137 scratch. That probably means stop accepting upstream requests, the
138 actual policy of what to do with them beeing specific to a given
139 driver. It's acceptable for a network driver to just drop packets
140 while a block driver is expected to block the queue so no request is
141 lost. (Use IDE as an example on how to do that). FREEZE requires no
142 power state change, and it's expected for drivers to be able to
143 quickly transition back to operating state.
145 SUSPEND -- like FREEZE, but also put hardware into low-power state. If
146 there's need to distinguish several levels of sleep, additional flag
147 is probably best way to do that.
149 Transitions are only from a resumed state to a suspended state, never
150 between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
151 FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
155 [NOTE NOTE NOTE: If you are driver author, you should not care; you
156 should only look at event, and ignore flags.]
158 #Prepare for suspend -- userland is still running but we are going to
159 #enter suspend state. This gives drivers chance to load firmware from
160 #disk and store it in memory, or do other activities taht require
161 #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
162 #are forbiden once the suspend dance is started.. event = ON, flags =
165 Apm standby -- prepare for APM event. Quiesce devices to make life
166 easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
168 Apm suspend -- same as APM_STANDBY, but it we should probably avoid
169 spinning down disks. event = FREEZE, flags = APM_SUSPEND
171 System halt, reboot -- quiesce devices to make life easier for BIOS. event
172 = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
174 System shutdown -- at least disks need to be spun down, or data may be
175 lost. Quiesce devices, just to make life easier for BIOS. event =
176 FREEZE, flags = SYSTEM_SHUTDOWN
178 Kexec -- turn off DMAs and put hardware into some state where new
179 kernel can take over. event = FREEZE, flags = KEXEC
181 Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
182 may need to be enabled on some devices. This actually has at least 3
183 subtypes, system can reboot, enter S4 and enter S5 at the end of
184 swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
185 SYSTEM_SHUTDOWN, SYSTEM_S4
187 Suspend to ram -- put devices into low power state. event = SUSPEND,
188 flags = SUSPEND_TO_RAM
190 Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
191 devices into low power mode, but you must be able to reinitialize
192 device from scratch in resume method. This has two flavors, its done
193 once on suspending kernel, once on resuming kernel. event = FREEZE,
194 flags = DURING_SUSPEND or DURING_RESUME
196 Device detach requested from /sys -- deinitialize device; proably same as
197 SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
198 = FREEZE, flags = DEV_DETACH.
200 #These are not really events sent:
202 #System fully on -- device is working normally; this is probably never
203 #passed to suspend() method... event = ON, flags = 0
205 #Ready after resume -- userland is now running, again. Time to free any
206 #memory you ate during prepare to suspend... event = ON, flags =
210 Driver Detach Power Management
212 The kernel now supports the ability to place a device in a low-power
213 state when it is detached from its driver, which happens when its
216 Each device contains a 'detach_state' file in its sysfs directory
217 which can be used to control this state. Reading from this file
218 displays what the current detach state is set to. This is 0 (On) by
219 default. A user may write a positive integer value to this file in the
220 range of 1-4 inclusive.
222 A value of 1-3 will indicate the device should be placed in that
223 low-power state, which will cause ->suspend() to be called for that
224 device. A value of 4 indicates that the device should be shutdown, so
225 ->shutdown() will be called for that device.
227 The driver is responsible for reinitializing the device when the
228 module is re-inserted during it's ->probe() (or equivalent) method.
229 The driver core will not call any extra functions when binding the
230 device to the driver.
234 pm_message_t has two fields. event ("major"), and flags. If driver
235 does not know event code, it aborts the request, returning error. Some
236 drivers may need to deal with special cases based on the actual type
237 of suspend operation being done at the system level. This is why
242 ON -- no need to do anything except special cases like broken
245 # NOTIFICATION -- pretty much same as ON?
247 FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
248 scratch. That probably means stop accepting upstream requests, the
249 actual policy of what to do with them being specific to a given
250 driver. It's acceptable for a network driver to just drop packets
251 while a block driver is expected to block the queue so no request is
252 lost. (Use IDE as an example on how to do that). FREEZE requires no
253 power state change, and it's expected for drivers to be able to
254 quickly transition back to operating state.
256 SUSPEND -- like FREEZE, but also put hardware into low-power state. If
257 there's need to distinguish several levels of sleep, additional flag
258 is probably best way to do that.
260 Transitions are only from a resumed state to a suspended state, never
261 between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
262 FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
266 [NOTE NOTE NOTE: If you are driver author, you should not care; you
267 should only look at event, and ignore flags.]
269 #Prepare for suspend -- userland is still running but we are going to
270 #enter suspend state. This gives drivers chance to load firmware from
271 #disk and store it in memory, or do other activities taht require
272 #operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
273 #are forbiden once the suspend dance is started.. event = ON, flags =
276 Apm standby -- prepare for APM event. Quiesce devices to make life
277 easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
279 Apm suspend -- same as APM_STANDBY, but it we should probably avoid
280 spinning down disks. event = FREEZE, flags = APM_SUSPEND
282 System halt, reboot -- quiesce devices to make life easier for BIOS. event
283 = FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
285 System shutdown -- at least disks need to be spun down, or data may be
286 lost. Quiesce devices, just to make life easier for BIOS. event =
287 FREEZE, flags = SYSTEM_SHUTDOWN
289 Kexec -- turn off DMAs and put hardware into some state where new
290 kernel can take over. event = FREEZE, flags = KEXEC
292 Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
293 may need to be enabled on some devices. This actually has at least 3
294 subtypes, system can reboot, enter S4 and enter S5 at the end of
295 swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
296 SYSTEM_SHUTDOWN, SYSTEM_S4
298 Suspend to ram -- put devices into low power state. event = SUSPEND,
299 flags = SUSPEND_TO_RAM
301 Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
302 devices into low power mode, but you must be able to reinitialize
303 device from scratch in resume method. This has two flavors, its done
304 once on suspending kernel, once on resuming kernel. event = FREEZE,
305 flags = DURING_SUSPEND or DURING_RESUME
307 Device detach requested from /sys -- deinitialize device; proably same as
308 SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
309 = FREEZE, flags = DEV_DETACH.
311 #These are not really events sent:
313 #System fully on -- device is working normally; this is probably never
314 #passed to suspend() method... event = ON, flags = 0
316 #Ready after resume -- userland is now running, again. Time to free any
317 #memory you ate during prepare to suspend... event = ON, flags =