Commit | Line | Data |
---|---|---|
48a8c26c | 1 | Git pack format |
9760662f JH |
2 | =============== |
3 | ||
5316c8e9 | 4 | == pack-*.pack files have the following format: |
9760662f | 5 | |
71362bd5 | 6 | - A header appears at the beginning and consists of the following: |
9760662f | 7 | |
1361fa3e SP |
8 | 4-byte signature: |
9 | The signature is: {'P', 'A', 'C', 'K'} | |
10 | ||
11 | 4-byte version number (network byte order): | |
48a8c26c | 12 | Git currently accepts version number 2 or 3 but |
1361fa3e SP |
13 | generates version 2 only. |
14 | ||
9760662f JH |
15 | 4-byte number of objects contained in the pack (network byte order) |
16 | ||
17 | Observation: we cannot have more than 4G versions ;-) and | |
18 | more than 4G objects in a pack. | |
19 | ||
20 | - The header is followed by number of object entries, each of | |
21 | which looks like this: | |
22 | ||
23 | (undeltified representation) | |
979ea585 | 24 | n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
9760662f JH |
25 | compressed data |
26 | ||
27 | (deltified representation) | |
979ea585 | 28 | n-byte type and length (3-bit type, (n-1)*7+4-bit length) |
06cb843f SS |
29 | 20-byte base object name if OBJ_REF_DELTA or a negative relative |
30 | offset from the delta object's position in the pack if this | |
31 | is an OBJ_OFS_DELTA object | |
9760662f JH |
32 | compressed delta data |
33 | ||
34 | Observation: length of each object is encoded in a variable | |
35 | length format and is not constrained to 32-bit or anything. | |
36 | ||
d5fa1f1a | 37 | - The trailer records 20-byte SHA-1 checksum of all of the above. |
9760662f | 38 | |
5316c8e9 | 39 | == Original (version 1) pack-*.idx files have the following format: |
9760662f JH |
40 | |
41 | - The header consists of 256 4-byte network byte order | |
42 | integers. N-th entry of this table records the number of | |
43 | objects in the corresponding pack, the first byte of whose | |
71362bd5 | 44 | object name is less than or equal to N. This is called the |
9760662f JH |
45 | 'first-level fan-out' table. |
46 | ||
1361fa3e | 47 | - The header is followed by sorted 24-byte entries, one entry |
9760662f JH |
48 | per object in the pack. Each entry is: |
49 | ||
50 | 4-byte network byte order integer, recording where the | |
51 | object is stored in the packfile as the offset from the | |
52 | beginning. | |
53 | ||
54 | 20-byte object name. | |
55 | ||
9760662f JH |
56 | - The file is concluded with a trailer: |
57 | ||
d5fa1f1a | 58 | A copy of the 20-byte SHA-1 checksum at the end of |
9760662f JH |
59 | corresponding packfile. |
60 | ||
d5fa1f1a | 61 | 20-byte SHA-1-checksum of all of the above. |
9760662f JH |
62 | |
63 | Pack Idx file: | |
64 | ||
71362bd5 | 65 | -- +--------------------------------+ |
66 | fanout | fanout[0] = 2 (for example) |-. | |
67 | table +--------------------------------+ | | |
9760662f JH |
68 | | fanout[1] | | |
69 | +--------------------------------+ | | |
70 | | fanout[2] | | | |
71 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | |
71362bd5 | 72 | | fanout[255] = total objects |---. |
73 | -- +--------------------------------+ | | | |
74 | main | offset | | | | |
75 | index | object name 00XXXXXXXXXXXXXXXX | | | | |
76 | table +--------------------------------+ | | | |
77 | | offset | | | | |
78 | | object name 00XXXXXXXXXXXXXXXX | | | | |
79 | +--------------------------------+<+ | | |
80 | .-| offset | | | |
81 | | | object name 01XXXXXXXXXXXXXXXX | | | |
82 | | +--------------------------------+ | | |
83 | | | offset | | | |
84 | | | object name 01XXXXXXXXXXXXXXXX | | | |
85 | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | |
86 | | | offset | | | |
87 | | | object name FFXXXXXXXXXXXXXXXX | | | |
88 | --| +--------------------------------+<--+ | |
9760662f JH |
89 | trailer | | packfile checksum | |
90 | | +--------------------------------+ | |
91 | | | idxfile checksum | | |
92 | | +--------------------------------+ | |
a6080a0a | 93 | .-------. |
9760662f JH |
94 | | |
95 | Pack file entry: <+ | |
96 | ||
97 | packed object header: | |
979ea585 PE |
98 | 1-byte size extension bit (MSB) |
99 | type (next 3 bit) | |
a6080a0a | 100 | size0 (lower 4-bit) |
9760662f JH |
101 | n-byte sizeN (as long as MSB is set, each 7-bit) |
102 | size0..sizeN form 4+7+7+..+7 bit integer, size0 | |
979ea585 PE |
103 | is the least significant part, and sizeN is the |
104 | most significant part. | |
9760662f JH |
105 | packed object data: |
106 | If it is not DELTA, then deflated bytes (the size above | |
107 | is the size before compression). | |
9de328fe | 108 | If it is REF_DELTA, then |
d5fa1f1a | 109 | 20-byte base object name SHA-1 (the size above is the |
a6080a0a | 110 | size of the delta data that follows). |
9760662f | 111 | delta data, deflated. |
9de328fe PE |
112 | If it is OFS_DELTA, then |
113 | n-byte offset (see below) interpreted as a negative | |
114 | offset from the type-byte of the header of the | |
115 | ofs-delta entry (the size above is the size of | |
116 | the delta data that follows). | |
117 | delta data, deflated. | |
118 | ||
119 | offset encoding: | |
120 | n bytes with MSB set in all but the last one. | |
121 | The offset is then the number constructed by | |
122 | concatenating the lower 7 bit of each byte, and | |
123 | for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) | |
124 | to the result. | |
125 | ||
71362bd5 | 126 | |
127 | ||
5316c8e9 TA |
128 | == Version 2 pack-*.idx files support packs larger than 4 GiB, and |
129 | have some other reorganizations. They have the format: | |
71362bd5 | 130 | |
131 | - A 4-byte magic number '\377tOc' which is an unreasonable | |
132 | fanout[0] value. | |
133 | ||
134 | - A 4-byte version number (= 2) | |
135 | ||
136 | - A 256-entry fan-out table just like v1. | |
137 | ||
d5fa1f1a | 138 | - A table of sorted 20-byte SHA-1 object names. These are |
71362bd5 | 139 | packed together without offset values to reduce the cache |
140 | footprint of the binary search for a specific object name. | |
141 | ||
142 | - A table of 4-byte CRC32 values of the packed object data. | |
143 | This is new in v2 so compressed data can be copied directly | |
f1cdcc70 | 144 | from pack to pack during repacking without undetected |
71362bd5 | 145 | data corruption. |
146 | ||
147 | - A table of 4-byte offset values (in network byte order). | |
148 | These are usually 31-bit pack file offsets, but large | |
149 | offsets are encoded as an index into the next table with | |
150 | the msbit set. | |
151 | ||
152 | - A table of 8-byte offset entries (empty for pack files less | |
153 | than 2 GiB). Pack files are organized with heavily used | |
154 | objects toward the front, so most object references should | |
155 | not need to refer to this table. | |
156 | ||
157 | - The same trailer as a v1 pack file: | |
158 | ||
d5fa1f1a | 159 | A copy of the 20-byte SHA-1 checksum at the end of |
71362bd5 | 160 | corresponding packfile. |
161 | ||
d5fa1f1a | 162 | 20-byte SHA-1-checksum of all of the above. |