dulwich.pack module¶
Classes for dealing with packed git objects.
A pack is a compact representation of a bunch of objects, stored using deltas where possible.
They have two parts, the pack file, which stores the data, and an index that tells you where the data is.
To find an object you look in all of the index files ‘til you find a match for the object name. You then use the pointer got from this as a pointer in to the corresponding packfile.
-
class
dulwich.pack.
DeltaChainIterator
(file_obj, resolve_ext_ref=None)¶ Bases:
object
Abstract iterator over pack data based on delta chains.
Each object in the pack is guaranteed to be inflated exactly once, regardless of how many objects reference it as a delta base. As a result, memory usage is proportional to the length of the longest delta chain.
Subclasses can override _result to define the result type of the iterator. By default, results are UnpackedObjects with the following members set:
- offset
- obj_type_num
- obj_chunks
- pack_type_num
- delta_base (for delta types)
- comp_chunks (if _include_comp is True)
- decomp_chunks
- decomp_len
- crc32 (if _compute_crc32 is True)
-
ext_refs
()¶
-
classmethod
for_pack_data
(pack_data, resolve_ext_ref=None)¶
-
record
(unpacked)¶
-
set_pack_data
(pack_data)¶
-
class
dulwich.pack.
FilePackIndex
(filename, file=None, contents=None, size=None)¶ Bases:
dulwich.pack.PackIndex
Pack index that is based on a file.
To do the loop it opens the file, and indexes first 256 4 byte groups with the first byte of the sha id. The value in the four byte group indexed is the end of the group that shares the same starting byte. Subtract one from the starting byte and index again to find the start of the group. The values are sorted by sha id within the group, so do the math to find the start and end offset and then bisect in to find if the value is present.
Create a pack index object.
Provide it with the name of the index file to consider, and it will map it whenever required.
-
calculate_checksum
()¶ Calculate the SHA1 checksum over this pack index.
Returns: This is a 20-byte binary digest
-
check
()¶ Check that the stored checksum matches the actual checksum.
-
close
()¶
-
get_pack_checksum
()¶ Return the SHA1 checksum stored for the corresponding packfile.
Returns: 20-byte binary digest
-
get_stored_checksum
()¶ Return the SHA1 checksum stored for this index.
Returns: 20-byte binary digest
-
iterentries
()¶ Iterate over the entries in this pack index.
Returns: iterator over tuples with object name, offset in packfile and crc32 checksum.
-
path
¶
-
-
class
dulwich.pack.
MemoryPackIndex
(entries, pack_checksum=None)¶ Bases:
dulwich.pack.PackIndex
Pack index that is stored entirely in memory.
Create a new MemoryPackIndex.
Parameters: - entries – Sequence of name, idx, crc32 (sorted)
- pack_checksum – Optional pack checksum
-
get_pack_checksum
()¶ Return the SHA1 checksum stored for the corresponding packfile.
Returns: 20-byte binary digest
-
iterentries
()¶ Iterate over the entries in this pack index.
Returns: iterator over tuples with object name, offset in packfile and crc32 checksum.
-
object_sha1
(index)¶ Return the SHA1 corresponding to the index in the pack file.
-
class
dulwich.pack.
Pack
(basename, resolve_ext_ref=None)¶ Bases:
object
A Git pack object.
-
check
()¶ Check the integrity of this pack.
Raises: ChecksumMismatch – if a checksum for the index or data is wrong
-
check_length_and_checksum
()¶ Sanity check the length and checksum of the pack index and data.
-
close
()¶
-
data
¶ The pack data object being used.
-
classmethod
from_lazy_objects
(data_fn, idx_fn)¶ Create a new pack object from callables to load pack data and index objects.
-
classmethod
from_objects
(data, idx)¶ Create a new pack object from pack data and index objects.
-
get_raw
(sha1)¶
-
get_raw_unresolved
(sha1)¶ Get raw unresolved data for a SHA.
Parameters: sha1 – SHA to return data for Returns: Tuple with pack object type, delta base (if applicable), list of data chunks
-
get_stored_checksum
()¶
-
index
¶ The index being used.
Note: This may be an in-memory index
-
iterobjects
()¶ Iterate over the objects in this pack.
-
keep
(msg=None)¶ Add a .keep file for the pack, preventing git from garbage collecting it.
Parameters: msg – A message written inside the .keep file; can be used later to determine whether or not a .keep file is obsolete. Returns: The path of the .keep file, as a string.
-
name
()¶ The SHA over the SHAs of the objects in this pack.
-
pack_tuples
()¶ Provide an iterable for use with write_pack_objects.
Returns: Object that can iterate over (object, path) tuples and provides __len__
-
-
class
dulwich.pack.
PackData
(filename, file=None, size=None)¶ Bases:
object
The data contained in a packfile.
Pack files can be accessed both sequentially for exploding a pack, and directly with the help of an index to retrieve a specific object.
The objects within are either complete or a delta against another.
The header is variable length. If the MSB of each byte is set then it indicates that the subsequent byte is still part of the header. For the first byte the next MS bits are the type, which tells you the type of object, and whether it is a delta. The LS byte is the lowest bits of the size. For each subsequent byte the LS 7 bits are the next MS bits of the size, i.e. the last byte of the header contains the MS bits of the size.
For the complete objects the data is stored as zlib deflated data. The size in the header is the uncompressed object size, so to uncompress you need to just keep feeding data to zlib until you get an object back, or it errors on bad data. This is done here by just giving the complete buffer from the start of the deflated object on. This is bad, but until I get mmap sorted out it will have to do.
Currently there are no integrity checks done. Also no attempt is made to try and detect the delta case, or a request for an object at the wrong position. It will all just throw a zlib or KeyError.
Create a PackData object representing the pack in the given filename.
The file must exist and stay readable until the object is disposed of. It must also stay the same size. It will be mapped whenever needed.
Currently there is a restriction on the size of the pack as the python mmap implementation is flawed.
-
calculate_checksum
()¶ Calculate the checksum for this pack.
Returns: 20-byte binary SHA1 digest
-
check
()¶ Check the consistency of this pack.
-
close
()¶
-
create_index
(filename, progress=None, version=2)¶ Create an index file for this data file.
Parameters: - filename – Index filename.
- progress – Progress report function
Returns: Checksum of index file
-
create_index_v1
(filename, progress=None)¶ Create a version 1 file for this data file.
Parameters: - filename – Index filename.
- progress – Progress report function
Returns: Checksum of index file
-
create_index_v2
(filename, progress=None)¶ Create a version 2 index file for this data file.
Parameters: - filename – Index filename.
- progress – Progress report function
Returns: Checksum of index file
-
filename
¶
-
classmethod
from_file
(file, size)¶
-
classmethod
from_path
(path)¶
-
get_compressed_data_at
(offset)¶ Given offset in the packfile return compressed data that is there.
Using the associated index the location of an object can be looked up, and then the packfile can be asked directly for that object using this function.
-
get_object_at
(offset)¶ Given an offset in to the packfile return the object that is there.
Using the associated index the location of an object can be looked up, and then the packfile can be asked directly for that object using this function.
-
get_ref
(sha)¶ Get the object for a ref SHA, only looking in this pack.
-
get_stored_checksum
()¶ Return the expected checksum stored in this pack.
-
iterentries
(progress=None)¶ Yield entries summarizing the contents of this pack.
Parameters: progress – Progress function, called with current and total object count. Returns: iterator of tuples with (sha, offset, crc32)
-
iterobjects
(progress=None, compute_crc32=True)¶
-
path
¶
-
resolve_object
(offset, type, obj, get_ref=None)¶ Resolve an object, possibly resolving deltas when necessary.
Returns: Tuple with object type and contents.
-
sorted_entries
(progress=None)¶ Return entries in this pack, sorted by SHA.
Parameters: progress – Progress function, called with current and total object count Returns: List of tuples with (sha, offset, crc32)
-
-
exception
dulwich.pack.
PackFileDisappeared
(obj)¶ Bases:
exceptions.Exception
-
class
dulwich.pack.
PackIndex
¶ Bases:
object
An index in to a packfile.
Given a sha id of an object a pack index can tell you the location in the packfile of that object if it has it.
-
get_pack_checksum
()¶ Return the SHA1 checksum stored for the corresponding packfile.
Returns: 20-byte binary digest
-
iterentries
()¶ Iterate over the entries in this pack index.
Returns: iterator over tuples with object name, offset in packfile and crc32 checksum.
-
object_index
(sha)¶ Return the index in to the corresponding packfile for the object.
Given the name of an object it will return the offset that object lives at within the corresponding pack file. If the pack file doesn’t have the object then None will be returned.
-
object_sha1
(index)¶ Return the SHA1 corresponding to the index in the pack file.
-
objects_sha1
()¶ Return the hex SHA1 over all the shas of all objects in this pack.
Note: This is used for the filename of the pack.
-
-
class
dulwich.pack.
PackIndex1
(filename, file=None, contents=None, size=None)¶ Bases:
dulwich.pack.FilePackIndex
Version 1 Pack Index file.
-
class
dulwich.pack.
PackIndex2
(filename, file=None, contents=None, size=None)¶ Bases:
dulwich.pack.FilePackIndex
Version 2 Pack Index file.
-
class
dulwich.pack.
PackIndexer
(file_obj, resolve_ext_ref=None)¶ Bases:
dulwich.pack.DeltaChainIterator
Delta chain iterator that yields index entries.
-
class
dulwich.pack.
PackInflater
(file_obj, resolve_ext_ref=None)¶ Bases:
dulwich.pack.DeltaChainIterator
Delta chain iterator that yields ShaFile objects.
-
class
dulwich.pack.
PackStreamCopier
(read_all, read_some, outfile, delta_iter=None)¶ Bases:
dulwich.pack.PackStreamReader
Class to verify a pack stream as it is being read.
The pack is read from a ReceivableProtocol using read() or recv() as appropriate and written out to the given file-like object.
Initialize the copier.
Parameters: - read_all – Read function that blocks until the number of requested bytes are read.
- read_some – Read function that returns at least one byte, but may not return the number of bytes requested.
- outfile – File-like object to write output through.
- delta_iter – Optional DeltaChainIterator to record deltas as we read them.
-
verify
()¶ Verify a pack stream and write it to the output file.
See PackStreamReader.iterobjects for a list of exceptions this may throw.
-
class
dulwich.pack.
PackStreamReader
(read_all, read_some=None, zlib_bufsize=4096)¶ Bases:
object
Class to read a pack stream.
The pack is read from a ReceivableProtocol using read() or recv() as appropriate.
-
offset
¶
-
read
(size)¶ Read, blocking until size bytes are read.
-
read_objects
(compute_crc32=False)¶ Read the objects in this pack file.
Parameters: compute_crc32 – If True, compute the CRC32 of the compressed data. If False, the returned CRC32 will be None.
Returns: Iterator over UnpackedObjects with the following members set: offset obj_type_num obj_chunks (for non-delta types) delta_base (for delta types) decomp_chunks decomp_len crc32 (if compute_crc32 is True)
Raises: - ChecksumMismatch – if the checksum of the pack contents does not match the checksum in the pack trailer.
- zlib.error – if an error occurred during zlib decompression.
- IOError – if an error occurred writing to the output file.
-
recv
(size)¶ Read up to size bytes, blocking until one byte is read.
-
-
class
dulwich.pack.
SHA1Reader
(f)¶ Bases:
object
Wrapper for file-like object that remembers the SHA1 of its data.
-
check_sha
()¶
-
close
()¶
-
read
(num=None)¶
-
tell
()¶
-
-
class
dulwich.pack.
SHA1Writer
(f)¶ Bases:
object
Wrapper for file-like object that remembers the SHA1 of its data.
-
close
()¶
-
offset
()¶
-
tell
()¶
-
write
(data)¶
-
write_sha
()¶
-
-
class
dulwich.pack.
UnpackedObject
(pack_type_num, delta_base, decomp_len, crc32)¶ Bases:
object
Class encapsulating an object unpacked from a pack file.
These objects should only be created from within unpack_object. Most members start out as empty and are filled in at various points by read_zlib_chunks, unpack_object, DeltaChainIterator, etc.
End users of this object should take care that the function they’re getting this object from is guaranteed to set the members they need.
-
comp_chunks
¶
-
crc32
¶
-
decomp_chunks
¶
-
decomp_len
¶
-
delta_base
¶
-
obj_chunks
¶
-
obj_type_num
¶
-
offset
¶
-
pack_type_num
¶
-
sha
()¶ Return the binary SHA of this object.
-
sha_file
()¶ Return a ShaFile from this object.
-
-
dulwich.pack.
apply_delta
(src_buf, delta)¶ Based on the similar function in git’s patch-delta.c.
Parameters: - src_buf – Source buffer
- delta – Delta instructions
-
dulwich.pack.
bisect_find_sha
(start, end, sha, unpack_name)¶ Find a SHA in a data blob with sorted SHAs.
Parameters: - start – Start index of range to search
- end – End index of range to search
- sha – Sha to find
- unpack_name – Callback to retrieve SHA by index
Returns: Index of the SHA, or None if it wasn’t found
-
dulwich.pack.
chunks_length
(chunks)¶
-
dulwich.pack.
compute_file_sha
(f, start_ofs=0, end_ofs=0, buffer_size=65536)¶ Hash a portion of a file into a new SHA.
Parameters: - f – A file-like object to read from that supports seek().
- start_ofs – The offset in the file to start reading at.
- end_ofs – The offset in the file to end reading at, relative to the end of the file.
- buffer_size – A buffer size for reading.
Returns: A new SHA object updated with data read from the file.
-
dulwich.pack.
create_delta
(base_buf, target_buf)¶ Use python difflib to work out how to transform base_buf to target_buf.
Parameters: - base_buf – Base buffer
- target_buf – Target buffer
-
dulwich.pack.
deltify_pack_objects
(objects, window_size=None)¶ Generate deltas for pack objects.
Parameters: - objects – An iterable of (object, path) tuples to deltify.
- window_size – Window size; None for default
Returns: Iterator over type_num, object id, delta_base, content delta_base is None for full text entries
-
dulwich.pack.
iter_sha1
(iter)¶ Return the hexdigest of the SHA1 over a set of names.
Parameters: iter – Iterator over string objects Returns: 40-byte hex sha1 digest
-
dulwich.pack.
load_pack_index
(path)¶ Load an index file by path.
Parameters: filename – Path to the index file Returns: A PackIndex loaded from the given path
-
dulwich.pack.
load_pack_index_file
(path, f)¶ Load an index file from a file-like object.
Parameters: - path – Path for the index file
- f – File-like object
Returns: A PackIndex loaded from the given file
-
dulwich.pack.
obj_sha
(type, chunks)¶ Compute the SHA for a numeric type and object chunks.
-
dulwich.pack.
pack_object_header
(type_num, delta_base, size)¶ Create a pack object header for the given object info.
Parameters: - type_num – Numeric type of the object.
- delta_base – Delta base offset or ref, or None for whole objects.
- size – Uncompressed object size.
Returns: A header for a packed object.
-
dulwich.pack.
pack_objects_to_data
(objects)¶ Create pack data from objects
Parameters: objects – Pack objects Returns: Tuples with (type_num, hexdigest, delta base, object chunks)
-
dulwich.pack.
read_pack_header
(read)¶ Read the header of a pack file.
Parameters: read – Read function Returns: Tuple of (pack version, number of objects). If no data is available to read, returns (None, None).
-
dulwich.pack.
read_zlib_chunks
(read_some, unpacked, include_comp=False, buffer_size=4096)¶ Read zlib data from a buffer.
This function requires that the buffer have additional data following the compressed data, which is guaranteed to be the case for git pack files.
Parameters: - read_some – Read function that returns at least one byte, but may return less than the requested size.
- unpacked – An UnpackedObject to write result data to. If its crc32 attr is not None, the CRC32 of the compressed bytes will be computed using this starting CRC32. After this function, will have the following attrs set: * comp_chunks (if include_comp is True) * decomp_chunks * decomp_len * crc32
- include_comp – If True, include compressed data in the result.
- buffer_size – Size of the read buffer.
Returns: Leftover unused data from the decompression.
Raises: zlib.error – if a decompression error occurred.
-
dulwich.pack.
take_msb_bytes
(read, crc32=None)¶ Read bytes marked with most significant bit.
Parameters: read – Read function
-
dulwich.pack.
unpack_object
(read_all, read_some=None, compute_crc32=False, include_comp=False, zlib_bufsize=4096)¶ Unpack a Git object.
Parameters: - read_all – Read function that blocks until the number of requested bytes are read.
- read_some – Read function that returns at least one byte, but may not return the number of bytes requested.
- compute_crc32 – If True, compute the CRC32 of the compressed data. If False, the returned CRC32 will be None.
- include_comp – If True, include compressed data in the result.
- zlib_bufsize – An optional buffer size for zlib operations.
Returns: A tuple of (unpacked, unused), where unused is the unused data leftover from decompression, and unpacked in an UnpackedObject with the following attrs set:
- obj_chunks (for non-delta types)
- pack_type_num
- delta_base (for delta types)
- comp_chunks (if include_comp is True)
- decomp_chunks
- decomp_len
- crc32 (if compute_crc32 is True)
-
dulwich.pack.
write_pack
(filename, objects, deltify=None, delta_window_size=None)¶ Write a new pack data file.
Parameters: - filename – Path to the new pack file (without .pack extension)
- objects – Iterable of (object, path) tuples to write. Should provide __len__
- window_size – Delta window size
- deltify – Whether to deltify pack objects
Returns: Tuple with checksum of pack file and index file
-
dulwich.pack.
write_pack_data
(f, num_records, records, progress=None)¶ Write a new pack data file.
Parameters: - f – File to write to
- num_records – Number of records
- records – Iterator over type_num, object_id, delta_base, raw
- progress – Function to report progress to
Returns: Dict mapping id -> (offset, crc32 checksum), pack checksum
-
dulwich.pack.
write_pack_header
(f, num_objects)¶ Write a pack header for the given number of objects.
-
dulwich.pack.
write_pack_index
(f, entries, pack_checksum)¶ Write a new pack index file.
Parameters: - f – File-like object to write to
- entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
- pack_checksum – Checksum of the pack file.
Returns: The SHA of the index file written
-
dulwich.pack.
write_pack_index_v1
(f, entries, pack_checksum)¶ Write a new pack index file.
Parameters: - f – A file-like object to write to
- entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
- pack_checksum – Checksum of the pack file.
Returns: The SHA of the written index file
-
dulwich.pack.
write_pack_index_v2
(f, entries, pack_checksum)¶ Write a new pack index file.
Parameters: - f – File-like object to write to
- entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
- pack_checksum – Checksum of the pack file.
Returns: The SHA of the index file written
-
dulwich.pack.
write_pack_object
(f, type, object, sha=None)¶ Write pack object to a file.
Parameters: - f – File to write to
- type – Numeric type of the object
- object – Object to write
Returns: Tuple with offset at which the object was written, and crc32
-
dulwich.pack.
write_pack_objects
(f, objects, delta_window_size=None, deltify=None)¶ Write a new pack data file.
Parameters: - f – File to write to
- objects – Iterable of (object, path) tuples to write. Should provide __len__
- window_size – Sliding window size for searching for deltas; Set to None for default window size.
- deltify – Whether to deltify objects
Returns: Dict mapping id -> (offset, crc32 checksum), pack checksum