dulwich.pack module

Classes for dealing with packed git objects.

A pack is a compact representation of a bunch of objects, stored using deltas where possible.

They have two parts, the pack file, which stores the data, and an index that tells you where the data is.

To find an object you look in all of the index files ‘til you find a match for the object name. You then use the pointer got from this as a pointer in to the corresponding packfile.

class dulwich.pack.DeltaChainIterator(file_obj, resolve_ext_ref=None)

Bases: object

Abstract iterator over pack data based on delta chains.

Each object in the pack is guaranteed to be inflated exactly once, regardless of how many objects reference it as a delta base. As a result, memory usage is proportional to the length of the longest delta chain.

Subclasses can override _result to define the result type of the iterator. By default, results are UnpackedObjects with the following members set:

  • offset
  • obj_type_num
  • obj_chunks
  • pack_type_num
  • delta_base (for delta types)
  • comp_chunks (if _include_comp is True)
  • decomp_chunks
  • decomp_len
  • crc32 (if _compute_crc32 is True)
ext_refs()
classmethod for_pack_data(pack_data, resolve_ext_ref=None)
record(unpacked)
set_pack_data(pack_data)
class dulwich.pack.FilePackIndex(filename, file=None, contents=None, size=None)

Bases: dulwich.pack.PackIndex

Pack index that is based on a file.

To do the loop it opens the file, and indexes first 256 4 byte groups with the first byte of the sha id. The value in the four byte group indexed is the end of the group that shares the same starting byte. Subtract one from the starting byte and index again to find the start of the group. The values are sorted by sha id within the group, so do the math to find the start and end offset and then bisect in to find if the value is present.

Create a pack index object.

Provide it with the name of the index file to consider, and it will map it whenever required.

calculate_checksum()

Calculate the SHA1 checksum over this pack index.

Returns:This is a 20-byte binary digest
check()

Check that the stored checksum matches the actual checksum.

close()
get_pack_checksum()

Return the SHA1 checksum stored for the corresponding packfile.

Returns:20-byte binary digest
get_stored_checksum()

Return the SHA1 checksum stored for this index.

Returns:20-byte binary digest
iterentries()

Iterate over the entries in this pack index.

Returns:iterator over tuples with object name, offset in packfile and crc32 checksum.
path
class dulwich.pack.MemoryPackIndex(entries, pack_checksum=None)

Bases: dulwich.pack.PackIndex

Pack index that is stored entirely in memory.

Create a new MemoryPackIndex.

Parameters:
  • entries – Sequence of name, idx, crc32 (sorted)
  • pack_checksum – Optional pack checksum
get_pack_checksum()

Return the SHA1 checksum stored for the corresponding packfile.

Returns:20-byte binary digest
iterentries()

Iterate over the entries in this pack index.

Returns:iterator over tuples with object name, offset in packfile and crc32 checksum.
object_sha1(index)

Return the SHA1 corresponding to the index in the pack file.

class dulwich.pack.Pack(basename, resolve_ext_ref=None)

Bases: object

A Git pack object.

check()

Check the integrity of this pack.

Raises:ChecksumMismatch – if a checksum for the index or data is wrong
check_length_and_checksum()

Sanity check the length and checksum of the pack index and data.

close()
data

The pack data object being used.

classmethod from_lazy_objects(data_fn, idx_fn)

Create a new pack object from callables to load pack data and index objects.

classmethod from_objects(data, idx)

Create a new pack object from pack data and index objects.

get_raw(sha1)
get_raw_unresolved(sha1)

Get raw unresolved data for a SHA.

Parameters:sha1 – SHA to return data for
Returns:Tuple with pack object type, delta base (if applicable), list of data chunks
get_stored_checksum()
index

The index being used.

Note:This may be an in-memory index
iterobjects()

Iterate over the objects in this pack.

keep(msg=None)

Add a .keep file for the pack, preventing git from garbage collecting it.

Parameters:msg – A message written inside the .keep file; can be used later to determine whether or not a .keep file is obsolete.
Returns:The path of the .keep file, as a string.
name()

The SHA over the SHAs of the objects in this pack.

pack_tuples()

Provide an iterable for use with write_pack_objects.

Returns:Object that can iterate over (object, path) tuples and provides __len__
class dulwich.pack.PackData(filename, file=None, size=None)

Bases: object

The data contained in a packfile.

Pack files can be accessed both sequentially for exploding a pack, and directly with the help of an index to retrieve a specific object.

The objects within are either complete or a delta against another.

The header is variable length. If the MSB of each byte is set then it indicates that the subsequent byte is still part of the header. For the first byte the next MS bits are the type, which tells you the type of object, and whether it is a delta. The LS byte is the lowest bits of the size. For each subsequent byte the LS 7 bits are the next MS bits of the size, i.e. the last byte of the header contains the MS bits of the size.

For the complete objects the data is stored as zlib deflated data. The size in the header is the uncompressed object size, so to uncompress you need to just keep feeding data to zlib until you get an object back, or it errors on bad data. This is done here by just giving the complete buffer from the start of the deflated object on. This is bad, but until I get mmap sorted out it will have to do.

Currently there are no integrity checks done. Also no attempt is made to try and detect the delta case, or a request for an object at the wrong position. It will all just throw a zlib or KeyError.

Create a PackData object representing the pack in the given filename.

The file must exist and stay readable until the object is disposed of. It must also stay the same size. It will be mapped whenever needed.

Currently there is a restriction on the size of the pack as the python mmap implementation is flawed.

calculate_checksum()

Calculate the checksum for this pack.

Returns:20-byte binary SHA1 digest
check()

Check the consistency of this pack.

close()
create_index(filename, progress=None, version=2)

Create an index file for this data file.

Parameters:
  • filename – Index filename.
  • progress – Progress report function
Returns:

Checksum of index file

create_index_v1(filename, progress=None)

Create a version 1 file for this data file.

Parameters:
  • filename – Index filename.
  • progress – Progress report function
Returns:

Checksum of index file

create_index_v2(filename, progress=None)

Create a version 2 index file for this data file.

Parameters:
  • filename – Index filename.
  • progress – Progress report function
Returns:

Checksum of index file

filename
classmethod from_file(file, size)
classmethod from_path(path)
get_compressed_data_at(offset)

Given offset in the packfile return compressed data that is there.

Using the associated index the location of an object can be looked up, and then the packfile can be asked directly for that object using this function.

get_object_at(offset)

Given an offset in to the packfile return the object that is there.

Using the associated index the location of an object can be looked up, and then the packfile can be asked directly for that object using this function.

get_ref(sha)

Get the object for a ref SHA, only looking in this pack.

get_stored_checksum()

Return the expected checksum stored in this pack.

iterentries(progress=None)

Yield entries summarizing the contents of this pack.

Parameters:progress – Progress function, called with current and total object count.
Returns:iterator of tuples with (sha, offset, crc32)
iterobjects(progress=None, compute_crc32=True)
path
resolve_object(offset, type, obj, get_ref=None)

Resolve an object, possibly resolving deltas when necessary.

Returns:Tuple with object type and contents.
sorted_entries(progress=None)

Return entries in this pack, sorted by SHA.

Parameters:progress – Progress function, called with current and total object count
Returns:List of tuples with (sha, offset, crc32)
class dulwich.pack.PackIndex

Bases: object

An index in to a packfile.

Given a sha id of an object a pack index can tell you the location in the packfile of that object if it has it.

get_pack_checksum()

Return the SHA1 checksum stored for the corresponding packfile.

Returns:20-byte binary digest
iterentries()

Iterate over the entries in this pack index.

Returns:iterator over tuples with object name, offset in packfile and crc32 checksum.
object_index(sha)

Return the index in to the corresponding packfile for the object.

Given the name of an object it will return the offset that object lives at within the corresponding pack file. If the pack file doesn’t have the object then None will be returned.

object_sha1(index)

Return the SHA1 corresponding to the index in the pack file.

objects_sha1()

Return the hex SHA1 over all the shas of all objects in this pack.

Note:This is used for the filename of the pack.
class dulwich.pack.PackIndex1(filename, file=None, contents=None, size=None)

Bases: dulwich.pack.FilePackIndex

Version 1 Pack Index file.

class dulwich.pack.PackIndex2(filename, file=None, contents=None, size=None)

Bases: dulwich.pack.FilePackIndex

Version 2 Pack Index file.

class dulwich.pack.PackIndexer(file_obj, resolve_ext_ref=None)

Bases: dulwich.pack.DeltaChainIterator

Delta chain iterator that yields index entries.

class dulwich.pack.PackInflater(file_obj, resolve_ext_ref=None)

Bases: dulwich.pack.DeltaChainIterator

Delta chain iterator that yields ShaFile objects.

class dulwich.pack.PackStreamCopier(read_all, read_some, outfile, delta_iter=None)

Bases: dulwich.pack.PackStreamReader

Class to verify a pack stream as it is being read.

The pack is read from a ReceivableProtocol using read() or recv() as appropriate and written out to the given file-like object.

Initialize the copier.

Parameters:
  • read_all – Read function that blocks until the number of requested bytes are read.
  • read_some – Read function that returns at least one byte, but may not return the number of bytes requested.
  • outfile – File-like object to write output through.
  • delta_iter – Optional DeltaChainIterator to record deltas as we read them.
verify()

Verify a pack stream and write it to the output file.

See PackStreamReader.iterobjects for a list of exceptions this may throw.

class dulwich.pack.PackStreamReader(read_all, read_some=None, zlib_bufsize=4096)

Bases: object

Class to read a pack stream.

The pack is read from a ReceivableProtocol using read() or recv() as appropriate.

offset
read(size)

Read, blocking until size bytes are read.

read_objects(compute_crc32=False)

Read the objects in this pack file.

Parameters:

compute_crc32 – If True, compute the CRC32 of the compressed data. If False, the returned CRC32 will be None.

Returns:

Iterator over UnpackedObjects with the following members set: offset obj_type_num obj_chunks (for non-delta types) delta_base (for delta types) decomp_chunks decomp_len crc32 (if compute_crc32 is True)

Raises:
  • ChecksumMismatch – if the checksum of the pack contents does not match the checksum in the pack trailer.
  • zlib.error – if an error occurred during zlib decompression.
  • IOError – if an error occurred writing to the output file.
recv(size)

Read up to size bytes, blocking until one byte is read.

class dulwich.pack.SHA1Reader(f)

Bases: object

Wrapper for file-like object that remembers the SHA1 of its data.

check_sha()
close()
read(num=None)
tell()
class dulwich.pack.SHA1Writer(f)

Bases: object

Wrapper for file-like object that remembers the SHA1 of its data.

close()
offset()
tell()
write(data)
write_sha()
class dulwich.pack.UnpackedObject(pack_type_num, delta_base, decomp_len, crc32)

Bases: object

Class encapsulating an object unpacked from a pack file.

These objects should only be created from within unpack_object. Most members start out as empty and are filled in at various points by read_zlib_chunks, unpack_object, DeltaChainIterator, etc.

End users of this object should take care that the function they’re getting this object from is guaranteed to set the members they need.

comp_chunks
crc32
decomp_chunks
decomp_len
delta_base
obj_chunks
obj_type_num
offset
pack_type_num
sha()

Return the binary SHA of this object.

sha_file()

Return a ShaFile from this object.

dulwich.pack.apply_delta(src_buf, delta)

Based on the similar function in git’s patch-delta.c.

Parameters:
  • src_buf – Source buffer
  • delta – Delta instructions
dulwich.pack.bisect_find_sha(start, end, sha, unpack_name)

Find a SHA in a data blob with sorted SHAs.

Parameters:
  • start – Start index of range to search
  • end – End index of range to search
  • sha – Sha to find
  • unpack_name – Callback to retrieve SHA by index
Returns:

Index of the SHA, or None if it wasn’t found

dulwich.pack.chunks_length(chunks)
dulwich.pack.compute_file_sha(f, start_ofs=0, end_ofs=0, buffer_size=65536)

Hash a portion of a file into a new SHA.

Parameters:
  • f – A file-like object to read from that supports seek().
  • start_ofs – The offset in the file to start reading at.
  • end_ofs – The offset in the file to end reading at, relative to the end of the file.
  • buffer_size – A buffer size for reading.
Returns:

A new SHA object updated with data read from the file.

dulwich.pack.create_delta(base_buf, target_buf)

Use python difflib to work out how to transform base_buf to target_buf.

Parameters:
  • base_buf – Base buffer
  • target_buf – Target buffer
dulwich.pack.deltify_pack_objects(objects, window_size=None)

Generate deltas for pack objects.

Parameters:
  • objects – An iterable of (object, path) tuples to deltify.
  • window_size – Window size; None for default
Returns:

Iterator over type_num, object id, delta_base, content delta_base is None for full text entries

dulwich.pack.iter_sha1(iter)

Return the hexdigest of the SHA1 over a set of names.

Parameters:iter – Iterator over string objects
Returns:40-byte hex sha1 digest
dulwich.pack.load_pack_index(path)

Load an index file by path.

Parameters:filename – Path to the index file
Returns:A PackIndex loaded from the given path
dulwich.pack.load_pack_index_file(path, f)

Load an index file from a file-like object.

Parameters:
  • path – Path for the index file
  • f – File-like object
Returns:

A PackIndex loaded from the given file

dulwich.pack.obj_sha(type, chunks)

Compute the SHA for a numeric type and object chunks.

dulwich.pack.pack_object_header(type_num, delta_base, size)

Create a pack object header for the given object info.

Parameters:
  • type_num – Numeric type of the object.
  • delta_base – Delta base offset or ref, or None for whole objects.
  • size – Uncompressed object size.
Returns:

A header for a packed object.

dulwich.pack.pack_objects_to_data(objects)

Create pack data from objects

Parameters:objects – Pack objects
Returns:Tuples with (type_num, hexdigest, delta base, object chunks)
dulwich.pack.read_pack_header(read)

Read the header of a pack file.

Parameters:read – Read function
Returns:Tuple of (pack version, number of objects). If no data is available to read, returns (None, None).
dulwich.pack.read_zlib_chunks(read_some, unpacked, include_comp=False, buffer_size=4096)

Read zlib data from a buffer.

This function requires that the buffer have additional data following the compressed data, which is guaranteed to be the case for git pack files.

Parameters:
  • read_some – Read function that returns at least one byte, but may return less than the requested size.
  • unpacked – An UnpackedObject to write result data to. If its crc32 attr is not None, the CRC32 of the compressed bytes will be computed using this starting CRC32. After this function, will have the following attrs set: * comp_chunks (if include_comp is True) * decomp_chunks * decomp_len * crc32
  • include_comp – If True, include compressed data in the result.
  • buffer_size – Size of the read buffer.
Returns:

Leftover unused data from the decompression.

Raises:

zlib.error – if a decompression error occurred.

dulwich.pack.take_msb_bytes(read, crc32=None)

Read bytes marked with most significant bit.

Parameters:read – Read function
dulwich.pack.unpack_object(read_all, read_some=None, compute_crc32=False, include_comp=False, zlib_bufsize=4096)

Unpack a Git object.

Parameters:
  • read_all – Read function that blocks until the number of requested bytes are read.
  • read_some – Read function that returns at least one byte, but may not return the number of bytes requested.
  • compute_crc32 – If True, compute the CRC32 of the compressed data. If False, the returned CRC32 will be None.
  • include_comp – If True, include compressed data in the result.
  • zlib_bufsize – An optional buffer size for zlib operations.
Returns:

A tuple of (unpacked, unused), where unused is the unused data leftover from decompression, and unpacked in an UnpackedObject with the following attrs set:

  • obj_chunks (for non-delta types)
  • pack_type_num
  • delta_base (for delta types)
  • comp_chunks (if include_comp is True)
  • decomp_chunks
  • decomp_len
  • crc32 (if compute_crc32 is True)

dulwich.pack.write_pack(filename, objects, deltify=None, delta_window_size=None)

Write a new pack data file.

Parameters:
  • filename – Path to the new pack file (without .pack extension)
  • objects – Iterable of (object, path) tuples to write. Should provide __len__
  • window_size – Delta window size
  • deltify – Whether to deltify pack objects
Returns:

Tuple with checksum of pack file and index file

dulwich.pack.write_pack_data(f, num_records, records, progress=None)

Write a new pack data file.

Parameters:
  • f – File to write to
  • num_records – Number of records
  • records – Iterator over type_num, object_id, delta_base, raw
  • progress – Function to report progress to
Returns:

Dict mapping id -> (offset, crc32 checksum), pack checksum

dulwich.pack.write_pack_header(f, num_objects)

Write a pack header for the given number of objects.

dulwich.pack.write_pack_index(f, entries, pack_checksum)

Write a new pack index file.

Parameters:
  • f – File-like object to write to
  • entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
  • pack_checksum – Checksum of the pack file.
Returns:

The SHA of the index file written

dulwich.pack.write_pack_index_v1(f, entries, pack_checksum)

Write a new pack index file.

Parameters:
  • f – A file-like object to write to
  • entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
  • pack_checksum – Checksum of the pack file.
Returns:

The SHA of the written index file

dulwich.pack.write_pack_index_v2(f, entries, pack_checksum)

Write a new pack index file.

Parameters:
  • f – File-like object to write to
  • entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
  • pack_checksum – Checksum of the pack file.
Returns:

The SHA of the index file written

dulwich.pack.write_pack_object(f, type, object, sha=None)

Write pack object to a file.

Parameters:
  • f – File to write to
  • type – Numeric type of the object
  • object – Object to write
Returns:

Tuple with offset at which the object was written, and crc32

dulwich.pack.write_pack_objects(f, objects, delta_window_size=None, deltify=None)

Write a new pack data file.

Parameters:
  • f – File to write to
  • objects – Iterable of (object, path) tuples to write. Should provide __len__
  • window_size – Sliding window size for searching for deltas; Set to None for default window size.
  • deltify – Whether to deltify objects
Returns:

Dict mapping id -> (offset, crc32 checksum), pack checksum