From: Avery Pennarun Date: Sun, 20 Feb 2011 02:48:06 +0000 (-0800) Subject: hashsplit: use shorter offset-filenames inside trees. X-Git-Url: https://git.michaelhowe.org/gitweb/?a=commitdiff_plain;h=d130210ac92511773b54219aaed4253ea63ba3ac;p=packages%2Fb%2Fbup.git hashsplit: use shorter offset-filenames inside trees. We previously zero-padded all the filenames (which are hexified versions of the file offsets) to 16 characters, which corresponds to a maximum file size that fits into a 64-bit integer. I realized that there's no reason to use a fixed padding length; just pad all the entries in a particular tree to the length of the longest entry (to ensure that sorting alphabetically is still equivalent to sorting numerically). This saves a small amount of space in each tree, which is probably irrelevant given that gzip compression can quite easily compress extra zeroes. But it also makes browsing the tree in git look a little prettier. This is backwards compatible with old versions of vfs.py, since vfs.py has always just treated the numbers as an ordered set of numbers, and doesn't care how much zero padding they have. Signed-off-by: Avery Pennarun --- diff --git a/lib/bup/hashsplit.py b/lib/bup/hashsplit.py index 2b2163b..914c2bb 100644 --- a/lib/bup/hashsplit.py +++ b/lib/bup/hashsplit.py @@ -119,11 +119,14 @@ def split_to_blobs(makeblob, files, keep_boundaries, progress): def _make_shalist(l): ofs = 0 + l = list(l) + total = sum(size for mode,sha,size, in l) + vlen = len('%x' % total) shalist = [] for (mode, sha, size) in l: - shalist.append((mode, '%016x' % ofs, sha)) + shalist.append((mode, '%0*x' % (vlen,ofs), sha)) ofs += size - total = ofs + assert(ofs == total) return (shalist, total)