From: Avery Pennarun <apenwarr@gmail.com>
Date: Sun, 20 Feb 2011 02:48:06 +0000 (-0800)
Subject: hashsplit: use shorter offset-filenames inside trees.
X-Git-Url: https://git.michaelhowe.org/gitweb/?a=commitdiff_plain;h=d130210ac92511773b54219aaed4253ea63ba3ac;p=packages%2Fb%2Fbup.git

hashsplit: use shorter offset-filenames inside trees.

We previously zero-padded all the filenames (which are hexified versions of
the file offsets) to 16 characters, which corresponds to a maximum file size
that fits into a 64-bit integer.  I realized that there's no reason to
use a fixed padding length; just pad all the entries in a particular tree to
the length of the longest entry (to ensure that sorting
alphabetically is still equivalent to sorting numerically).

This saves a small amount of space in each tree, which is probably
irrelevant given that gzip compression can quite easily compress extra
zeroes.  But it also makes browsing the tree in git look a little prettier.

This is backwards compatible with old versions of vfs.py, since vfs.py has
always just treated the numbers as an ordered set of numbers, and doesn't
care how much zero padding they have.

Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
---

diff --git a/lib/bup/hashsplit.py b/lib/bup/hashsplit.py
index 2b2163b..914c2bb 100644
--- a/lib/bup/hashsplit.py
+++ b/lib/bup/hashsplit.py
@@ -119,11 +119,14 @@ def split_to_blobs(makeblob, files, keep_boundaries, progress):
 
 def _make_shalist(l):
     ofs = 0
+    l = list(l)
+    total = sum(size for mode,sha,size, in l)
+    vlen = len('%x' % total)
     shalist = []
     for (mode, sha, size) in l:
-        shalist.append((mode, '%016x' % ofs, sha))
+        shalist.append((mode, '%0*x' % (vlen,ofs), sha))
         ofs += size
-    total = ofs
+    assert(ofs == total)
     return (shalist, total)