nlhtree

Tree

An NLHTree is a data structure in which each node consists of a pair: a name and either a list of such pairs or a hash. The NLHTree is serialized as an indented list, with each immediate child indented one space more than the parent

Hash

All of the hashes in the tree have the same number of bits, either 160 (for an SHA1 hash) or 256 (for an SHA2 or SHA3 hash).

Node Names

All names in the tree must be valid file names. For the moment, this will be understood to include letters, both upper and lower case, digits, the dash (‘-’), and the underscore (’_‘). Node names may not include either spaces or line breaks (CR=13 and LF=10).

Top

The topmost node in the tree is a pair of the first type and belongs to the NLHTree class. It consists of a name and a list of NLHNodes, where an NLHNode is either an NLHTree or an NLFLeaf. At each level the list of nodes is sorted by name.

Intermediate Nodes

All intermediate nodes in the tree are also instances of the NLHTree class and so consist of a name and a (possibly empty) sorted list.

Leaf Nodes

Leaf nodes in the tree, instances of the NLHLeaf class, consist of a valid name and a hash. Once formed an NLHLeaf is immutable in the sense that its fields (its name and its hash) cannot be changed. If the leaf node is in an NLHTree, the tree has a reference to it.

Example

dataDir
 data1 bea7383743859a81b84cec8fde2ccd1f3e2ff688
 data2 895c210f5203c48c1e3a574a2d5eba043c0ec72d
 subDir1
  data11 cb0ece05cbb91501d3dd78afaf362e63816f6757
  data12 da39a3ee5e6b4b0d3255bfef95601890afd80709
 subDir2
 subDir3
  data31 8cddeb23f9de9da4547a0d5adcecc7e26cb098c0
 subDir4
  subDir41
   subDir411
    data41 31c16def9fc4a4b6415b0b133e156a919cf41cc8
 zData 31c16def9fc4a4b6415b0b133e156a919cf41cc8

In this example, the NLHTree represents the files in the directory dataDir. The directory contains three files (data1, data2, and zData and four subdirectories (subDir1, subDir2 (which is empty), subDir3, and subDir4. The name of each of the leaf nodes (files) is followed by its content hash. In this case these are SHA1 hashes, which are 20 bytes long, and so written as 40 hex digits.

Utility

NLHTrees are useful as concise descriptions of file systems. In particular they are used in building and editing BuildLists.

A BuildList contains a recursive data structure, its NLHTree. Each leaf node has associated with it a content hash which can be used to verify the integrity of the file. The BuildList has an associated RSA public key, title and date and can be signed using the signatory’s private RSA key. The presence of the RSA public key makes it possible for anyone obtaining a copy of the BuildList to verify that the list is a true representation of the BuildList.

Package Status

A good beta. All tests succeed.


github link to project project