nlhtree_py
Data Structure
This project is a Python3 implementation of the NLHTree, a data structure defined here in some detail.
The NLHTree represents a directory structure as an indented list. Directory names appear alone on a line; files in a given directory are listed at an indent of one space below the directory name, with the file name followed by the Secure Hash Algoritm hash of the file contents, SHA1 or SHA256 as appropriate. These are printed as 40 or 64 hexadecimal digits respectively.
What follows is an NLHTree representing the directory dataDir
.
dataDir
data1 34463aa26c4d7214a96e6e42c3a9e8f55727c695
data2 14193743b265973e5824ca5257eef488094e19e9
subDir1
data11 58089ce970b65940dd5bf07703cd81b4306cb8f0
data12 da39a3ee5e6b4b0d3255bfef95601890afd80709
subDir2
subDir3
data31 487607ec22ee1255cc31c35506c64b1819a48090
subDir4
subDir41
subDir411
data31 0b57d3ab229a69ce5f7fad62f9fe654fe96c51bb
This has two data files in dataDir/
, data1
and data2
. Each
of these is followed by its 20-byte/160-bit SHA1 hash in hexadecimal form.
The file names are indented one space more than the directory name.
There are also four subdirectories, subDir1
, subDir2
, subDir3
,
and subDir4
. These are at the same indent as the two data files.
Files with the subdirectories and indented one space more.
Use in BuildLists
The NLHTree is used in the BuildList; it this context it is prefixed with a title, RSA public key, and timestamp and then optionally digitally signed. If signed it can be used to guarantee the integrity of a file system. That is, it can be used to detect any modifications to an existing directory structure. Alternatively, it can be used to reconstruct a file system in systems such as distributed version control systems, with the content key used to guarantee that the files retrieved are identical to those in the original form of the directory structure.
As an example, this is the BuildList of a similar directory structure:
-----BEGIN RSA PUBLIC KEY-----
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCzxJ0l1e898G/gBB9zBWUoQ7uw
8C2Z6OTMJeXrNcTR2ZW7IIMevzYHeR26w+k54Roiv4Oec1uGGom4I7TSxF1QCmfG
PDaWgvzE4mmwbPCiiYt6cl/y7paG00709ZnbNBjbaaS2Y3gWN+HwiBcENyNrX29i
P3aQwEB1RNVW8r+SIQIDAQAB
-----END RSA PUBLIC KEY-----
sample build list
2016-09-11 23:13:36
# BEGIN CONTENT #
dataDir
data1 32056bdf38bed17ab7f2bfb37421fa5f4caade71
data2 80b3b965bdfde312a4eeacc87c42f2a68ad1c7d8
subDir1
data11 ca83a024cf4bb9c503f89c86b8819e012d64212d
data12 da39a3ee5e6b4b0d3255bfef95601890afd80709
subDir2
subDir3
data31 e5ea2d73b3801b38e2add428fa98219c48c69e93
subDir4
subDir41
subDir411
data31 c4d1f005f36404cf15a00ce00d9a136a35409bc4
# END CONTENT #
SuJvEG5zYe5SAnYEHynZXvjIPdY/Fr792ltnwJyPxyg2QO+GCrSfepXnNeIUMJtG5c4zamqsijFZYuAuuhIHCxM1sLcEM5PVNmU/cJT9BLWI952bAqqcB+qaWRcDdSt/tQKZCvzeujZTCa9MsbygN2Wo+ToaIv6dkB21WufyRSs=
The RSA private key corresponding to the public part of the key at the top of the BuildList is used to generate the digital signature at the bottom. Given these two bits of information, the public key and the digital signature, anyone can verify that the BuildList has not been tampered with.
Utilities
nlh_check_in_data_dir
usage: nlh_check_in_data_dir [-h] [-b LIST_FILE] [-d DATA_DIR] [-j] [-T] [-V]
[-1] [-2] [-3] [-B] [-u U_PATH] [-v]
list any files in the NLHTree not present in the directory,
optional arguments:
-h, --help show this help message and exit
-b LIST_FILE, --list_file LIST_FILE
where to write listing (default = list.nlh)
-d DATA_DIR, --data_dir DATA_DIR
path to data directory
-j, --just_show show options and exit
-T, --testing this is a test run
-V, --show_version print the version number and exit
-1, --using_sha1 using the 160-bit SHA1 hash
-2, --using_sha2 using the 256-bit SHA2 (SHA256) hash
-3, --using_sha3 using the 256-bit SHA3 (Keccak-256) hash
-B, --using_blake2b using blake2b with a 256-bit digest
-u U_PATH, --u_path U_PATH
path to uDir
-v, --verbose be chatty
nlh_check_in_u_dir
usage: nlh_check_in_u_dir [-h] [-b LIST_FILE] [-j] [-T] [-V] [-1] [-2] [-3] [-B]
[-u U_PATH] [-v]
given a project directory, write an NLHTree while backing the project up to U
optional arguments:
-h, --help show this help message and exit
-b LIST_FILE, --list_file LIST_FILE
where to write listing (default = list.nlh)
-j, --just_show show options and exit
-T, --testing this is a test run
-V, --show_version print the version number and exit
-1, --using_sha1 using the 160-bit SHA1 hash
-2, --using_sha2 using the 256-bit SHA2 (SHA256) hash
-3, --using_sha3 using the 256-bit SHA3 (Keccak-256) hash
-B, --using_blake2b using blake2b with a 256-bit digest
-u U_PATH, --u_path U_PATH
path to uDir
-v, --verbose be chatty
nlh_populate_data_dir
usage: nlh_populate_data_dir [-h] [-b LIST_FILE] [-j] [-p PATH] [-T] [-V] [-z]
[-1] [-2] [-3] [-B] [-u U_PATH] [-v]
given an NLHTree and U, recreate the corresponding data directory
optional arguments:
-h, --help show this help message and exit
-b LIST_FILE, --list_file LIST_FILE
where to write listing (default = list.nlh)
-j, --just_show show options and exit
-p PATH, --path PATH path to data directory
-T, --testing this is a test run
-V, --show_version print the version number and exit
-z, --dont_do_it don't actually do anything, just say what you would do
-1, --using_sha1 using the 160-bit SHA1 hash
-2, --using_sha2 using the 256-bit SHA2 (SHA256) hash
-3, --using_sha3 using the 256-bit SHA3 (Keccak-256) hash
-B, --using_blake2b using blake2b with a 256-bit digest
-u U_PATH, --u_path U_PATH
path to uDir
-v, --verbose be chatty
nlh_save_to_u_dir
This is the most frequently used utility. It scans a data directory, builds an NLHTree using any of the four supported hashes, and then either writes the NLHTree out to a file or backs up the input data diretory to a content-keyed store (U) or both.
usage: nlh_save_to_u_dir [-h] [-b LISTFILE] [-d DATADIR] [-j] [-T] [-V] [-z]
[-1] [-2] [-3] [-B] [-u U_PATH] [-v]
Given a project directory, write an NLHTree while backing the project up
to content-keyed store U.
optional arguments:
-h, --help show this help message and exit
-b LISTFILE, --listFile LISTFILE
where to write listing (default = list.nlh)
-d DATADIR, --dataDir DATADIR
path to data directory
-j, --justShow show options and exit
-T, --testing this is a test run
-V, --showVersion print the version number and exit
-z, --dontDoIt don't actually do anything, just say what you would do
-1, --using_sha1 using the 160-bit SHA1 hash
-2, --using_sha2 using the 256-bit SHA2 (SHA256) hash
-3, --using_sha3 using the 256-bit SHA3 (Keccak-256) hash
-B, --using_blake2b using blake2b with a 256-bit digest
-u U_PATH, --u_path U_PATH
path to uDir
-v, --verbose be chatty
Project Status
A good beta. All tests succeed.
Licensing
The material on this github.io website is licensed under a Creative Commons Attribution 4.0 International License.
Project software is licensed under an MIT license. Follow the SOFTWARE LICENSE link below for more information on project software licensing.