U: A Data Store Organized by Content Key

Rather than storing data files in a hierarchical directory structure where both directories and data files are given string names, u stores files named by their content keys. The keys are generated by either SHA1 or SHA3 (Keccak-256). Storage by content keys has several advantages. For one, it is trivial to determine whether a file is corrupt: you simply recalculate the hash. In a distributed storage system, files are requested by key. All machines participating in the retrieval can check file integrity as the file is passing through and drop and re-request if the hash doesn’t match the content key.

u is optimized for storing very large numbers of files. The first byte of the content key determines which top-level directory the file goes in; the second byte determines its lower-level directory. So if a file’s content hash is abcdef…1234, then it will be stored in ab/cd/ef…1234. There are 256 top-level directories and 256 subdirectories below each of these, so 256x256 = 65536 lower-level directories.


    // Determine the SHA1 or SHA3 content hash of an arbitrary file
    func FileSHA1(path string) (hash string, err error)
    func FileSHA3(path string) (hash string, err error)
    
    // Create a u256x256 directory structure
    func New(path string) *U256x256
    
    // Attributes for files in U, a u256x256 directory tree
    func (u *U256x256) Exists(key string) bool
    func (u *U256x256) FileLen(key string) (length int64, err error)
    func (u *U256x256) GetPathForKey(key string) string
    
    // Copy a data file and add the copy to U using an SHA1 key.  If the
    // key doesn't match, the operation fails.
    func (u *U256x256) CopyAndPut1(path, key string) (
        written int64, hash string, err error)
    // Retrieve a file by its SHA1 key.
    func (u *U256x256) GetData1(key string) (
        data []byte, err error)
    // Insert a data file into U; the original is lost.
    func (u *U256x256) Put1(inFile, key string) (
        length int64, hash string, err error)
    // Write a buffer into U, storing it by its SHA1 key.
    func (u *U256x256) PutData1(data []byte, key string) (
        length int64, hash string, err error)
    
    // Similar functions using the SHA3 (Keccak-256) hash function.
    func (u *U256x256) CopyAndPut3(path, key string) (
        written int64, hash string, err error)
    func (u *U256x256) GetData3(key string) (
        data []byte, err error){ var path string
    func (u *U256x256) Put3(inFile, key string) (
        length int64, hash string, err error)
    func (u *U256x256) PutData3(data []byte, key string) (
        length int64, hash string, err error)

github link to project project