Categorygithub.com/microsoft/hdfs-mount
modulepackage
0.0.0-20180326201108-b70c23fc8e2a
Repository: https://github.com/microsoft/hdfs-mount.git
Documentation: pkg.go.dev

# README

hdfs-mount

Build Status

Allows to mount remote HDFS as a local Linux filesystem and allow arbitrary applications / shell scripts to access HDFS as normal files and directories in efficient and secure way.

Features (Planned)

  • High performance
    • directly interfacing Linux kernel for FUSE and HDFS using protocol buffers (requires no JavaVM)
    • designed and optimized for throughput-intensive workloads (throughput is traded for latency whenever possible)
    • full streaming and automatic read-ahead support
    • concurrent operations
    • In-memory metadata caching (very fast ls!)
  • High stability and robust failure-handling behavior
    • automatic retries and failover, all configurable
    • optional lazy mounting, before HDFS becomes available
  • Support for both reads and writes
    • support for random writes [slow, but functionally correct]
    • support for file truncations
  • Optionally expands ZIP archives with extracting content on demand
    • this provides an effective solution to "millions of small files on HDFS" problem
  • CoreOS and Docker-friendly
    • optionally packagable as a statically-linked self-contained executable

Current state

"Alpha", under active development. Basic R/O scenarios, key R/O throughout optimizations and ZIP support are implemented and outperform existing HDFS/FUSE solutions. If you want to use the component - come back in few weeks If you want to help - contact authors

Building

Ensure that you cloned the git repository recursively, since it contains submodules. Run 'make' to build and 'make test' to run unit test. Please use Go version at least 1.6beta2. This version contains bugfix for handling zip64 archives necessary for hdfs-mount to operate normally.

Other Platforms

It should be relatively easy to enable this working on MacOS and FreeBSD, since all underlying dependencies are MacOS and FreeBSD-ready. Very few changes are needed to the code to get it working on those platforms, but it is currently not a priority for authors. Contact authors if you want to help.

# Functions

No description provided by the author
No description provided by the author
Returns minimum of two integers.
Returns true if err==nil or err is expected (benign) error which should be propagated directoy to the caller.
Creates default retry policy.
Creates an instance of FaultTolerantHdfsAccessor.
Creates new instance of FaultTolerantHdfsReader.
Creates new instance of FaultTolerantHdfsWriter.
Creates new file handle.
Creates new adapter.
Opens the reader (creates backend reader).
Opens the file for writing.
Creates an instance of mountable file system.
Creates an instance of HdfsAccessor.
Creates new instance of HdfsReader.
Creates new instance of HdfsWriter.
Creates trivial retry policy which disallows all retries.
No description provided by the author
Creates new file handle.
Creates root dir node for zip archive.

# Variables

No description provided by the author
Built time overwritten automatically by the build.
No description provided by the author
No description provided by the author
GITCommit overwritten automatically by the build.
Built hostname overwritten automatically by build.
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

Attributes common to the file/directory HDFS nodes.
Encapsulates state and operations for directory node on the HDFS file system.
Adds automatic retry capability to HdfsAccessor with respect to RetryPolicy.
Implements ReadSeekCloser interface with automatic retries (acts as a proxy to HdfsReader).
Implements HdfsWriter interface with automatic retries (acts as a proxy to HdfsWriter).
No description provided by the author
Represents a buffered (or cached) sequential fragment of the file.
Represends a handle to an open file.
Wraps FileHandle exposing it as ReadSeekCloser intrface Concurrency: not thread safe: at most on request at a time.
Encapsulates state and routines for reading data from the file handle FileHandleReader implements simple two-buffer scheme which allows to efficiently handle unordered reads which aren't far away from each other, so backend stream can be read sequentially without seek.
Encapsulates state and routines for writing data from the file handle.
No description provided by the author
FsInfo provides information about HDFS.
Allows to open an HDFS file as a seekable read-only stream Concurrency: not thread safe: at most on request at a time.
No description provided by the author
No description provided by the author
Encapsulats policy and logic of handling retries.
No description provided by the author
No description provided by the author
Encapsulates state and operations for a directory inside a zip file on HDFS file system.
Encapsulates state and operations for a virtual file inside zip archive on HDFS file system.
Encapsulates a file handle for a file inside a zip archive.

# Interfaces

Interface to get wall clock time (taking an indirection makes unit testing easier).
Interface for accessing HDFS Concurrency: thread safe: handles unlimited number of concurrent requests.
Allows to open HDFS file as a seekable/flushable/truncatable write-only stream Concurrency: not thread safe: at most on request at a time.
RandomAccessReader implments io.ReaderAt, io.Closer providing efficient concurrent random access to the HDFS file.
Implements simple Read()/Seek()/Close() interface to read from a file or stream Concurrency: not thread safe: at most on request at a time.
Interface to open a file for reading (create instance of ReadSeekCloser).