Security has traditionally presented a barrier to performing comprehensive analysis over multiple data sets. Accumulo helps overcome this by allowing data owners to label individual fields within each data record. At query time, users must present credentials which are checked against the stored access labels. Within each record, users only see the fields they are authorized to see.
This technology represents a breakthrough in scalable data storage, removing the need to physically isolate data of varying security levels by ensuring logical isolation throughout the storage and query process. Security labels make it possible to combine data when necessary, without expensive data movement, and maintain data isolation as needed.
Security Labels can be used as part of a comprehensive security architecture to provide a high degree of assurance for organizations with high security requirements such as Government, Health, Finance, and Enterprise.
Accumulo provides the capability to do server-side computation that can be used to transform some types of MapReduce jobs into continuously updated result sets. This means the results of some MapReduce jobs can be updated incrementally rather than in a batch oriented manner, and made immediately available for queries.
Apache Hadoop MapReduce jobs over data stored in Accumulo tables is also supported. Since tables are partitioned and distributed automatically, MapReduce jobs can take advantage of the same parallelism and data locality as when storing records in HDFS.
Accumulo allows users to keep multiple versions of their data. Data versioning can be controlled via a policy mechanism such as "keep the latest version", "keep the last 12 versions", or "keep versions newer than 10 days ago"
This provides users with a lot of flexibility for remaining compliant with data retention policies and increased safety against accidental overwrites.
In addition to versioning policies, Accumulo tables can be snapshotted to preserve the state of a table at a particular point in time. Snapshots are very efficient as Accumulo stores the original data and tracks changes separately, enabling snapshots to be created very quickly and without using a lot of additional storage.
Based on Google's proven BigTable design, and incorporating technologies such as Apache Hadoop, Zookeeper, and Thrift, Accumulo is built for scale.
Accumulo is able to store very large tables of records, sorted by record IDs, and provides mechanisms for quickly retrieving one or more records from anywhere in the table.
Unlike some NoSQL databases that require administrators to dictate data partitioning and replication, Accumulo automatically load balances data partitions over new machines and gracefully when machines are removed from the cluster.
Data replication is easily configurable and handled seamlessly by the Hadoop Distributed File System, which comprises Accumulo's persistent storage layer.
Accumulo also provides the ability to automatically age off records through a flexible versioning system, making it easier to comply with data storage policies and eliminating costly data removal efforts
Accumulo is an open source project available under the Apache license. The source code is available at the Apache Incubator site.