Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems
Published as Storage Systems Research Center Technical Report UCSC-SSRC-08-01.
Abstract
As storage systems reach the petabyte scale, it has become increasingly difficult for users and storage administrators to understand and manage their data. File metadata, such as inode and extended attributes are a valuable source of information that can aid in locating and identifying files, and can also facilitate administrative tasks, such as storage provisioning and recovery from backups. Unfortunately, most storage systems have no way to quickly and easily search file metadata at large scale.
To address these issues, we developed Spyglass, a indexing system that efficiently gathers, indexes and queries file metadata in large-scale storage systems. Our analysis of file metadata from real-world workloads showed that metadata has spatial locality in the storage namespace and that the distribution of metadata is highly skewed. Based on these findings, we designed Spyglass to use index partitioning and signature files to quickly prune the file search space. We also developed techniques to efficiently handle index versioning, facilitating both fast update and queries across historical indexes. Experiments on systems with up to 300 million files show that the Spyglass prototype is as much as several thousand times faster than current database solutions while requiring only a fraction of the space.
Publication date:
May 2008
Authors:
Andrew Leung
Minglong Shao
Timothy Bisson
Shankar Pasupathy
Ethan L. Miller
Projects:
Scalable File System Indexing
Ultra-Large Scale Storage
Available media
Full paper text: PDF
Bibtex entry
@techreport{leung-ssrctr0801, author = {Andrew Leung and Minglong Shao and Timothy Bisson and Shankar Pasupathy and Ethan L. Miller}, title = {Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems}, institution = {University of California, Santa Cruz}, number = {UCSC-SSRC-08-01}, month = may, year = {2008}, }