Evaluation of a Hybrid Approach for Efficient Provenance Storage
Appeared in ACM Transactions on Storage .
Abstract
Provenance is the metadata that describes the history of objects. Provenance provides new functionality in a variety of areas, including experimental documentation, debugging, search, and security. As a result, a number of groups have built systems to capture provenance. Most of these systems focus on provenance collection, a few systems focus on building applications that use the provenance, but all of these systems ignore an important aspect: efficient long-term storage of provenance.
In this article, we first analyze the provenance collected from multiple workloads and characterize the properties of provenance with respect to long-term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of Web graph compression (adapted for provenance) and dictionary encoding, provides the best trade-off in terms of compression ratio, compression time, and query performance when compared to other compression schemes.
Publication date:
November 2013
        Authors:
        
            
                Yulai Xie
            
        
            
                Kiran-Kumar Muniswamy-Reddy
            
        
            
                Dan Feng
            
        
            
                Yan Li
            
        
            
                Darrell D. E. Long
            
        
    
        Projects:
        
            Archival Storage
        
    
Available media
Full paper text: PDF
Bibtex entry
@article{xie-tos13,
  author       = {Yulai Xie and Kiran-Kumar Muniswamy-Reddy and Dan Feng and Yan Li and Darrell D. E. Long},
  title        = {Evaluation of a Hybrid Approach for Efficient Provenance Storage},
  journal      = {ACM Transactions on Storage},
  volume       = {},
  month        = nov,
  year         = {2013},
}
    
