THIS IS THE ARCHIVED SSRC SITE.
Maintained by Ethan L. Miller.
The current CRSS site is at https://www.crss.us/.

Qualifying exam: Andrew Leung

Andrew Leung will take his qualifying exam to advance to candidacy for the Ph.D. thesis titled:
Organizing, Indexing, and Searching Large-Scale File Systems

As the world moves towards a digital infrastructure, there has been a rapidly increasing demand for data storage. This increasing demand for storage has resulted in file systems that store petabytes of data, billions of files, and serve thousands of users. Unfortunately, as file systems have scaled up in capacity and performance, file organization and retrieval has not kept pace. As a result, large-scale file systems commonly store far more data than can be easily or effectively managed. Large-scale file system users and administrators waste increasing amounts of time and resources organizing, finding, and managing data leading to utility and usability decreasing as file systems increase in scale.

The complexity of managing growing file systems has lead to an increasing focus on file system search. Close to two decades of research has demonstrated the ability of file system search to address many file management problems and improve file retrieval. Unfortunately, current file system search solutions are not designed for large-scale file systems and do not easily scale to such large systems. As a result, large-scale file system user and administrators continue to lack an effective search solution.

I propose a thesis that bridges the current gap between large-scale file systems and file system search technology. First, this proposal examines the fundamental requirements for an efficient file system search solution and why large-scale file systems make designing such a solution so difficult. Then, backed by analysis of real-world file system trace data, this proposal advocates a search solution that exploits large-scale file system properties to achieve both scale and performance. This proposal then presents the design and implementation of a scalable, high-performance file system metadata search system, which leverages large-scale file system properties to outperform existing metadata search solutions by 1--3 orders of magnitude. To complete my thesis I propose two additional chapters. The first extends some of my basic file metadata search concepts towards keyword search in file systems, providing fast, unstructured search of a file system's contents. The second explores how search can be directly integrated with the file system as a first-class function by re-designing how the file system manages metadata. I conclude by proposing a time line for completion and tentative milestones.

When:
Wednesday, December 10, 2008 at 3:30 PM

Where:
E2-599

SSRC Contact:
Leung, Andrew

Last modified 24 May 2019