The Environmental Molecular Sciences Laboratory (EMSL) is a Department of Energy, Office of Science, User Facility funded by the Biological and Environmental Research program. EMSL set up a data management pipeline together with the Northwest Cryo-EM Processing Center (PNCC) and with Oregon Health Sciences University (OHSU) that enables researchers to build detailed 3D models and determine the atomic structures of biomolecules using cryo-Electron Microscopy (cryo-EM).
Four cryo-EM microscopes are maintained at OHSU. These microscopes generate terabytes of raw data on a daily basis and require intensive computation and data management in order to access, transfer, and run through lots of computations for analysis. During the multi-step process the raw data can double and even triple in size. OHSU uses EMSL’s infrastructure to support the intensive computation and extremely large data sets that are generated.
The Globus platform and service was selected to address several key data management requirements that surfaced as the pipeline was developed, including the need to provide data access to end users over a long distance and across institutional boundaries. Workflow automation was also needed to further accelerate some of the steps in the multi-step pipeline.
Globus provides EMSL, OHSU and PNCC with an easy method to access and disseminate the data to users through shared endpoints for each project, using Globus Auth to enable access via existing identities. Globus Transfer enables the terabytes of raw data which are generated daily at OHSU to be moved efficiently to EMSL, where a supercomputer is used to process the data. The staff at EMSL were able to write a script using the Globus Command Line Interface (CLI), and pull data from OHSU to EMSL, thereby automating part of the workflow. And with the Globus management console, the status of the data transfers is easily monitored at EMSL
- "Globus provides us with the ability to transfer multiple data streams simultaneously, and as a result we were able to reduce the time it took for the data to be accessible from days down to hours"
- "PNCC would not have been possible without Globus- last I looked we have moved 1.7 PB of data"
- "Globus handles authentication issues extremely well which is important as we have people across multiple facilities trying to access the data."
- "Moving terabytes of data daily has become a reality in Cryo-EM pipelines"