DOE Data Days (D3)
Globus will be speaking at the 2020 Department of Energy (DOE) Data Days virtual event:
Scalable Data Management for National Facilities Using the Modern Research Data Portal
National user facilities such as the Advanced Photon Source (APS), the Advanced Light Source (ALS), and Leadership Class Facilities, are generating large volumes of data daily. As data volumes grow, the research enterprise is increasingly challenged by what should be mundane tasks: reliably moving data from instruments and computing resources, easily describing data for downstream discovery, and making the data accessible (often with appropriate access controls) to distributed groups of collaborators. The ad hoc methods currently employed at many facilities place undue burden on scientists and system administrators alike, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its interactive web browser interface addresses simple file transfer and sharing scenarios, large scale automation typically requires integration of the research data management platform it provides into bespoke applications.
One such example, among many, is the Petrel data portal (https://petreldata.net) developed by the Argonne Leadership Computing Facility (ALCF) and Globus, used by researchers to manage data in diverse fields including materials science, cosmology, machine learning, and serial crystallography. The portal facilitates automated ingest of data from APS beamlines and other sources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users. As security and privacy are often critical requirements, the portal employs fine-grained permissions that control both visibility of metadata and access to the datasets themselves. It is based on the Modern Research Data Portal design pattern, jointly developed by the ESnet and Globus teams, and leverages capabilities such as the Science DMZ for enhanced performance and to streamline the user experience.
We will describe common use cases in user facilities that motivate the need for such data portals, illustrated by further examples, and will demonstrate how DOE investigators can rapidly develop and deploy these capabilities to scale up their research.