Reimagining Research IT
October 04, 2024 | EdgeDiscovery - Summer/Fall 2024
A computer engineer by training, Rachana Ananthakrishnan started her career at Argonne National Lab as a software engineer working on distributed systems with a focus on security for distributed systems, specifically for grid computing. During this time, she was also part of various National Science Foundation (NSF) projects including XSEDE, and National Institutes of Health (NIH) funded bioinformatics and cancer research projects in roles that today are defined as research software engineer.
Later, Ananthakrishnan transitioned to the University of Chicago (UChicago) to take on customer engagement and product management roles with the Globus platform team. “These roles afforded me the opportunity to engage broadly with the R&E community and understand the cyberinfrastructure landscape—identifying challenges and gaps, and researcher needs. That experience was instrumental in informing and shaping Globus platform capabilities,” shares Ananthakrishnan who is currently the Executive Director & Head of Products at the University of Chicago. She also has a Joint Staff Appointment at the Data Science and Learning Division at the Argonne National Laboratory. “In my current role, I lead the Globus department within the Office of Research at UChicago, to build and deliver sustainable core research infrastructure solutions for scientists.”
Accelerating Science and Discovery
Developed and operated as a not-for-profit service by UChicago, Globus delivers cyberinfrastructure services to scientists for data and compute management, and automation. “Our mission is to accelerate science and reduce time to discovery for researchers worldwide,” explains Ananthakrishnan. “We offer comprehensive cyberinfrastructure capabilities to investigators engaged in data-driven science and scholarship, allowing them to focus on their research while outsourcing supporting activities to a suite of powerful cloud-hosted services.”
Globus is a data and compute management platform used by leading universities, national laboratories, government facilities, and other non-profit research and commercial organizations worldwide. “The Globus team researches, develops, and operates services for reliable file transfer, sharing, remote computation, and automation throughout the research lifecycle, tailored to the needs of a distributed, yet collaboration driven, scientific community,” says Ananthakrishnan. “These services are offered as Software- and Platform-as-a-Service (SaaS and PaaS); a researcher can use a browser to manage data and compute, across a distributed ecosystem of connected computing resources, or build upon these capabilities to create powerful new applications and services. Long-term sustainability underpins our team’s mission, and in pursuit of this goal we offer Globus under a freemium model to the research and education community.”
“Our mission is to accelerate science and reduce time to discovery for researchers worldwide. We offer comprehensive cyberinfrastructure capabilities to investigators engaged in data-driven science and scholarship, allowing them to focus on their research while outsourcing supporting activities to a suite of powerful cloud-hosted services.”
–Rachana Ananthakrishnan, Executive Director & Head of Products
Globus at the University of Chicago; Joint Staff Appointee at Argonne National Laboratory
Promoting Collaboration through Shared Data
Globus was designed from the ground up to support secure management of data and compute across security boundaries, while minimizing the burden for researchers and system administrators. With a hybrid cloud-edge deployment model, secure and reliable orchestration and management capabilities are operated and managed by the Globus team at UChicago. Institutions deploy agents to connect resources such as storage and compute to the Globus ecosystem. “This approach offers layered security, with institutions managing access to the resources, while allowing researchers to easily leverage these powerful data and compute resources, and automate their research tasks, explains Ananthakrishnan. “The Globus platform supports access to a diverse set of storage and computing resources, from lab servers and campus clusters to cloud and supercomputing environments.”
Underpinning all Globus services is Globus Auth, a security platform that integrates with federated identities from thousands of institutions. This makes Globus easily accessible to millions of researchers around the globe, while providing mechanisms for fine-grained policy enforcement when accessing both standard and protected data. Dr. Forough Ghahramani, Assistant Vice President for Research, Innovation, and Sponsored Programs, Edge, adds, “Edge views Globus as a critical component of the vision for advancing research infrastructure. By enabling seamless and secure data transfer across institutions, Globus empowers researchers to collaborate more effectively and manage vast datasets with ease. Globus not only supports the technical demands of large-scale projects but also fosters innovation by making advanced data management accessible to a broader spectrum of the research community.”
Enabling Scientific Breakthroughs
All Globus capabilities can be integrated into any application, whether they be command line clients and thick clients, or web applications such as portals and science gateways. Globus services are accessible via open APIs, and the team also offers Python and JavaScript Software Development Kits (SDKs) to make it easier to add Globus capabilities into applications. “For example, Globus has been integrated with JupyterHub for authentication so that data management capabilities can be used in Jupyter Notebooks,” shares Ananthakrishnan. “Other common services used in the R&D community, such as Open OnDemand, data portals and science gateways, have integrations for managing data via Globus.” The platform is also used to build customized solutions for data management at facilities. For instance, multi-institutional data sharing at the Environmental Molecular Sciences Laboratory, data distribution at the Cryo-EM Facility at Case Western Reserve, and publication in the Materials Data Facility, all use various Globus services.
For over 14 years, Globus has been broadly adopted by researchers in academia, independent research institutes, national labs, government agencies, hospitals and healthcare systems, and even museums. As a self-service system, Globus has over 500,000 registered users, and 60,000 storage systems connected across 80 countries. “Globus has been instrumental in enabling many scientific breakthroughs across different disciplines,” shares Ananthakrishnan. “Most recently, Globus was used to build an open science platform for health professionals to break down silos and foster collaboration, especially when time is of essence, as it was during the pandemic. Dr. Jonathan Ozik and Dr. Valerie Hayot-Sasson at Argonne National Laboratory and the University of Chicago developed an Open Science Platform (OSPREY) for epidemic analysis using Globus services.”
Globus data management capabilities are also used by IceCube, the world’s largest neutrino detector. “The IceCube detector has enabled scientists for the first time to trace the origins of a ghostly subatomic particle, the neutrino, that traveled 3.7 billion light-years to earth,” explains Ananthakrishnan. “IceCube uses Globus to archive its data. Users transfer data with Globus from Madison, Wisconsin to storage locations at the National Energy Research Scientific Computing Center (NERSC) in California, and Deutsches Elektronen-Synchrotron (DESY) in Berlin.”
Other Globus users include the National Solar Observatory which has built its data management and distribution platform on Globus capabilities, moving observational data from the DKIST telescope in Maui, Hawai’i to their facility in Colorado, and securely distribute processed data to its users. “Other notable projects that leveraged Globus capabilities include the Large Hadron Collider where the Higgs Boson was discovered, and the LIGO project that proved the existence of gravitational waves,” says Ananthakrishnan. “There are numerous projects in life sciences that use Globus for data management, including protected data. For example, Human BioMolecular Atlas Program (HuBMAP), a project funded by NIH, uses Globus for authentication and data sharing, to meet its goal to develop an open and global platform to map healthy cells in the human body.”
“The Globus team researches, develops, and operates services for reliable file transfer, sharing, remote computation, and automation throughout the research lifecycle, tailored to the needs of a distributed, yet collaboration driven, scientific community. These services are offered as Software- and Platform-as-a-Service (SaaS and PaaS); a researcher can use a browser to manage data and compute, across a distributed ecosystem of connected computing resources, or build upon these capabilities to create powerful new applications and services. Long-term sustainability underpins our team’s mission, and in pursuit of this goal we offer Globus under a freemium model to the research and education community.”
–Rachana Ananthakrishnan, Executive Director & Head of Products
Globus at the University of Chicago; Joint Staff Appointee at Argonne National Laboratory
Improving Accessibility and Automation
Among Globus’ success stories is the Mesirov Lab at University of California San Diego who added Globus services into their portal. Prior to integrating Globus, it was difficult to transfer large datasets to and from the cloud-hosted GenePattern server, and the technical barrier for use of genomics analysis tools was high. “Now that Globus is integrated into the GenePattern portal, researchers can simply use a point and click interface to easily log in, access, transfer large datasets, perform their analysis, and share the results,” says Ananthakrishnan.
“Another project I was particularly proud of was our work to preserve 50 years of astronomy data after a hurricane that caused the collapse of the radio telescope in Puerto Rico. Together with the University of Florida, the Engagement and Performance Operations Center (EPOC), the Arecibo Observatory, and the Cyberinfrastructure Center of Excellence (CICoE) Pilot, we were able to preserve this astronomy data by rapidly moving it over to the Texas Advanced Computing Center’s Ranch system before it was lost.”
Globus also has ongoing projects within climate science and are part of the Earth System Grid Federation project, a Department of Energy (DOE) funded initiative that is part of an international collaboration to manage publication and distribution of climate model data in collaboration with the World Climate Research Program. This platform, used by researchers who contribute to the IPCC governmental report on climate change, is being redesigned to use Globus capabilities for data management and automation to meet the growing data volumes and user base, and offer new access interfaces for novel use of data for climate and impact science.
“Beyond data, instrument facilities use Globus to trigger automated data and computation tasks from software in the acquisition machine using the API,” continues Ananthakrishnan. “This has enabled end-to-end automation and supports near-real time data processing use cases, including cases where feedback is used to drive the next set of experiment runs. Advanced Photon Source (APS) makes use of Globus to automate analysis of data as they are acquired. By integrating Globus with the software on the acquisition machine connected to the beamline, they detect when data are acquired and launch a Globus flow for analysis. The flow transfers the data to the Argonne Leadership Computing Facility and uses Globus Compute to execute a range of computing functions on an HPC system. These functions perform quality control, reconstruction, machine learning model training and inference, to name just a few operations. The resulting reconstructions are stored in a catalog and the researchers can then view the reconstructions as the experiment progresses, or search over data acquired.”
Supported by an expert team with decades of research data management experience, Globus provides new users with access to a wealth of resources for getting started. “We have an excellent documentation site, which provides users with best practices, key resources and more,” shares Ananthakrishnan. “Our discussion group is a great place to ask questions and get answers quickly from Globus, as well as from our community of users.”
“We also host an annual user conference in Chicago in the Spring with hands-on sessions and regularly conduct workshops and tutorials at institutions, like the one held last year at Princeton University, which was co-hosted by Edge, EPOC (TACC and ESnet co-PIs) and Globus. In addition, we have a YouTube channel, a monthly Globus Newsletter, and regularly scheduled office hours where you can ask our engineers your toughest questions.”
Keeping Pace with a Changing Landscape
In the next five to ten years, Ananthakrishnan envisions further growth within the Globus community as they continue to offer services and support that simplifies data and compute management tasks to accelerate research and discovery. “The researchers we serve constantly push the boundaries for innovation, and we strive to build enhanced capabilities that will support their endeavors. With artificial intelligence (AI) and machine learning (ML) research and its application across numerous domains in the forefront, offering capabilities that support these efforts will continue to be an important area of focus for us as well. For example, we’re looking at data management in service of training machine learning models, which includes identifying new methods for organizing, sharing, and managing data. ‘AI/ML ready data’ requires an automated way of gathering, annotating, publishing, and accessing such data.”
With advances in high resolution imaging instruments such as cryogenic electron microscopes and synchrotron beamlines, another key focus for Globus will be on instrument facilities. “These advancements will require automation of data flows to increase throughput and researcher productivity, as well as to ensure the instrument remains highly utilized,” explains Ananthakrishnan. “Self-driving labs that incorporate advances in AI/ML and linking of instrument facilities with compute facilities for experiment-time data analysis are areas we are working on today. Several of these are seeing early adoption at national labs, such as our work at Argonne with the Advanced Photon Source (APS) and the Argonne Leadership Computing Facility (ALCF) to provide ‘experiment-time’ data analysis and complete automation of data pipelines. We envision bringing such capabilities to instrument and core facilities at campuses and independent research institutes.”
“An overarching theme across all these capabilities is compliance,” continues Ananthakrishnan. “Motivated by a wide variety of factors, ranging from broader cyber threats and posture, regulations and stricter security requirements by granting agencies, to researchers engaged in novel pursuits enabled by combining protected and open data, there has been an increase in cybersecurity and compliance requirements for services used by researchers. We expect to grow Globus’ investment in this area beyond the current offering for standard authorized data, and protected data including CUI and HIPAA. Innovations in how common resources are deployed and dynamically configured and managed for various security levels by institutions to drive the need for new security capabilities will be a driver for some of this work.”
Breaking Down Barriers to Innovation
For researchers and institutions looking to improve their data management practices, Ananthakrishnan suggests examining their current processes and engaging with others in their community. “We have a vibrant community that is collaborative and open to sharing their approaches and lessons learnt—so engage and learn how other organizations are addressing and modernizing their infrastructure. Equally important is ensuring researchers are involved —they are primary stakeholders—listening to their needs, and how their plans are evolving to inform plans is key. Crafting an IT cyber infrastructure plan to support FAIR (findable, accessible, interoperable, reusable) data will be transformational to the science enterprise, in addition to meeting the policies from funding agencies.”
By creating a frictionless data management system for researchers, Globus breaks down barriers and helps improve the overall user experience. With this streamlined efficiency, researchers can spend more time on their work, and less time navigating how to move data to distributed resources and collaborators at other institutions. “Supporting researchers who address a diverse array of challenges—many with significant societal implications—has been immensely fulfilling,” shares Ananthakrishnan. “I have the opportunity to learn from and assist passionate scientists working on various fronts: expanding our knowledge of the universe, developing innovative materials for batteries and body armor, studying viruses and creating vaccines, exploring climate change and its effects, decoding human creativity, and researching cures for rare diseases, among others. Knowing that our work plays a role in these significant endeavors is truly a rewarding experience.”