Moving data day

How University of Michigan scientists can focus on science while safely transferring data using Globus

September 20, 2023  | U-M Medical School Blog

University of Michigan (U-M) data technology experts met with Globus team members to talk about data transfer, a topic of great interest for scientists since nowadays most scientists are data scientists. They need not only to collect, curate and analyze data, but also manage and move data between servers, and often very large quantities of it. Scientists are also responsible for the security of their data requiring safe data management tools.

In a world where science is collaborative, it is crucial to move data securely and effectively between research partners in a distributed environment, within or across organizations. But the super large size of most data sets constitutes one of the many challenges of transferring data between servers. “Moving large amounts of data is plain painful,” said Ken Weiss, IT Project Senior Manager in the U-M Department of Computational Medicine and Bioinformatics (DCMB). “Given the amount of data we are moving these days, it can take days to weeks to months to transfer and you need to be sure it is done as quickly as possible and with accuracy.”

This is why the U-M subscribed to Globus, a research cyber infrastructure developed and operated as a not-for-profit service by the University of Chicago. Globus offers a platform that transfers data quickly, securely and with a tracking system. With Globus, scientists can select a set of data and a destination no matter how large the data set and how far the delivery is. The data is transferred in a highly secure and insured environment.

For example, DCMB recently welcomed Kin Fai Au, Professor of Computational Medicine and Bioinformatics, from Ohio State University. “We had to move over 500 terabytes of data from Au’s lab and with Globus and the infrastructure at and between OSU and U-M, we were able to sustain transfer rates of over 20TB per day,” said Weiss. “Without this service, it would have taken months to move this data over the wire –or we could have loaded a large box of tapes in the back of Dr. Au’s car!”

How does it work?

The platform has been designed with the user in mind, and it takes only a few clicks to initiate a transfer from a web browser interface. Globus makes it happen “in the background.” For example, you have data at Stanford University that you want to move to U-M. You login to Globus with your U-M credentials and state where you want to put the data locally through a Globus collection (an endpoint for accessing your data). You then access your Stanford data through a different Globus collection and enter your Stanford credentials. You select the files and/or folders at Stanford University and then initiate the transfer by clicking the “Start” button. That’s it! Globus brokers the transfer on your behalf and you can logout of Globus, even shut down your computer, while the transfer continues without you nor your computer being involved. When the transfer is completed, you receive an email notification.

A Linux command line version is also available for those who would like to use Globus services from the command line or in shell scripts. Globus keeps upgrading its service and interface, and is currently customizing its platform for U-M users, so our scientists can further focus on science rather than on transfer technology.

Another benefit of using Globus is its ability to handle interruptions and still complete the transfer. Let’s say there was a network hardware failure or a disk full issue, Globus will try to continue the transfer every 5 minutes for up to 1 week before timing out. Once the network issue is resolved or more space is freed up on the storage, Globus will continue right where it left off –while other transfer protocols would make you start over.

Cost and subscription

There is no cost to the individual U-M scientist for Globus since the university pays the service subscription fee. The recipient does either need to create a Globus ID (which is free) or be a user at an institution that is partnered with Globus.

Beyond moving data

Globus is always improving and adding onto its offerings. It is more than “just a data moving” service. Currently, Globus provides the ability to create workflows, “Globus Flows,” to automate repetitive tasks. There is also the ability to prepare and submit compute jobs, “Globus Compute,” which is a distributed Function as a Service (FaaS) platform that enables reliable, scalable, and high performance remote function execution on remote clusters, including the Great Lakes cluster(link is external) at U-M. Note that Globus does not store data.

If you can use a cell phone, you can use Globus.”

–Ken Weiss, University of Michigan

View the original article