Building Hosted Services for Scientists: 4 Important Lessons Learned

October 31, 2011   |  Ian Foster

In our work with many hundreds of researchers who work in smaller labs, we’ve learned a few things about what is likely to be adopted and what is likely to go nowhere. For these scientists, who represent the majority of researchers working today, cyberinfrastructure can’t be delivered by providing software to be installed in a lab: they too often lack the local infrastructure and expertise.

Instead, we need to follow the lead of the commercial world and deliver required capabilities via software-as-a-service (SaaS: aka "cloud"). Small businesses don't run their own information technology (IT): they outsource email, accounting, customer relationship management, Web hosting, payroll, etc., to third party SaaS providers. Similarly, we should aspire to identify and then outsource key functions required for effective research in all areas of science.

We have been experimenting with such an approach in Globus Online, which provides in its first instantiation data movement services via SaaS methods. In the year or so since we launched this service, thousands of users have moved hundreds of millions of files – and we’ve been fortunate to glean lots of useful feedback from smaller lab scientists in particular. I share four of the most useful – and in some cases surprising -- of these below:

  1. Scientists love Web 2.0 interfaces. Modern IT practice emphasizes the use of simple (Web, REST, command line) interfaces to sophisticated hosted services. Web interfaces are good for occasional users; REST interfaces are good for integration with other tools; and command line interfaces are good for scripting. That's the theory, and we've found that it's also true in practice. Most scientists are accustomed to such intuitive interfaces in their daily lives (outside the lab), so they are delighted when similar interfaces are available for scientific services.
  2. Facilities like SaaS services too. This was a surprise to us, although perhaps it should not have been. Operators of computational and experimental facilities are under pressure from their users for more modern (e.g., Web 2.0) interfaces for various services. They like the fact that a third party will provide these services and in the process improve user experience and reduce support demands on the facility.
  3. SaaS can greatly improve user experience. When users receive software via tarballs or other packaging systems for local installation, they often encounter problems such as installation issues; inconsistencies with local configurations; out-of-date software; slow response to bug reports; lack of local expertise for optimization and problem determination. We find that the claimed benefits of SaaS in these areas are indeed realized in practice. With SaaS, local software installation and configuration problems disappear. Software can be updated in hours rather than weeks or months. Proactive monitoring can detect and respond to problems, sometimes before users notice them--which really gets their attention!
  4. Hosting services on commercial cloud providers can be more reliable and cheaper than private hosting. We started hosting our services on a mix of "public cloud" (Amazon Web Services) and private systems. Our private systems are well run, but we've found that the public cloud is far more reliable. Not only are they highly motivated to be reliable, we can replicate state and key services (and do so in different geographical regions) and dynamically scale resources in response to changing load. And the associated costs are small--running a service like Globus Online doesn't require a lot of computing resources. Thus we've moved entirely to public cloud, with good results.

We’re looking forward to continuing to build in Globus Online a hosted set of services that researchers can rely on to get their work done. To that end, we welcome all the feedback we can get – either posted as a comment here or sent to