Data Transfer Tools: GridFTP using Globus Online
This article describes how to transfer data using Globus Online. It covers:
- An introduction to Globus Online
- What you need
- Selecting endpoints
- Performing a transfer using the Globus web interface
- Performing a transfer using the Globus command-line interface
Globus Online is a web-based service for managing data transfers efficiently. After some one-time setup tasks, you can initiate high-performance data transfers which can run non-interactively, can automatically re-start if there's a problem, and can even verify the integrity of the data transfer once completed. Please read the Globus "Getting Started" guide before reading further.
You can use a Globus endpoint on the JASMIN GridFTP server, and tranfer files to/from it using an online tool provided by Globus, or by interacting with it using the Globus command-line interface (CLI).
This guide will show you how to set up Globus endpoints using your CEDA account, and move data between those endpoints.
What you need
In order to proceed further, you will need to set up the following items:
- At the JASMIN end
- A CEDA Account to which you have uploaded a public key
- A login account
- Access to the high-performance data transfer servers
- Currently, there is an additional manual step which needs to be performed by the system team to grant you access to the gridftp server. We hope to automate this process in the future.
- At Globus Online
- At the remote end
- Details of the remote grdiftp server you need to connect to
JASMIN gridftp server
Use the JASMIN GridFTP server endpoint.
You can slo locate this endpoint with Globus Online by going to Manage Endpoints, then "search all" and enter
ceda#jasmin-xfer. When you activate the endpoint, you will be prompted to enter your CEDA web account credentials (the same as those you would use for MyCEDA).
Endpoint at the UK Research Data Facility (RDF)
Use the ARCHER RDF endpoint.
Locate this endpoint with Globus Online by going to Manage Endpoints, then "search all" and enter
archer#rdf. When you activate the endpoint, you will be redirected to a web interface provided by RDF where you will need to enter your RDF credentials.
Endpoint at another institution
Consult documentation for the remote GridFTP server you with to connect to, or contact the helpdesk of the host institution for details of how to use an existing endpoint there, or configure your own.
Endpoint on your local machine
Globus Connect Personal can be used to set up an endpoint on your desktop or laptop, but because the "last mile" of the network path to that machine is unlikely to be suitable for high-performance transfers, it is not receommended for large-scale use, but can be useful for testing or smaller-scale transfers.
TOP TIP: There are instructions for installing Globus Connect Personal at the command line on a linux machine, as a regular user (non-root) and behind a firewall. These enable most remote users of JASMIN to set up a personal globus endpoint with the ability to at least test transfers to the JASMIN gridftp server and see the benefit of managed transfers using Globus, usually with significant performance benefit. For this to work you will need high-performance data transfer access on JASMIN.
Perform a transfer using Globus Online web interface
- Go to https://www.globus.org/app/transfer or select "Transfer Files" from the menu
- Log in with your Globus Online account credentials
- You are presented with a graphical file transfer interface
- Enter the name of one endpoint in the "Endpoint" box of the left hand window
- Enter the name of the other endpoint in the "Endpoint" box of the right hand window
- Navigate to the files / directories you wish to transfer and click to select them
- Click the appropriate direction arrow to submit the transfer task
- Click "Activity" to monitor the status of the transfer (optional)
- You should receive an email (to the email address associated with your Globus Online account profile) when the task has completed.
Perform a transfer using Globus Online command-line interface
This process requires that you have uploaded a public key to your Globus Online account profile. To do this, go to "Account" then "Identities", then "manage SSH and X.509 keys". Upload the PUBLIC part of your SSH key.
- Log in to the Globus Online command-line interface
myhost$ ssh email@example.com
usernameis the globus online account username. Note this can be done from your own laptop/desktop so does not even have to be done from within JASMIN. If you have your SSH key loaded in your authentication agent (e.g. ssh-agent or Mac Keychain), you should not be prompted for a passphrase.
- Initiate a transfer. Depending on their configuration, endpoints may activate automatically, others will require you to activate them before they can be used. You can activate them either using the
endpoint-activatecommand, or in a browser at your Globus Online account, and the activation takes immediate effect.
- Example 1: transfer the file 100M.dat from user's home directory on JASMIN to user's home directory on the RDF data transfer node.
globus$ transfer --label 'example 1' -- esnet#anl-diskpt1/data1/1G.dat ceda#jasmin-xfer/~/1G.dat
In this example, a test file
1G.dat is copied from one of ESNet's test data transfer nodes at Argonne National Laboratory (ANL) to the user's home directory (
~/) on JASMIN. Obviously, for larger files, you may need to specify the path to somewhere within a JASMIN or CEMS group workspace where you have sufficient space available.
Use your browser to navigate to the "Activity" section of your Globus Online account to enable immediate monitoring of the transfer tasks you have initiated via the command-line interface. The label, in this case "example 1" can be used to locate the transfer task in the "Activity" section.
- Example 2: recursively (-r) syncronise the contents of a directory, first applying a file-size check (-s 1) to select which files to transfer. For further options, log in to cli.globusonline.org and see the man page for the transfer command.
myhost$ ssh firstname.lastname@example.org "transfer -s 1 --label 'example 2' -- ceda#jasmin-xfer/~/source/ archer#rdf/~/source/ -r"
In this example, the whole command is initiated from the end-user machine (without first logging in interactively to
cli.globusonline.org. So you could, for example, run your own automated task on your local machine periodically to synchronise the contents of the two directories.
Further options for the
endpoint-activate commands are available in the Globus documentation or by viewing the man pages for these commands at