Access to storage
This article provides information about JASMIN storage. It covers:
- Home directory
- JASMIN disk mounts
- Where to write data
- Access to the CEDA archive
- Tape access
Every JASMIN user is allocated a HOME directory located at
/home/users/<user_id>. This directory is available across most of the interactive and batch computing resources, including the JASMIN login and transfer servers.
Each home directory has a default quota of 10GB. You are allowed to exceed this limit for a very brief period of time but if you continue to exceed the limit, you will be unable to add any more files or run jobs and will be required to reduce your usage.
You can check your current usage and quota using the
pan_quota command, such as:
$ pan_quota <GB> <soft> <hard> : <files> <soft> <hard> : <path to volume> <pan_identity(name)> 2.58 8.00 19.00 : 31225 unlimited unlimited : /home/users/astephen uid:29775(astephen)
Backups of your home directory
There is a daily incremental and weekly full backup of your home directory. Your home directory is the ONLY storage which is automatically backed up.
Additonally, "snapshots" provide a quick, user-accessible method for you to restore files or directories that have been accidentally deleted.
Recovering snapshots within your home directory
Users can access snapshots to recover files/directories that have been accidentally deleted.
$ ls .snapshot will list
<date>.home_users directories containing files as they were on that date.
File(s) can then be copied back from one of these directories to their original location. Note that these directories will not appear in a normal
ls listing. It is necessary to explicitly refer to the snapshot directory by name:
[user@hostname ~]$ ls .snapshot 2016.02.04.23.31.01.home_users 2016.02.08.23.31.02.home_users 2016.02.05.23.31.09.home_users 2016.02.09.23.31.02.home_users 2016.02.06.23.31.02.home_users 2016.02.10.23.31.14.home_users 2016.02.07.23.31.03.home_users
Home directories should not be used for storing large amounts of data. See below for guidance on where to write your data.
JASMIN disk mounts
There is a common file system layout that underpins most of the JASMIN infrastructure. However, access to different parts of the file system will depends on where you are logged in. Table 1 outlines the key disk mounts, where they are accessible from and the type of access (read and/or write).
Table 1. List of common disk mounts and their availability on JASMIN, where:
login = login servers: jasmin-login1.ceda.ac.uk, cems-login1.cems.ac.uk, comm-login1.cems.rl.ac.uk
sci = scientific analysis servers: jasmin-sci[1-3].ceda.ac.uk, cems-sci[1-2].cems.rl.ac.uk
transfer = data transfer servers: jasmin-xfer.ceda.ac.uk, cems-xfer1.cems.rl.ac.uk
LOTUS = LOTUS batch processing cluster (all cluster nodes)
Disks are mounted read/write ("R/W") or read-only ("RO").
| Disk mount
|/badc, /neodc (archives)||No||RO||RO||RO|
Where to write data
As indicated in table 1 there are three main disk mounts where data can be written. Please follow these general principles when deciding where to write your data:
- HOME directories (
/home/users) are very small (10GB) and should NOT be used for storing large data volumes.
- Group Workspaces (
/group_workspaces/*/<project>) are usually the correct place to write your data. Please refer to the Group Workspace documentation for details. Please note Group Workspaces are NOT backed up.
- The "scratch" area (
/work/scratch) is available as a temporary file space for jobs running on LOTUS (see next section below).
- The (
/tmp) directory is not an appropriate location to write your data (see next section below).
/work/scratch directory is a temporary file space that is shared across the entire LOTUS cluster and the scientific analysis servers. This directory uses the Panasas high-speed parallel file system.
The "scratch" space is ideal for processes that generate intermediate data files that are consumed by other parts of the processing before being deleted. There is only 4TB of "scratch" disk (for all users) so it is not suitable for temporary storage of very large data. Any data that you wish to keep should be written to a Group Workspace.
When using the "scratch" space please create a sub-directory (e.g.
/work/scratch/<user_id>) and write your data there.
In contrast to the "scratch" space, the
/tmp directories are all local directories, one per cluster node (or interactive server). These can be used to store small volumes of temporary data for a job that only needs to be read by the local process.
Cleaning up the
Please make sure that your jobs delete any files under the
/work/scratch directories when they complete (especially if jobs have not completed normally!).
Data in the
/work/scratch directories are temporary and may be arbitrarily removed at any point once your job has finished running. Do not use them to store important output for any significant length of time. Any important data should be written to a group workspace so that you do not lose it.
Access to the CEDA archive
The CEDA archive is mounted read-only under
/badc (British Atmospheric Data Centre) and
/neodc (NERC Earth Observation Data Centre). The archive includes a range of data sets that are provided under varying licences. Access to these groups is managed through standard Unix groups. Information about the data and their access restrictions is available from the CEDA Data Catalogue. As a JASMIN user it is your responsibility to ensure that you have the correct permissions to access data the CEDA archive.
Group workspace managers also have access to a tape library (Elastic Tape service) for making secondary copies and managing storage between online and near-line storage.