MOOSE (the MASS client) User Guide

MOOSE User Guide

External Users' Version  

(experimental - links currently go back to version on http://collab.metoffice.gov.uk, and command examples have missing line breaks)

This edition valid for MOOSE Release 6.4.4.

(To find the current version of MOOSE that you're using, see: si: displays information about the MOOSE hardware & software)

Table of Contents


Introduction

The Managed Archive Storage System (MASS) provides storage and restore services for large volumes of Met Office data. This service aims to provide the most cost effective way of meeting the projected storage volumes and store and restore throughput. The largest demand for high volume storage and throughput comes from atmosphere and ocean numerical models run on the super computer. MASS is largely aimed at meeting this demand.

The storage model appropriate for numerical model data is, however, also applicable to other data and MASS can provide storage/restore services for a range of Met Office data. If you have an archiving requirement then it is worth reading the first major section ( Common Tasks) of this document, and then discussing your archiving need with the Storage Team.

The services provided by MASS are relatively straightforward. They include actions like storing and restoring data, removing data, and some simple facilities to help find data. The most cost effective way of storing large volumes of data involves a significant proportion of tape storage. Storage on tape, however, has an impact on restore performance. On restore there is an overhead in accessing data on tape, and possible contention for tape drives. The impact of this can be reduced through structuring the way data is written to tape. This adds some constraints and complexity to the storage system and the way its services are used.

The Met Office Operational Storage Environment (MOOSE) provides an interface to a storage system, and is responsible for enabling the structuring and management of Met Office data. It also provides some resource management to help ensure fair distribution of throughput and storage among users. All users of MASS will use the interface provided by MOOSE. The underlying storage system is High Performance Storage System ( HPSS). Some information on the Met Office configuration of HPSS and the hardware for MOOSE can be found in the MASS Hardware documentation.

This document describes the interface provided by MOOSE. It is divided into two major sections. The first major section ( Common Tasks) outlines how to perform the most common storage tasks. The second major section (The MOOSE command line interface) gives a more detailed reference guide to both the way data is stored in MASS and the MASS commands.

Document conventions

This document adopts the following conventions:

  1. Terminal input and output is given as fixed width font.
  2. Terminal input that should be substituted if you try the examples is given as <change.me>.
  3. Key ideas in the MOOSE interface are given in bold. Most of these terms are defined for reference in Key Terms and Concepts.
  4. In The MOOSE command line interface optional arguments to commands are give as [optional-argument].
  5. In this version of the document editorial notes are written Note:. These will be removed from the final version.

Common Tasks

This section gives a set of examples to illustrate common uses of MOOSE. It also introduces some of the key ideas used in the interface to help optimise performance and manage resources.

You are welcome to try the examples in this section. If you do this then please use a few (<10) small files (<10 Mbtyes) to avoid putting unnecessary load on the system. For reasons that should become clear you should also delete the files when you have finished the examples.

Getting a MOOSE Account (External users)

MOOSE accounts for Collaborating users may be obtained by contacting the Met Office - see http://collab.metoffice.gov.uk/twiki/bin/view/Support/ExternalAccessToMASS for details.

External users can change their password to a randomly generated one. This is done by invoking 'moo passwd -r". The encrypted version of the moose credentials file will then be automatically replaced with one containing the new credentials.

NOTE: Users external to the Met Office are not permitted to store data.

Retrieving data

To retrieve a file from MOOSE use moo get:

$ moo get -v \    
moose:adhoc/users/<user.name>/examples/example-file \    
copy_of_moose_file 
### get, command-id=9780, estimated-cost=13byte(s), files=1, media=1 ### task-id=0, estimated-cost=13byte(s), resource=moose:/adhoc/users/<user.name>/examples/example-file ### task-id=0, transferred=13byte(s) T   moose:/adhoc/users/<user.name>/examples/example-file -> /var/tmp/copy_of_moose_file
			

Unlike moo put, moo get is synchronous. It will only return a success return code when it has completed and the file is on the local disk. This means any calling scripts can rely on the file being readable when the command completes.

Retrievals that you run soon after a file has been stored into MASS are often relatively fast. This is because the MASS storage is made up of a disk cache and the tape storage. When data is first stored to MASS it goes into the disk cache. This cache has been sized so that any restores from MASS will typically come from disk for the first 30 days after the data was stored. If you restore the data within this period then you will get good performance. After about the 30 day period the only copy of data will be on tape and so retrieval performance will be worse. It is worth noting that the data is copied from disk to tape within one day of being stored to MASS, and that MASS always makes two tape copies. This provides protection against disk loss, or even loss of one of the tape copies.

You can also retrieve multiple files at once using wild cards:

$ moo get -v moose:adhoc/users/<user.name>/examples/*.txt . ### get, command-id=48, estimated-cost=10byte(s), files=2, media=1 ### task-id=0, estimated-cost=5byte(s), resource=moose:adhoc/users/<user.name>/examples/file1.txt ### task-id=1, estimated-cost=5byte(s), resource=moose:adhoc/users/<user.name>/examples/file2.txt ### task-id=0, transferred=5byte(s) T   moose:adhoc/users/<user.name>/examples/file1.txt -> /tmp/<user.name>/file1.txt ### task-id=1, transferred=5byte(s) T   moose:adhoc/users/<user.name>/examples/file2.txt -> /tmp/<user.name>/file2.txt
			

When retrieving files that are only available from tape then a wild card retrieval could result in multiple tape mounts for a single command using an unfair proportion of resources. To help prevent this happening you can only use wild cards within the lowest directory on retrieval.

The moo get command helps you to avoid putting in overly-large retrieval requests by reporting the cost of the command in terms of the total volume, the number of individual files and the number of tapes involved.

Deleting Data

NOTE: Users external to the Met Office are not permitted to delete any data.

Better performance for project data

As noted in the introduction, when using tape storage there are several time consuming steps that mean accessing data on tape is much slower than accessing data from disk. First there must be a free tape drive, then the tape must be found in the library, mounted, and wound to the correct position before data can be read or written. The impact of these time consuming steps can be minimised by avoiding over use of drives and maximising the amount of data accessed per tape mount. When data has access patterns that can be anticipated then these can be used to maximise the amount of data accessed per tape mount. For instance if data is likely to be read together it should be stored on the same tape.

There is no attempt to organise the way data stored into adhoc/users is stored to tape. This can give very poor restore performance when restoring files which were archived on different days. There is, however, another area for unstructured data that can give much higher performance. Use of this area is generally reserved for Met Office systems or projects. Each project has a set of tapes dedicated to it (or shared with a small number of other projects). The aim is to ensure that the data from this project is only spread over a small number of tapes. This reduces the number of tape mounts needed on restore. This data is stored below moose:/adhoc/projects. The interface to this data is identical to the interface to moose:/adhoc/users.

Reserving a set of tapes for a project is an optimisation strategy that works well in some cases. There are, however, several limitations which mean this strategy is not applicable to all classes of Met Office data. First, it relies on advance planning to set up the directory and allocate it to a set of tapes (users cannot create these directories themselves). Secondly, there is no ability to partition the data further based on access patterns. If the data volumes are low this probably isn't a problem but if the data volumes become too high, then this could result in poor retrieval performance. Thirdly, it can result in wasted space on tapes (or excessive reorganisation) if the data has differing retention periods.

Using MASS for application structured data

For research data, such as that produced from the UM and SCS, the optimisation strategy used in moose:/adhoc/projects is inappropriate. There are a large number of simultaneous runs, some of which are of very unpredictable length, and not all data from the same run has the same access patterns. As an example of the latter, restart dumps have very different restore and delete patterns from those of model diagnostic fields. As this data is from an application, however, it has structure imposed on it by the application and its typical use. For instance in a climate research run monthly mean data is typically stored as one file for each month of the run. Analysis of the run will usually involve reading a series of the monthly mean files. This production and usage is reflected by putting all the monthly mean files in the same directory.

The optimisation strategy used by MOOSE for this structured data is to keep data with the same access pattern in the same directory, and then use a small number of tapes to store the files within the directory. The aim is, typically, to ensure that each tape holds data received within a particular time period (a month for example). Groups of tapes are assigned to directories based on naming conventions. For example storing dumps in directories with a consistently different name to diagnostics means different tapes can be used for dumps and diagnostics from the same application run.

Different applications have different naming conventions, and the data has different access patterns. MOOSE calls data with the same naming conventions and the same broad access patterns a data class. Climate model runs all belong to the same data class: moose:/crum, NWP trials belong to a different data class moose:/devfc. The data class name is used as the top level in the directory hierarchy. In fact the moose:/adhoc directory used in the previous examples is associated with a data class which contains data where there are no naming conventions, and there is only a limited exploitation of access patterns. (For more detail on the data classes defined in MOOSE see the MOOSE Data Classes documentation.)

Each new run of an application, typically from the super computer, will create a data set. A data set is a set of data that belongs together, and conforms to the naming conventions of its data class. In fact the moose:/adhoc/users/<user.name> directory is an example of a data set. One important difference with structured data is that users are typically allowed to create their own sets. In most instances, however, this will be done for you by the application.

$ moo mkset -v moose:crum/runid ### mkset, command-id=9782 A   moose:/crum/runid
		

As structured data classes rely on naming conventions you must put files to destinations that fit the naming conventions.

$ moo put adump moose:crum/runid/bad-name/adump put command-id=56 failed: (SSC_TASK_REJECTION) one or more tasks are rejected. adump -> moose:crum/runid/ada.seq/adump: (TSSC_INVALID_COLLECTION) invalid path to a data collection. put: failed (2)
		

Furthermore, as there are fewer opportunities for typo's creating unwanted directories, directories in structured data classes are auto-created on the first put.

$ moo put -v adump moose:crum/runid/ada.file/adump ### put, command-id=9784, estimated-cost=6byte(s), files=1 ### task-id=0, estimated-cost=6byte(s), resource=adump ### task-id=0, transferred=6byte(s) T   adump -> moose:/crum/runid/ada.file/adump $ moo ls -r moose:/crum/runid moose:/crum/runid/ada.file moose:/crum/runid/ada.file/adump
		

There is similar behaviour on delete: if a delete leaves a directory empty then that directory will be deleted.

$ moo rm -v moose:crum/runid/ada.file/adump ### rm, command-id=9790 D   moose:/crum/runid/ada.file/adump $ moo ls -r moose:/crum/runid moose:/crum/runid/
		

As structured data classes support auto-creation and auto-deletion of directories they do not support the moo mkdir or moo rmdir commands.

To make it simpler to manage the naming conventions and the mapping of directories onto tapes, directories in structured data sets can contain either other directories, or files, but not both. A directory containing only files is known as a data collection.

Server-side record-level retrieval

MOOSE supports record-level access to tar-balls and PP files.

We now also support server-side atomic access to NetCDF files; however, this may only be applied at the file level rather than the collection level. This is achieved through invoking a version of the external utility 'ncks' on the server, and streaming the output to the client. See below for details.

In a single MOOSE command, you will be able to restore a number of records (i.e. PP fields or tarred files) from a single data set only.

The syntax of the record-level retrieval command is as follows, for .pp or .tar Collections:

$ moo select myquery moose:/crum/runid/apm.pp mylocal
			

and for NetCDF files:

$ moo filter myoptions moose:/crum/ranid/ana.nc.file/files*.nc mylocal
			

The file 'myquery' (which must, of course, exist) contains the selection-criteria. The MOOSE URI ('moose:/crum/runid/apm.pp' in the example) can specify one or more data collections from a single data set, possibly using wildcards, or can specify the (single) data set itself. The final argument specifies the destination of the restored data.

In the case of moo filter, the file myoptions is a valid ncks options-file, with content relevant to the target-files.

The format of the output from moo select is the same as its target: a record-level retrieval from a PP data collection will return one or more files in portable PP format; retrieval from a .tar data collection will return one or more tar-balls. By default, the output will consist of one file (.pp or .tar) per file in the data collection that has records matching the criteria. You can use the -C option to condense the output into a single file, if required.

Similarly, moo filter (which currently only supports NetCDF) produces output in NetCDF-4 format only. There is no -C option, although for certain client-platforms the output can be appended to an existing file on the client.

Record-level retrieval: Query syntax

(This section applies only to PP files and tar-balls.)

The basic query has a list of attributes that must all be true for a file/field to be passed, held between begin and end. There is now the ability to provide a global block (containing conditions to be applied to all begin/end blocks) and custom blocks (containing conditions which may be explicitly applied to some begin/end blocks). See section Global and custom blocks below. Since the output type is by default the same as the stored type, a retrieval on a tar file collection will retrieve a tar format file and a select on a PP collection will result in a PP format file.

The following logic applies for the query syntax:

Lines within begin/end blocks are logically ANDed The begin/end blocks within a query are logically ORed List of comma-separated elements are logically ORed Spaces are optional, and all blank lines are ignored Comments begin with '#': all text between the '#' and the end of the line is ignored Line continuation is possible using a single backslash - '\'.
		

Tar-balls

For tar file queries, there are two attributes available

filename  - represents the name of the individual file(s) tar_filename -  represents the name of the tar-file in collection
			

Strings may include wildcards, with the only wildcard character being '*'. However:

lists of elements within a single query line are limited to ten values if at least one of these contains a wildcard.  the limitation that wildcards must be preceded by a number of ordinary characters has now been removed.
			

In the following examples, assume that the target data collection has tar-balls containing files file1, file2 and file3.

Example1: selection of a file(s) from a collection of tar-balls:

begin filename = "file1" end
			

This will return all the files matching the name file1 from the collection in one or more tar-ball(s). You can either use single or double quotes to specify the filename.

Example2: selection of multiple files from a collection of tar-balls:

begin filename = ("file1", "file2", "file3") end
			

Alternatively, you can also use the following syntax:

begin filename = "file1" end begin filename = "file2" end begin filename = "file3" end
			

Here the query will return a result if either of the three conditions matches/corresponds to the original collection.

Example3 : selection of file(s) from a specific tarfile:

begin tar_filename = "tarfile1.tar" filename = "file2" end
			

The output of this query will return all the files matching the name file2 belonging to the tar-file tarfile1.tar.

Example4: selection of files from specified tar-balls

You can also specify files to select from different tar-balls in a collection within the same query by adding more conditions between new begin/end blocks. For example:

begin tar_filename = "tarfile1.tar" filename = ("file1", "file2") end begin tar_filename = "tarfile2.tar" filename = "file3" end
			

Example5: selection of multiple files from a tar-collection using a wild card:

begin filename = "file12*" end
			

PP Files

The following elements are available for selection. The names of most of these elements are defined in the UM FieldsFile documentation F3 as the LOOKUP entry values. Please see that documentation for the description:

T1 (Date)                 start date/time of field in file yr (Integer)              year part of T1 mon (Integer)             month part of T1 day (Integer)             day part of T1 (lbdat) hr (Integer)              hour part of T1 min (Integer)             minute part of T1 sec (Integer)             second part of T1 T2 (Date)                 End date/time of field yrd  (Integer)            year part of T2 mond (Integer)            month part of T2 dayd (Integer)            day part of T2 (lbdatd) hrd  (Integer)            hour part of T2 mind (Integer)            minute part of T2 secd (Integer)            second part of T2 lbtim (Integer) lbtim_ia (Integer)        lbtim / 100 lbtim_ib (Integer)        lbtim / 10 % 10 lbtim_ic (Integer)        lbtim % 10 lbcode (Integer) lbhem (Integer) lbrow (Integer) lbnpt (Integer) lbpack (Integer) lbrel (Integer) lbfc (Integer) lbcfc (Integer) lbft (Integer) file_min_lbft (Integer)   Minimum lbft over all headers in the file file_max_lbft (Integer)   Maximum lbft over all headers in the file lbexp (Integer) model_number (Integer)    LBEXP from the PP header as an integer runid (String)            LBEXP after the string is converted to an integer as per UM documentation source (String)           pseudonym for runid lbproc (Integer) lbvc (Integer) lbrvc (Integer) lbproj (Integer) lbtyp (Integer) lblev (Integer) lbrsvd_1 (Integer) lbrsvd_2 (Integer) lbrsvd_3 (Integer) lbrsvd_4 (Integer) lbsrce (Integer) lbuser_1 (Integer) data_type (Integer)       pseudonym for lbuser_1 lbuser_3 (Integer) lbuser_4 (Integer) stash (Integer)           pseudonym for lbuser_4 item_code (Integer)       pseudonym for lbuser_4 item (Integer)            lbuser_4 % 1000 section (Integer)         lbuser_4 / 1000 lbuser_5 (Integer) lbuser_6 (Integer) lbuser_7 (Integer) submodel  (Integer)       pseudonym for lbuser_7 model_code (Integer)      pseudonym for lbuser_7 brsvd_3 (Float) brsvd_4 (Float) bdatum (Float) bacc (Float) blev (Float) bulev (Float) bhulev (Float) brlev (Float) bhlev (Float) bhrlev (Float) bplat (Float) bplon (Float) bgor (Float) bzy (Float) bdy (Float) bzx (Float) bdx (Float) bmks (Float) period (Interval)         meaning_period for mean weather field forecast_delta (Interval) time period for forecast data interval_type (Integer)   indicator for the format of intervals: DDDDDHHMMSS or YYYYMMDDHHMMSS pp_file (String)          the name of the PP file in the collection pp_filename (String)      pseudonym for pp_file filename (String)         pseudonym for pp_file shape_id (Integer)        internal id of the shape of the PP file file_start_date (Date)    earliest start date of all fields in the file file_end_date (Date)      latest end date of all fields in the file sequence_number (Integer) the relative position of an atom in a file
			

NOTE 1: yr/yrd/etc all still have to retrieve the full date/time they are a part of, and that may be done several times if you use them unnecessarily, so use it where it really is a day you want, rather than some days for some months, otherwise, use the T1/T2 values.

NOTE 2: If a meaning period and a climate mean period are ambiguous in a stream you should use LBTIM to differentiate between the two.

The following logical relationships are supported:

Equals:                    attribute = value Less Than:                 attribute < value Less Than or Equal To:     attribute <= value Greater Than:              attribute > value Greater Than or Equal To:  attribute >= value
			

NOTE: The inequality matches can result in a lot of atoms passing the filter condition and so you should use other more limiting filters first so as to reduce the number of atoms that will test against the next condition as quickly as possible.

Lists & ranges:

In a list:                 attribute = (value1, value2, value3) In a range:                attribute = [valueMin..valueMax] In a list with ranges:     attribute = (value1, [valueMin..valueMax], value3)
			

Note: The range matches can also result in a lot of atoms passing the filter condition, but at least here you have an upper and lower bound in the same statement.

Attribute datatypes

Integer

Can be a signed integer and have a maximum magnitude of about 2 billion (a signed 32 bit integer). All the options for testing are available for integer datatypes.

Float

All tests are done with rounding to the same number of significant digits as the value given, so inherent in a filter statement like

a_float=1.00

is interpreted as

a_float>=0.995 AND a_float<1.005

while

a_float=1.0

is interpreted as

a_float>=0.95 AND a_float<1.05

Apart from that, it is tested the same way as an integer, with all options available for testing against a floating point datatype.

String

Strings are any printable ASCII/UTF-8 characters within double quotes.

a_string="My_Mother"

A string value tested against can include wildcards, so you can match many strings without having to create a long list. The only wildcard character is "*" and means the same as it does in shell scripts: any character, any number of times, even none. Use of wildcards in string matching has the same problem as inequality, range and so on so please be careful when you use them that you will not return vast amounts of data with a wildcard. It is a good idea to do string matching on patterns last or nearly last so that the greatest amount of possible matches are discarded by quicker tests first.

You can also now match a range of strings, eg:

a_string=["aardvark".."antelope"]

will match all 8-letter strings that fall alphabetically between "aardvark" and "antelope", inclusive. This will often be used to select date ranges for PP files named in the UM naming convention. E.g.

pp_file=["acrota.paj1bae.pp".."acrota.paj1bdc.pp"]

or for selecting radar images from the FRASIA tar file collections.

Date

Date values are written down as:

{YYYY/MM/DD hh:mm}

If the hours and minutes are 00:00, you can leave them out:

{YYYY/MM/DD}

And you can use the full range of testing operations to select the right date(s) for your data.

High resolution (sub-minute) timestamps are now supported by MOOSE. An LBREL value of 3 is used in PP headers to indicate that seconds have been specified, in which case the second component of the timestamps is stored in place of LBDAY and LBDAYD. In this case, MOOSE will recognise the high resolution timestamps and store them in the metadata accordingly. These timestamps may also now be queried upon. A date in a query file may specify seconds according to the syntax:

{YYYY/MM/DD hh:mm:ss}

If seconds are omitted, as with the earlier syntax, their value will be taken as zero. Note that a timestamp with seconds in a query file will pick up headers with and without seconds alike, provided they match the condition. Similarly, a query file timestamp which does not specify seconds will also pick up headers which do contain seconds, if the condition is satisfied. For example the query:

begin   T1>={2030/01/01 00:00:00}   T1<={2030/01/01 12:00} end
				

would match a T1 timestamp of 2030/01/01 11:00, as well as a high resolution T1 timestamp of 2030/01/01 10:15:30. If this is not the desired behaviour, and only headers which do actually specify seconds are wanted, LBREL should be used to filter the results further:

begin   T1>={2030/01/01 00:00:00}   T1<={2030/01/01 12:00:00}   LBREL=3 end
				

As with string tests, date tests can be very expensive so other tests should be put earlier so that as many possible matches are discarded by other test constraints first.

Interval

An interval is effectively a relative date, e.g. a meaning period where data over a constant time-interval is averaged. Such a data type value to test against is expressed by:

period={N Unit}

where N is an integer greater than or equal to 0 and the Unit is one of second, minute, hour, day, month, year. See the discussion above on the use of seconds in high-resolution PP headers. Where an interval cannot be expressed as a whole number of one unit, you should combine them. For example, an interval of one day and 12 hours should be expressed as:

{1 day + 12 hour}

rather than 36 hours.

This also applies to years & whole months:

{1 year + 6 month}

rather than 18 months.

HOWEVER, it doesn't always work for days! It depends on whether you're using the 'Gregorian', '365-day' or '360-day' calendar, and to avoid ambuguity you must also supply a criterion on the 'interval_type' attribute, as follows:

360-day calendar

In this case, the above convention works. The 'interval_type' attribute of all such files is 0 (if archived before 31/7/13) or 1 (if archived later), and so you should specify an interval of e.g. 40 days as:

period={1 month + 10 day} interval_type<=1
					

or 400 days:

forecast_delta={1 year + 1 month + 10 day} interval_type<=1
					
Gregorian & 365-day calendars

For data archived after 31/7/13 (applies to all data using the 365-day calendar):

Every such file has an 'interval_type' value of 1 or 2. A file has an interval_type of 1 if and only if every interval in the file is a whole number of months. Otherwise, the interval_type is 2.

The style of the specification of interval in the query depends on the interval_type of the target-files.

If the interval_type = 1:

period>{1 year + 6 month} interval_type=1
					

If the interval_type = 2:

period>{549 ddd} interval_type=2
					

You can combine queries if your target-collection has files of both types, e.g.:

begin period>{1 year + 6 month} interval_type=1 end begin period>{549 ddd} interval_type=2 end
					

For data archived before 31/7/13 (Gregorian only):

In this case, the interval is characterized in the database as {year, month, day, hour, minute, second} whether or not it is a whole number of months, and the 'interval_type' attribute is always 0. Beware, however, that the following 30-day (Gregorian) intervals will be recorded in the database differently:

2001/02/28 to 2001/03/30:   1 month + 2 day 2001/03/30 to 2001/04/29:   30 day 2001/04/29 to 2001/05/29:   1 month 2000/02/29 to 2000/03/30:   1 month + 1 day
					

so to extract all 30-day intervals for data archived before 31/7/13 you'd need a query like this:

period=({30 day}, {1 month}, {1 month + 1 day}, {1 month + 2 day}) interval_type=0
					

If you don't supply a criterion on the interval_type with a query involving meaning-period or forecast-delta, MOOSE will reject the command.

Note that you can, of course, use 'moo mdls' to inspect the interval_type attributes of your files.

Apart from this, since the interval is basically a date value, all the comparison tests are available too, though whether such a test would make sense depends highly on the field element being tested.

Global and custom blocks

In addition to begin/end query blocks, users may provide a global block between tags begin_global and end_global. Any conditions within this global block will be implicitly added to the list of logically ANDed conditions of all existing blocks. This facility has been added to reduce the need for duplication within query files. If, for example, each block is required only to match atoms with one of a given list of stash codes, this list may be given only once within a global block.

Similarly, arbitrarily many (differently named) custom blocks may also be given. Users are free to choose custom block names, with the exception of global and attributes which are reserved block names (see moo mdls for use of the attributes block). A custom block is a list of conditions contained within begin_<blockname> and end_<blockname> tags. Any (non-custom or global) block may choose to include all conditions within the custom block by referencing it using the syntax {replace37}lt;blockname>. See example 3 below.

Examples

Ex 1:

begin   lbproc=128   stash=2201 end
			

Ex 2:

begin   lbproc=128   stash=(3217, 3223)   blev=1.0000000000 end
			

Ex 3:

begin   lbproc=128   $mycustom   lbfc=1 end begin   T1<={2030/01/01 12:00:00} end begin_global   stash=(42,43,44,45,46) end_global begin_mycustom   blev>1.0   bhulev<2.0 end_mycustom
			

The above example is equivalent to:

begin   lbproc=128   blev>1.0   bhulev<2.0   lbfc=1   stash=(42,43,44,45,46) end begin   T1<={2030/01/01 12:00:00}   stash=(42,43,44,45,46) end
			

NetCDF Files

In the case of NetCDF files, the query file described above becomes the ncks options file. Please see documentation for ncks (man ncks) for a list of valid options and how to use them. MOOSE is using a version of ncks which has been altered slightly from the standard build, in order to allow us to stream output to the client. For this reason, there are some minor differences from the standard set of allowable options. Firstly, MOOSE's ncks is only capable of generating NetCDF-4 output. The -3 and -6 options are therefore not valid. The -4 option will be added automatically by MOOSE, and so output will be the same regardless of whether the user has manually specified -4. Secondly, since the location of the input file and output file are determined by MOOSE, these should not be specified within the options file. Options may be separated by any amount of whitespace and may be spread across multiple lines, for ease of reading.

Example options file:

-a -d y,0,360 -d deptht,0,30 -v nav_lon,nav_lat,lont_bounds,latt_bounds,areat,e1t,e2t,ang1t,ocndept,deptht_bounds,e3t,votemper
			

(See select: performs filtered retrieval from a data set and filter: Extracts defined variables from data files (NetCDF) for more details on selection or filtering)


Client-side Pre- and Post-Processing

NOTE: References to moo put are not relevant to External users.

The MOOSE client incorporates facilities to allow files to be pre-processed before archival, or post-processed following retrieval, within the one moo put or moo get command. This is used mainly to convert files amongst the several 'flavours' of the PP format (UMPP, UMOPP, DPP) used on various client platforms. However, it can also be used for other processing, e.g. encryption/decryption, or the removal and re-incorporation of geographical metadata from/into NetCDF files.

The MOOSE data-transfer commands moo put, moo get and moo select all support the option -c or --local-file-format, which is used to invoke client-side processing. The option takes a single argument, which specifies the processing to be carried out.

If the option's argument matches a file-format already 'known' to MOOSE as applicable to the current client-platform, e.g. moo put -c=umpp on the HPC, then the standard format conversion (in this case UMPP to 'canonical' PP) will be carried out prior to the 'put'.

If the option's argument does not match a known format, the client's current PATH will be searched for a file of the same name. If such a file is found, it will be read for instructions on the process(es) to be applied. We will call this file the 'processing specification file', or PSF.

If the option's argument matches neither a known format, nor a local PSF, the user-request will fail.

By default, MOOSE does not echo STDOUT from the conversion processes, unless the -v option is included in the MOOSE command-line, or a conversion process returns a non-zero exit-code, in which case all STDOUT and STDERR from all the processes involved will be displayed.

Format of the PSF

The PSF is a simple text-file that can be used to specify a single process, or chain of processes to be carried out in sequence. Each process is specified on one line of the file. Each process must be available to the client, and conform to the following template:

PROCESS [OPTIONS] INPUT-FILE OUTPUT-FILE
			

User-specified processes that do not conform to the above template must be wrapped so that they do.

To include a process, simply include its name, with any options but WITHOUT ANY ARGUMENTS, on a single line in the PSF. For example, suppose you wish to pass your input file through 'process1', with options '-d -b=BLUE', and the resulting output through 'process2', which has no options, prior to archiving to MASS. Your PSF would then look like:

process1 -d -b=BLUE process2
			

If you called your PSF 'mypsf', and the input-file is 'myinput', destined for MOOSE location 'moose:/adhoc/users/auser/mydir', your MOOSE request would then look like:

moo put -c=mypsf myinput moose:/adhoc/users/auser/mydir
			

MOOSE takes care of supplying appropriate arguments to the individual processes in the PSF, and of deletion of any intermediate files created by them. In the example above, 'process1' will operate on 'myinput' to produce an intermediate temporary file, which is then processed by 'process2' to produce the output that is finally stored in MASS.

It is the user's responsibility to ensure that the specified processes exist and are appropriate for the files concerned; that the sequence of processes makes sense; and that the format of the final output file (for a 'put') is compatible with the destination on MASS.

You can include in the PSF the existing, standard, format-conversion processes if appropriate. So, for example, if the client is running on the HPC and you wish to run a local filter 'myfilter' on a file in UMPP format before archiving the result, which must be converted to canonical PP, your PSF would look like:

myfilter umpp
			

Temporary workspace

Client-side processing requires that a temporary workspace be available to the client, in addition to the space required to hold the converted file(s). MOOSE checks the following environment-variables in order:

MOOSE_TEMP TMPDIR
			

If neither of the above is set, the request will fail if running on the supercomputer. For clients running elsewhere, MOOSE will attempt to use location /var/tmp. This is a change from previous releases, where MOOSE_TEMP would have been set to /scratch if the above variables were not set.

Note that MOOSE does a basic check for sufficient available space in the temporary workspace. If this check fails, the command will fail early without performing any transfers. This check does not guarantee that no later failure will occur due to lack of space. This is because conversion is now multi-threaded, and it is possible that two or more threads may be using the temporary workspace simultaneously. In the case of both get and put with conversion, MOOSE uses the original file size to perform all checks for sufficient space, which is likely to differ from the converted size. If you are having problems with conversion, consider explicitly setting MOOSE_TEMP to a suitably large storage-area prior to issuing the MOOSE request.

Limitations

Each process specified in the PSF takes exactly one input file and produces exactly one output file. You cannot use this mechanism to, say, split one input file into two and archive both outputs.


Accessing data owned by others

Many of the Met Office data resources stored on MASS are shared among a number of users. For instance the data from the operational suite is used in the production suite and in NWP. Data from climate model runs are analysed by several users. MOOSE provides an access control system to accommodate the need to share data.

In MASS, data ownership is at the data set level. All data within a data set is owned by the same person. If you want to list and read data owned by someone else you need read permissions to be given to you or to a MOOSE user group (or project) that you belong to.

Currently the default for sets not associated with a project allows read permission to all members of group-metostaff, a user group containing all Met Office Staff users of MOOSE.

However, the access control list may have subsequently been changed by the data set owner.

Sets can also be created which are associated with projects, in which case the access permissions granted depend upon the project itself although members of group-metostaff will always be granted read access by default. In this case, the owner of the data set is the owner of the project.

If you want to read a data set owned by someone else you will need to find it and understand its contents. This is another benefit of restrictions on naming conventions within a data class. Uniformity of naming helps you understand the contents of directories and files. By always putting monthly mean files to directories named apm.pp you, and others, will know what to expect when sharing data. This may seem an obvious point, but it is worth keeping in mind. If you are likely to share data then stick to the naming conventions. If the naming conventions are not sufficient then talk to the Storage Team about refining the data class definitions.

See Data Set Access Control for more details on access-control.

Content TBA.


Key Terms and Concepts

This section acts as a reference, bringing together the terms introduced in Common Tasks into one place.

Data Class

All data with a similar set of access patterns and naming conventions belong to a data class. The data class defines a set of naming rules using regular expressions. It also organises the data for optimal restore and management by defining which sets of tapes can be used to store data in different directories, using the naming conventions.

The names of the data classes are used as the top level in the hierarchy.

There are two types of data classes: structured and unstructured. These support different functionality, and are designed for different types of access patterns. These are described in the next sub-sections.

The defined data classes are described in the MOOSE Data Classes documentation.

Structured Data Class

Structured data classes are intended for data where many data sets follow the same naming conventions and where some of the data may need record level access.

In structured data classes files reside in directories called data collections (see later for more detail on these). The data collections in a structured data set do not need to be assigned to the same sets of tapes. Different data collections in a structured data set can be stored on different tapes. This gives greater flexibility where data within a data set has different restore or deletion patterns.

There is no support for moo mkdir, moo rmdir, and only limited support for moo mv in structured data classes. Instead directories are created automatically on first use, and deleted automatically when empty.

Unstructured Data Class

There is only one unstructured data class, called adhoc. It is intended for data where the naming conventions, if they exist at all, are local and do not apply to many data sets. There is also no support for record level access to data.

The unstructured data class looks very like a file system and supports moo mkdir, moo rmdir, and moo mv. Directories can contain both files and other directories.

There are two subsets of the unstructured data sets. The first is users. There is no attempt to optimise restore from these data sets. The second is projects. All data within a projects data set can be confined to the same set of tapes, and this set can be reserved for only this project. This can give very good performance.

Data Set

Associated data matching the usage patterns of a data class is put into a data set. The form of the association is not defined and is up to the designer of the data class to give the right level of association. Examples of associations might include a UMUI run, an SCS trial, or output from one model in the operational suite.

All data within a data set has a single owner.

Each data set is a sub-directory of the data class directory. There may be intermediate directories between the data set directory and the data class directory. This gives the option of building up a hierarchy to help with finding data. One place this feature is used is in the operational (WGOS) archive. Intermediate directories are used to group the data into atmosphere model output and wet model output at the top level, then model configuration at the data set level. Note that all data sets in the same data class must have the same number of intermediate directories.

A data set may optionally be associated with a single project (see below).

Data Set Type

A data class can define a number of data set types for data sets. This is a special optimisation feature which is only used for migrated data at present. Data set types are used when the logical layout (naming conventions) of data is captured by a single data class, but there are differences in access patterns (production patterns) which warrant finer-grained control of which tapes are used for different directories in the hierarchy.

Groups

A Group represents a collection of MOOSE users. A user can be a member of more than one group. Currently MOOSE supports group-all, group-metostaff & group-admin.

group-all represents all MOOSE users, group-metostaff all Met Office Staff users and group-admin represents all those with administrator privileges.

A data class has properties which define read and write access. These properties limit which users/groups can create/delete data sets in the class (write) and list data in the class (read). Each property is a list of users and/or groups.

Data class access options:

Read - group-metostaff for all classes

Write - group-metostaff for devfc and crum classes. For all other classes it is restricted to group-admin

Projects

Projects are specialized versions of Groups, that represent users who all participate in a particular project. Unlike Groups, Projects can have one or more data sets associated with them. In this case, the owner of the Project (who must be a member of Met Office staff) associated with the data set is deemed to be the owner of the data set.

If the owner of the Project changes, so does the ownership of any data set associated with that Project.

Please note that all project names begin with the prefix project-. So therefore a project called glomodel would be referred to as project-glomodel within client commands, in the same way that Group names always have a prefix of group-.

In addition, a Project can have its own subset of rules about the names of data sets that can be created within it, and the ability of members of the Project to create or access the Project's sets.

Users external to the Met Office CDN must belong to at least one Project in order to gain access to any data. Every data set that is created by any user accessing MOOSE from outside the Met Office CDN will be forced to belong to an existing Project, and will therefore be subject to naming and access rules specific to that Project. The data set will also, by definition, be owned by the Met Office member of staff who owns the Project.

If you believe that your work could benefit from the degree of control that Projects provide, please contact the Storage Team to discuss your requirements.

Data Set Access Control

MOOSE provides an access control system to accommodate the need to share data. Access control works at the level of the data set and is achieved by an Access Control List (an acl). An acl can include entries for individual user-ids, the special user called owner, and groups. The special owner user is included to allow permission to persist on change of ownership. The only groups existing at present are group-all, representing a group of all MOOSE users, group-metostaff representing all Met Office Staff users and group-admin, representing a those users with administrator privileges.

Currently, when a data set is created for a user, a corresponding acl entry is created that allows read-write permission to the set owner and read permission to all members of group-metostaff by default. E.g. the getacl command will show the following for a data set with default settings:

moose:/crum/abcde - owner:readwrite moose:/crum/abcde - group-metostaff:read
		

This implies that the owner has read-write access and all Met Office staff have read access.

If the owner has given Carl Rossby read-write access to the set above, the getacl command will show:

moose:/crum/abcde - owner:readwrite moose:/crum/abcde - carl.rossby:readwrite moose:/crum/abcde - group-metostaff:read
		

To recap, in order to have read or read-write permission for a set, a user must have an entry in the acl either for their own username, or in their capacity as owner, or for a group they are a member of, such as group-metostaff.

Only the set owner, and users with administrator privileges, can change the access control list for a set. The set owner can always change the access control list, even if they no longer have readwrite or read access to the set.

The owner can remove their own access permission from the acl, in which case they would retain read access because they also belong to group-metostaff. They can further remove read access to group-metostaff, at which point the user cannot read from or write to the set. The owner can still change access control to allow them to read from or write to the set once again.

If a user holds more than one permission (for example, in cases where the set owner is also in a group with access permission), the most permissive access entry will be applied.

A Summary of Access-Control Rules

(with thanks to Mick Carter)

Notes:

'group-all' is a group to which all MASS users belong.  'group-metostaff' is a group to which all MASS Met Office users belong and which has read-permission by default on every Set.  'owner' in a Set's access-control list refers to the current owner of the Set
			

For each Set in MASS you as a user can have one of four permission-states: read, readwrite, readwrite-del or none.

Permission-state Short form Meaning
read r Read-only access. Default for members of group-metostaff
readwrite rw Read, write and delete permission. Default for Set-owner
readwrite-del rw-d Read & write but NOT DELETE
none n No access
Permission Rule Comment
read: I have read- permission on a Set if any one of : I am the owner and the owner has read, readwrite or readwrite-del permission By default the owner has readwrite access, but this can be removed (and restored) by the owner him/herself.
I belong to a group or project that has read, readwrite or readwrite-del permission By default, group-metostaff has read permission. If you want to restrict read-access, you must remove group-metostaff from the Set's ACL. Some projects may also have been granted access, so these need to be removed from the ACL.
My user-ID has been given read, readwrite or readwrite-del permission By default, group-metostaff has read permission, and all Met Office staff are automatically members of that group. However, read-access for that group can be revoked.
readwrite: I have read-write and delete permission on a Set if any one of : I am the owner and the owner has readwrite permission By default the owner has readwrite access, but this can be removed (and restored) by the owner him/herself.
I belong to a group or project that has readwrite permission Must be granted explicitly by the owner.
My user-ID has been given readwrite permission Must be granted explicitly by the owner.
readwrite-del: I have read-write but NOT delete permission on a Set if any one of : I am the owner and the owner has readwrite-del permission By default the owner has readwrite access, but this can be removed (and restored) by the owner him/herself.
I belong to a group or project that has readwrite-del permission Must be granted explicitly by the owner.
My user-ID has been given readwrite-del permission Must be granted explicitly by the owner.
There are special privileges reserved for the following account-types:
Account-type Privileges Comment
administrator Only administrators can create user-IDs, manage user-accounts, create & manage groups, and create Sets in Classes opfc, misc and adhoc. Admins can also create projects, and delete or transfer ownership of others' Sets.  
owner Only the owner of a Set can manipulate the Set's ACL. They can delete, or transfer ownership of, their own Sets. Creator of a Set (crum or devfc), or has had ownership conferred.
superusers A sub-set of Met Office Staff with the additional ability of creating projects.  
The following table shows how MOOSE commands that are limited by permissions or privileges work.NOTE: Several of the commands are not available to users outside of the Met Office domain. Please see The MOOSE command line interface for details of permitted commands.
MOOSE Commands Permission or Privilege Required
Read and list commands get Read, readwrite or readwrite-del permission for the Set
ls (within a Set)
mdls
select
test (1)
Write commands put Readwrite or readwrite-del permission for the Set
mkdir (adhoc only)
Delete and move commands rm Readwrite permission for the Set; Internal users only
rmdir (adhoc only)
mv
dispose
restore
Creating a Set mkset Any user (2) can create a Set in Classes crum or devfc; only administrators can create Sets in opfc, misc & adhoc.
Deleting a Set rmset Restricted to the Set-owner and administrators
Changing Set-ownership and access-permission chown
chacl
Viewing Set-ownership ownedsets Any internal user can list their own sets, or sets belonging to other users and projects (using -u or -p).
Changing Set metadata chmeta Any internal user with read permission can add a tag; Only the user who added a tag can remove it; Only the set-owner can change the category.
Comments comment Any user with read permission can add a comment; Only the user who added a comment, or an administrator, can remove it; Only the owner of the comment can link it to a MOOSE URI; Only the comment owner and administrators can unlink a comment from a URI.
Quality records quality Add & modify; link, unlink & remove: restricted to users with write-permission on the Set(s), or members of specified projects. Output & export: users with read permission, or members of specified projects.
Listing down to Set ls moose: Available to all users
ls moose:<class>
Viewing Set access-permissions getacl Available to internal users with read-access to the Set
getprojacl
viewprot
Viewing Set info setinfo Users with read-access to the Set
Viewing Project info projinfo Any user
Listing Projects prls
Changing Project- ownership chownproj Restricted to administrators and project-owner
Changing Project info chprojectinfo
Managing projects project Restricted to administrators and superusers
Managing users & groups   Restricted to administrators
Suspending a command suspend Restricted to the user who issued the original command
Resuming a suspended command resume

1. Note that moo test [URI] will return false if the user does not have read or readwrite permission, even if the user is the Set-owner and the target-URI exists.2. Note that this only applies to Sets created by Met Office users. Locations where Sets associated with Projects can be created are governed by the rules stipulated in the Project itself, which are likely to be more restrictive.

Data Collection

A data collection is a directory that contains files in a structured data set. A data collection cannot contain directories, and all files must be of the same file type. Files within a data collection will be accessed or managed together.

All data collections belong within data sets. Intermediate directories may exist between the data set and the data collection. Note that all data collections in the same data class must have the same number of intermediate directories.

Some data collections contain a file type that will support record level access (Currently only portable PP, tar & NetCDF.)

File Types

The concept of File Type is supported by MOOSE to allow us to provide extra features for certain types of data, such as access to fields within a pp file.

MOOSE currently supports three file types: binary files, portable PP files and tarballs. Any pp files that require record level access should be stored in MASS portable PP in an appropriately-named data collection within a structured data class. Any other pp flavours should either be converted into this format before storage, or using options on the MOOSE command line. Tarballs that require record level access must be in USTAR format.

All data collections of files of the same file type share a naming convention.

.pp
portable PP files: MOOSE supports record-level access on these collections.
.tar
tarballs: MOOSE supports record-level access on these collections.
.file
any other binary (non-pp) files, supports whole file access, and NetCDF atomic access (as of release 4.6.x) for legacy NetCDF data.

There are two special cases for binary files:

.nc.file
NetCDF files: MOOSE supports record-level access on these files, using a local variant of ncks.
.xmit.file
files from the IBM mainframe that have been through the xmit utility. This preserves the MVS dataset characteristics. (Only restore these files to a platform other than the IBM mainframe if you really know what you are doing).

The file types supported will be extended as a future enhancement.


The MOOSE command line interface

Referring to Data in MASS

mooseURI

Data in MASS is referred to through a mooseURI. A mooseURI begins with one of moose:, moo:, :, optionally followed by / or //.

On output MOOSE will always use the full form moose:/.

As there is no concept of a current working directory within MASS you must always specify the full mooseURI.

There is a limit on the length of a mooseURI. This limit is 960 characters following the moose: but including the implied leading /. Filenames are limited to 256 characters. In practice this should not be too restrictive.

No mooseURI can contain the hash-character ("#") anywhere.

In the adhoc data class there are no other constraints on the names of directories or files. It is, however, worth avoiding characters that may be confused with wild-cards, or lead to hard-to-handle file names when restored. Other data classes may impose naming conventions on their own hierarchies.

A file in MOOSE is much like a file on a Unix file system. The file is referred to by the mooseURI and only a single version of a file can reside at a mooseURI - there is no concept of versioning of files in MOOSE. A file or files can be created with the put command, deleted with the rm command or overwritten using the -f (--force) option on the put command.

MOOSE wildcard

In addition to a mooseURI that identifies a single directory or file, MOOSE supports simple wild-cards. These are *?[]. Wild cards work in a similar way to Unix/Linux shells:

*
matches zero or more characters
?
matches one character
[]
matches a character range. For example [a-z] will match any lower case character.

To limit the resources used by individual commands, wild cards can only be used below a data set, and only in the last level of a mooseURI.

Error Codes and Command Failure

There are a number of reasons why a MOOSE command can fail. When there is a failure the command line will return a non-zero error code and print an error message to STDERR. Successful commands return 0 as the error code.

In general the MOOSE command line uses fairly coarse-grained error codes, and more than one error message may be associated with the same error code. The underlying philosophy is that error codes are processed by scripts and so should only be different if a script can take different action based on the error code. More detail on the cause of the error, that is useful to the user, is available in the error message.

The error codes general to all MOOSE commands are:

2
a user error. Usually (though not always) a user error should be correctable by the user. It might include fixing a typo in mooseURI, correcting your password, or getting the right permissions on a data set.
3
a system error. This is a problem you will not be able to fix and should contact the Storage Team. It may result from a programming bug, or incorrect system configuration. This error code is also returned when there is a system outage, or when the system times-out as a result of excessive load. More information on system outage should be available on the newsgroup (met-office.mass-r.announce).
4
an error in the client system external to MOOSE, e.g. file I/O or format-conversion.
5
indicates that the system is running, but that either general access, or the requested command-type ('put', 'get' etc.), has been temporarily disabled to allow maintenance action.

Some of the MOOSE commands return other error codes in special cases where a script might take alternative action. These are outlined in the error handling sections of the documentation on the individual commands. If you have a usage scenario that you think could be handled by a distinct error code then contact the Storage Team for review.

Any command that acts on multiple files or directories in MOOSE will fail, and do nothing, if the command acting on any one of the files individually would have produced a user error. This is different behaviour from Unix where the command will complete until the first failure.

Options Common to many commands

There are some options that are common to many of the MOOSE commands. This section gives an outline of the common features.

-h, --help, --usage
  Will output a help message on the command.
-n, --dry-run, --dryrun
  A command run in dry-run mode will not change the state of your local disk or the MASS system. It will only perform command verification (authorisation, naming convention checks etc) and estimate the resource usage of the command. This can be used for checking whether commands will run, understanding the likely time a command will take, or seeing what wild-cards will expand to.
-q, --quiet The default output from the Moose client is more verbose than standard Unix/Linux commands. For instance all commands that result in a change in storage in either MASS or the local client will print the MOOSE command-id. This command-id is unique to the command and should be used when reporting problems to the Storage Team. --quiet will suppress all output to standard output, but errors will still appear on standard error.
-v, --verbose This option will increase the amount of information output about the progress of an 'action' command. If a command fails, the output will automatically become verbose.
-l, --long This option increases the range of information supplied by 'info' commands e.g. ls, cstat, si.

comment: provides functions to add comments to sets and collections

Usage: moo comment URI [URI...] (for options: output, export) moo comment EXPORT_PATH URI[URI...] (for option: number)

The Comment command provides an authorised user with the ability to add comments to set and/or collections and also provides features to change, delete, and link comments to sets and collections. This command requires that a "command type" option be specifed to determine which function the command will perform.

(Note that external users cannot add, modify or delete comments.)

Vaild options:
-h, --help, --usage
  prints help
-n, --dry-run, --dryrun
  tries running to check for errors but does not change anything
-o, --output prints the comments associated with the supplied set and/or collection URIs.
-e, --export=EXPORT_PATH
  saves the comments of the supplied set and/or collection URIs to local files.
-N, --number=QUANTITY
  saves a limited number of comments for each of the supplied containers to local files.

Outputting and Exporting comments

Comments linked to sets or collections can be printed or exported as comment text files, using either the output or export options. To print comments, use the output option along with the URI arguments for the sets and/or collections for which comments are required. As with the add option, this option requires the user to have read access for the given sets and/or collections. To export comments, a directory path specifying where to place the comments must be given as an argument to the option. Command arguments should be the set and/or collection URIs. For each of the URIs, a folder structure resembling the URI is created beneath the export directory. Within these directories will be the comment text files. The files are named according to the comment id number. For example, if a set and a collection immediately beneath that set are passed as arguments, the resulting folder structure will mean the comments for the collection will reside in a directory adjacent to the set comment files. The same access rights as output apply.

There is also another exporting option, --number, which provides the same functionality as export, but with the ability to restrict the number of comments output for each URI. To use this option a quantity limit must be passed as an argument to the option, with the export directory being passed as the first command argument, followed by the URIs for the sets and/or collections that need to be exported. The most recent comments will be returned.

Examples

Output Example:

$ moo comment -o :/crum/aaaaa/ :/crum/bbbbb/
			

Export Example:

$ moo comment -e ~/test/ :/crum/aaaaa :/crum/bbbbb
			

Number Example:

$ moo comment -n 10 ~/test/ :/crum/aaaaa :/crum/bbbbb
			

dls: List disposed data

Usage:

moo dls URI [URI ...]
		

Data which has been marked for deletion using the dispose command may be listed using moo dls. Note that moo ls -a may also be used, however there are important differences between the two invocations. Firstly, dls will only show data which has either been disposed of, or has some contents which have been disposed of. Secondly, dls will indicate where the returned nodes have themselves been marked for deletion, or have been marked for deletion due to a parent set or collection having been disposed of. Finally, using the --view-metadata option with dls will additionally show the name of the user who ran the dispose command, together with the date and time at which the data becomes eligible for permanent deletion.

Valid options:
-h, --help, --usage
  prints help
-d, --directory
  displays information for the selected directory
-R, -r, --recursive
  descends recursively, note recursion only works below the data set
-x, --xml displays information in XML
-v, --view-metadata
  displays the user who ran the dispose command, and the time when permanent deletion is imminent

The MOOSE nodes returned by dls mirror the behaviour of ls. If a directory argument is given, the disposed contents of that directory is returned, unless -d is used. If a wildcard is given, the nodes returned are those disposed nodes matching the wildcard. If the provided URL does not exist, or the wildcard expansion matches no disposed data, a non-zero return code results.Immediately following the node path in the output is the disposal status. The status is one of: PARENT, NODE or CHILD. PARENT indicates that a parent collection or set of that node has been marked for deletion. NODE indicates that the displayed node itself has been marked for deletion. CHILD indicates that the node has not been disposed, but some of its contents have.

Using the -v option will additionally display the name of the user who ran the original dispose command, and the date and time at which the data becomes eligible for permanent deletion. The data is likely to be deleted permanently within 24 hours of this date, as the deletion process for disposed data runs once daily. If the disposal status is CHILD, this additional metadata will not be displayed (as there may be a range of deletion dates and users for different contents). The username may be used as a contact if the data is still in fact required, and needs to be restored.

Examples

To find disposed data in set myset:

$ moo dls moose:/crum/myset moose:/crum/myset/ama.pp        NODE
			

To find all sets in crum containing some disposed data (view-metadata option):

$ moo dls -v moose:/crum moose:/crum/setaa               CHILD moose:/crum/setaa               NODE      carl.rossby          2013-09-11 15:10:58 BST
			

To find disposed data in setab (xml format, with metadata):

$ moo dls -xv moose:crum/setab <?xml version="1.0"?> <nodes> <node url="moose:/crum/setab/ama.pp"> <disposalStatus>PARENT</disposalStatus> <disposalUser>carl.rossby</disposalUser> <deletionTime>2013-09-11 15:10:58 BST</deletionTime> </node> </nodes>
			

filter: Extracts defined variables from data files (NetCDF)

Usage:

moo filter FILTER-FILE SRC-URI [SRC-URI...] DEST-PATH
		

Extracts defined variables from one or more (NetCDF) data files.

When retrieving from multiple files, moo filter also limits the total volume of data, the number of files transferred and the number of tape mounts (same as moo get). Requests that exceed these limits will fail and will need to be reorganised in order to retrieve smaller amounts of data with each command. These limits are subject to change and can be found using moo si -l. The size of the file is not limited when a single file is requested.

Valid options:
--dry-run, --dryrun, -n
  tries running but does not alter anything
-i, --fill-gaps
  retrieves all target files within a list except those that already exist at destination
-k, --conversion-threads=MAX_CONVERSION_THREADS
  overrides the default setting for maximum concurrent client-side format-conversion threads
--force, -f allows overwriting of the destination
--help, --usage, -h
  prints help
-c FORMAT, --local-file-format=FORMAT
  specifies the local file format for use in converting files
--quiet, -q decreases the verbosity
--verbose, -v increases the verbosity
-z, --compressed-transfer
  uses compression transfer to reduce traffic, at the expense of processing
-a, --append appends the restored data to existing destination files (currently only valid for NetCDF data). See Append option for get for use of this option.
-b, --large-retrieval
  relaxes the check for sufficient client-side disc-space
-T FILE-TYPE, --file-type=FILE-TYPE
  specifies the file type to run the query against, in the case where it cannot be inferred (currently only "nc" can be specified, indicating NetCDF)
-L PATHNAME, --licence-file=PATHNAME
  overrides the default pathname for the file containing usage-licence details

Arguments:

FILTER-FILE: a text-file that holds the options for ncks to be applied to the retrieved NetCDF file.

SRC_URIs: must specify one or more (possibly wildcarded) Moose data file paths. Currently only supports NetCDF data files.

DEST-PATH: a destination-path that must be writeable by the client.The rules are exactly as specified in get: retrieves files from MASS.

Note: when filtering a single file you can specify a new name for the resulting file. This is done by specifying the desired filename as part of the destination argument.

Examples

Example 1: selection of a single NetCDF data file, in the simplest fashion:

$ moo filter ~/filters/ncks_std_opts moose:/devfc/ana.nc.file/file1.nc ~/results $ ### filter, command-id=65438, estimated-cost=UNKNOWN, files=1, media=1
			

Example 2: retrieve multiple files using a wildcard moose file path:

$ moo filter ~/filters/ncks_std_opts moose:/devfc/ana.nc.file/* ~/results $ ### filter, command-id=65456, estimated-cost=UKNOWN, files=1 media=1
			

Example 3: Append retrieved file data to an existing file ('exists.nc'):

$ moo filter -a ~/filters/ncks_apd_opts moose:/crum/aaaaa/ama.nc/file1.nc ~/results/exists.nc $ ### filter, command-id=65500, estimated-cost=UNKNOWN, files=1 media=1
			

Note that the resulting file sizes are unknown until after the retrieval is complete, because the file is filtered by ncks during the data transfer process. It is therefore possible that the local destination may have insufficient space for the resulting data, in which case the transfer would fail. If the options file ~/options/myncksoptions is invalid (for example, if a variable is specified which does not exist in the source file), the error message from ncks will be displayed to the user, and the MOOSE command will fail with user error return code. See the section on filtered retrievals above, and ncks documentation, for valid options.

Large-retrieval option

By default, MOOSE checks that there is enough space at the destination to store all the files specified in the retrieval, and will abort the whole request if not.

If the --large-retrieval or -b option is specified, MOOSE will check that there is enough writable space at the specified destination only for the largest file to be retrieved. This is to allow users to issue large multi-file retrievals as part of processing suites, where files are processed and deleted as soon as they are available on the client. If this option is used, it is up to the user to manage client-side disc-space: MOOSE will not do this. If MOOSE finds that, at any time, it does not have enough space on the client to write a file, it will abandon the rest of the request as a user-error (RC=2).

Output

Each moo filter command will output an informational message including the command-id and a measure of the cost of the command. Currently the cost is a combination of the number of resulting files, and the number of distinct tape-media involved. Note that the total size of the command cannot be established until the command has completed.

See also Data-usage licences.

Error

Error code 17, output file(s) already exist when using -i, --fill-gaps option: When filter is used to retrieve only those files missing at the destination and it is determined that there are none missing, the command will exit with a return code of 17. This allows users who want to maintain a local cache to distinguish between this scenario and other errors.


get: retrieves files from MASS

Usage:

moo get SRC-URI [SRC-URI ...] DEST-PATH
		

Retrieves one or more files (SRC-URI) from MOOSE to a destination (DEST-PATH) in the local file system. SRC-URI must be a valid mooseURI or a mooseURI wild card. The SRC-URI must all be in the same data set. DEST-PATH can be the name of either an existing directory or a file. If it is a directory, the name of each resulting file will take the base name of the corresponding SRC-URI. If you try to retrieve a file that already exists at the destination, you will get a failure unless you use the -f (--force) option to force get to overwrite. When retrieving multiple files without -f if you specify the -i (--fill-gaps) option, it will retrieve all the target files within a list except those that already exist at the destination.

If a specified SRC-URI does not correspond to an existing file, the whole command will fail with an RC of 2. If the command specifies multiple SRC-URIs, together with the option -g or --get-if-available, and some of the files do not exist, the command will continue to retrieve those that do and will complete with RC=0, UNLESS none of the specified SRC-URIs exist (in which case RC=2).

When retrieving more than one file DEST-PATH must be a directory.

When retrieving multiple files, moo get limits the total volume of data, the number of files transferred and the number of tape mounts. Requests that exceed these limits will fail and will need to be reorganised in order to retrieve smaller amounts of data with each command. These limits are subject to change and can be viewed using moo si -l command. Please refer to si: displays information about the MOOSE hardware & software for more information. The size of the file is not limited when a single file is requested.

MOOSE has a mechanism designed to prevent unintentional repeated retrieval of the same file(s), for example in cases where a script is erroneously left looping over the same commands. If a single user repeatedly submits get requests for the same file within a short period, eventually the commands will fail with message TSSC_EXCEEDS_RECENT_RETRIEVAL_LIMIT and a return code of 2. Once the repeated requests have stopped, MOOSE will automatically reallow retrieval of these files after a limited period. If your commands are returning this message and you believe this is in error, or if you have stopped the offending script and again need access to the files, please report the problem in the usual way, and the Storage Team will be able to remove the block. Note that this mechanism does not prevent other users from retrieving the blocked files.

When the back end storage is down moo get will fail.

Valid options:
-n, --dry-run, --dryrun
  checks the command can run and estimate cost, but does not copy data out of MASS.
-i, --fill-gaps
  retrieves all target files within a list except those that already exist at destination
-k, --conversion-threads=MAX_CONVERSION_THREADS
  overrides the default setting for maximum concurrent client-side format-conversion threads
-f, --force allows overwriting the destination
-I, --fill-gaps-and-overwrite-smaller-files
  allows overwriting the destination if source is larger than destination file size
-g, --get-if-available
  retrieves all available files from target-list, ignoring those that do not exist at source
-h, --help, --usage
  prints help
-c FORMAT, --local-file-format=FORMAT
  specifies the local file format for use with collections of particular file types
-q, --quiet suppresses all output to standard out
-v, --verbose lists files retrieved, and adds information on server communications
-z, --compressed-transfer
  uses compression on transfer to reduce traffic, at the expense of processing
-a, --append appends the restored data to existing destination files (currently only valid for NetCDF data)
-b, --large-retrieval
  relaxes the check for sufficient client-side disc-space
-L PATHNAME, --licence-file=PATHNAME
  overrides the default pathname for the file containing usage-licence details

Examples

Retrieve a single file from MOOSE: $ moo get moose:crum/juaaa/ada.file/afile . ### get, command-id=44189, estimated-cost=1813byte(s), files=1, media=1

Retrieve a single file with more verbose output:

$ moo get -v moose:crum/juaaa/ada.file/afile . ### get, command-id=44190, estimated-cost=1813byte(s), files=1, media=1 ### task-id=0, estimated-cost=1813byte(s), resource=moose:/crum/juaaa/ada.file/afile ### 2008-11-21T15:44:09ZGMT: polled server for ready tasks: #0 ### 2008-11-21T15:44:10ZGMT: polled server for ready tasks: #0 ### task-id=0, transferred=1813byte(s) T   moose:/crum/juaaa/ada.file/afile -> /var/tmp/afile
			

Retrieve multiple files from MOOSE using a wild card:

$ moo get -v moose:crum/juaaa/ada.file/* . ### get, command-id=18710, estimated-cost=25839byte(s), files=3, media=1 ### task-id=0, estimated-cost=8613byte(s), resource=moose:/crum/juaaa/ada.file/file1 ### task-id=1, estimated-cost=8613byte(s), resource=moose:/crum/juaaa/ada.file/file2 ### task-id=2, estimated-cost=8613byte(s), resource=moose:/crum/juaaa/ada.file/file3 ### task-id=0, transferred=8613byte(s) T   moose:/crum/juaaa/ada.file/file1 -> /var/tmp/file1 ### task-id=1, transferred=8613byte(s) T   moose:/crum/juaaa/ada.file/file2 -> /var/tmp/file2 ### task-id=2, transferred=8613byte(s) T   moose:/crum/juaaa/ada.file/file3 -> /var/tmp/file3
			

Output

Each moo get command will output an informational message including the command-id and a measure of the cost of the command. Currently the cost is a combination of: the total size of the data to be transfered from the MOOSE system to the local machine, the number of files, and the number of distinct tape-media involved.

More information can be obtained using the -v or --verbose option. This causes output of a set of lines for each get task. A get task is the retrieval of an individual file from MOOSE. There is a task for every file specified on the command line (after the expansion of any wild cards). The first line includes an id for the task and the size of data to be transferred. Then follows a line giving information on remote and local locations of the files for this task. This line has a T at the beginning to indicate Transfer. In addition, information is displayed each time the client polls the server to see if the file is ready to fetch. The client will poll the server more times when the system is busy and your command is in a queue, or if the file is on tape and the storage system is having to fetch and mount the tape in a drive.

Large-retrieval option

By default, MOOSE checks that there is enough space at the destination to store all the files specified in the retrieval, and will abort the whole request if not.

If the --large-retrieval or -b option is specified, MOOSE will check that there is enough writable space at the specified destination only for the largest file to be retrieved. This is to allow users to issue large multi-file retrievals as part of processing suites, where files are processed and deleted as soon as they are available on the client. If this option is used, it is up to the user to manage client-side disc-space: MOOSE will not do this. If MOOSE finds that, at any time, it does not have enough space on the client to write a file, it will abandon the rest of the request as a user-error (RC=2).

Append option for get

On Met Office Linux desktop machines (where the external utility ncks is available), the --append option may be used, provided that the source file is NetCDF format contained in a .nc.file collection. In this case, if the destination file already exists, MOOSE will use the external utility ncks to append the source file to the destination. The destination file must therefore also be in NetCDF format. Any variables in the source file which do not exist in the destination file will be appended, whereas any variables which are already present in the destination file will be overwritten. For this reason, care must be taken when using this option, as data may be overwritten without warning.

If --append is used, and the destination file does not already exist, moo get will behave as if the option had not been used.

Since the internal mechanism for performing the append is similar to that used with the -c option, both -c and -a cannot be used in conjunction. (In this case, -c will be ignored). If filtering of the NetCDF data is required as well as -a, moo filter -a should be used.

Error

Error code 17, output file(s) already exist when using -i, --fill-gaps option: When filter is used to retrieve only those files missing at the destination and it is determined that there are none missing, the command will exit with a return code of 17. This allows users who want to maintain a local cache to distinguish between this scenario and other errors.

Data-usage licences

Access is provided to users external to the Met Office only following receipt of undertakings that any data retrieved will be used only in accordance with standing Terms and Conditions. For that reason, retrievals to external systems will be accompanied by a text-file that points to the specific licence(s) under which the data have been made available. By default, this licence-file will be written to the same location (directory) as the retrieved data; users may override this by specifying an alternative (directory or complete pathname) using the "-L" option (which must take an argument). If the pathname does not exist, or is not writeable, the file will be placed instead in the user's home-directory. The user will be informed via the CLI if a licence-file has been created.


help: prints help for the moo command line

Usage:

moo help [SUB-COMMAND ...]
		

Prints help for the MOOSE command line interface. If no argument is specified, prints a listing of available sub command. If one (or more) argument is specified and is a valid sub command, prints help for each specified sub-command.

When the back end storage is down moo help will continue to work.

Valid options:
-a, --all prints help information for all sub commands
-h, --help, --usage
  prints help
Alternative names: moo h, moo ? moo --help moo --usage

Examples

To print help on moo get:

moo help get
			

kill: sends a kill signal to running commands

Usage:

moo kill CMD-ID [CMD-ID ...]
		

Sends a kill signal to one or more running MOOSE transfer commands. The purpose of moo kill is to prematurely terminate the transfer commands in question. If put commands are killed, it is very likely that data associated with them will not be archived. This is the case even if the user view suggests that the put has completed. The reason for this is that ingestion into storage may happen some time after the user has received a return code for the transfer. It is also possible that some files associated with a killed command will be archived while others will not, depending on the state of the command when the kill signal is processed.

Valid options:
-h, --help, --usage
  prints help and exits
-q, --quiet decreases the verbosity
-v, --verbose increases the verbosity

Examples

To kill a command ID within MOOSE: $ moo kill 45101

Alternatively ctrl c can also be used to kill a running MOOSE command.


ls: prints a listing of directories/files in MASS

Usage:

moo ls [URI ...]
		

Prints a listing of the files and directories in one or more locations (URI) in the MASS system. The default listing simply shows files or directories within the URI. More information on the directories and files can be found by specifying options.

When the back end storage is down moo ls will fail.

Valid options:
-u, --atime, --access-time
  displays last-access rather than archival time
-d, --directory
  displays information for the selected directory
-h, --help, --usage
  prints help
-l, --long displays extra information in long format (see examples)
-f, --full displays even more information in extra-long format (see examples)
-R, -r, --recursive
  descends recursively, note recursion only works below the data set
-S, --size sorts by size
-t, --time, --otime
  sorts by time
-m, --media details of storage-media are included
-x, --xml displays information in XML
-p, --page displays only the specified pages of results. A range of pages may be requested, separated by a hyphen. A page size may be specified, separated from the page range by a colon.
-v, --view-metadata
  displays all textual metadata associated with data set and data collection nodes, ie categories and tags.
-a, --all additionally displays all data which have been safely deleted using the moo dispose command. The URIs of such data will be prefixed with the string "D* ".
See the moo chmeta command specification below for information on how to apply categories and and text tags to sets and collections. If -v is used with ls, each set node in the output will also display its category (possibly UNCATEGORISED), followed by a list of all text tags. Each collection node in the output will display a list of any associated text tags. The recommended way to use -v is in combination with -x, as this produces more easily readable output.Alternative names:
moo list
		

Examples

To list the data classes available from within MOOSE:

$ moo ls moose: moose:/adhoc moose:/crum moose:/devfc moose:/opfc
			

To list the directories and files within your unstructured data data set:

$ moo ls moose:adhoc/users/jules.charney moose:/adhoc/users/jules.charney/mode1 moose:/adhoc/users/jules.charney/mode2
			

A long listing gives more information, including the owner, annual storage cost, size in bytes and write-timestamp (whitespace compressed for clarity):

$ moo ls -l moose:crum/auaaa/apm.pp F carl.rossby    0.14 GBP   141520 2009-03-02 17:14:14 GMT moose:/crum/auaaa/ama.pp/file1.pp F carl.rossby    0.14 GBP   141520 2009-03-02 17:14:14 GMT moose:/crum/auaaa/ama.pp/file2.pp
			

A full listing gives the information from a long listing plus additionally the archiving level:

$ moo ls -f moose:crum/auaaa/apm.pp F carl.rossby    0.14 GBP   141520 2009-03-02 17:14:14 GMT Single-Copy moose:/crum/auaaa/ama.pp/file1.pp F carl.rossby    0.14 GBP   141520 2009-03-02 17:14:14 GMT Single-Copy moose:/crum/auaaa/ama.pp/file2.pp
			

A listing with the media option:

$ moo ls -m :crum/molys/ama.pp FD A0002700      123   45678    141520 moose:/crum/molys/ama.pp/acrjta.pyn1c10.pp FD                  1801856 moose:/crum/molys/ama.pp/cb10ra@pmu49jn.pp FD A0002700      765   32176    141520 moose:/crum/molys/ama.pp/file2000.pp FD A0002700      432    5234    141520 moose:/crum/molys/ama.pp/file2001.pp FT A0002700       13    2244    141520 moose:/crum/molys/ama.pp/file2002.pp
			

In the above example, D indicates that a copy of the file is available on disc-cache, whereas T indicates that the file is stored only on tape. The string following is the media-ID of the tape-volume. So the second file above has not yet been copied to tape-storage (since no media-ID is listed), and the last of the five is only available from tape-storage. The remaining 3 files have copies on both tape and disc-cache. If a disc-copy is available it will always be returned by a get request. Files that are only stored on tape (i.e. those that have been 'purged' from disc) will take longer to retrieve. The three numbers following the media-ID are the file's relative position on the tape-volume, its offset in bytes, and its size in bytes.

A long listing sorted on last-access time:

$ moo ls -ltu moose:crum/juaaa/apm.pp F carl.rossby    0.14 GBP   141520 1970-01-01 00:00:00 GMT moose:/crum/juaaa/apm.pp/file1.pp F carl.rossby    0.14 GBP   141520 1970-01-01 00:00:00 GMT moose:/crum/juaaa/apm.pp/file2.pp
			

A long listing with the directory option to display the details of a specified collection:

$ moo ls -ld moose:crum/auaaa/apm.pp C carl.rossby    2.83 GBP   283040 2009-03-02 17:12:11 GMT moose:/crum/auaaa/apm.pp
			

Recursive listing:

$ moo ls -R moose:crum/juaaa moose:/crum/juaaa/ada.file moose:/crum/juaaa/ada.file/file1 moose:/crum/juaaa/ada.file/file2 moose:/crum/juaaa/ada.file/file3
			

Listing using xml format output:

$ moo ls -lx moose:crum/setmo/ama.pp <?xml version="1.0"?> <nodes> <node   kind="F"   url="moose:/crum/setmo/ama.pp/good.pp"> <time> <access>2009-09-03 14:48:13 GMT</access> <create>2009-09-02 14:08:36 GMT</create> <modified>2009-09-03 14:09:07 GMT</modified> <put>2009-09-03 14:09:07 GMT</put> <managed>2009-09-02 17:14:14 GMT</managed> </time> <owner>my.name</owner> <size>141520</size> <cost>0.14</cost> </node> <node   kind="F"   url="moose:/crum/setmo/ama.pp/better.pp"> <time> <access>1970-01-01 01:00:00 GMT</access> <create>2009-03-02 14:09:39 GMT</create> <modified>2009-03-03 14:09:48 GMT</modified> <put>2009-03-03 14:09:48 GMT</put> <managed>2009-03-02 17:14:14 GMT</managed> </time> <owner>my.name</owner> <size>141520</size> <cost>0.14</cost> </node> </nodes>
			

Listing of a set node together with text metadata:

$ moo ls -vd moose:crum/setmo moose:/crum/setmo                     UNCATEGORISED                   "A MOOSE tag"   "Another tag"
			

Output

Without any options the output from moo ls is simply a list of names of files or directories held within the URI. Each name is the full mooseURI that refers to the file/directory.

The output from the long listing includes information on the size, annual cost, timestamps and type within the MOOSE data framework. In addition, at the data set level and below, the owner of the data set is displayed. Please note that the output and options for long & media listings (and xml output) are likely to change. Please do not build utilities which rely on the output format without seeking further advice.

In a long listing the typical output is of the form:

type owner cost size time-stamp name
			

In a full listing the typical output is of the form:

type owner cost size time-stamp archiving-level name
			

The meaning, and terms used, in each of these columns is described in what follows.

The type refers to the type in the MOOSE data framework:

C - data collection D - directory F - standard file S - data set
			

If the --media option is specified, information on storage-media is displayed:

type media-type media-id position offset size name
			

The media-type is only relevant for files and gives an indication of where the file is stored:

- - not relevant because not a file D - disc, with or without a copy on tape T - tape only
			

The media-id is the serial-no of the physical storage-volume if there is a copy of the file on tape. So if a file is marked with a D and a media-id, it has both a tape and disc copy. The disc copy will be used if you restore such a file as this is much faster.

The 'position' and 'offset' values are given only if there is a copy of the file on tape, and can be useful in establishing the degree of co-location achieved among related files.

Note that the 'owner' is not listed with the -m option.

The cost is an estimate of the cost of holding that data (a set, collection, or file) for one year. The current cost rate can be found using the si long command ( si -l).

Note the cost is only listed as part of a long or full list ls -l or ls -f.

The size is the size in bytes of the data. For files this is the size of the file. For data sets and data collections it is the size of all the files within the data set or data collection. Other directory types report a size of 0.

The archiving-level is either Duplex if we hold two copies of the file or Single-Copy if we only hold one copy.

The name is the mooseURI of the directory/file.

In the case of a normal long or 'media' listing the time-stamp is the last modification time. For files and plain directories this will be the time they were written to MOOSE. For data sets and data collections, this will be the last time data within them were archived, deleted or moved. If the -u option has been specified then the time-stamp is the time the file was last read. Data that have never been read from MOOSE will have the access time set at 1970-01-01 01:00:00 UTC. For data sets and data collections, the last access time is that of the latest access of a file within them, or the creation time if no access has occurred.

In the case of legacy data that were migrated from the previous MASS system the time displayed by the default long listing is the time the data were originally archived.

The xml option displays the same information in XML format, although with -xl the times of creation, modification and last-access are all included, and with -xm the output includes the owner, collocation parameters and a check-sum.

Note that, if more than one of the -m, -f or -l options are specified, only the last is effective: 'ls -lm' = 'ls -m', and 'ls -mlf' = 'ls -f'.

If the --page option is specified, only given pages of results will be displayed. The format of this option is '-p=<lower-page>-<upper-page>:<page-size>'. Either the upper-page or page-size (or both) may be omitted. '-p=0:<page-size>' causes the output of all pages, but using the specified page-size instead of the default (if smaller).

Examples:

$ moo ls -p=1:500 moose:crum/auaaa/apm.pp Displays the first 500 entries in the collection.  $ moo ls -p=2-4:500 moose:crum/auaaa/apm.pp Displays pages 2, 3 and 4 within the collection, ie entries 501 to 2000.  $ moo ls -p=7-8 moose:crum/auaaa/apm.pp Displays pages 7 and 8, with the default page size of 50000, ie entries 300001 to 400000.  $ moo ls -p=0:2000 moose:crum/auaaa/apm.pp Displays all the entries in the collection, in batches of 2000.
			

Due to limited available memory on the MOOSE client, listing results which exceed a specified threshold (currently set at 50000 nodes) are split up and returned to the client in batches for display. This behaviour only occurs if the request involves a single url. If multiple urls are specified, the user will receive an error message stating that the node limit has been exceeded, and they may then resubmit the request with one url at a time. When the results are sent in batches, it is not possible to correctly order them by size or timestamp, and hence ls --time or ls --size will fail if the limit is exceeded. Again, the request may be resubmitted without the sorting option.

Alternatively, users can access large listing results a page (or several pages) at a time, using the syntax given above. Note that if a page number beyond the final page is given, no results will be returned, however the exit code will be zero. Nodes beneath a given directory are displayed in alphabetical order, with (for recursive listings) nodes immediately beneath a directory displayed immediately following that directory. Hence it is guaranteed that, by submitting a series of commands with '-p=1', '-p=2' and so on, each node entry which would have been returned without the paging option will eventually be displayed exactly once. Note also that it is possible to page recursive listings in this way. This is not recommended for large recursive listings, however, as performance is expected to be poor (recursive listings require a call to storage to determine if a given node is a file or directory, regardless of whether the node is due to be displayed within the requested page). Finally, if the paging option is used with more than one target url, the requested page(s) will be displayed for each url.


mdls: prints a listing of the data-atoms in a structured data-collection

Usage:

moo mdls [QUERY-FILE] URI [URI ...]
		

Prints a listing of the data-atoms, selected according to the criteria in the QUERY-FILE, from a numer of data collections (URIs) in the MASS system. Alternatively, prints all of the distinct values of a given PP attribute.

Valid options:
-h, --help, --usage
  prints help
-b BATCH-SIZE, --batchsize=BATCH-SIZE
  overrides the default output batch-size
-t, --tabulate displays information in tabulated format
-x, --xml displays information in XML
-s SORT-PARAMETER, --sort=SORT-PARAMETER
  sorts output files according to the given parameter
-a ATTRIBUTE, --attribute=ATTRIBUTE
  prints all of the distinct values of the specified attribute found in the target collection
-T FILE-TYPE, --file-type=FILE-TYPE
  specifies the file type to run the query against, in the case where it cannot be inferred
-S, --summary produces a summary of distinct attribute values matching a query

The syntax of the QUERY-FILE is as described in the section Record-level retrieval: Query syntax The limit on the size of this file is identical to the limit for the select command. This will return a listing of the default set of attributes (see below). Alternatively, a list of up to (currently) 5 attributes may be included in the query-file, and the values of those attributes will be listed instead. Each attribute must be on a separate line, and the list must be enclosed within begin_attributes and end_attributes tags. This list may appear anywhere in the query file, but may not be contained within another query block. As with the QUERY-FILE syntax, blank lines, whitespace and comments (beginning with the '#' character) will be ignored.Default attributes listed:

PP Collections:

pp_file lbfc stash lbft t1 t2
		

Tar Collections:

tar_filename filename
		

Any queryable tar or PP attribute may be used as a parameter for the --sort option. Whenever -s is used, only the order of the files in the output will change, and not the atoms within those files. If the sort parameter is a file attribute (eg pp_filename, file_start_date), the output files will be from lowest to highest value. If it is an atom attribute (eg T1, lblev), the files will be ordered according to the lowest value appearing in any matching atom within the file. Note that if the query is altered so that different atoms within the same set of files are matched, this could affect the sort order.

The query file may be replaced with use of the -a option. In this case, rather than displaying matching files and atoms, a complete set of all distinct values of the given attribute for some atom within a file in the collection will be displayed. This option is intended to answer questions like: what is the complete list of stash codes present within this collection? Although this information could previously have been obtained by providing a query file which would match all atoms, and including the desired attribute in the list of attributes for display, this option provides a more efficient way of obtaining this information, with more concise output. Only certain PP attributes are available as arguments to this option. Any file or header attribute which does not require MOOSE to perform calculations to obtain the values may be requested. Specifically, the only attributes which are not available for use with this option are: lbrsvd4, lbuser6, meaning_period, forecast_delta, lbft, max_lbft, T1, T2 and components thereof (ie year, yeard etc). When this option is used, attribute values are automatically sorted, and thus the --sort option has no effect.

The --summary option may be used in conjunction with a query file. In this case, a summary of attribute values will be returned, where each distinct attribute value in some header (or file) matching the query will only be displayed once. This is useful for answering questions like: which values of lbproc are associated with stash code 3226? The attributes required for display may be specified in a begin/end_attributes block as described above. Alternatively, the default set of attributes will be displayed. The summary option may be used in conjuction with the xml option for xml-formatted output.

Note that, as with moo select, it is now possible to query multiple PP or tar collections with the same command. To do this, multiple (possibly wildcarded) URIs may be given, provided that they expand to collections or parents of collections within the same set. In particular, a set URI may be supplied. While previously MOOSE was able to deduce the file type being queried from the collection name (either PP or tar), it is now necessary to specify the file type using the -T option if it is not apparent from all supplied paths. MOOSE will search for all collections beneath the specified paths which are of the given file type (after expanding any wildcards), and apply the query to these collections.

Examples

To list the default attributes of data-atoms (i.e. PP fields) in data collection moose:/crum/myset/ama.pp having a value of lblev = 850, first construct a query-file containing:

begin   lblev=850 end
			

then issue the following MOOSE command:

$ moo mdls myquery moose:/crum/myset/ama.pp
			

To list, e.g., the values of attributes pp_file, stash, lbproc, lbsrce instead of the defaults, the query-file should read:

begin   lblev=850 end begin_attributes   pp_file   stash   lbproc   lbsrce end_attributes
			

To sort the output according to the lowest T2 value within each file, run:

$ moo mdls --sort=T2 myquery moose:/crum/myset/ama.pp

To obtain all atoms in PP files lying anywhere within myset matching the query, run:

$ moo mdls -T=pp myquery moose:/crum/myset
			

To view all distinct stash codes present within a collection:

$ moo mdls --attribute=stash moose:/crum/myset/ama.pp
			

To view only the distinct attribute values matching myquery:

$ moo mdls --summary moose:crum/myset/ama.pp
			

Issues and Limitations

moo mdls (and moo select) queries that are particularly vague can result in requests to the catalogue database that would return huge recordsets if no limit were imposed, and could overwhelm the MOOSE server. Accordingly, commands of these types will be rejected if MOOSE determines that the database-query would exceed a blanket limit, currently 1 million rows.

Queries that resolve to a large number of atoms can also overwhelm memory on the client. If this occurs, try using the -b option to reduce the output batch-size from the default (100,000).


prls: prints a listing of all the projects the user is associated with in MOOSE

Usage:

moo prls
		

Prints a listing of projects the user is associated with in the MASS system.

Valid options:
-h, --help, --usage
  prints help
-l, --long includes details of project-name, owner and description of the project.
The default listing simply shows names of projects. More information on the projects can be found by specifying options. It will return the list only if the user is associated (member or owner) with one or more projects in MOOSE.Alternative names:
moo projectlist, moo projlist
		

Examples

To list associated projects in MOOSE use:

$ moo prls monsoon-project
			

A long listing gives more information:

$ moo prls -l monsoon-project       - carl.rossby       A description of the project
			

projinfo: displays information about projects in MOOSE

Usage:

moo projinfo PROJECT [PROJECT ...]
		

Displays metadata about the specified projects. The PROJECT(s) must refer to existing projects in the MOOSE database.

Valid options:
-h, --help, --usage
  prints help
-x, --xml displays output in xml format
-m, --members additionally display username and email address of project members (and project owner)
-l, --long additionally display sets owned by the project, and project access rules
The information displayed will consist of the name of the project, the project owner, the project type, and a description of the project, in this order. Any user may use this command, for any project(s), even if they are neither the owner nor a member of the project. If the --long option is used, a list of sets associated with the project is displayed, together with access rules for the project, governing which MOOSE URIs are available for reading, writing and data set creation. For each access rule, the rule name, allowed path and access type are displayed.

Examples

To view information for project project-monsoon use:
$ moo projinfo project-monsoon monsoon-project        carl.rossby       typeA           A description of the project
			

To view the above information in xml format (including project members, associated sets and access rules):

$ moo projinfo -xlm project-monsoon <?xml version="1.0"?> <projects> <project name="project-monsoon"> <owner>   <username>carl.rossby</username>   <email>carl.rossby@metoffice.gov.uk</email> </owner> <type>typeA</type> <description>A description of the project</description> <members>   <member>     <username>project.member</username>     <email>project.member@metoffice.gov.uk</email>   </member> </members> <sets>   <set>moose:/crum/setaa</set> </sets> </project> </projects>
			

To view the project members, associated sets, and access rules along with the above information:

$ moo projinfo -ml project-monsoon monsoon-project        carl.rossby       typeA           A description of the project     Owner: carl.rossby               carl.rossby@metoffice.gov.uk   Members:            project.member            project.member@metoffice.gov.uk   Associated sets:            moose:/crum/setaa   Access rules:            An access rule            moose:/crum/s            createreadwrite
			

quality: Manages quality records for MOOSE URIs

Usage:

moo quality [RECORD-ID][RECORD-ID ...] [URI][URI ...]
		

Adds, maintains or retrieves quality records associated with data sets, data collections, data directories or data files.

Valid command-type options:
-o, --output displays quality records and links on the command line
-e LOCAL-DIR, --export=LOCAL-DIR
  saves quality records and URI links to local files
Other options:
-n, --dry-run tries running but does not alter anything
-q, --quiet decreases the verbosity
-v, --verbose increases the verbosity
-p, --project specifies a project associated with the quality record(s)
-c, --children outputs or exports records associated with child URIs

In order to run moo quality, exactly one of the command type options above must be supplied. Quality records are supplied to MOOSE in the form of xml documents. These documents are parsed by MOOSE and stored in the database. Quality records are based on a subset of the METAFOR data model, with some alterations. Quality records come in two forms; quality issue records, and quality assessment records. An example xml record of each type is given below.The arguments accepted by moo quality depend on which command type option is used. In all cases, arguments will be either MOOSE URIs or ids of existing quality records. Some command types require a record id followed by URIs. In the case of --output and --export, arguments may be either a list of URIs, or a list of quality record ids (but not both). Whenever URI arguments are given, these are permitted to be wildcarded in the final node. Finally, --output and --export also allow no arguments to be given, provided the --project option is used (see below).

Permitted syntax for each of the command type options is as follows:

(Note that external users cannot add, modify or delete quality-records.)

export:

moo quality -e=LOCAL-DIR URI [URI ...]

moo quality -e=LOCAL-DIR RECORD-ID [RECORD-ID ...]

moo quality -e=LOCAL-DIR -p=PROJECT

output:

moo quality -o URI [URI ...]

moo quality -o RECORD-ID [RECORD-ID ...]

moo quality -o -p=PROJECT

Outputting/exporting records

In order to output or export quality record metadata, the user must have read access to all supplied URIs (in the case where URI arguments are given), have read access for all linked URIs (in the case where record ids are given), or be a member of the specified project(s) (in the case where only project names are given).

In the first case, records associated with the given URIs or their parents will be retrieved. The exact URI(s) possessing the link will also be shown. If --children is also used, any records associated with some child URI of the supplied URIs will also be returned. If the --project option is not used, only global records will be returned. If --project=PROJECT is given, PROJECT should then be a (possibly wildcarded) string referencing existing project names. Project-specific quality records associated with the given URIs will then be returned. If project-specific records for any and all projects are required, the user should use --project="*".

In the second case (where the user requires specific records to be output/exported), quality record id arguments must be given, and the user must have read-access for all linked URIs of the records. The xml content of these records will then be returned, together with all linked URIs.

In the third case (where -p is used but no other arguments given), all quality records associated with the given project(s) will be returned, together with all linked URIs. In this case, the PROJECT name may not be wildcarded.

In all of the above output/export cases, if the returned records are project-specific, the project they are associated with will be returned in the output.

In the case of --export, LOCAL-DIR must be the full path to an existing local directory where the records will be exported to. Within this directory, a file for each quality record being exported will be created. The filenames will be the numerical record id followed by .xml, and the contents will be the xml record content. Additionally, an index file will be created, containing a section for each exported record, displaying the record id, associated project (if any), and linked URIs. The name of this index file will be quality_records_index_[CMD-ID] where CMD-ID is the command id of the moo quality command producing the file. Using the command id in the filename means that index files will not be overwritten by MOOSE. Quality records, however, may be overwritten by subsequent invocations, but will contain the same content unless the record has been modified since the earlier command.

XML specification

Below is an example quality issue record:

<?xml version="1.0"?> <Quality>     <DataReference>         <runId>kaatz</runId>     </DataReference>     <QualityIssue>         <issueType>data_content</issueType>         <issueStatus>reported</issueStatus>         <issueSeverity>minor</issueSeverity>         <issueDateTime>2008-09-10T12:34:00</issueDateTime>         <updateDateTime>2008-09-12T15:42:00</updateDateTime>         <IdentifiedBy>             <User>                 <name>Joe Bloggs</name>                 <mooseUsername>joe.bloggs</mooseUsername>                 <username>hadjb</username>                 <email>hadjb@metoffice.gov.uk</email>                 <tel>1234</tel>             </User>         </IdentifiedBy>         <UpdateBy>             <User>                 <name>Fred Bloggs</name>                 <mooseUsername>fred.bloggs</mooseUsername>                 <username>hadfb</username>                 <email>hadfb@metoffice.gov.uk</email>                 <tel>5678</tel>             </User>         </UpdateBy>         <issueDescription url="http://metoffice.gov.uk/issues/27">The description of the issue</issueDescription>         <QualityDetail id="detail1">             <detailDateTime>2009-04-07T11:16:00</detailDateTime>             <DetailBy>                 <User>                     <name>Joe Bloggs</name>                     <mooseUsername>joe.bloggs</mooseUsername>                     <username>hadjb</username>                     <email>hadjb@metoffice.gov.uk</email>                     <tel>1234</tel>                 </User>             </DetailBy>             <detailParameter>myParameter</detailParameter>             <detailValue>myValue</detailValue>             <detailIssue>The issue this detail refers to</detailIssue>             <detailDescription url="http://metoffice.gov.uk/issues/27/detail1">Description of the detail element with id 1</detailDescription>         </QualityDetail>         <QualityDetail id="detail2">             <detailDateTime>2009-06-15T09:32:00</detailDateTime>             <DetailBy>                 <User>                     <name>Joe Bloggs</name>                     <mooseUsername>joe.bloggs</mooseUsername>                     <username>hadjb</username>                     <email>hadjb@metoffice.gov.uk</email>                     <tel>1234</tel>                 </User>             </DetailBy>             <detailParameter>anotherParameter</detailParameter>             <detailValue>anotherValue</detailValue>             <detailIssue>The issue this detail refers to</detailIssue>             <detailDescription url="http://metoffice.gov.uk/issues/27/detail2">Description of the detail element with id 2</detailDescription>         </QualityDetail>         <QualityResolution>             <resolutionDateTime>2009-03-12T09:28:00</resolutionDateTime>             <ResolvedBy>                 <User>                     <name>Joe Bloggs</name>                     <mooseUsername>joe.bloggs</mooseUsername>                     <username>hadjb</username>                     <email>hadjb@metoffice.gov.uk</email>                     <tel>1234</tel>                 </User>             </ResolvedBy>             <resolutionDescription url="http://metoffice.gov.uk/issues/27/resolution">The description of the resolution</resolutionDescription>         </QualityResolution>     </QualityIssue> </Quality>
			

Below is an example quality assessment record:

<?xml version="1.0"?> <Quality>     <DataReference>         <runId>kaatz</runId>     </DataReference>     <QualityAssessment>         <measure>Hadley Stability Check</measure>         <MeasureIdentification>             <authority>ResponsibleParty Information for Hadley Centre</authority>               <code>Check 265</code>         </MeasureIdentification>         <measureDescription url="http://metoffice.gov.uk/measures/142">A description of the measure</measureDescription>         <evaluationMethodType>Automated Standard QC</evaluationMethodType>         <evaluationMethodDescription url="http://metoffice.gov.uk/methods/142">A description of method or reference</evaluationMethodDescription>         <evaluationProcedure url="http://metoffice.gov.uk/procedures/142">A description of procedure or reference></evaluationProcedure>         <assessmentDateTime>2008-09-10T12:34:00</assessmentDateTime>         <EvaluationBy>             <User>                 <name>Joe Bloggs</name>                 <mooseUsername>joe.bloggs</mooseUsername>                 <username>hadjb</username>                 <email>hadjb@metoffice.gov.uk</email>                 <tel>1234</tel>             </User>         </EvaluationBy>         <ConformanceResult>             <resultSpecification url="http://www.metoffice.gov.uk/TechnicalNotes/TCN345A.pdf">Specification of the result</resultSpecification>             <resultExplanation>Simulation passes stability check with an index value of 104 </resultExplanation>             <resultPass>1</resultPass>         </ConformanceResult>     </QualityAssessment> </Quality>
			

The examples given above populate every available field, for each of the two types of quality record. As stated earlier and as shown in the schema, the majority of elements (and attributes, if present) at any node level may be omitted. Exceptions to this are that either a <QualityIssue> or <QualityAssessment> must be present, and also any qualityUserContainer type (such as <IdentifiedBy>) must contain a <User> (which need not populate all fields). If a particular <User> element is not required, the containing element should be omitted. Finally, <QualityDetail> elements must populate the id attribute. As mentioned above, this id should be unique for each such element.

Also apparent from the schema is that elements beneath each node may appear in any order, with the exception of the root element, which must contain <DataReference> (if required) followed by <QualityIssue> or <QualityAssessment>, and the immediate child nodes beneath <QualityIssue> or <QualityAssessment>.

The <DataReference> element may appear in both types (issue or assessment) of quality record. It currently only permits one child element, namely <runId>. This need not be populated, as it will not be relevant to all types of data requiring quality records. It is likely that other fields will be added here in the future to suit other types of data.

The content of most elements is permitted to be an arbitrary string. The exceptions to this are: tags containing dateTime, whose contents must be timestamps in the standard xml schema format shown in the examples, and the elements <issueType>, <issueStatus> and <issueSeverity>, which are defined by enums shown in the schema. <QualityDetail> elements contain generic <detailParameter> and <detailValue> elements, which may be used to associate any number of parameter/value pairs with a quality issue record.

Examples

To output global quality records associated with collection /setab/ama.pp or its parents:

$ moo quality -o moose:/crum/setab/ama.pp quality-record-id=103     moose:/crum/setab <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Quality> ...[more xml content] </Quality>
			

To output project-specific quality records for any project, associated with collection setac/ama.pp, its parents, or its children:

$ moo quality -oc -p="*" moose:/crum/setac/ama.pp quality-record-id=104 project=my_project     moose:/crum/setac/ama.pp/myfile.pp <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Quality> ...[more xml content] </Quality>
			

To output quality record with id 200 and its associated links:

$ moo quality -o 200 quality-record-id=200 project=another_project     moose:/opfc/atm/postpro/prods/2012.pp/badfile.pp <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Quality> ...[more xml content] </Quality>
			

To output all quality records associated with project my_project, and associated links:

$ moo quality -o -p="my_project" quality-record-id=210 project=my_project     moose:/crum/setad/apa.pp <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Quality> ...[more xml content] </Quality>
			

To export global quality records associated with collection /setab/ama.pp or its parents to local directory export_dir (verbose output):

$ moo quality -e=export_dir -v moose:/crum/setab/ama.pp ### quality, command-id=46587607 quality-record-id=103
			

To export project-specific quality records for any project, associated with collection setac/ama.pp, its parents, or its children:

$ moo quality -e=export_dir -c -p="*" moose:/crum/setac/ama.pp ### quality, command-id=46587608
			

To export quality record with id 200 and its associated links:

$ moo quality -e=export_dir -v 200 ### quality, command-id=46587608 quality-record-id=200
			

To export all quality records associated with project my_project, and associated links:

$ moo quality -e=export_dir -p="my_project" ### quality, command-id=46587609 quality-record-id=210
			

resume: resumes a suspended MOOSE command

Usage:

moo resume CMD-ID
		

To allow a previously issued retrieval command to be resumed after contact has been lost with the server. Only one retrieval command may be resumed by a single resume command. The 'resume' command is valid only if issued by the user who submitted the suspended command. The ability to resume commands can be controlled globally by administrators.

If valid, and in a 'resumable' state, the command will be resumed in a way that honours its original options and arguments, including invoking client-side format-conversion.

Valid options:
-n, --dry-run, --dryrun
  checks the command can run, but does not proceed further
-h, --help, --usage
  prints help
-q, --quiet suppresses output to standard out
-v, --verbose displays more information on progress of the resumption.

Examples

Resume the command with ID 98765:
$ moo resume 98765

Restrictions

Please note that retrieval commands can only be resumed a finite number of times. In addition, please note that currently SELECT commands using the -C (condense) option cannot be resumed.

select: performs filtered retrieval from a data set

Usage:

moo select QUERY-FILE SRC-URI [SRC-URI...] DEST-PATH
		

Selects atoms from a single MOOSE data set, or number of data files in the case of NetCDF data.

When restoring multiple selected files, moo select also limits the total volume of data, the number of files transferred and the number of tape mounts (same as moo get). Requests that exceed these limits will fail and will need to be reorganised in order to retrieve smaller amounts of data with each command. These limits are subject to change and can be found using moo si -l. The size of the file is not limited when a single file is requested.

IMPORTANT NOTE: The use of moo select to filter NetCDF files is deprecated, and the facility is being replaced by the new moo filter command; see filter: Extracts defined variables from data files (NetCDF) .

Valid options:
--condense, -C condenses retrieved data into single file
--dry-run, --dryrun, -n
  tries running but does not alter anything
-i, --fill-gaps
  retrieves all target files within a list except those that already exist at destination
-k, --conversion-threads=MAX_CONVERSION_THREADS
  overrides the default setting for maximum concurrent client-side format-conversion threads
--force, -f allows overwriting the destination
--help, --usage, -h
  prints help and exits
-I, --fill-gaps-and-overwrite-smaller-files
  allows overwriting the destination if source is larger than destination file size
-c FORMAT, --local-file-format=FORMAT
  specifies the local file format for use in converting files
--quiet, -q decreases the verbosity
--verbose, -v increases the verbosity
-z, --compressed-transfer
  uses compression on transfer to reduce traffic, at the expense of processing
-a, --append appends the restored data to existing destination files (currently only valid for NetCDF data). See Append option for get for use of this option.
-b, --large-retrieval
  relaxes the check for sufficient client-side disc-space
-T FILE-TYPE, --file-type=FILE-TYPE
  specifies the file type to run the query against, in the case where it cannot be inferred
-L PATHNAME, --licence-file=PATHNAME
  overrides the default pathname for the file containing usage-licence details

Arguments:

QUERY-FILE: a text-file that is readable by the client, that contains the
selection-criteria. See the section Record-level retrieval: Query syntax for details of syntax. As described, in the case of NetCDF data, this file contains the options for ncks. There is a limit on the size of the associated query file, which is currently set to 32768 bytes (as of MOOSE release 6.3).
SRC_URIs: must specify multiple (possibly wildcarded) MOOSE data collections, or parents of collections,
within a single set, currently of .pp or .tar format only, or multiple (possibly wildcarded) NetCDF data files.
DEST-PATH: a destination-path that must be writeable by the client. The rules
are exactly as specified in get: retrieves files from MASS .
Note that, while previously available for only a single data collection in the case of PP and tar, this command may now be applied to multiple collections within the same data set, parents of such collections (including the data set itself). While previously MOOSE was able to deduce the file type being queried from the collection name (either PP or tar), it is now necessary to specify the file type using the -T option if it is not apparent from all supplied paths. MOOSE will search for all collections beneath the specified paths which are of the given file type (after expanding any wildcards), and apply the query to these collections. In the case of condensed selects, if only a single collection is being queried, a file will be created at the destination with the name of the collection as before. If multiple collections are given, however, the name of the destination file will be that of the deepest common parent node of all queried collections (ie the name of the set in the case of crum data).

Examples

In the first example, the data collection contains 5 tar-balls, 2 of which contain files with names that match the criteria.

Example 1: selection from a collection of tar-balls:

$ moo select ~/queries/nuquery moose:/misc/frasia/200911.tar ~/output $ ### select, command-id=65435, estimated-cost=987654byte(s), files=2, media=1
			

'~/output' must be a writeable directory, and on completion will contain 2 new tar-balls corresponding to those in the original collection that provide a match, but containing only the matching files.

If '~/output' does not exist, a file of that name will be created.

If '~/output' exists, and is a file, the command will fail with an error-code TSSC_ALREADY_EXISTS. You can use the -f (force overwrite) and the file will be overwritten.

In the next 3 examples, assume that the target data collection contains 3 files, (file1.pp, file2.pp, file3.pp) each of which contain 10 'records' or 'fields' that match the criteria in the query-file (~/queries/myquery).

Example 2: retrieve to individual output-files:

$ moo select ~/queries/myquery moose:/crum/runid/ama.pp ~/results $ ### select, command-id=65432, estimated-cost=1635536byte(s), files=3, media=1
			

The destination must be an existing, writeable directory. On successful completion, it will contain 3 output files (file1.pp, file2.pp, file3.pp) each of which contains just the 10 records from the corresponding original that match the supplied criteria.

Example 3: retrieve to a combined output-file:

$ moo select -C ~/queries/myquery moose:/crum/runid/ama.pp ~/results $ ### select, command-id=65433, estimated-cost=1635536byte(s), files=3, media=1
			

If '~/results' exists, it must be a writeable directory, in which case it will contain on completion a single file 'ama.pp', with a name derived from the original data collection.

If '~/results' does not exist, a file of that name will be created.

If '~/results' exists, and is a file, the command will fail with an error-code TSSC_ALREADY_EXISTS.

Example 4: retrieve to a combined output-file (forced-overwrite):

$ moo select -Cf ~/queries/myquery moose:/crum/runid/ama.pp ~/results $ ### select, command-id=65434, estimated-cost=1635536byte(s), files=3, media=1
			

If '~/results' exists, and is a file, it will be overwritten.

Example 5: retrieve to individual output-files from two collections:

$ moo select ~/queries/myquery moose:/crum/runid/ama.pp moose:/crum/runid/amb.pp ~/results $ ### select, command-id=65435, estimated-cost=1635536byte(s), files=3, media=1
			

Example 6: retrieve to individual output-files from PP collections matching the wildcard pattern (note file type specified):

$ moo select -T=pp ~/queries/myquery moose:/crum/runid/a* ~/results $ ### select, command-id=65436, estimated-cost=1635536byte(s), files=3, media=1
			

Example 7: retrieve to individual output-files from PP collections within the specified set (note file type specified):

$ moo select -T=pp ~/queries/myquery moose:/crum/runid ~/results $ ### select, command-id=65437, estimated-cost=1635536byte(s), files=3, media=1
			

Example 8: retrieve NetCDF filtered data to NetCDF-4 files generated by ncks (deprecated; use moo filter instead):

$ moo select ~/options/myncksoptions moose:/crum/aaaaa/ana.nc.file/file1.nc moose:/crum/aaaaa/ana.nc.file/file2.nc ~/results $ ### select, command-id=65438, estimated-cost=UNKNOWN, files=2, media=1
			

Note that the resulting file sizes are unknown in this last example, because the file is filtered by ncks during the data transfer process. It is therefore possible that the local destination may have insufficient space for the resulting data, in which case the transfer would fail. If the options file ~/options/myncksoptions is invalid (for example, if a variable is specified which does not exist in the source file), the error message from ncks will be displayed to the user, and the MOOSE command will fail with user error return code. See the section on filtered retrievals above, and ncks documentation, for valid options.

Large-retrieval option

By default, MOOSE checks that there is enough space at the destination to store all the files specified in the retrieval, and will abort the whole request if not.

If the --large-retrieval or -b option is specified, MOOSE will check that there is enough writable space at the specified destination only for the largest file to be retrieved. This is to allow users to issue large multi-file retrievals as part of processing suites, where files are processed and deleted as soon as they are available on the client. If this option is used, it is up to the user to manage client-side disc-space: MOOSE will not do this. If MOOSE finds that, at any time, it does not have enough space on the client to write a file, it will abandon the rest of the request as a user-error (RC=2).

Output

Each moo select command will output an informational message including the command-id and a measure of the cost of the command. Currently the cost is a combination of: the total size of the data to be transfered from the MOOSE system to the local machine, the number of resulting files, and the number of distinct tape-media involved.

See also Data-usage licences.

Limits

moo select (and moo mdls) queries that are particularly vague can result in requests to the catalogue database that would return huge recordsets if no limit were imposed, and could overwhelm the MOOSE server. Accordingly, commands of these types will be rejected if MOOSE determines that the database-query would exceed a blanket limit, currently 1 million rows.

Error

Error code 17, output file(s) already exist when using -i, --fill-gaps option: When filter is used to retrieve only those files missing at the destination and it is determined that there are none missing, the command will exit with a return code of 17. This allows users who want to maintain a local cache to distinguish between this scenario and other errors.


setinfo: prints information for one or more defined data-set paths

Usage:

moo setinfo URI [URI...]
		

Print a list of information about a data set within the MASS system. At least one but multiple arguments can be specified. The arguments must be valid set URIs and must exists within MASS to get the expected output.

Valid options:
-h, --help, --usage
  prints help
-x, --xml displays results in xml format
The default listing shows the information for each set found from a given URI. The information that is displayed for each set is its URI, the username of its owner, the name of its parent project (if any), its total size in bytes, its category, any associated tags and comments, and its protection-level (plus any comment on the protection-level setting).Alternative names:
moo seti
		

Examples

To list details of multiple sets

moo setinfo moose:/crum/prset moose:/crum/myset Information for: moose:/crum/prset ------------------------------------ Owner: carl.rossby Project: project-monsoon Size: 2097152 Cost: 2.09GBP Archiving Level: Duplex Category: IPCCAR Tags: "Some tag" Comments: "This set has a comment" Protection Level: Open  Information for: moose:/crum/myset ------------------------------------ Owner: carl.rossby Size: 1048576 Cost: 1.04GBP Archiving Level: Single-Copy Category: UNCATEGORISED Tags: "another tag" Comments: Protection Level: Managed
			

To list details of multiple sets using the xml option:

moo setinfo -x moose:/crum/prset moose:/crum/myset <? uml version="1.0"?> <sets> <set      url="moose:/crum/prset"> <owner>carl.rossby</owner> <project>project-monsoon</project> <size>2097152</size> <cost>2.09</cost> <archivingLevel>Duplex</archivingLevel> <category>IPCCAR</category> <tags> <tag>Some tag</tag> </tags> <comments> </comments> <protectionLevel>Open</protectionLevel> </set> <set      url="moose:/crum/myset"> <owner>carl.rossby</owner> <size>1048576</size> <cost>1.04</cost> <archivingLevel>Single-Copy</archivingLevel> <category>UNCATEGORISED</category> <tags> <tag>another tag</tag> </tags> <comments> <comment>Unreliable initialization</comment> </comments> <protectionLevel>Managed</protectionLevel> </set> </sets>
			

si: displays information about the MOOSE hardware & software

Usage:

moo si
		

Prints some details of the hardware on which MOOSE is running, together with identification of the software releases installed and the availability of back-end storage.

Valid options:
-h, --help, --usage
  prints help
-l, --long displays information on the current user, client & server versions, and storage-availability. It also displays a selection of configuration values, particularly the limits on multi-file archivals & retrievals.
-x, --xml displays information in xml format
Alternative names: moo sysinfo moo info

Examples

Basic command:

$ moo si Controller: expmoosetst01.metoffice.gov.uk
			

Note that there are multiple instances of the MOOSE controller on a number of servers, and the Controller displayed will depend on which server receives the request.

In verbose mode:

$ moo si -l Moose User: a.user Client: EXX30GSP0J; Revision: Rel_4.2.2     Query-file size-limit (byte): 32768 Controller: expmoosetst01.metoffice.gov.uk; Revision: Rel_4.2.2     PUT commands enabled: true     GET commands enabled: true     SELECT commands enabled: true     MDLS commands enabled: true     Multiple-put file-number limit: 300     Multiple-put volume limit (MB): 245760     Multiple-get file-number limit: 500     Multiple-get volume limit (MB): 245760     Multiple-get tape-number limit: 10     Cost of storing one Terabyte for one year (GBP) 200 Storage: unavailable; Revision: not reported
			

To display system information in xml format:

$ moo si -xl     <?xml version="1.0"?>     <sysinfo>       <user>admin</user>       <component>         <host>expmoosetst01.metoffice.gov.uk</host>         <role>Client</role>         <revision>V17003M</revision>         <attributes>           <attribute name="Query-file size-limit (byte)" value="32768" />           <attribute name="Default max. conversion-threads" value="15" />           <attribute name="Default max. transfer-threads" value="3" />         </attributes>       </component>       <component>         <host>expmoosetst01.metoffice.gov.uk</host>         <role>Controller</role>         <revision>Rel_4.2.2</revision>         <attributes>           <attribute name="PUT commands enabled" value="true" />           <attribute name="GET commands enabled" value="true" />           <attribute name="SELECT commands enabled" value="true" />           <attribute name="MDLS commands enabled" value="true" />           <attribute name="Multiple-put file-number limit" value="1000" />           <attribute name="Multiple-put volume limit (MB)" value="512000" />           <attribute name="Multiple-get file-number limit" value="1000" />           <attribute name="Multiple-get volume limit (MB)" value="512000" />           <attribute name="Multiple-get tape-number limit" value="20" />           <attribute name="Cost of storing one Terabyte for one year (GBP)" value="119.0" />         </attributes>       </component>       <component>         <host>available</host>         <role>Storage</role>         <revision>not reported</revision>         <attributes>         </attributes>       </component>     </sysinfo>
			

Output

Basic output is simply the address of the MOOSE server ( Controller) with which the client is communicating. With the -l option, the output identifies the current user, the node on which the MOOSE client is running, together with the release of the client software; it identifies the controller to which the client is connected, and its software revision; it also shows whether the controller can detect the back-end storage-system. Note that the revision-level of the storage-system is currently not reported.

In the second example, the underlying storage-system is not available.


test: checks existence and status of MOOSE URIs

Usage:

moo test URI [URI ...] moo test LOCAL-FILE
		

Checks that the target URIs exist and, optionally, their type and whether or not they can be written to by the current user. Target URIs may be supplied on the command line, or may be contained in a file located at LOCAL-FILE. In the latter case, each URI must appear on a new line, and any lines beginning with the comment character '#' will be ignored.

Valid options:
-h, --help, --usage
  prints help
-s, --is-set checks if the target URI is an existing data set
-c, --is-collection
  checks if the target URI is an existing data collection
-d, --is-directory
  checks if the target URI is an existing directory
-f, --is-file checks if the target URI is an existing file
-t DATA-SET-TYPE, --is-type=DATA-SET-TYPE
  checks if the target URI is an existing data set of the specified type
-w, --is-writable
  checks if the target URI exists and can be written to by the current user
-v, --verbose increases the verbosity

Note that the options -s, -c and -f are mutually exclusive, -f and -d are mutually exclusive, and -d returns true whenever -s or -c return true. If more than one of -s, -c, -f or -d is specified in the command, all but the last are ignored, e.g.:

moo test -sdc moose:/crum/myset
		

is equivalent to:

moo test -c moose:/crum/myset
		

Other combinations of options are effectively 'ANDed' together.

Note also that option -t implies option -s: the target must exist, and be a data set, as well as being of the specified set-type.

Other than -v, option -w is the only one that can sensibly be used in conjunction with any of the other options:

moo test -sw moose:/crum/myset
		

asks is crum/myset an existing set, and can I write to it?

Examples

Does the target URI exist in the archive?:

$ moo test moose:crum/myset true
			

Do the target URIs exist in the archive?:

$ moo test moose:crum/setaa moose:crum/setbb moose:/crum/setaa - true moose:/crum/setbb - false
			

Given the following file resourcefile:

# My files: moose:crum/setaa/ama.pp/ppfile.pp moose:crum/setbb/ama.pp/ppfile.pp moose:crum/setcc/ama.pp/*
			

Do the URIs contained in resourcefile exist?:

$ moo test resourcefile moose:/crum/setaa/ama.pp/ppfile.pp - true moose:/crum/setbb/ama.pp/ppfile.pp - false moose:/crum/setcc/ama.pp/ppfile1.pp - true moose:/crum/setcc/ama.pp/ppfile2.pp - true
			

Is the target URI an existing data set?:

$ moo test -s moose:adhoc/users/myself false
			

Can I write to the target URI?:

$ moo test -w moose:adhoc/users/notmine false
			

Is the target an existing file, and can I write to (i.e. overwrite) it?:

$ moo test -fw moose:crum/myset/ama.pp/afile.pp true
			

Is the target an existing set of type migration?:

$ moo test -t migration moose:crum/migra true
			

Output

Output to STDOUT for a single, non-wildcarded mooseURI is simply the word true if all of the specified options are satisfied simultaneously, or false if not. In this case, the output may then be used unmodified as a boolean value in shell scripts. If the --verbose option is used, the mooseURI will first be printed, followed by a hyphen and the true or false value. If multiple mooseURIs are supplied (either on the command line or within a local file), output will automatically be set to verbose if it has not been manually specified. mooseURIs may contain wildcards. In this case, wildcards are expanded and a value is returned for each resulting node. If the wildcarded mooseURI matches no files or directories, the original wildcarded mooseURI will be displayed in the output together with the false result.

In cases of normal output, regardless of the number of true or false values returned, the RC will be 0. If the request cannot be fulfilled, say because the archival system is unavailable, the RC will be 3, no response will be sent to STDOUT, and STDERR will receive output similar to:

test command-id=557 failed: (SSC_STORAGE_SYSTEM_UNAVAILABLE) storage system unavailable. test: failed (3)
			

Issue

If the user does not have read or readwrite permission on the Set, the output will be false.


Generated on: 2016-11-25.

Still need help? Contact Us Contact Us