FileSet Allocation and other storage procedures
A storage fileset is an allocation of storage space for a chunk of the archive. It may house a particular dataset, a portion of a dataset or even several datasets. A fileset may be manifested as a directory on disk volume, or portion of a tape system, or as a bucket in an object store. All these storage systems have practical limits in the volume of data that can be stored and the number of files that they can contain. Regardless of the logical datasets boundaries it is important to divide the storage space into filesets so that storage operations, such as migration and audit can be carried out. Clearly as storage media evolve the practice limits of the storage may change and any advice on filesets will need to evolve. Examples of current constraints are:
- Disk partition size
- Practical object store bucket sizes
- Number of objects handled by file metadata servers for parallel disk volumes
- Tape sizes
- Audit job run times
Filesets divide the storage in terms of volume and number of files. Each fileset has an estimate of its final size. If the dataset is going to grow indefinitely then estimate the size for 4 years ahead. This allows volume planning, which stops unneeded migration actions. The fileset associates a hierarchical identifier for the logical dataset with a storage system; In its simplest form, a storage directory pointed to via a symbolic link.
/badc/cira -> /datacentre/archive/storage-234-cira
Best practice for creating filesets
- Keeping a dataset together on storage media is preferable as this leads to better retrieval performance and cleaner migration, audit and backup processes. For this reason it is best to allocate a single fileset per dataset and only if it exceeds practical limits to break it down further.
- The volume of a single disk volume is currently the driver for limiting fileset volumes. Practically in order to aid disk management they should not exceed 20% of the size of a partition currently this gives a 40TB limit. If the archive is reaching capacity, it can be hard to find space for on disk so splitting anything larger than 10TB is best practice. If space is tight then even small volumes may be required.
- File numbers are limited by the performance of parallel file system metadata servers, which can be overwhelmed if an audit of the data is requested of all files in quick succession. Best practice is to split filessets with more than 1M files.
Partition administration should be done by the Storage Coordinator.
Partitions are disk storage volumes for the archive. Partitions are added through the admin interface of the cedaarchiveapp http://cedaarchiveapp.ceda.ac.uk/admin/cedaarchiveapp/fileset/ tool. The mount point is the directory under which the disk is mounted. The status field should in the first instance be set to "blank". The table below explains the meanings of the partition status field.
|Blank||A new partition with nothing on it. It is not yet available for use as archive storage.|
|Allocating||Moving a partition from Blank to Allocating labels the partition to accept new allocations of Filesets.|
|Closed||Closed partitions are no longer available for new allocations. Partitions are marked as closed if they are deemed to be fully allocated. Closed partitions will still be receiving data and if volume prediction are right they will fill to around their capacity. If the predicted volume is under estimated then closed partitions may be marked as Allocating again.|
|Migrate||Partitions marked as migrating have been scheduled for retirement. The data should be reallocated to "Allocating" partitions and then migrated - This state is currently now used in the migration workflow.|
|Retired||Partitions that have had all their data migrated are marked as retired.|
In the Allocations process it is assumed that the Storage Coordinator will maintain a reasonable number of "Allocating" partitions so that filesets are always allocatable.
Adding new partitions
Go to the partition list. To add a partition click the "add Partition button".
An alternative to manually adding partitions is to write a Django script to add a large batch of partitions.
Creating a FileSet from the admin interface
FileSet administration should be done by the Data Scientist (with guidance from the storage coordinator)
FileSets are added through the admin interface of the cedainfodb tool http://cedaarchiveapp.ceda.ac.uk/admin/login/?next=/admin/cedaarchiveapp/fileset/. To add a FileSet click the "add fileset button". The fileset is defined by the logical path and the estimated size that it is estimated to grow to.
Creating FileSets via ingest_lib function
(ingest) [badc@ingest1 ~]$ python Python 2.7.15 |Anaconda, Inc.| (default, Oct 23 2018, 18:31:10) [GCC 7.3.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import ingest_lib >>> ingest_lib.make_fileset("/badc/.testing/testfileset2", "5GB") >>>
Creating via command line tool
(ingest) [badc@ingest1 ~]$ make_fileset /badc/.testing/testfileset 4GB (ingest) [badc@ingest1 ~]$ ls -l /badc/.testing/testfileset/ total 0 (ingest) [badc@ingest1 ~]$ ls -l /badc/.testing/testfileset lrwxrwxrwx 1 badc badcint 57 Jan 14 10:45 /badc/.testing/testfileset -> /datacentre/archvol3/pan102/archive/spot-9739-testfileset (ingest) [badc@ingest1 ~]$
Scripting fileset and partition operations
The storage coordinator may need to perform various ad hoc operations to update the content of the cedaarchiveapp. If the operations need to apply to large batchs or are repeated then the best way may be to write a simple script. The scripts are run as badc on cedaarchiveapp.ceda.ac.uk. A typical script is show below.
import sys sys.path.append('/usr/local/cedaarchiveapp_site') import django django.setup() from cedaarchiveapp.models import FileSet, Partition # find relevant partitions qb208 = Partition.objects.get(mountpoint="/datacentre/archvol5/qb208") qb209 = Partition.objects.get(mountpoint="/datacentre/archvol5/qb209") fss = FileSet.objects.filter(migrate_to=qb208) print(qb208) print(qb209) print(fss) for fs in fss: fs.migrate_to = qb209 fs.save()