Arrivals module
In ingest lib there is an arrivals module to list files from the arrivals area or elsewhere. This list is used as input for ingest processes, for example by the ingester script.
Arrivals options are:
arrivals_dirs |
A space separated list of directories to search for files to ingest. |
arrivals_users |
A space seperated list of usernames. This list is mapped to a list of directories that are appended to the arrivals_dirs list. The directory mapped to for a user is of the form /datacentre/arrivals/users/<username>/<streamname>. For example, if arrivals_users = spepler gparton, and the stream name is xxx then the list of directories searched are /datacentre/arrivals/users/spepler/xxx and /datacentre/arrivals/users/gparton/xxx. |
arrivals_wait |
Don't add files that are modified less than this number of seconds ago. This is a simple measure used to not include any file that is still being uploaded. Defaults to 500 (~8 mins) . |
arrivals_maxfiles |
If the number of files exceeds this number then stop adding to the list. Defaults to 10000 |
arrivals_ignore_regex |
Skip files that match this regex. For example, arrivals_ignore_regex = \.tmp would ignore all files with .tmp extentions. |
arrivals_include_regex |
Only include files that match this regex. For example, arrivals_include_regex = \.nc would only include files with .nc extentions. |
Using Arrivals in scripts
There are two pertinent methods in the arrivals module. One to make the arrivals directories if they do not already exist, and another to make the list of files. If the config contains this option:
[streamX] arrivals_users =spepler
Then code using this would be of this form:
from ingest_lib import Arrivals A = Arrivals() # should pick up options from config A.make_arrivals_dirs() # make /datacentre/arrivals/users/spepler/streamX<br> file_listing = A.arrivals_files() for file_path in filelisting: ...