Installation and Setup¶
If you’re using flux-python-api as part of the Carbon Data Explorer package, you can ignore these installation steps; the Carbon Data Explorer setup script will take care of installation.
Install Git, if need be:
sudo apt-get install git
And download the package to your preferred installation directory:
cd /where/my/repo/lives/
git clone git@github.com:arthur-e/flux-python-api.git
Installing within a Python Virtual Environment (recommended)¶
If you do not already have virtuelenv installed, see full installation instructions.
Briefly, using pip:
[sudo] pip install virtualenv
Then set up your virtualenv, e.g.:
VENV_DIR=/usr/local/pythonenv
virtualenv $VENV_DIR/flux-python-api-env
source $VENV_DIR/flux-python-api-env/bin/activate
python setup.py install
# ensure the virtualenv is accessible to non-root users
sudo chmod -R 775 $VENV_DIR/flux-python-api-env
And activate:
source $VENV_DIR/flux-python-api-env/bin/activate
While your shell is activated, run setup.py (do NOT run as sudo):
cd /where/my/repo/lives/flux-python-api
python setup.py install
Finally, deactivate the virtual environment:
deactivate
All done!
Installing without a Python Virtual Environment¶
If not using a python virtual environment, simply run setup.py
cd /where/my/repo/lives/flux-python-api
sudo python setup.py install
Adding “fluxpy” to the Python Path¶
echo “/where/my/repo/lives/flux-python-api/” > my_new_virtualenv/lib/python2.7/site-packages/fluxpy.pth
Loading data¶
Use the manage.py load utility to load flux data from files that are in Matlab (*.mat) or HDF5 (*.h5 or *.mat) format. A description of the required configuration file and examples are provided in the next sections.
Loading with manage.py load:
Usage:
manage.py load -p <filepath> -m <model> -n <collection_name> [OPTIONAL ARGS]
Required arguments:
-p, --path Directory path of input file in Matlab (*.mat)
or HDF5 (*.h5 or *.mat) format
-n, --collection_name Provide a unique name for the dataset by which
it will be identified in the MongoDB
-m, --model fluxpy/models.py model associated with the
input dataset
Optional arguments:
-c, --config_file Specify location of json config file. By
default, seeks input file w/ .json extension.
-o, --options Use to override specifications in the config file.
Syntax: -o "parameter1=value1;parameter2=value2;parameter3=value3"
e.g.: -o "title=MyData;gridres={'units':'degrees,'x':1.0,'y':1.0}"
The configuration file¶
This utility requires that the input *.h5 or *.mat file be accompanied by a JSON configuration file specifying required metadata parameters. By default, the utility will look for a *.json file with the same name as the data file, but you can specify an alternate location by using the -c option.
Configuration file parameter schema:
"columns": [String], // Array of well-known column identifiers, in order
// e.g. "x", "y", "value", "error"
"gridres": {
"units": String, // Grid cell units
"x": Number, // Grid cell resolution in the x direction
"y": Number // Grid cell resolution in the y direction
},
"header": [String], // Array of human-readable column headers, in order
"parameters": [String], // Array of well-known variable names e.g.
// "values", "value", "errors" or "error"
"span": String, // The length of time, as a Pandas "freq" code,
// that an observation spans
"step": Number, // The length of time, in seconds, between each
// observation to be imported
"timestamp": String, // An ISO 8601 timestamp for the first observation
"title": String, // Human-readable "pretty" name for the data set
"units": Object, // The measurement units, per parameter
"var_name": String // The name of the variable in the hierarchical
// file which stores the data
Contents of an example configuration file are shown here:
{
"columns": ["x","y"],
"gridres": {
"units": "degrees",
"x": 1.0,
"y": 1.0
},
"header": [
"lng",
"lat"
],
"parameters": ["values"],
"steps": [10800],
"timestamp": "2012-12-22T03:00:00",
"title": "Surface Carbon Flux",
"units": {
"x": "degrees",
"y": "degrees",
"values": "μmol/m²"
},
"var_name": "casa_gfed_2004"
}
manage.py load examples¶
Most basic example; assumes a configuration file exists at ./mydata/data_casa_gfed_3hrly.json:
$ python manage.py load -p ./data_casa_gfed.mat -m SpatioTemporalMatrix -n casa_gfed_2004
Specify an alternate config file to use:
$ python manage.py load -p ./data_casa_gfed.mat -m SpatioTemporalMatrix -n casa_gfed_2004 -c ./config/casa_gfed.json
In the following example, the loader will look for a config file at ./data_casa_gfed.json and overwrite the timestamp and var_name parameters in that file with those provided as command line args:
$ python manage.py load -p ./data_casa_gfed.mat -m SpatioTemporalMatrix -n casa_gfed_2004 -o "timestamp=2003-12-22T03:00:00;var_name=casa_gfed_2004"
Removing data¶
Use the manage.py remove utility to remove datasets from the database.
$ manage.py remove -n <collection_name>
Required argument:
-n, --collection_name Collection name to be removed (MongoDB identifier)
manage.py remove example¶
$ python manage.py remove -n casa_gfed_2004
Renaming datasets¶
Use the manage.py rename utility to rename datasets in the database.
Warning
It is important to use this utility for renaming rather than manually renaming datasets by interfacing directly with MongoDB because several metadata tables require corresponding updates.
$ manage.py rename -n <collection_name> -r <new_name>
Required arguments:
-n, --collection_name Name of the dataset name to be modified (MongoDB identifier)
-r, --new_name New name for the collection
manage.py rename example¶
$ python manage.py rename -n casa_gfed_2004 -r casa_2004
Database diagnostics¶
Use the manage.py db utility to get diagnostic information on database contents:
$ manage.py db [OPTIONAL ARGUMENTS]
Requires one of the following flags:
-l, --list_ids Lists collection names in the database.
Optional args with -l flag:
collections : lists collections
metadata: lists the collections w/ metadata entries
coord_index: lists the collections w/ coord_index entries
-n, --collection_name Collection name for which to shows metadata
-a, --audit No argument required. Performs audit of the
database, reporting any collections that are
missing corresponding metadata/coord_index
entries and any stale metadata/coord_index
entries without corresponding collections
Optional argument:
-x, --include_counts Include count of records within each listed
collection. Valid only with a corresponding
"-l collections" flag; ignored otherwise
manage.py db examples¶
List all collections and their number of records:
$ python manage.py db -l collections -x
List all the collections with metadata entries:
$ python manage.py db -l metadata
Show metadata for the collection with id “casa_gfed_2004”:
$ python manage.py db -n casa_gfed_2004
Audit the database:
$ python manage.py db -a