flux-python-api

Contents: Documentation on installing and using flux-python-api.

Installation and Setup

If you’re using flux-python-api as part of the Carbon Data Explorer package, you can ignore these installation steps; the Carbon Data Explorer setup script will take care of installation.

Install Git, if need be:

sudo apt-get install git

And download the package to your preferred installation directory:

cd /where/my/repo/lives/
git clone git@github.com:arthur-e/flux-python-api.git

Installing without a Python Virtual Environment

If not using a python virtual environment, simply run setup.py

cd /where/my/repo/lives/flux-python-api
sudo python setup.py install

Adding “fluxpy” to the Python Path

echo “/where/my/repo/lives/flux-python-api/” > my_new_virtualenv/lib/python2.7/site-packages/fluxpy.pth

Loading data

Use the manage.py load utility to load flux data from files that are in Matlab (*.mat) or HDF5 (*.h5 or *.mat) format. A description of the required configuration file and examples are provided in the next sections.

Loading with manage.py load:

Usage:
    manage.py load -p <filepath> -m <model> -n <collection_name> [OPTIONAL ARGS]

Required arguments:

    -p, --path               Directory path of input file in Matlab (*.mat)
                             or HDF5 (*.h5 or *.mat) format

    -n, --collection_name    Provide a unique name for the dataset by which
                             it will be identified in the MongoDB

    -m, --model              fluxpy/models.py model associated with the
                             input dataset

Optional arguments:

    -c, --config_file        Specify location of json config file. By
                             default, seeks input file w/ .json extension.

    -o, --options            Use to override specifications in the config file.
                             Syntax: -o "parameter1=value1;parameter2=value2;parameter3=value3"
                             e.g.: -o "title=MyData;gridres={'units':'degrees,'x':1.0,'y':1.0}"

The configuration file

This utility requires that the input *.h5 or *.mat file be accompanied by a JSON configuration file specifying required metadata parameters. By default, the utility will look for a *.json file with the same name as the data file, but you can specify an alternate location by using the -c option.

Configuration file parameter schema:

"columns": [String],        // Array of well-known column identifiers, in order
                            // e.g. "x", "y", "value", "error"

"gridres": {
    "units": String,        // Grid cell units
    "x": Number,            // Grid cell resolution in the x direction
    "y": Number             // Grid cell resolution in the y direction
},

"header": [String],         // Array of human-readable column headers, in order

"parameters": [String],     // Array of well-known variable names e.g.
                            // "values", "value", "errors" or "error"

"span": String,             // The length of time, as a Pandas "freq" code,
                            // that an observation spans

"step": Number,             // The length of time, in seconds, between each
                            // observation to be imported

"timestamp": String,        // An ISO 8601 timestamp for the first observation

"title": String,            // Human-readable "pretty" name for the data set

"units": Object,            // The measurement units, per parameter

"var_name": String          // The name of the variable in the hierarchical
                            // file which stores the data

Contents of an example configuration file are shown here:

{
    "columns": ["x","y"],
    "gridres": {
        "units": "degrees",
            "x": 1.0,
            "y": 1.0
        },
    "header": [
        "lng",
        "lat"
    ],
    "parameters": ["values"],
    "steps": [10800],
    "timestamp": "2012-12-22T03:00:00",
    "title": "Surface Carbon Flux",
    "units": {
        "x": "degrees",
        "y": "degrees",
        "values": "&mu;mol/m&sup2;"
    },
    "var_name": "casa_gfed_2004"
}

manage.py load examples

Most basic example; assumes a configuration file exists at ./mydata/data_casa_gfed_3hrly.json:

$ python manage.py load -p ./data_casa_gfed.mat -m SpatioTemporalMatrix -n casa_gfed_2004

Specify an alternate config file to use:

$ python manage.py load -p ./data_casa_gfed.mat -m SpatioTemporalMatrix -n casa_gfed_2004 -c ./config/casa_gfed.json

In the following example, the loader will look for a config file at ./data_casa_gfed.json and overwrite the timestamp and var_name parameters in that file with those provided as command line args:

$ python manage.py load -p ./data_casa_gfed.mat -m SpatioTemporalMatrix -n casa_gfed_2004 -o "timestamp=2003-12-22T03:00:00;var_name=casa_gfed_2004"

Removing data

Use the manage.py remove utility to remove datasets from the database.

$ manage.py remove -n <collection_name>

Required argument:
    -n, --collection_name    Collection name to be removed (MongoDB identifier)

manage.py remove example

$ python manage.py remove -n casa_gfed_2004

Renaming datasets

Use the manage.py rename utility to rename datasets in the database.

Warning

It is important to use this utility for renaming rather than manually renaming datasets by interfacing directly with MongoDB because several metadata tables require corresponding updates.

$ manage.py rename -n <collection_name> -r <new_name>

Required arguments:
    -n, --collection_name    Name of the dataset name to be modified (MongoDB identifier)
    -r, --new_name           New name for the collection

manage.py rename example

$ python manage.py rename -n casa_gfed_2004 -r casa_2004

Database diagnostics

Use the manage.py db utility to get diagnostic information on database contents:

$ manage.py db [OPTIONAL ARGUMENTS]

Requires one of the following flags:

    -l, --list_ids           Lists collection names in the database.

         Optional args with -l flag:
             collections :   lists collections
             metadata:       lists the collections w/ metadata entries
             coord_index:    lists the collections w/ coord_index entries

    -n, --collection_name    Collection name for which to shows metadata

    -a, --audit              No argument required. Performs audit of the
                             database, reporting any collections that are
                             missing corresponding metadata/coord_index
                             entries and any stale metadata/coord_index
                             entries without corresponding collections

Optional argument:

    -x, --include_counts     Include count of records within each listed
                             collection. Valid only with a corresponding
                             "-l collections" flag; ignored otherwise

manage.py db examples

List all collections and their number of records:

$ python manage.py db -l collections -x

List all the collections with metadata entries:

$ python manage.py db -l metadata

Show metadata for the collection with id “casa_gfed_2004”:

$ python manage.py db -n casa_gfed_2004

Audit the database:

$ python manage.py db -a

Module documentation

fluxpy.models

fluxpy.mediators