# ampli _Repository for Petra's work at [ampli](https://www.ampli.nz) Jan-Feb 2019_ ## What isn't in this repository This repository does not contain with it downloaded data (for confidentiality and size reasons), the fitted models (similar issues), or the configuration file for downloading from the database (which contains a password). Recreate an empty `data/` directory and `model/`. The configuration file is at `py/database.ini` and looks like: ```ini [postgresql] host= database= user= password= ``` This file is based on an example from [the postgresSQL website](http://www.postgresqltutorial.com/postgresql-python/connect/). Replace `` and `` with valid credentials. ## SQL I've included a bunch of annotated SQL queries in `sql/queries.pgsql` and `sql/weather.pgsql`. The latter file is specifically about the weather dataset; the former is more general. Some of the queries are copied into the `py/util.py` file, but changing these two files will do nothing on their own. ## Python `requirements.txt` contains the python packages required to set up a virtual environment with `virtualenv -p /usr/bin/python3 venv` and `pip install -r requirements.txt`. Notably these are: * numpy * pandas * pkg-resources * psycopg2-binary * pyparsing * python-dateutil * pytz * scipy * seaborn * statsmodels Virtual environments are loaded with `source venv/bin/activate`. The python scripts are in the `py/` folder. The scripts that are designed to be called directly are called by `python `; use `python -h` to view help. Note that most options will have a default, which may not be what you want, so always check. ### `util.py` This script is imported by several other scripts, particularly for downloading the data from the database. ### `downkwh.py` Downloads demand data from the database. Options: * `-o PATH`: The path for the python "pickle" file to store the result in. * `-s DATE`: The start date for the download in `YYYY-MM-DD` format; default of 2017-01-01. * `-e DATE`: The end date in `YYYY-MM-DD` format; default of 2018-01-01. * `-t TABLE`: The table in the database from which to obtain the wanted ICP ids; default is `public.icp_sample`, a table which contains 1000 ICPs with good data for 2017. **Important**: Don't assume that SQL injection can't come through this vector, although I have constrained the values that this script will accept from the command line to the following list: * `public.best_icp`, All icps with at least 360 days of data in 2017 * `public.best_icp_1618`, All icps with at least 720 days of data in 2 years from 1 April 2016 * `public.best_icp_18m`, All icps with at least 540 days of data from July 2016 to end of 2017 * `public.icp_sample`, A pre-generated 1k sample from best_icp * `public.icp_sample_5k`, A pre-generated 5k sample from best_icp * `public.icp_sample_1618`, A pre-generated 1k sample from best_icp_1618 * `public.icp_sample_18m`, A pre-generated 1k sample from best_icp_18m * `-n NUM`: The algorithm downloads the dataset in pieces, optimises them to reduce storage space, and reassembles. This option defines the number of such pieces; it should always be less than the number of days between the start and end days. Default of 12. * `--no-pivot`: This option can probably be ignored, as it downloads the dataset in a less efficient non-"pivoted" form, which was used in the original versions of some of these scripts. * `-v`: Output some extra progress information as it goes; mostly useful for debugging. Example: ```bash python downkwh.py -o ../data/test1k.pkl -n 24 ``` Downloads data from the default period into `../data/test1k.pkl` with 24 segments used. ### `downweather.py` Downloads weather (temperature and humidity) data from the database, from one specified station. * `-o PATH`: The path for the python "pickle" file to store the result in. * `-s DATE`: The start date for the download in `YYYY-MM-DD` format; default of 2016-04-01. * `-e DATE`: The end date in `YYYY-MM-DD` format; default of 2019-01-01. * `--station`: The station to download from; default is 2006 which is located near Pukekohe. * `-v`: Output some extra progress information as it goes; mostly useful for debugging. Example: ```bash python downweather.py -o ../data/weathertest.pkl ``` Downloads data from the default period into `../data/weathertest.pkl`.