Repository for Petra's work at ampli Jan-Feb 2019

README.md 4.3KB

ampli

Repository for Petra’s work at ampli Jan-Feb 2019

What isn’t in this repository

This repository does not contain with it downloaded data (for confidentiality and size reasons), the fitted models (similar issues), or the configuration file for downloading from the database (which contains a password). Recreate an empty data/ directory and model/. The configuration file is at py/database.ini and looks like:

[postgresql]
host=<Hname>
database=<dbname>
user=<Username>
password=<Password>

This file is based on an example from the postgresSQL website. Replace <Username> and <Password> with valid credentials.

SQL

I’ve included a bunch of annotated SQL queries in sql/queries.pgsql and sql/weather.pgsql. The latter file is specifically about the weather dataset; the former is more general. Some of the queries are copied into the py/util.py file, but changing these two files will do nothing on their own.

Python

requirements.txt contains the python packages required to set up a virtual environment with virtualenv -p /usr/bin/python3 venv and pip install -r requirements.txt. Notably these are:

  • numpy
  • pandas
  • pkg-resources
  • psycopg2-binary
  • pyparsing
  • python-dateutil
  • pytz
  • scipy
  • seaborn
  • statsmodels

Virtual environments are loaded with source venv/bin/activate. The python scripts are in the py/ folder. The scripts that are designed to be called directly are called by python <scriptname.py>; use python <scriptname.py> -h to view help. Note that most options will have a default, which may not be what you want, so always check.

util.py

This script is imported by several other scripts, particularly for downloading the data from the database.

downkwh.py

Downloads demand data from the database. Options:

  • -o PATH: The path for the python “pickle” file to store the result in.
  • -s DATE: The start date for the download in YYYY-MM-DD format; default of 2017-01-01.
  • -e DATE: The end date in YYYY-MM-DD format; default of 2018-01-01.
  • -t TABLE: The table in the database from which to obtain the wanted ICP ids; default is public.icp_sample, a table which contains 1000 ICPs with good data for 2017. Important: Don’t assume that SQL injection can’t come through this vector, although I have constrained the values that this script will accept from the command line to the following list:
    • public.best_icp, All icps with at least 360 days of data in 2017
    • public.best_icp_1618, All icps with at least 720 days of data in 2 years from 1 April 2016
    • public.best_icp_18m, All icps with at least 540 days of data from July 2016 to end of 2017
    • public.icp_sample, A pre-generated 1k sample from best_icp
    • public.icp_sample_5k, A pre-generated 5k sample from best_icp
    • public.icp_sample_1618, A pre-generated 1k sample from best_icp_1618
    • public.icp_sample_18m, A pre-generated 1k sample from best_icp_18m
  • -n NUM: The algorithm downloads the dataset in pieces, optimises them to reduce storage space, and reassembles. This option defines the number of such pieces; it should always be less than the number of days between the start and end days. Default of 12.
  • --no-pivot: This option can probably be ignored, as it downloads the dataset in a less efficient non-“pivoted” form, which was used in the original versions of some of these scripts.
  • -v: Output some extra progress information as it goes; mostly useful for debugging.

Example:

python downkwh.py -o ../data/test1k.pkl -n 24

Downloads data from the default period into ../data/test1k.pkl with 24 segments used.

downweather.py

Downloads weather (temperature and humidity) data from the database, from one specified station.

  • -o PATH: The path for the python “pickle” file to store the result in.
  • -s DATE: The start date for the download in YYYY-MM-DD format; default of 2016-04-01.
  • -e DATE: The end date in YYYY-MM-DD format; default of 2019-01-01.
  • --station: The station to download from; default is 2006 which is located near Pukekohe.
  • -v: Output some extra progress information as it goes; mostly useful for debugging.

Example:

python downweather.py -o ../data/weathertest.pkl

Downloads data from the default period into ../data/weathertest.pkl.