Browse Source

Continue readme

Petra Lamborn 5 years ago
parent
commit
06e0f94338
3 changed files with 52 additions and 3 deletions
  1. 50
    1
      README.md
  2. 1
    1
      py/downkwh.py
  3. 1
    1
      py/downweather.py

+ 50
- 1
README.md View File

@@ -24,7 +24,6 @@ I've included a bunch of annotated SQL queries in `sql/queries.pgsql` and `sql/w
24 24
 
25 25
 `requirements.txt` contains the python packages required to set up a virtual environment with `virtualenv -p /usr/bin/python3 venv` and `pip install -r requirements.txt`. Notably these are:
26 26
 
27
-
28 27
 * numpy
29 28
 * pandas
30 29
 * pkg-resources
@@ -35,3 +34,53 @@ I've included a bunch of annotated SQL queries in `sql/queries.pgsql` and `sql/w
35 34
 * scipy
36 35
 * seaborn
37 36
 * statsmodels
37
+
38
+Virtual environments are loaded with `source venv/bin/activate`. The python scripts are in the `py/` folder. The scripts that are designed to be called directly are called by `python <scriptname.py>`; use `python <scriptname.py> -h` to view help. Note that most options will have a default, which may not be what you want, so always check.
39
+
40
+### `util.py`
41
+
42
+This script is imported by several other scripts, particularly for downloading the data from the database.
43
+
44
+### `downkwh.py`
45
+
46
+Downloads demand data from the database. Options:
47
+
48
+* `-o PATH`: The path for the python "pickle" file to store the result in.
49
+* `-s DATE`: The start date for the download in `YYYY-MM-DD` format; default of 2017-01-01.
50
+* `-e DATE`: The end date in `YYYY-MM-DD` format; default of 2018-01-01.
51
+* `-t TABLE`: The table in the database from which to obtain the wanted ICP ids; default is `public.icp_sample`, a table which contains 1000 ICPs with good data for 2017. **Important**: Don't assume that SQL injection can't come through this vector, although I have constrained the values that this script will accept from the command line to the following list:
52
+	* `public.best_icp`, All icps with at least 360 days of data in 2017
53
+	* `public.best_icp_1618`, All icps with at least 720 days of data in 2 years from 1 April 2016
54
+	* `public.best_icp_18m`, All icps with at least 540 days of data from July 2016 to end of 2017
55
+	* `public.icp_sample`, A pre-generated 1k sample from best_icp
56
+	* `public.icp_sample_5k`, A pre-generated 5k sample from best_icp
57
+	* `public.icp_sample_1618`, A pre-generated 1k sample from best_icp_1618
58
+	* `public.icp_sample_18m`, A pre-generated 1k sample from best_icp_18m
59
+* `-n NUM`: The algorithm downloads the dataset in pieces, optimises them to reduce storage space, and reassembles. This option defines the number of such pieces; it should always be less than the number of days between the start and end days. Default of 12.
60
+* `--no-pivot`: This option can probably be ignored, as it downloads the dataset in a less efficient non-"pivoted" form, which was used in the original versions of some of these scripts.
61
+* `-v`: Output some extra progress information as it goes; mostly useful for debugging.
62
+
63
+Example:
64
+
65
+```bash
66
+python downkwh.py -o ../data/test1k.pkl -n 24
67
+```
68
+
69
+Downloads data from the default period into `../data/test1k.pkl` with 24 segments used.
70
+
71
+### `downweather.py`
72
+
73
+Downloads weather (temperature and humidity) data from the database, from one specified station.
74
+
75
+* `-o PATH`: The path for the python "pickle" file to store the result in.
76
+* `-s DATE`: The start date for the download in `YYYY-MM-DD` format; default of 2016-04-01.
77
+* `-e DATE`: The end date in `YYYY-MM-DD` format; default of 2019-01-01.
78
+* `--station`: The station to download from; default is 2006 which is located near Pukekohe.
79
+* `-v`: Output some extra progress information as it goes; mostly useful for debugging.
80
+
81
+Example:
82
+
83
+```bash
84
+python downweather.py -o ../data/weathertest.pkl
85
+```
86
+Downloads data from the default period into `../data/weathertest.pkl`.

+ 1
- 1
py/downkwh.py View File

@@ -113,7 +113,7 @@ def collateddownload(startd, endd, numdivis, icp_tab, pivot, verbose):
113 113
 
114 114
 def main():
115 115
     parser = ArgumentParser(description='Download kwh data from database')
116
-    parser.add_argument("-o", "--output", dest="output",     help = "output pickle path; default: ../data/2017-5k-wide.pkl", metavar="PATH", default = "../data/2017-5k-wide.pkl")
116
+    parser.add_argument("-o", "--output", dest="output",     help = "output pickle path", metavar="PATH", required = True)
117 117
     parser.add_argument("-s", "--start-date", dest = "startdate", help = "start date for download; format: YYYY-MM-DD; default: 2017-01-01", metavar="DATE", default = "2017-01-01", type = datevalid)
118 118
     parser.add_argument("-e", "--end-date", dest = "enddate", help = "end date for download; format: YYYY-MM-DD; default: 2018-01-01", metavar="DATE", default = "2018-01-01", type = datevalid)
119 119
     parser.add_argument("-t", "--table", dest = "table", help = "table for download (constrained to specific values in source); default: public.icp_sample", metavar="TABLE", default = "public.icp_sample", choices = tables)

+ 1
- 1
py/downweather.py View File

@@ -5,7 +5,7 @@ import pandas as p
5 5
 
6 6
 def main():
7 7
     parser = ArgumentParser(description='Download kwh data from dataframe')
8
-    parser.add_argument("-o", "--output", dest="output",     help = "output pickle path; default: ../data/2016-18-weather.pkl", metavar="PATH", default = "../data/2016-18-weather.pkl")
8
+    parser.add_argument("-o", "--output", dest="output",     help = "output pickle path", metavar="PATH", required = True)
9 9
     parser.add_argument("-s", "--start-date", dest = "startdate", help = "start date for download; format: YYYY-MM-DD; default: 2016-04-01", metavar="DATE", default = "2016-04-01", type = datevalid)
10 10
     parser.add_argument("-e", "--end-date", dest = "enddate", help = "end date for download; format: YYYY-MM-DD; default: 2019-01-01", metavar="DATE", default = "2019-01-01", type = datevalid)
11 11
     parser.add_argument("--station", dest = "station", help = "weather station to get data from; default: 2006", metavar="STATION", default = "2006")