Browse Source

Readme, minor fixes

Petra Lamborn 2 years ago
parent
commit
912fd2eff0
3 changed files with 61 additions and 8 deletions
  1. 48
    0
      README.md
  2. 11
    7
      py/predict.py
  3. 2
    1
      py/pymodels.py

+ 48
- 0
README.md View File

@@ -188,6 +188,54 @@ python agg.py -i ../data/test1kb.pkl -c ../data/test1kbclustertable.pkl -o ../da
188 188
 
189 189
 Downloads new dataset from the `public.icp_sample_18m` sample and saves it to `../data/test1kb.pkl`. Then assigns clusters to this (excluding the misc/'-1' cluster) from the `../data/test1kagg.pkl` dataset with threshold 0.1, saving into `../data/test1kbclustertable.pkl`. Then aggregates this dataset and saves in `../data/test1kbagg.pkl`.
190 190
 
191
+### `pymodels.py`
192
+
193
+This script is a rewrite of the below `weathmod.R` and `combmodels.R` `R` scripts. It fits a harmonic model to an aggregated dataset. **Note**: the model file created is quite large (on the order of 500MB). This could probably be pruned down.
194
+
195
+
196
+* `-i PATH`: Path of the file that contains the (aggregated) dataset to fit models to.
197
+* `-w PATH`: Path of the weather data involved in the model.
198
+* `-m MODEL_FILE`: Filename to save the model to, as pickle. Note: this is not the same kind of pickle that `pickletocsv.py` can read.
199
+* `--weather-harmonics NUM`: Number of harmonics (with base period of 1 year/365.25 days) to fit to the weather data; default is 2. Adding more harmonics leads to a more complicated model which may be more powerful but may also "overfit."
200
+* `--icp-harmonics NUM NUM NUM`: (3 values) Number of harmonics of base period 1 year, 1 week, 1 day to fit, respectively. Default is 2, 3, and 3. Adding more harmonics leads to a more complicated model which may be more powerful but may also "overfit."
201
+
202
+Example:
203
+
204
+```bash
205
+python pymodels.py -i ../data/test1kagg.pkl -w ../data/weathertest.pkl -m ../models/testmod.pkl
206
+```
207
+
208
+Fit all clusters in `../data/test1kagg.pkl` with weather data in `../data/weathertest.pkl` and save to `../models/testmod.pkl`.
209
+
210
+
211
+### `predict.py`
212
+
213
+Predict unobserved demand values for given cluster, time period, supplying either maximum/minimum temperatures (as in the shiny app) or with a weather dataset.
214
+
215
+* `-m MODEL_FILE`: Filename to retrieve the model from, as pickle.
216
+* `-w WEATHER_FILE`: Path to weather data. This is optional, but if not specified the temperature parameter should be.
217
+* `-o OUTPUT_FILE`: File to output to. If `-` or absent prints to `stdout`.
218
+* `-t TEMP TEMP`: (2 values) If not supplying a weather file, can specify a minimum overnight and maximum daytime temperature value, similar to the shiny app.
219
+* `-s START_DATE`: The start date for the prediction interval, in `YYYY-MM-DD` format.
220
+* `-e END_DATE`: The end date for the prediction interval, in `YYYY-MM-DD` format.
221
+* `-c CLUSTER`: The cluster to be predicted.
222
+* `--pkl`: Output as a pickled dataframe rather than a csv file.
223
+
224
+Examples:
225
+
226
+```bash
227
+python predict.py -m ../models/testmod.pkl -s 2018-01-01 -e 2018-02-01 -w ../data/weathertest.pkl -c 1 | tabview -
228
+```
229
+
230
+For cluster 1, model `../models/testmod.pkl`, weather data `../data/weathertest.pkl`, predict per ICP demand for the month of Jan 2018 and view in `tabview` viewer.
231
+
232
+```bash
233
+python predict.py -m ../models/testmod.pkl -t 5 10 -s 2019-07-01 -e 2019-07-02 -c 1 | tabview -
234
+```
235
+
236
+For cluster 1, model `../models/testmod.pkl`, minimum overnight temperature 5 degrees C, and maximum temperature 10 degrees C, predict per ICP demand for the first of July 2019.
237
+
238
+
191 239
 ## R
192 240
 
193 241
 The scripts in `R/` include visualisers for the data, and for the creation of some models.

+ 11
- 7
py/predict.py View File

@@ -4,6 +4,7 @@ import pandas as p
4 4
 import statsmodels.formula.api as smf
5 5
 import datetime as dt
6 6
 import pickle
7
+from sys import stdout
7 8
 from pymodels import thirtyoffset, predweather, harmonic
8 9
 from pprint import pprint
9 10
 from util import datevalid
@@ -41,8 +42,7 @@ def main():
41 42
                         required=False,
42 43
                         type=FileType('rb'))
43 44
     parser.add_argument("-o", "--output", dest="output_file",
44
-                        help="file to save result",
45
-                        required=True, type=FileType('w'))
45
+                        help="file to save result (default stdout)")
46 46
     parser.add_argument("-t", "--temperature", dest="temp",
47 47
                         help = "min and max temperature, if not using "
48 48
                         "weather dataset, e.g. 2.0 10.5", 
@@ -50,13 +50,13 @@ def main():
50 50
                         type=float, nargs=2)
51 51
     parser.add_argument("-s", "--start-date", 
52 52
                         dest = "startdate", 
53
-                        help = "start date for prediction; format: YYYY-MM-DD; default: 2018-01-01", 
53
+                        help = "start date for prediction; format: YYYY-MM-DD", 
54 54
                         metavar="START_DATE", 
55 55
                         required = True,
56 56
                         type = datevalid)
57 57
     parser.add_argument("-e", "--end-date", 
58 58
                         dest = "enddate", 
59
-                        help = "end date for prediction; format: YYYY-MM-DD; default: 2018-02-01", 
59
+                        help = "end date for prediction; format: YYYY-MM-DD", 
60 60
                         metavar="END_DATE", 
61 61
                         required = True,
62 62
                         type = datevalid)
@@ -65,9 +65,10 @@ def main():
65 65
                         help = "cluster to predict for",
66 66
                         type = int,
67 67
                         required = True)
68
-    parser.add_argument("--csv",
69
-                        help="output as csv",
70
-                        action="store_true")
68
+    parser.add_argument("--pkl",
69
+                        help="output as pkl rather than csv",
70
+                        dest = "csv",
71
+                        action="store_false")
71 72
     args = parser.parse_args()
72 73
 
73 74
     if args.temp is None and args.weather_file is None:
@@ -79,6 +80,9 @@ def main():
79 80
     if args.cluster not in mods["clusters"]:
80 81
         parser.error(f"cluster ('{args.cluster}') not in model")
81 82
 
83
+    if args.output_file is None or args.output_file == "-":
84
+        args.output_file = stdout
85
+
82 86
     wdat = []
83 87
     
84 88
     if args.weather_file is not None:

+ 2
- 1
py/pymodels.py View File

@@ -4,6 +4,7 @@ import pandas as p
4 4
 import statsmodels.formula.api as smf
5 5
 import datetime as dt
6 6
 import pickle
7
+from tqdm import tqdm
7 8
 
8 9
 epoch = dt.datetime(2017, 1, 1)
9 10
 
@@ -130,7 +131,7 @@ def fitdemand(df, wmodsum, harmonics=[2, 3, 3]):
130 131
                    + min_resid:({w_params}) + min_resid:({d_params})
131 132
                    """.replace("\n", "").replace("  ", "")
132 133
 
133
-    for c in clusters:
134
+    for c in tqdm(clusters):
134 135
         dfc = df[df['cluster'] == c].join(hcomb,
135 136
                                           how='left').join(wmodsum,
136 137
                                                            how='left')