Browse Source

Add -d option in agg.py; add tabview program

Petra Lamborn 5 years ago
parent
commit
99e9aca478
3 changed files with 35 additions and 6 deletions
  1. 10
    3
      README.md
  2. 6
    3
      py/agg.py
  3. 19
    0
      py/requirements.txt

+ 10
- 3
README.md View File

@@ -128,6 +128,7 @@ Aggregates data based on clusters. **Note**: columns `CI_low` and `CI_high` do n
128 128
 * `-i PATH`: The path for the python "pickle" file retrieve the original data from (i.e. the data downloaded from `downkwh.py`).
129 129
 * `-c PATH`: The path for the python "pickle" file retrieve the cluster data from.
130 130
 * `-o PATH`: The path for the python "pickle" file to store the result in.
131
+* `-d`: Drop cluster '-1', which represents the miscellaneous/unclustered pseudo-cluster produced by `clusAssign.py`.
131 132
 * `-p`: If the dataframe is not in wide form, pivots it so that it is first. Note: untested and unused, representing past behaviour.
132 133
 
133 134
 Example:
@@ -147,7 +148,7 @@ Helper function to transform a pickle into a csv file, for easier importing into
147 148
 * `-r`: Include row names/index labels in csv. This may be essential for proper exporting of some datasets
148 149
 * `-v`: Output extra information, including dimensions of dataset.
149 150
 
150
-Example:
151
+Examples:
151 152
 
152 153
 ```bash
153 154
 python pickletocsv.py ../data/test1kagg.pkl | less
@@ -155,6 +156,12 @@ python pickletocsv.py ../data/test1kagg.pkl | less
155 156
 
156 157
 Reads file at `../data/test1kagg.pkl` and views it in the UNIX pager `less`.
157 158
 
159
+```bash
160
+python pickletocsv.py ../data/test1kagg.pkl - | tabview -
161
+```
162
+
163
+Reads the same file and views it using the `tabview` python module (included in `requirements.txt`). `-` in this case is shorthand for `stdout` and `stdin` respectively, allowing the pipe.
164
+
158 165
 ### `clusAssign.py`
159 166
 
160 167
 Assigns clusters found from one dataset to the values of another. **Note**: this algorithm can assign some ICPs to cluster -1, which means that it failed to assign to a cluster. **Further note**: this method requires both datasets to be on the same timespan.
@@ -169,7 +176,7 @@ Example:
169 176
 ```bash
170 177
 python downkwh.py -o ../data/test1kb.pkl -t public.icp_sample_18m
171 178
 python clusAssign.py -i ../data/test1kb.pkl -c ../data/test1kbclustertable.pkl -a ../data/test1kagg.pkl -t 0.1
172
-python agg.py -i ../data/test1kb.pkl -c ../data/test1kbclustertable.pkl -o ../data/test1kbagg.pkl
179
+python agg.py -i ../data/test1kb.pkl -c ../data/test1kbclustertable.pkl -o ../data/test1kbagg.pkl -d
173 180
 ```
174 181
 
175
-Downloads new dataset from the `public.icp_sample_18m` sample and saves it to `../data/test1kb.pkl`. Then assigns clusters to this from the `../data/test1kagg.pkl` dataset with threshold 0.1, saving into `../data/test1kbclustertable.pkl`. Then aggregates this dataset and saves in `../data/test1kbagg.pkl`.
182
+Downloads new dataset from the `public.icp_sample_18m` sample and saves it to `../data/test1kb.pkl`. Then assigns clusters to this (excluding the misc/'-1' cluster) from the `../data/test1kagg.pkl` dataset with threshold 0.1, saving into `../data/test1kbclustertable.pkl`. Then aggregates this dataset and saves in `../data/test1kbagg.pkl`.

+ 6
- 3
py/agg.py View File

@@ -3,10 +3,12 @@ from argparse import ArgumentParser
3 3
 import pandas as p
4 4
 from tqdm import tqdm
5 5
 
6
-def aggregator(widedf, clusdf):
6
+def aggregator(widedf, clusdf, drop_misc = False):
7 7
     """Aggregate a (wide-form) dataframe by the cluster mappings in a second dataframe
8 8
     """
9
-    clusters = clusdf['cluster'].unique()
9
+    clusters = list(clusdf['cluster'].unique())
10
+    if drop_misc and -1 in clusters:
11
+        clusters.remove(-1)
10 12
     clusters.sort()
11 13
     dflis = []
12 14
     qlow  = lambda x: x.quantile(0.250)
@@ -35,6 +37,7 @@ def main():
35 37
     parser.add_argument("-i", "--input",  dest="input",      help = "input pickle path",  metavar="PATH", required = True)
36 38
     parser.add_argument("-c", "--clusters", dest="clusfile", help = "cluster pickle path", metavar="PATH", required = True)
37 39
     parser.add_argument("-o", "--output", dest="output",     help = "output pickle path", metavar="PATH", required = True)
40
+    parser.add_argument("-d", "--drop-misc", dest="drop_misc", help = "drop 'misc' (-1) pseudocluster", action = "store_true")
38 41
     parser.add_argument("-p", "--pivot", dest = "istall",    help = "input dataframe is in tall format and must be pivoted", action ="store_true")
39 42
     args = parser.parse_args()
40 43
     wd = p.read_pickle(args.input)
@@ -42,7 +45,7 @@ def main():
42 45
     if (args.istall):
43 46
         wd = wd.pivot(index = 'read_time', columns = 'icp_id', values = 'kwh_tot')
44 47
 
45
-    agged = aggregator(wd, cd)
48
+    agged = aggregator(wd, cd, args.drop_misc)
46 49
     agged.to_pickle(args.output)
47 50
 
48 51
 

+ 19
- 0
py/requirements.txt View File

@@ -0,0 +1,19 @@
1
+brewer2mpl==1.4.1
2
+cycler==0.10.0
3
+ggplot==0.11.5
4
+kiwisolver==1.0.1
5
+matplotlib==3.0.2
6
+numpy==1.15.4
7
+pandas==0.23.4
8
+patsy==0.5.1
9
+pkg-resources==0.0.0
10
+psycopg2-binary==2.7.6.1
11
+pyparsing==2.3.0
12
+python-dateutil==2.7.5
13
+pytz==2018.9
14
+scipy==1.2.0
15
+seaborn==0.9.0
16
+six==1.12.0
17
+statsmodels==0.9.0
18
+tabview==1.4.3
19
+tqdm==4.30.0