Miscellaneous notes # The database Accessed either via the SQL Manager program on the laptop, the `psql` terminal command (via the `psconnect` alias), or with the `psycopg2` (`psycopg2-binary`) python library. Have created an experimental table called `public.coup_tall_april` containing data from April 2017 in a "tall" format, using code from Jason: ```sql CREATE TABLE public.coup_tall_april AS SELECT a.icp_id , a.read_date , c.period , sum(c.read_kwh) as kwh_tot , sum(case when a.content_code = 'UN' then c.read_kwh else 0 end) as kwh_un , sum(case when a.content_code in ('CN','EG') then c.read_kwh else 0 end) as kwh_cn FROM coup_prd.coupdatamaster a, unnest(a.read_array) WITH ORDINALITY c(read_kwh, period) WHERE a.read_date >= to_date('01/04/2017','dd/mm/yyyy') and a.read_date < to_date('01/05/2017','dd/mm/yyyy') and a.content_code ~ ('UN|CN|EG') GROUP BY 1, 2, 3 ORDER BY 1, 2, 3; ``` This data looks like: ```sql SELECT * FROM public.coup_tall_april limit 10; ``` icp_id | read_date | period | kwh_tot | kwh_un | kwh_cn ---------+------------+--------+---------+--------+-------- I000002 | 2017-04-01 | 1 | 0.123 | 0.123 | 0.0 I000002 | 2017-04-01 | 2 | 0.161 | 0.161 | 0.0 I000002 | 2017-04-01 | 3 | 0.118 | 0.118 | 0.0 I000002 | 2017-04-01 | 4 | 0.108 | 0.108 | 0.0 I000002 | 2017-04-01 | 5 | 0.125 | 0.125 | 0.0 I000002 | 2017-04-01 | 6 | 0.144 | 0.144 | 0.0 I000002 | 2017-04-01 | 7 | 0.11 | 0.11 | 0.0 I000002 | 2017-04-01 | 8 | 0.116 | 0.116 | 0.0 I000002 | 2017-04-01 | 9 | 0.197 | 0.197 | 0.0 I000002 | 2017-04-01 | 10 | 0.144 | 0.144 | 0.0 * `icp_id` is the ID of the ICP, which may be a home or business. This is a `varchar(10)`, although it appears to only have 7 characters. The ID is not the _real_ ID, but an anonymised value. * `read_date` is the date, in this case in April 2017. * `kwh_cn` is the demand in kwh that the company has some level of control over, e.g. by systems that turn off and on water heaters remotely. This ought to be relatively stable, although in many cases this will be 0. * `kwh_un` is the uncontrolled demand, i.e. the rest. * `kwh_tot` is the sum of the other kwh measurements for this half-hour ## Statistics This dataset includes 34278 distinct ICPs: `SELECT COUNT(DISTINCT icp_id) FROM public.coup_tall_april;` Not every day has the same number of ICPs recorded for it: (Some are added and removed each day; some are disconnected for various reasons on different days ```sql SELECT read_date, COUNT(DISTINCT icp_id) AS d_icp FROM public.coup_tall_april GROUP BY read_date; ``` Days have similar averages (within the same month), but sometimes values are negative: Negative values possibly from solar power injecting back into the network, but is so rare it can be ignored. ```sql SELECT read_date, min(kwh_tot), Avg(kwh_tot), max(kwh_tot) FROM public.coup_tall_april GROUP BY read_date; ``` Three values in this table are negative: ```sql SELECT * FROM public.coup_tall_april WHERE kwh_tot < 0 OR kwh_un < 0 OR kwh_cn < 0; ``` icp_id | read_date | period | kwh_tot | kwh_un | kwh_cn ---------+------------+--------+---------+---------+-------- I017181 | 2017-04-12 | 19 | -30.530 | -30.585 | 0.055 I019141 | 2017-04-29 | 37 | -31.445 | -31.445 | 0 I019141 | 2017-04-29 | 38 | -31.624 | -31.624 | 0 There are 334 values in this table where the `icp_id` ends in 17: ```sql SELECT COUNT (DISTINCT icp_id) FROM public.coup_tall_april WHERE icp_id LIKE '%17'; SELECT DISTINCT icp_id FROM public.coup_tall_april WHERE icp_id LIKE '%17' ORDER BY icp_id LIMIT 10; ``` This could be useful for sampling later. # Research ## Roelofsen, Time Series Clustering A master's thesis from the Netherlands, which gives a lot of useful information. ([link](https://beta.vu.nl/nl/Images/stageverslag-roelofsen_tcm235-882304.pdf).) Notes in notebook; copy next week. Up to page 54. ## Alonso, Time Series Clustering A presentation, which I still need to read. ([link](http://halweb.uc3m.es/esp/Personal/personas/amalonso/esp/ASDM-C02-clustering.pdf)) ## Twoards Data Science, Playing with time series data in python ([link](https://towardsdatascience.com/playing-with-time-series-data-in-python-959e2485bff8).) ## Minnaar, Time Series Classification and Clustering with Python http://alexminnaar.com/time-series-classification-and-clustering-with-python.html