Miscellaneous notes
Accessed either via the SQL Manager program on the laptop, the psql
terminal command (via the psconnect
alias), or with the psycopg2
(psycopg2-binary
) python library.
Have created an experimental table called public.coup_tall_april
containing data from April 2017 in a “tall” format, using code from Jason:
CREATE TABLE public.coup_tall_april AS
SELECT a.icp_id
, a.read_date
, c.period
, sum(c.read_kwh) as kwh_tot
, sum(case when a.content_code = 'UN' then c.read_kwh else 0 end) as kwh_un
, sum(case when a.content_code in ('CN','EG') then c.read_kwh else 0 end) as kwh_cn
FROM coup_prd.coupdatamaster a,
unnest(a.read_array) WITH ORDINALITY c(read_kwh, period)
WHERE a.read_date >= to_date('01/04/2017','dd/mm/yyyy')
and a.read_date < to_date('01/05/2017','dd/mm/yyyy')
and a.content_code ~ ('UN|CN|EG')
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
This data looks like:
SELECT * FROM public.coup_tall_april limit 10;
icp_id | read_date | period | kwh_tot | kwh_un | kwh_cn ---------+------------+--------+---------+--------+-------- I000002 | 2017-04-01 | 1 | 0.123 | 0.123 | 0.0 I000002 | 2017-04-01 | 2 | 0.161 | 0.161 | 0.0 I000002 | 2017-04-01 | 3 | 0.118 | 0.118 | 0.0 I000002 | 2017-04-01 | 4 | 0.108 | 0.108 | 0.0 I000002 | 2017-04-01 | 5 | 0.125 | 0.125 | 0.0 I000002 | 2017-04-01 | 6 | 0.144 | 0.144 | 0.0 I000002 | 2017-04-01 | 7 | 0.11 | 0.11 | 0.0 I000002 | 2017-04-01 | 8 | 0.116 | 0.116 | 0.0 I000002 | 2017-04-01 | 9 | 0.197 | 0.197 | 0.0 I000002 | 2017-04-01 | 10 | 0.144 | 0.144 | 0.0
icp_id
is the ID of the ICP, which may be a home or business. This is a varchar(10)
, although it appears to only have 7 characters. The ID is not the real ID, but an anonymised value.read_date
is the date, in this case in April 2017.kwh_cn
is the demand in kwh that the company has some level of control over, e.g. by systems that turn off and on water heaters remotely. This ought to be relatively stable, although in many cases this will be 0.kwh_un
is the uncontrolled demand, i.e. the rest.kwh_tot
is the sum of the other kwh measurements for this half-hourThis dataset includes 34278 distinct ICPs: SELECT COUNT(DISTINCT icp_id) FROM public.coup_tall_april;
Not every day has the same number of ICPs recorded for it:
(Some are added and removed each day; some are disconnected for various reasons on different days
SELECT read_date, COUNT(DISTINCT icp_id) AS d_icp
FROM public.coup_tall_april
GROUP BY read_date;
Days have similar averages (within the same month), but sometimes values are negative:
Negative values possibly from solar power injecting back into the network, but is so rare it can be ignored.
SELECT read_date, min(kwh_tot), Avg(kwh_tot), max(kwh_tot)
FROM public.coup_tall_april
GROUP BY read_date;
Three values in this table are negative:
SELECT * FROM public.coup_tall_april WHERE kwh_tot < 0 OR kwh_un < 0 OR kwh_cn < 0;
icp_id | read_date | period | kwh_tot | kwh_un | kwh_cn ---------+------------+--------+---------+---------+-------- I017181 | 2017-04-12 | 19 | -30.530 | -30.585 | 0.055 I019141 | 2017-04-29 | 37 | -31.445 | -31.445 | 0 I019141 | 2017-04-29 | 38 | -31.624 | -31.624 | 0
There are 334 values in this table where the icp_id
ends in 17:
SELECT COUNT (DISTINCT icp_id) FROM public.coup_tall_april WHERE icp_id LIKE '%17';
SELECT DISTINCT icp_id FROM public.coup_tall_april WHERE icp_id LIKE '%17' ORDER BY icp_id LIMIT 10;
This could be useful for sampling later.
A master’s thesis from the Netherlands, which gives a lot of useful information. (link.)
Notes in notebook; copy next week. Up to page 54.
A presentation, which I still need to read. (link)
(link.)
http://alexminnaar.com/time-series-classification-and-clustering-with-python.html