Repository for Petra's work at ampli Jan-Feb 2019

notes.md 6.3KB

Miscellaneous notes

The database

Accessed either via the SQL Manager program on the laptop, the psql terminal command (via the psconnect alias), or with the psycopg2 (psycopg2-binary) python library.

Have created an experimental table called public.coup_tall_april containing data from April 2017 in a “tall” format, using code from Jason:


CREATE TABLE public.coup_tall_april AS
SELECT  a.icp_id
     , a.read_date
     , c.period
     , sum(c.read_kwh) as kwh_tot
     , sum(case when a.content_code = 'UN' then c.read_kwh else 0 end) as kwh_un
     , sum(case when a.content_code in ('CN','EG') then c.read_kwh else 0 end) as kwh_cn
FROM    coup_prd.coupdatamaster a,
	unnest(a.read_array) WITH ORDINALITY c(read_kwh, period)
WHERE   a.read_date >= to_date('01/04/2017','dd/mm/yyyy')
 and   a.read_date <  to_date('01/05/2017','dd/mm/yyyy')
 and   a.content_code  ~ ('UN|CN|EG')
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;

This data looks like:


SELECT * FROM public.coup_tall_april limit 10;

icp_id | read_date | period | kwh_tot | kwh_un | kwh_cn ---------+------------+--------+---------+--------+-------- I000002 | 2017-04-01 | 1 | 0.123 | 0.123 | 0.0 I000002 | 2017-04-01 | 2 | 0.161 | 0.161 | 0.0 I000002 | 2017-04-01 | 3 | 0.118 | 0.118 | 0.0 I000002 | 2017-04-01 | 4 | 0.108 | 0.108 | 0.0 I000002 | 2017-04-01 | 5 | 0.125 | 0.125 | 0.0 I000002 | 2017-04-01 | 6 | 0.144 | 0.144 | 0.0 I000002 | 2017-04-01 | 7 | 0.11 | 0.11 | 0.0 I000002 | 2017-04-01 | 8 | 0.116 | 0.116 | 0.0 I000002 | 2017-04-01 | 9 | 0.197 | 0.197 | 0.0 I000002 | 2017-04-01 | 10 | 0.144 | 0.144 | 0.0

  • icp_id is the ID of the ICP, which may be a home or business. This is a varchar(10), although it appears to only have 7 characters. The ID is not the real ID, but an anonymised value.
  • read_date is the date, in this case in April 2017.
  • kwh_cn is the demand in kwh that the company has some level of control over, e.g. by systems that turn off and on water heaters remotely. This ought to be relatively stable, although in many cases this will be 0.
  • kwh_un is the uncontrolled demand, i.e. the rest.
  • kwh_tot is the sum of the other kwh measurements for this half-hour

Statistics

This dataset includes 34278 distinct ICPs: SELECT COUNT(DISTINCT icp_id) FROM public.coup_tall_april;

Not every day has the same number of ICPs recorded for it(?):


SELECT read_date, COUNT(DISTINCT icp_id) AS d_icp 
FROM public.coup_tall_april 
GROUP BY read_date;

read_date | d_icp ------------+------- 2017-04-01 | 34080 2017-04-02 | 34070 2017-04-03 | 34082 2017-04-04 | 34085 2017-04-05 | 34083 2017-04-06 | 34078 2017-04-07 | 34084 2017-04-08 | 34085 2017-04-09 | 34079 2017-04-10 | 34097 2017-04-11 | 34102 2017-04-12 | 34095 2017-04-13 | 34127 2017-04-14 | 34127 2017-04-15 | 34128 2017-04-16 | 34122 2017-04-17 | 34119 2017-04-18 | 34161 2017-04-19 | 34178 2017-04-20 | 34181 2017-04-21 | 34190 2017-04-22 | 34187 2017-04-23 | 34178 2017-04-24 | 34190 2017-04-25 | 34180 2017-04-26 | 34199 2017-04-27 | 34193 2017-04-28 | 34194 2017-04-29 | 34179 2017-04-30 | 34162

Days have similar averages (within the same month), but sometimes values are negative:


SELECT read_date, min(kwh_tot), Avg(kwh_tot), max(kwh_tot) 
FROM public.coup_tall_april 
GROUP BY read_date;

read_date | min | avg | max ------------+---------+------------------------+-------------------- 2017-04-01 | 0.0 | 0.4122544704 | 30.928 2017-04-02 | 0.0 | 0.4282689126 | 28.153 2017-04-03 | 0.0 | 0.4313990021 | 28.041 2017-04-04 | 0.0 | 0.4429309526 | 31.111 2017-04-05 | 0.0 | 0.4478038208 | 29.1009999999 2017-04-06 | 0.0 | 0.4327588656 | 28.067 2017-04-07 | 0.0 | 0.4219823395 | 37.413 2017-04-08 | 0.0 | 0.4228975471 | 25.908 2017-04-09 | 0.0 | 0.4349535148 | 30.373 2017-04-10 | 0.0 | 0.4288151169 | 26.791 2017-04-11 | 0.0 | 0.4252618345 | 35.234 2017-04-12 | -30.530 | 0.4419348542 | 29.818 2017-04-13 | 0.0 | 0.4490852099 | 31.721 2017-04-14 | 0.0 | 0.4311007474 | 27.167 2017-04-15 | 0.0 | 0.4113287952 | 30.746 2017-04-16 | 0.0 | 0.4115571155 | 26.713 2017-04-17 | 0.0 | 0.4265760042 | 27.751 2017-04-18 | 0.0 | 0.4311305557 | 34.414 2017-04-19 | 0.0 | 0.4341526347 | 26.547 2017-04-20 | 0.0 | 0.4351385443 | 27.124 2017-04-21 | 0.0 | 0.4382414735 | 30.365 2017-04-22 | 0.0 | 0.4257634901 | 31.112 2017-04-23 | 0.0 | 0.4409884495 | 31.099 2017-04-24 | 0.0 | 0.4344151128 | 27.109 2017-04-25 | 0.0 | 0.4380537863 | 25.776 2017-04-26 | 0.0 | 0.4401752859 | 26.907 2017-04-27 | 0.0 | 0.4428565221 | 29.544 2017-04-28 | 0.0 | 0.4343788968 | 29.598 2017-04-29 | -31.624 | 0.4528608447 | 31.874 2017-04-30 | 0.0 | 0.4640828966 | 31.960

Three values in this table are negative:


SELECT * FROM public.coup_tall_april WHERE kwh_tot < 0 OR kwh_un < 0 OR kwh_cn < 0;

icp_id | read_date | period | kwh_tot | kwh_un | kwh_cn ---------+------------+--------+---------+---------+-------- I017181 | 2017-04-12 | 19 | -30.530 | -30.585 | 0.055 I019141 | 2017-04-29 | 37 | -31.445 | -31.445 | 0 I019141 | 2017-04-29 | 38 | -31.624 | -31.624 | 0

There are 334 values in this table where the icp_id ends in 17:


SELECT COUNT (DISTINCT icp_id) FROM public.coup_tall_april WHERE icp_id LIKE '%17';

SELECT DISTINCT icp_id FROM public.coup_tall_april WHERE icp_id LIKE '%17' ORDER BY icp_id LIMIT 10;

icp_id

I000117 I000217 I000417 I000517 I000617 I000817 I001117 I001217 I001317 I001417