Skip to content

Commit

Permalink
Merge pull request #9 from iag-geo/202311
Browse files Browse the repository at this point in the history
202311
  • Loading branch information
minus34 authored Nov 21, 2023
2 parents e69352e + 4e29100 commit 83d42e3
Show file tree
Hide file tree
Showing 18 changed files with 170 additions and 110 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Concord

A [CSV file](https://minus34.com/opendata/geoscape-202308/boundary_concordance.csv) and supporting scripts for converting data between Australian boundaries.
A [CSV file](https://minus34.com/opendata/geoscape-202311/boundary_concordance.csv) and supporting scripts for converting data between Australian boundaries.

It solves the problem of trying to merge 2 or more datasets based on different census or administrative boundaries such as statistical areas or postcodes.

It does this by providing a list of **_concordances_** between pairs of boundaries. _e.g. In the image below: 100% of postcode 3126 fits within the Boroondara LGA. However, only ~46% of postcode 3127 fits within that LGA._

In this context, **_concordance_** describes what % of residential addresses in a "from" boundary fit within a "to" boundary.

Download & import the ~50Mb [concordance file](https://minus34.com/opendata/geoscape-202308/boundary_concordance.csv) into your database or reporting tool to [get started](#get-started). A [script](/postgres-scripts/00_import_concordance_file.sql) for importing into Postgres is also provided.
Download & import the ~50Mb [concordance file](https://minus34.com/opendata/geoscape-202311/boundary_concordance.csv) into your database or reporting tool to [get started](#get-started). A [script](/postgres-scripts/00_import_concordance_file.sql) for importing into Postgres is also provided.

### Example Use Cases

Expand Down Expand Up @@ -42,7 +42,7 @@ In the [score](/data/boundary_concordance_score.csv) file, the **_error_** measu

The concordance file is generated by the following process:

1. Tag all GNAF addresses with 2016 & 2021 ABS Census boundaries and geoscape 202308 Administrative boundaries
1. Tag all GNAF addresses with 2016 & 2021 ABS Census boundaries and geoscape 202311 Administrative boundaries
2. Remove all addresses in non-residential ABS Census 2021 meshblocks
3. Aggregate all residential addresses by a set of _**from**_ boundary and _**to**_ boundary pairs (e.g. postcode to LGA)
4. Determine the % overlap of residential addresses between both boundary types for all boundary pairs
Expand All @@ -63,7 +63,7 @@ There are 2 options to get the data:

#### 1. Download and Import

1. Download the [concordance file](https://minus34.com/opendata/geoscape-202308/boundary_concordance.csv)
1. Download the [concordance file](https://minus34.com/opendata/geoscape-202311/boundary_concordance.csv)
2. Import it into your database/reporting tool of choice. If using Postgres:
1. Edit the file path, schema name & table owner in `00_import_concordance_file.sql` in the [postgres-scripts](/postgres-scripts) folder
2. Run the SQL script to import the file
Expand All @@ -75,9 +75,9 @@ This requires a knowledge of Python, Postgres & pg_restore.
BTW - if the boundary combination you want isn't in the default concordance file - you need to edit the `settings.py` file before running `create_concordance_file.py`. If this is too hard - raise an [issue](https://github.com/iag-geo/concord/issues) and we may be able to generate it for you; noting you shouldn't convert data to a smaller boundary due to the increase in data errors.

**Running the script only needs to be done for 3 reasons:**
1. The boundary from/to combination you need isn't in the standard [concordances file](https://minus34.com/opendata/geoscape-202308/boundary_concordance.csv)
1. The boundary from/to combination you need isn't in the standard [concordances file](https://minus34.com/opendata/geoscape-202311/boundary_concordance.csv)
2. It's now the future and we've been too lazy to update the concordances file with the latest boundary data from the ABS and/or Geoscape
3. You have a license of [Geoscape Buildings](https://geoscape.com.auhttps://minus34.com/opendata/geoscape-202308/boundary_concordance.csv/buildings/) or [Geoscape Land Parcels](https://geoscape.com.auhttps://minus34.com/opendata/geoscape-202308/boundary_concordance.csv/land-parcels/) and want to use the _planning zone_ data in those products to:
3. You have a license of [Geoscape Buildings](https://geoscape.com.auhttps://minus34.com/opendata/geoscape-202311/boundary_concordance.csv/buildings/) or [Geoscape Land Parcels](https://geoscape.com.auhttps://minus34.com/opendata/geoscape-202311/boundary_concordance.csv/land-parcels/) and want to use the _planning zone_ data in those products to:
1. Use a more accurate list of residential addresses to determine the data apportionment percentages (see **note** below); or
2. Use a different set of addresses to apportion your data; e.g. industrial or commercial addresses

Expand All @@ -88,8 +88,8 @@ BTW - if the boundary combination you want isn't in the default concordance file
Running the script requires the following open data, available as Postgres dump files, as well as the optional licensed Geoscape data mentioned above:
1. ABS Census 2016 boundaries ([download](https://minus34.com/opendata/census-2016/census_2016_bdys.dmp))
2. ABS Census 2021 boundaries ([download](https://minus34.com/opendata/census-2021/census_2021_bdys_gda94.dmp))
3. GNAF from gnaf-loader ([download](https://minus34.com/opendata/geoscape-202308/gnaf-202308.dmp))
4. Geoscape Administrative Boundaries from gnaf-loader ([download](https://minus34.com/opendata/geoscape-202308/admin-bdys-202308.dmp))
3. GNAF from gnaf-loader ([download](https://minus34.com/opendata/geoscape-202311/gnaf-202311.dmp))
4. Geoscape Administrative Boundaries from gnaf-loader ([download](https://minus34.com/opendata/geoscape-202311/admin-bdys-202311.dmp))
5. ABS Census 2016 data - used to generate error rates only ([download](https://minus34.com/opendata/census-2016/census_2016_data.dmp))

#### Process
Expand Down Expand Up @@ -117,7 +117,7 @@ The behaviour of the Python script can be controlled by specifying various comma
* `--pgpassword` password for accessing the Postgres server. This defaults to the `PGPASSWORD` environment variable if set, otherwise `password`.

##### Optional Arguments
* `--geoscape-version` Geoscape version number in YYYYMM format. Defaults to current year and last release month. e.g. `202308`.
* `--geoscape-version` Geoscape version number in YYYYMM format. Defaults to current year and last release month. e.g. `202311`.
* `--gnaf-schema` input schema name to store final GNAF tables in. Also the **output schema** for the concordance table. Defaults to `gnaf_<geoscape_version>`.
* `--admin-schema` input schema name to store final admin boundary tables in. Defaults to `admin_bdys_<geoscape_version>`.
* `--output-table` name of both output concordance table and file. Defaults to `boundary_concordance`.
Expand Down Expand Up @@ -147,8 +147,8 @@ WITH pc_data AS (
con.to_name AS lga_name,
sum(pc.cases::float * con.address_percent / 100.0)::integer AS cases
FROM testing.nsw_covid_cases_20220503_postcode AS pc
INNER JOIN gnaf_202308.boundary_concordance AS con ON pc.postcode = con.from_id
WHERE con.from_source = 'geoscape 202308'
INNER JOIN gnaf_202311.boundary_concordance AS con ON pc.postcode = con.from_id
WHERE con.from_source = 'geoscape 202311'
AND con.from_bdy = 'postcode'
AND con.to_source = 'abs 2016'
AND con.to_bdy = 'lga'
Expand All @@ -167,9 +167,9 @@ FROM testing.nsw_covid_tests_20220503_lga AS lga

## Data Licenses

Incorporates or developed using G-NAF © [Geoscape Australia](https://geoscape.com.au/legalhttps://minus34.com/opendata/geoscape-202308/boundary_concordance.csv-copyright-and-disclaimer/) licensed by the Commonwealth of Australia under the [Open Geo-coded National Address File (G-NAF) End User Licence Agreement](https:/https://minus34.com/opendata/geoscape-202308/boundary_concordance.csv.gov.auhttps://minus34.com/opendata/geoscape-202308/boundary_concordance.csvset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/distribution/dist-dga-09f74802-08b1-4214-a6ea-3591b2753d30/details?q=).
Incorporates or developed using G-NAF © [Geoscape Australia](https://geoscape.com.au/legalhttps://minus34.com/opendata/geoscape-202311/boundary_concordance.csv-copyright-and-disclaimer/) licensed by the Commonwealth of Australia under the [Open Geo-coded National Address File (G-NAF) End User Licence Agreement](https:/https://minus34.com/opendata/geoscape-202311/boundary_concordance.csv.gov.auhttps://minus34.com/opendata/geoscape-202311/boundary_concordance.csvset/ds-dga-19432f89-dc3a-4ef3-b943-5326ef1dbecc/distribution/dist-dga-09f74802-08b1-4214-a6ea-3591b2753d30/details?q=).

Incorporates or developed using Administrative Boundaries © [Geoscape Australia](https://geoscape.com.au/legalhttps://minus34.com/opendata/geoscape-202308/boundary_concordance.csv-copyright-and-disclaimer/) licensed by the Commonwealth of Australia under [Creative Commons Attribution 4.0 International licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
Incorporates or developed using Administrative Boundaries © [Geoscape Australia](https://geoscape.com.au/legalhttps://minus34.com/opendata/geoscape-202311/boundary_concordance.csv-copyright-and-disclaimer/) licensed by the Commonwealth of Australia under [Creative Commons Attribution 4.0 International licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).

Based on [Australian Bureau of Statistics](https://www.abs.gov.au/websitedbs/d3310114.nsf/Home/Attributing+ABS+Material) data, licensed by the Commonwealth of Australia under [Creative Commons Attribution 4.0 International licence (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).

Expand Down
22 changes: 18 additions & 4 deletions create_concordance_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,18 @@ def main():
# get weighted scores as % concordance
score_results(pg_cur)

# # export results to csv
# export results to csv
export_to_csv(pg_cur, f'{settings.gnaf_schema}.{settings.output_table}',
settings.output_table + ".csv", True)
export_to_csv(pg_cur, f'{settings.gnaf_schema}.{settings.output_score_table}',
settings.output_score_table + ".csv", False)

# copy to GDA2020 schema
sql = geoscape.open_sql_file("data-prep/03_copy_to_gda2020_schema.sql")
pg_cur.execute(sql)

logger.info('\t - tables copied to GDA2020 schema')

# cleanup
pg_cur.close()
pg_conn.close()
Expand Down Expand Up @@ -180,6 +186,9 @@ def add_asgs_concordances(pg_cur):
start_time = datetime.now()
to_index = settings.asgs_concordance_list.index(to_bdy)

if to_bdy == "gccsa":
to_bdy = "gcc"

if to_index > from_index:
query = f"""insert into {settings.gnaf_schema}.{settings.output_table}
select '{source}' as from_source,
Expand All @@ -193,7 +202,7 @@ def add_asgs_concordances(pg_cur):
count(*) as address_count,
100.0 as address_percent
from census_2016_bdys.mb_2016_aust as mb
inner join gnaf_202308.address_principals as gnaf on gnaf.mb_2016_code::text = mb.mb_code16
inner join gnaf_202311.address_principals as gnaf on gnaf.mb_2016_code::text = mb.mb_code16
group by from_id,
from_name,
to_id,
Expand Down Expand Up @@ -224,7 +233,7 @@ def add_asgs_concordances(pg_cur):
start_time = datetime.now()
to_index = settings.asgs_concordance_list.index(to_bdy)

# fix for fild name change between censuses
# fix for field name change between censuses
if to_bdy == "gcc":
to_bdy = to_bdy.replace("gcc", "gccsa")

Expand All @@ -241,7 +250,7 @@ def add_asgs_concordances(pg_cur):
count(*) as address_count,
100.0 as address_percent
from census_2021_bdys_gda94.mb_2021_aust_gda94 as mb
inner join gnaf_202308.address_principals as gnaf
inner join gnaf_202311.address_principals as gnaf
on gnaf.mb_2021_code::text = mb.mb_code_2021
group by from_id,
from_name,
Expand Down Expand Up @@ -272,6 +281,11 @@ def get_field_names(bdy, source, to_from, sql):
if source == "abs 2016":
id_field = f"{table}.{bdy}_code16"
name_field = f"{table}.{bdy}_name16"

if bdy == "gccsa":
id_field = id_field.replace("gccsa_code16", "gcc_code16")
name_field = name_field.replace("gccsa_name16", "gcc_name16")

elif source == "abs 2021":
id_field = f"{table}.{bdy}_code_2021"
name_field = f"{table}.{bdy}_name_2021"
Expand Down
38 changes: 22 additions & 16 deletions data/boundary_concordance_score.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from_source,from_bdy,to_source,to_bdy,concordance_percent,error_percent
abs 2016,gcc,abs 2021,gccsa,100,
abs 2016,lga,abs 2016,gccsa,99,0.1
abs 2016,lga,abs 2016,ra,92,1.2
abs 2016,lga,abs 2016,sa3,73,4.8
abs 2016,lga,abs 2016,ste,100,0.2
abs 2016,poa,abs 2016,lga,93,1.4
Expand All @@ -15,13 +17,15 @@ abs 2016,sa2,abs 2016,poa,79,4.4
abs 2016,sa2,abs 2016,sa3,100,0.0
abs 2016,sa2,abs 2016,sa4,100,0.0
abs 2016,sa2,abs 2021,sa2,92,
abs 2016,sa2,geoscape 202308,postcode,78,
abs 2016,sa2,geoscape 202311,postcode,78,
abs 2016,sa3,abs 2016,gcc,100,0.0
abs 2016,sa3,abs 2016,lga,83,3.0
abs 2016,sa3,abs 2016,sa4,100,0.0
abs 2016,sa3,abs 2021,sa3,100,
abs 2016,sa4,abs 2016,gcc,100,0.0
abs 2016,sa4,abs 2021,sa4,100,
abs 2021,lga,abs 2021,gccsa,100,
abs 2021,lga,abs 2021,ra,95,
abs 2021,lga,abs 2021,sa3,72,
abs 2021,lga,abs 2021,state,100,
abs 2021,poa,abs 2021,lga,94,
Expand All @@ -35,22 +39,24 @@ abs 2021,sa2,abs 2021,lga,98,
abs 2021,sa2,abs 2021,poa,84,
abs 2021,sa2,abs 2021,sa3,100,
abs 2021,sa2,abs 2021,sa4,100,
abs 2021,sa2,geoscape 202308,postcode,84,
abs 2021,sa2,geoscape 202311,postcode,84,
abs 2021,sa3,abs 2021,gccsa,100,
abs 2021,sa3,abs 2021,lga,85,
abs 2021,sa3,abs 2021,sa4,100,
abs 2021,sa4,abs 2021,gccsa,100,
geoscape 202308,lga,abs 2016,lga,100,
geoscape 202308,lga,abs 2021,lga,100,
geoscape 202308,locality,abs 2016,lga,98,
geoscape 202308,locality,abs 2016,sa2,93,
geoscape 202308,locality,abs 2016,sa3,99,
geoscape 202308,locality,abs 2021,lga,98,
geoscape 202308,locality,abs 2021,sa2,90,
geoscape 202308,locality,abs 2021,sa3,99,
geoscape 202308,locality,geoscape 202308,lga,98,
geoscape 202308,postcode,abs 2016,lga,93,
geoscape 202308,postcode,abs 2016,sa3,92,
geoscape 202308,postcode,abs 2021,lga,94,
geoscape 202308,postcode,abs 2021,sa3,93,
geoscape 202308,postcode,geoscape 202308,lga,93,
geoscape 202311,lga,abs 2016,gccsa,99,
geoscape 202311,lga,abs 2016,lga,100,
geoscape 202311,lga,abs 2021,gccsa,100,
geoscape 202311,lga,abs 2021,lga,100,
geoscape 202311,locality,abs 2016,lga,98,
geoscape 202311,locality,abs 2016,sa2,93,
geoscape 202311,locality,abs 2016,sa3,99,
geoscape 202311,locality,abs 2021,lga,98,
geoscape 202311,locality,abs 2021,sa2,90,
geoscape 202311,locality,abs 2021,sa3,99,
geoscape 202311,locality,geoscape 202311,lga,98,
geoscape 202311,postcode,abs 2016,lga,93,
geoscape 202311,postcode,abs 2016,sa3,92,
geoscape 202311,postcode,abs 2021,lga,94,
geoscape 202311,postcode,abs 2021,sa3,93,
geoscape 202311,postcode,geoscape 202311,lga,93,
16 changes: 8 additions & 8 deletions postgres-scripts/00_import_concordance_file.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

-- create table
drop table if exists gnaf_202308.boundary_concordance;
create table gnaf_202308.boundary_concordance
drop table if exists gnaf_202311.boundary_concordance;
create table gnaf_202311.boundary_concordance
(
from_source text not null,
from_bdy text not null,
Expand All @@ -14,21 +14,21 @@ create table gnaf_202308.boundary_concordance
address_count integer,
address_percent numeric(4, 1)
);
alter table gnaf_202308.boundary_concordance owner to postgres;
alter table gnaf_202311.boundary_concordance owner to postgres;

-- import CSV file -- 586,977 rows affected in 1 s 365 ms
COPY gnaf_202308.boundary_concordance
COPY gnaf_202311.boundary_concordance
FROM '/Users/minus34/Downloads/boundary_concordance.csv'
WITH (HEADER, DELIMITER ',', FORMAT CSV);

analyse gnaf_202308.boundary_concordance;
analyse gnaf_202311.boundary_concordance;

-- add primary key (faster if done after import) -- completed in 8 s 496 ms
alter table gnaf_202308.boundary_concordance add constraint boundary_concordance_pkey
alter table gnaf_202311.boundary_concordance add constraint boundary_concordance_pkey
primary key (from_source, from_bdy, from_id, to_source, to_bdy, to_id);

-- add index on required fields for converting data
create index boundary_concordance_combo_idx on gnaf_202308.boundary_concordance
create index boundary_concordance_combo_idx on gnaf_202311.boundary_concordance
using btree (from_source, from_bdy, to_source, to_bdy);

alter table gnaf_202308.boundary_concordance cluster on boundary_concordance_combo_idx;
alter table gnaf_202311.boundary_concordance cluster on boundary_concordance_combo_idx;
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,23 @@ select distinct temp_mb.mb_code_2021, bdy.poa_code_2021 as poa_code_2021, bdy.po
inner join census_2021_bdys_gda94.poa_2021_aust_gda94 as bdy on st_intersects(temp_mb.geom, bdy.geom);
analyse temp_poa_mb;

-- drop table if exists temp_ra_mb;
-- create temporary table temp_ra_mb as
-- select distinct temp_mb.mb_code_2021, bdy.ra_code_2021 as ra_code_2021, bdy.ra_name_2021 as ra_name_2021 from temp_mb
-- inner join census_2021_bdys_gda94.ra_2021_aust_gda94 as bdy on st_intersects(temp_mb.geom, bdy.geom);
-- analyse temp_ra_mb;
drop table if exists temp_ra_mb;
create temporary table temp_ra_mb as
select distinct temp_mb.mb_code_2021, bdy.ra_code_2021 as ra_code_2021, bdy.ra_name_2021 as ra_name_2021 from temp_mb
inner join census_2021_bdys_gda94.ra_2021_aust_gda94 as bdy on st_intersects(temp_mb.geom, bdy.geom);
analyse temp_ra_mb;

drop table if exists temp_sed_mb;
create temporary table temp_sed_mb as
select distinct temp_mb.mb_code_2021, bdy.sed_code_2021 as sed_code_2021, bdy.sed_name_2021 as sed_name_2021 from temp_mb
inner join census_2021_bdys_gda94.sed_2021_aust_gda94 as bdy on st_intersects(temp_mb.geom, bdy.geom);
analyse temp_sed_mb;

-- drop table if exists temp_ucl_mb;
-- create temporary table temp_ucl_mb as
-- select distinct temp_mb.mb_code_2021, bdy.ucl_code_2021 as ucl_code_2021, bdy.ucl_name_2021 as ucl_name_2021 from temp_mb
-- inner join census_2021_bdys_gda94.ucl_2021_aust_gda94 as bdy on st_intersects(temp_mb.geom, bdy.geom);
-- analyse temp_ucl_mb;
drop table if exists temp_ucl_mb;
create temporary table temp_ucl_mb as
select distinct temp_mb.mb_code_2021, bdy.ucl_code_2021 as ucl_code_2021, bdy.ucl_name_2021 as ucl_name_2021 from temp_mb
inner join census_2021_bdys_gda94.ucl_2021_aust_gda94 as bdy on st_intersects(temp_mb.geom, bdy.geom);
analyse temp_ucl_mb;

drop table temp_mb;

Expand All @@ -81,20 +81,20 @@ with abs as (
lga_name_2021,
poa_code_2021,
poa_name_2021,
-- ra_code_2021,
-- ra_name_2021,
ra_code_2021,
ra_name_2021,
sed_code_2021,
sed_name_2021,
-- ucl_code_2021,
-- ucl_name_2021,
ucl_code_2021,
ucl_name_2021,
mb.state_code_2021,
mb.state_name_2021
from census_2021_bdys_gda94.mb_2021_aust_gda94 as mb
inner join temp_ced_mb as ced on ced.mb_code_2021 = mb.mb_code_2021
inner join temp_lga_mb as lga on lga.mb_code_2021 = mb.mb_code_2021
inner join temp_poa_mb as poa on poa.mb_code_2021 = mb.mb_code_2021
-- inner join temp_ra_mb as ra on ra.mb_code_2021 = mb.mb_code_2021
-- inner join temp_ucl_mb as ucl on ucl.mb_code_2021 = mb.mb_code_2021
inner join temp_ra_mb as ra on ra.mb_code_2021 = mb.mb_code_2021
inner join temp_ucl_mb as ucl on ucl.mb_code_2021 = mb.mb_code_2021
left outer join temp_sed_mb as sed on sed.mb_code_2021 = mb.mb_code_2021
)
select gid,
Expand Down
Loading

0 comments on commit 83d42e3

Please sign in to comment.