Skip to content

Commit

Permalink
Merge pull request #387 from rustprooflabs/docs-data-files
Browse files Browse the repository at this point in the history
Improve Docs with `--pgosm-date` details and behavior
  • Loading branch information
rustprooflabs authored May 19, 2024
2 parents c946501 + 862d549 commit 6727773
Show file tree
Hide file tree
Showing 5 changed files with 113 additions and 42 deletions.
4 changes: 2 additions & 2 deletions docs/book.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ git-repository-url = "https://github.com/rustprooflabs/pgosm-flex"
git-repository-icon = "fa-github"
edit-url-template = "https://github.com/rustprooflabs/pgosm-flex/edit/main/docs/{path}"

[preprocessor.variables.variables]
pgosm_flex_version = "0.10.0"
#[preprocessor.variables.variables]
#pgosm_flex_version = "0.10.0"
1 change: 1 addition & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- [Layersets](./layersets.md)
- [Indexes](./custom-indexes.md)
- [Configure Postgres](./configure-postgres.md)
- [Data Files](./data-files.md)
- [Query examples](./query.md)
- [Routing](./routing.md)
- [Processing Time](./performance.md)
Expand Down
42 changes: 2 additions & 40 deletions docs/src/common-customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ exactly what `--region` and `--subregion` options to choose.
This can be a bit confusing as larger subregions can contain smaller subregions.
Feel free to [start a discussion](https://github.com/rustprooflabs/pgosm-flex/discussions/new/choose) if you need help figuring this part out!

> See the [Data Files](data-files.md) section for steps to change this behavior.
If you want to load the entire United States subregion, instead of
the District of Columbia subregion, the `docker exec` command is changed to the
following.
Expand All @@ -48,46 +50,6 @@ docker exec -it \
--region=north-america
```

## Specific input file

The automatic Geofabrik download can be overridden by providing PgOSM Flex
with the path to a valid `.osm.pbf` file using `--input-file`.
This option overrides the default file handling, archiving, and MD5
checksum validation. With `--input-file` you can use a custom `osm.pbf`
you created, or use it to simply remove the need for an internet connection
from the instance running the processing.

> Note: The `--region` option is always required, the `--subregion` option can be used with `--input-file` to put the information in the `subregion` column of `osm.pgosm_flex`.

### Small area / custom extract

Some of the smallest subregions provided by Geofabrik are quite large compared
to the area of interest for a project.
The `osmium` tool makes it quick and easy to
[extract a bounding box](https://docs.osmcode.org/osmium/latest/osmium-extract.html).
The following example extracts an area roughly around Denver, Colorado.
It takes about 3 seconds to extract the 3.2 MB `denver.osm.pbf` output from
the 239 MB input.

```bash
osmium extract --bbox=-105.0193,39.7663,-104.9687,39.7323 \
-o denver.osm.pbf \
colorado-2023-04-18.osm.pbf
```

The PgOSM Flex procesing time for the smaller Denver region takes less than 20 seconds on a
typical laptop, versus 11 minutes for all of Colorado.

```bash
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=custom \
--subregion=denver \
--input-file=denver.osm.pbf \
--layerset=everything
```

## Customize load to PostGIS

Expand Down
1 change: 1 addition & 0 deletions docs/src/customizations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
- [Layersets](./layersets.md)
- [Layers](./layers.md)
- [Configure Postgres](./configure-postgres.md)
- [Data Files](./data-files.md)
107 changes: 107 additions & 0 deletions docs/src/data-files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Data Files

PgOSM Fle will automatically manage downloads of the appropriate data and `.md5`
files from the [Geofabrik download server](https://download.geofabrik.de/).
When using the default behavior, PgOSM Flex will automatically start downloading
the two necessary files:

* `<region/subregion>-latest.osm.pbf`
* `<region/subregion>-latest.osm.pbf.md5`

The data path on the host machine is defined via the `docker run` command. This
documentation always uses `~/pgosm-data` per the [quick start](quick-start.md).

```bash
docker run --name pgosm -d --rm \
-v ~/pgosm-data:/app/output \
...
```

> See the [Selecting Region and Sub-region](common-customization.md#selecting-region-and-subregion)
> section for more about the default behavior.


There are two methods to override this default behavior: specify `--pgosm-date`
or use `--input-file`.
If you have manually saved files in the path used by PgOSM Flex using `-latest`
in the filename, they **will be overwritten** if you are not using one of the
methods described below.


## Specific date with `--pgosm-date`

Use `--pgosm-date` to specify a specific date for the data. The date specified
must be in `yyyy-mm-dd` format.
This mode requires you have a valid `.pbf` and matching `.md5` file in order to
function. The following example shows the `docker exec` command along with
a `--pgosm-date` defined.

```bash
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=north-america/us \
--subregion=district-of-columbia \
--pgosm-date=2024-05-14
```

The output from running should confirm it finds and uses the file with the
specified date.
Remember, the paths reported from Docker (`/app/output/`) report the
container-internal path, not your local path on the host.

```bash
INFO:pgosm-flex:geofabrik:PBF File exists /app/output/district-of-columbia-2024-05-14.osm.pbf
INFO:pgosm-flex:geofabrik:PBF & MD5 files exist. Download not needed
INFO:pgosm-flex:geofabrik:Copying Archived files
INFO:pgosm-flex:pgosm_flex:Running osm2pgsql
```

If a date is specified without matching file(s) it will raise an error and exit.

```bash
ERROR:pgosm-flex:geofabrik:Missing PBF file for 2024-05-15. Cannot proceed.
```


## Specific input file with `--input-file`

The automatic Geofabrik download can be overridden by providing PgOSM Flex
with the path to a valid `.osm.pbf` file using `--input-file`.
This option overrides the default file handling, archiving, and MD5
checksum validation. With `--input-file` you can use a custom `osm.pbf`
you created, or use it to simply remove the need for an internet connection
from the instance running the processing.

> Note: The `--region` option is always required, the `--subregion` option can be used with `--input-file` to put the information in the `subregion` column of `osm.pgosm_flex`.

### Small area / custom extract

Some of the smallest subregions provided by Geofabrik are quite large compared
to the area of interest for a project.
The `osmium` tool makes it quick and easy to
[extract a bounding box](https://docs.osmcode.org/osmium/latest/osmium-extract.html).
The following example extracts an area roughly around Denver, Colorado.
It takes about 3 seconds to extract the 3.2 MB `denver.osm.pbf` output from
the 239 MB input.

```bash
osmium extract --bbox=-105.0193,39.7663,-104.9687,39.7323 \
-o denver.osm.pbf \
colorado-2023-04-18.osm.pbf
```

The PgOSM Flex processing time for the smaller Denver region takes less than 20 seconds on a
typical laptop, versus 11 minutes for all of Colorado.

```bash
docker exec -it \
pgosm python3 docker/pgosm_flex.py \
--ram=8 \
--region=custom \
--subregion=denver \
--input-file=denver.osm.pbf \
--layerset=everything
```

0 comments on commit 6727773

Please sign in to comment.