import os
from data_pipeline.nrw_pdf_downloader.geojson_parser import parse_geojson
from data_pipeline.nrw_pdf_downloader.nrw_pdf_scraper import run_pdf_downloader
from data_pipeline.match_RPlan_BPlan.matching_plans import merge_rp_bp
from data_pipeline.match_RPlan_BPlan.matching_plans import export_merged_bp_rp
Get started: PDF downloader for land parcels of Bebauungspläne
The first step necessary to run the BP downloader is to have a database that contains links to different building plans in PDF format. The input for this was the information provided in the NRW geoportal. Clicking on the download button there, you should be able to select all the areas of NRW, select to download information from Bebauungsplane and get the information in GeoPackage format (gpkg extension).
This extension can be loaded into any GIS interface, and exported into a geojson format. This is the format that the functions finally take as input.
parse_geojson:parses geojson file with download links to different building plans. It iterates over all rows and checks if the url matches the pattern of a osp-plan.de link without a list format, meaning than the scan url is not directly to a pdf, but the pdf is contained somewhere in the html of the page. If the url matches the pattern, the html of the page is downloaded and parsed with beautiful soup. All links that start with https://www.o-sp.de/download/ are extracted and written to a dataframe.to parse only a sample of the rows, set a sample size defined by sample_n.
\(~\)
run_pdf_downloader:goes through a GDF with PDF download links and downloads all the files. Links that return error are saved in a csv called error_links in the defined output folder.to parse only a sample of the rows, set a sample size defined by sample_n.
Necessary file path specifications:
INPUT_BP_FILE_PATH = os.path.join("..", "data","nrw", "bplan", "raw", "links", "NRW_BP.geojson")
OUTPUT_PDF_FOLDER_PATH = os.path.join("..", "data", "nrw", "bplan", "raw", "pdfs")
OUTPUT_CSV_PATH = os.path.join("..", "data", "nrw", "bplan", "raw", "links", "NRW_BP_parsed_links.csv")
OUTPUT_LAND_PARCELS_PATH = os.path.join("..", "data", "nrw", "bplan", "raw", "links", "land_parcels.geojson")
INPUT_REGIONS_FILE_PATH = os.path.join( "..", "data","nrw", "rplan", "raw", "geo", "regions_map.geojson")
Now, let’s start the process:
df = parse_geojson(file_path=INPUT_BP_FILE_PATH,
sample_n = 5,
output_path = OUTPUT_CSV_PATH)
100%|██████████| 5/5 [00:00<00:00, 8.54it/s]
Run the downloader and save the pdfs in output folder:
run_pdf_downloader(input_df=df,
output_folder=OUTPUT_PDF_FOLDER_PATH,
sample_n=3)
0%| | 0/3 [00:00<?, ?it/s]100%|██████████| 3/3 [00:01<00:00, 1.82it/s]
Enrich bplan info to create land_parcels.csv
To generate the file land_parcels.csv we need the columns from the
original NRW_BP but we also need to add the columns that refer to the
regional plans that match each parcel. For that, we will use the
function merge_rp_bp stored in the module
match_rplan_bplan.matching_plans. It takes as input the same
INPUT_BP_FILE_PATH we were working with, but also the file that
contains geodata of the regions (provided by GreenDIA).
land_parcels = merge_rp_bp(path_bp_geo=INPUT_BP_FILE_PATH,
path_rp_geo=INPUT_REGIONS_FILE_PATH)
The result is a dataframe that contains all the original columns from the BP dataset and the columns from the regions. The relevant columns in this dataset are:
objectid: unique numeric ID of the building plan.
geometry: contains the spatial information of the polygons.
kommune: name of the municipality.
name: name of the building plan.
datum: date of the building plan.
regional_plan_id: unique numeric ID of the regional plan.
regional_plan_name: nominal name of the regional plan.
land_parcels.head()
| objectid | geometry | planid | levelplan | name | kommune | gkz | nr | besch | aend | ... | aendnr | begruendurl | umweltberurl | erklaerungurl | shape_Length | shape_Area | regional_plan_id | regional_plan_name | ART | LND | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 84060 | POLYGON ((7.28543 50.82280, 7.28728 50.82179, ... | DE_05382060_Siegburg_BP93/1 | infra-local | Im Klausgarten, Braschosser Straße, Am Kreuztor | Siegburg | 05382060 | 93/1 | None | None | ... | None | None | None | None | 868.647801 | 3.196032e+04 | 5022 | Region Bonn/Rhein-Sieg | Teilabschnitt | 5 |
| 126 | 559438 | POLYGON ((7.39385 50.90281, 7.39416 50.90240, ... | DE_05382036_02_32 | infra-local | 32. Änderung des Bebauungsplanes Nr. 2 „Much-K... | Much | 05382036 | 0 | None | 32. Änderung | ... | 32 | https://www.much.de/zukunft/bauleitplanungen | https://www.much.de/zukunft/bauleitplanungen | None | 473.229327 | 4.467916e+03 | 5022 | Region Bonn/Rhein-Sieg | Teilabschnitt | 5 |
| 2722 | 2257588 | POLYGON ((7.12896 50.77292, 7.12899 50.77292, ... | DE_05314000_00 | local | Flächennutzungsplan der Bundesstadt Bonn | Bonn | 05314000 | 00 | ... | None | None | None | None | 69372.039264 | 1.410146e+08 | 5022 | Region Bonn/Rhein-Sieg | Teilabschnitt | 5 | ||
| 3436 | 2367967 | MULTIPOLYGON (((7.23255 50.91855, 7.23242 50.9... | DE_05378028_9aenderungI_Ur | local | 9. Änderung §34_Urschrift | Rösrath | 05378028 | 9aenderungI_Ur | Breide und Durbusch | Urschrift | ... | None | http://www.roesrath.de/34-9.-aenderung-breide-... | 739.659941 | 7.348491e+03 | 5022 | Region Bonn/Rhein-Sieg | Teilabschnitt | 5 | ||
| 3444 | 2367975 | MULTIPOLYGON (((7.19091 50.88535, 7.19112 50.8... | DE_05378028_1aenderungundUrschriftI_Ur | local | 1. Änderung und Urschrift §34_Urschrift | Rösrath | 05378028 | 1aenderungundUrschriftI_Ur | Urschrift | ... | None | http://www.roesrath.de/34-urfassung-und-1.-aen... | 56630.267941 | 6.082747e+06 | 5022 | Region Bonn/Rhein-Sieg | Teilabschnitt | 5 |
5 rows × 30 columns
File can be exported with the function export_merged_BP_RP() (runs the same as merge_RP_BP, but have to add output_path parameter) in the module, or by using to_file from the geopandas module. We will do a run of the export function.
export_merged_bp_rp(output_path=OUTPUT_LAND_PARCELS_PATH,
path_bp_geo=INPUT_BP_FILE_PATH,
path_rp_geo=INPUT_REGIONS_FILE_PATH)