dssg_banner
import os

from data_pipeline.nrw_pdf_downloader.geojson_parser import parse_geojson
from data_pipeline.nrw_pdf_downloader.nrw_pdf_scraper import run_pdf_downloader
from data_pipeline.match_RPlan_BPlan.matching_plans import merge_rp_bp
from data_pipeline.match_RPlan_BPlan.matching_plans import export_merged_bp_rp

Get started: PDF downloader for land parcels of Bebauungspläne

The first step necessary to run the BP downloader is to have a database that contains links to different building plans in PDF format. The input for this was the information provided in the NRW geoportal. Clicking on the download button there, you should be able to select all the areas of NRW, select to download information from Bebauungsplane and get the information in GeoPackage format (gpkg extension).

This extension can be loaded into any GIS interface, and exported into a geojson format. This is the format that the functions finally take as input.

  • parse_geojson: parses geojson file with download links to different building plans. It iterates over all rows and checks if the url matches the pattern of a osp-plan.de link without a list format, meaning than the scan url is not directly to a pdf, but the pdf is contained somewhere in the html of the page. If the url matches the pattern, the html of the page is downloaded and parsed with beautiful soup. All links that start with https://www.o-sp.de/download/ are extracted and written to a dataframe.

    • to parse only a sample of the rows, set a sample size defined by sample_n.

    \(~\)

  • run_pdf_downloader: goes through a GDF with PDF download links and downloads all the files. Links that return error are saved in a csv called error_links in the defined output folder.

    • to parse only a sample of the rows, set a sample size defined by sample_n.

Necessary file path specifications:

INPUT_BP_FILE_PATH = os.path.join("..", "data","nrw", "bplan", "raw", "links", "NRW_BP.geojson")
OUTPUT_PDF_FOLDER_PATH = os.path.join("..", "data", "nrw", "bplan", "raw", "pdfs")
OUTPUT_CSV_PATH = os.path.join("..", "data", "nrw", "bplan", "raw", "links", "NRW_BP_parsed_links.csv")
OUTPUT_LAND_PARCELS_PATH = os.path.join("..", "data", "nrw", "bplan", "raw", "links", "land_parcels.geojson")
INPUT_REGIONS_FILE_PATH = os.path.join( "..", "data","nrw", "rplan", "raw", "geo", "regions_map.geojson")

Now, let’s start the process:

df = parse_geojson(file_path=INPUT_BP_FILE_PATH,
                   sample_n = 5,
                   output_path = OUTPUT_CSV_PATH)
100%|██████████| 5/5 [00:00<00:00,  8.54it/s]

Run the downloader and save the pdfs in output folder:

run_pdf_downloader(input_df=df,
                   output_folder=OUTPUT_PDF_FOLDER_PATH,
                   sample_n=3)
0%|          | 0/3 [00:00<?, ?it/s]100%|██████████| 3/3 [00:01<00:00,  1.82it/s]

Enrich bplan info to create land_parcels.csv

To generate the file land_parcels.csv we need the columns from the original NRW_BP but we also need to add the columns that refer to the regional plans that match each parcel. For that, we will use the function merge_rp_bp stored in the module match_rplan_bplan.matching_plans. It takes as input the same INPUT_BP_FILE_PATH we were working with, but also the file that contains geodata of the regions (provided by GreenDIA).

land_parcels = merge_rp_bp(path_bp_geo=INPUT_BP_FILE_PATH,
                           path_rp_geo=INPUT_REGIONS_FILE_PATH)

The result is a dataframe that contains all the original columns from the BP dataset and the columns from the regions. The relevant columns in this dataset are:

  • objectid: unique numeric ID of the building plan.

  • geometry: contains the spatial information of the polygons.

  • kommune: name of the municipality.

  • name: name of the building plan.

  • datum: date of the building plan.

  • regional_plan_id: unique numeric ID of the regional plan.

  • regional_plan_name: nominal name of the regional plan.

land_parcels.head()
objectid geometry planid levelplan name kommune gkz nr besch aend ... aendnr begruendurl umweltberurl erklaerungurl shape_Length shape_Area regional_plan_id regional_plan_name ART LND
0 84060 POLYGON ((7.28543 50.82280, 7.28728 50.82179, ... DE_05382060_Siegburg_BP93/1 infra-local Im Klausgarten, Braschosser Straße, Am Kreuztor Siegburg 05382060 93/1 None None ... None None None None 868.647801 3.196032e+04 5022 Region Bonn/Rhein-Sieg Teilabschnitt 5
126 559438 POLYGON ((7.39385 50.90281, 7.39416 50.90240, ... DE_05382036_02_32 infra-local 32. Änderung des Bebauungsplanes Nr. 2 „Much-K... Much 05382036 0 None 32. Änderung ... 32 https://www.much.de/zukunft/bauleitplanungen https://www.much.de/zukunft/bauleitplanungen None 473.229327 4.467916e+03 5022 Region Bonn/Rhein-Sieg Teilabschnitt 5
2722 2257588 POLYGON ((7.12896 50.77292, 7.12899 50.77292, ... DE_05314000_00 local Flächennutzungsplan der Bundesstadt Bonn Bonn 05314000 00 ... None None None None 69372.039264 1.410146e+08 5022 Region Bonn/Rhein-Sieg Teilabschnitt 5
3436 2367967 MULTIPOLYGON (((7.23255 50.91855, 7.23242 50.9... DE_05378028_9aenderungI_Ur local 9. Änderung §34_Urschrift Rösrath 05378028 9aenderungI_Ur Breide und Durbusch Urschrift ... None http://www.roesrath.de/34-9.-aenderung-breide-... 739.659941 7.348491e+03 5022 Region Bonn/Rhein-Sieg Teilabschnitt 5
3444 2367975 MULTIPOLYGON (((7.19091 50.88535, 7.19112 50.8... DE_05378028_1aenderungundUrschriftI_Ur local 1. Änderung und Urschrift §34_Urschrift Rösrath 05378028 1aenderungundUrschriftI_Ur Urschrift ... None http://www.roesrath.de/34-urfassung-und-1.-aen... 56630.267941 6.082747e+06 5022 Region Bonn/Rhein-Sieg Teilabschnitt 5

5 rows × 30 columns

File can be exported with the function export_merged_BP_RP() (runs the same as merge_RP_BP, but have to add output_path parameter) in the module, or by using to_file from the geopandas module. We will do a run of the export function.

export_merged_bp_rp(output_path=OUTPUT_LAND_PARCELS_PATH,
                    path_bp_geo=INPUT_BP_FILE_PATH,
                    path_rp_geo=INPUT_REGIONS_FILE_PATH)