Logo
latest

Contents:

  • Getting Started
    • Installation
    • First Steps
  • Notebooks
    • 0. The data exploration
      • Exploring the Land Sealing Application Dataset
        • Getting Started
        • Imports
        • Introducing the Dataset Tables
        • Missing Data Analysis at the Land Parcel Level
        • Flooding Risk Analysis at the Land Parcel Level
        • Extract content from regional plans
    • 1. The execute pipeline
      • Execution Pipeline
        • Step 1: Scrape building plans from NRW geoportal
        • Step 2: Extract text from PDFs
        • Step 3: Enrich extracted building plan texts with information about corresponding land parcels
        • Step 4: Perform an exact keyword search in extracted texts
        • Step 5: Perform an fuzzy keyword search in extracted texts
        • Step 6: Extract content from regional plans
        • Step 7: Perform an exact keyword search in extracted texts of regional plans
        • Conclusion
    • 2. The sub-processes of the pipeline
      • Downloading and processing the data
        • Get started: PDF downloader for land parcels of Bebauungspläne
        • Illustration of usage of pdf_scraper.py
        • How to: create document_texts table
        • Check results
        • Write to csv and json
      • Extracting the keywords
        • Exact keyword search for paragraphs from BauNVO & BauGB
        • Apply function
        • Transform to Boolean
        • Write results to csv
        • Contextual fuzzy keyword search
        • Write results to csv
        • Contextual fuzzy keyword search for flooding-related keywords
        • Write results to csv and json
        • Agentic knowledge extractor
      • Parsing the regional plans
        • Regional plans
    • 3. Beyond the pipeline
      • Create keyword table
        • What keyword are you interested in? Choose your keyword here:
        • All other variables may remain the same, but you can also change them as you wish.
        • Some explanation:
        • Here you can perform the search (no changes needed, just run this cell):
        • Check out the results
        • Save to other keywords
      • Keyword Negation
        • Step 1: Generate content
        • Step 2: Exact keyword search
        • Step 3: Negate keyword search
        • Step 4: Compare the results
        • Step 5: Fuzzy Keyword Search
  • Shiny app
    • Usage
      • Keyword Exploration
      • Structural Level of Use
      • Regional Plans
      • PDF Segmentation
      • Keyword Search
  • API
    • Data Pipeline
      • Download NRW building plan PDFs
        • parse_geojson()
        • run_pdf_downloader()
        • merge_rp_bp()
        • export_merged_bp_rp()
      • Extract Text from PDFs
        • pdf_parser_from_folder()
        • pdf_parser_from_path()
      • Parsing the regional plans
        • RPlanContentExtractor
        • parse_rplan_directory()
        • parse_result_df()
    • Feature Extraction
      • Document Categorization
        • run_bp_keyword_detector()
      • Textual Feature Extraction
        • enrich_extracts_with_metadata()
        • Fuzzy Keyword Search
        • Exact Keyword Search
        • Agent
        • Regional Plan Keyword Search
DSSGx Munich
  • Search


© Copyright 2023, DSSGx Munich Fellows. Revision d8b6b38c.

Built with Sphinx using a theme provided by Read the Docs.