Logo
latest

Contents:

  • Getting Started
    • Installation
    • First Steps
  • Notebooks
    • 0. The data exploration
      • Exploring the Land Sealing Application Dataset
        • Getting Started
        • Imports
        • Introducing the Dataset Tables
        • Missing Data Analysis at the Land Parcel Level
        • Flooding Risk Analysis at the Land Parcel Level
        • Extract content from regional plans
    • 1. The execute pipeline
      • Execution Pipeline
        • Step 1: Scrape building plans from NRW geoportal
        • Step 2: Extract text from PDFs
        • Step 3: Enrich extracted building plan texts with information about corresponding land parcels
        • Step 4: Perform an exact keyword search in extracted texts
        • Step 5: Perform an fuzzy keyword search in extracted texts
        • Step 6: Extract content from regional plans
        • Step 7: Perform an exact keyword search in extracted texts of regional plans
        • Conclusion
    • 2. The sub-processes of the pipeline
      • Downloading and processing the data
        • Get started: PDF downloader for land parcels of Bebauungspläne
        • Illustration of usage of pdf_scraper.py
        • How to: create document_texts table
        • Check results
        • Write to csv and json
      • Extracting the keywords
        • Exact keyword search for paragraphs from BauNVO & BauGB
        • Apply function
        • Transform to Boolean
        • Write results to csv
        • Contextual fuzzy keyword search
        • Write results to csv
        • Contextual fuzzy keyword search for flooding-related keywords
        • Write results to csv and json
        • Agentic knowledge extractor
      • Parsing the regional plans
        • Regional plans
    • 3. Beyond the pipeline
      • Create keyword table
        • What keyword are you interested in? Choose your keyword here:
        • All other variables may remain the same, but you can also change them as you wish.
        • Some explanation:
        • Here you can perform the search (no changes needed, just run this cell):
        • Check out the results
        • Save to other keywords
      • Keyword Negation
        • Step 1: Generate content
        • Step 2: Exact keyword search
        • Step 3: Negate keyword search
        • Step 4: Compare the results
        • Step 5: Fuzzy Keyword Search
  • Shiny app
    • Usage
      • Keyword Exploration
      • Structural Level of Use
      • Regional Plans
      • PDF Segmentation
      • Keyword Search
  • API
    • Data Pipeline
      • Download NRW building plan PDFs
        • parse_geojson()
        • run_pdf_downloader()
        • merge_rp_bp()
        • export_merged_bp_rp()
      • Extract Text from PDFs
        • pdf_parser_from_folder()
        • pdf_parser_from_path()
      • Parsing the regional plans
        • RPlanContentExtractor
        • parse_rplan_directory()
        • parse_result_df()
    • Feature Extraction
      • Document Categorization
        • run_bp_keyword_detector()
      • Textual Feature Extraction
        • enrich_extracts_with_metadata()
        • Fuzzy Keyword Search
        • Exact Keyword Search
        • Agent
        • Regional Plan Keyword Search
DSSGx Munich
  • API
  • Edit on GitHub

API

This the API documentation for the Land Sealing Dataset creation.

  • Data Pipeline
    • Download NRW building plan PDFs
      • parse_geojson()
      • run_pdf_downloader()
      • merge_rp_bp()
      • export_merged_bp_rp()
    • Extract Text from PDFs
      • pdf_parser_from_folder()
      • pdf_parser_from_path()
    • Parsing the regional plans
      • RPlanContentExtractor
        • RPlanContentExtractor.extract_chapter_names()
        • RPlanContentExtractor.find_chapter_name_for_indices()
        • RPlanContentExtractor.parse_into_sections()
        • RPlanContentExtractor.parse_rplan_from_textfile()
        • RPlanContentExtractor.preprocess_rplan_content()
        • RPlanContentExtractor.read_text()
      • parse_rplan_directory()
      • parse_result_df()
  • Feature Extraction
    • Document Categorization
      • run_bp_keyword_detector()
    • Textual Feature Extraction
      • enrich_extracts_with_metadata()
      • Fuzzy Keyword Search
        • find_best_matches()
        • search_df_for_best_matches()
        • search_best_matches_dict()
        • search_df_for_best_matches_keyword_dict()
      • Exact Keyword Search
        • search_text_for_keywords()
        • search_df_for_keywords()
      • Agent
        • extract_knowledge_from_df()
      • Regional Plan Keyword Search
        • rplan_exact_keyword_search()
        • rplan_fuzzy_keyword_search()
        • negate_keyword_search()
        • plot_keyword_search_results()
Previous Next

© Copyright 2023, DSSGx Munich Fellows. Revision d8b6b38c.

Built with Sphinx using a theme provided by Read the Docs.