Automated PDF parsing and assembly with a Streamlit UI. Upload client PDFs, auto-extract fields, review/edit, and generate a complete Bradley Abstract with a filled cover page plus merged documents.
Use the provided script:
bash setup_and_run.sh
This creates/activates a virtualenv (prefers .venv) and installs dependencies from requirements.txt.
System OCR deps (for scanned PDFs) are listed in packages.txt and should be installed already in this devcontainer:
If running elsewhere, install them via your OS package manager.
./.venv/bin/python -m streamlit run streamlit_app.py --server.headless true --server.port 8501
Then open the URL shown in the terminal (typically http://localhost:8501).
output/.src/parser.py: PDF text extraction (PyPDF2). Falls back to OCR (pytesseract + pdf2image) when text quality is low.src/field_extractor.py: Regex-based field extraction.src/cover_page_generator.py: Generates the Bradley Abstract cover using PyMuPDF (fitz) and ReportLab.
BradleyAbstractCoverPage.generate_cover_page(data, output_path) is used by the UI.src/pdf_assembler.py: Merges cover + (optional) form + original docs into a single PDF via PyPDF2.templates/bradley_abstract_cover.pdf.A local MCP server exposing Git tools is included for MCP-capable clients.
tools/git-mcp-server/git_status, git_log, git_diff, git_commit, git_pushtesseract-ocr and poppler-utils are installed on the system.bash setup_and_run.sh.templates/bradley_abstract_cover.pdf exists.server.enableXsrfProtection in Streamlit config (not recommended unless you know the implications).This repository’s code is provided as-is by the owner. See repository settings for license details.