😫 The Pain Point
You have 50 contract PDFs and need to find all mentions of “penalty clause” or “termination fee”. Opening each one and using Ctrl+F takes hours.
🚀 Agentic Solution
A Batch PDF Searcher that scans all documents and reports exact locations.
Key Features:
- Multi-PDF Search: Scan entire folders at once.
- Context Extraction: Shows surrounding text for each match.
- Export Results: CSV or Excel report with file, page, and snippet.
⚔️ Phase 1: Commander (Quick Fix)
For quick searching.
Prompt:
“I have a folder
contractswith PDF files. Write a Python script using pdfplumber to:
- Search: Find all occurrences of keywords ‘penalty’ and ‘termination’.
- Output: For each match, print file name, page number, and surrounding context (50 chars).
- Export: Save results to
search_results.csv.Support regex patterns with
--regexflag. Handle unreadable PDFs (skip with warning).”
Result: Instant location of all relevant clauses.
🏗️ Phase 2: Architect (Permanent Tool)
For Legal/Compliance Teams.
Engineering Prompt:
**Role:** Python Tool Developer
**Task:** Create a "PDF Search Tool".
**Requirements:**
1. **GUI:**
* Select folder.
* Keyword input (comma-separated).
* Checkbox: Case sensitive, Regex mode.
* Results table with file, page, context.
* Export button (CSV/Excel).
* Progress bar.
2. **Logic:**
* Use pdfplumber for accurate text extraction.
* Highlight matches in context.
* Handle scanned PDFs (warning: needs OCR).
3. **Deliverables:**
* `pdf_search.py`
* `run.bat`, `run.sh`
* `requirements.txt`
🧠 Prompt Decoding
- pdfplumber vs PyPDF2: pdfplumber is better for text extraction with layout preservation.
🛠️ Instructions
- Install:
pip install pdfplumber - Copy Prompt → Run.