📄

Extract Images from Docs

Extract all embedded images from Word, PowerPoint, or PDF documents.

Document ⭐ Beginner ⏱️ 3 minutes

😫 The Pain Point

You received a Word document with 50 embedded images. You need those images as separate files for your website. Copy-paste from the document is manual and loses quality.

🚀 Agentic Solution

An Image Extractor that pulls all embedded media from documents.

Key Features:

  • Multiple Formats: Word (DOCX), PowerPoint (PPTX), PDF.
  • Original Quality: Extracts at embedded resolution.
  • Batch Processing: Process folder of documents.

⚔️ Phase 1: Commander (Quick Fix)

For quick extraction.

Prompt:

“I have a Word document report.docx with embedded images. Write a Python script to:

  1. Extract: All images from the document.
  2. Naming: Save as report_img_001.png, report_img_002.jpg, etc.
  3. Output: Save to extracted_images/ folder.

Print count of extracted images. Handle documents without images gracefully.”

Result: All images extracted at original quality.

🏗️ Phase 2: Architect (Permanent Tool)

Engineering Prompt:

**Role:** Python Tool Developer
**Task:** Create a "Document Image Extractor".

**Requirements:**
1.  **GUI:**
    *   Select document or folder.
    *   Format filter (DOCX, PPTX, PDF).
    *   Preview extracted images.
    *   Naming pattern input.

2.  **Logic:**
    *   DOCX: Use python-docx to access media folder.
    *   PPTX: Use python-pptx for slide images.
    *   PDF: Use image extraction from streams.

3.  **Deliverables:**
    *   `extract_images.py`
    *   `run.bat`, `run.sh`
    *   `requirements.txt`

🧠 Prompt Decoding

  • DOCX internals: A DOCX file is a ZIP containing XML and media files.

🛠️ Instructions

  1. Install: pip install python-docx python-pptx
  2. Copy Prompt → Run.

Related Workflows

Explore other categories

📬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

📖 Free eBook Guide 📦 7 Ready-to-use Scripts 🔔 Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.