πŸ“„

Text Normalizer

Clean and standardize text data: remove extra spaces, fix capitalization, normalize Unicode.

Document ⭐ Beginner ⏱️ 2 minutes

😫 The Pain Point

Your data has inconsistent text: ” JOHN SMITH ” vs β€œjohn smith” vs β€œJohn Smith”. Before processing, you need everything standardized.

πŸš€ Agentic Solution

A Text Cleaner that applies consistent formatting rules.

Key Features:

  • Whitespace Cleanup: Remove extra spaces, trim edges.
  • Case Normalization: UPPER, lower, Title Case.
  • Unicode Normalization: NFC/NFD forms for Vietnamese.

βš”οΈ Phase 1: Commander (Quick Fix)

For quick normalization.

Prompt:

β€œI have an Excel data.xlsx with text columns. Write a Python script to:

  1. Trim: Remove leading/trailing whitespace.
  2. Collapse: Multiple spaces to single space.
  3. Unicode: Normalize to NFC form.
  4. Case: Apply Title Case to β€˜Name’ column.
  5. Output: Save as data_normalized.xlsx.

Print sample before/after.”

Result: Clean, consistent text data.

πŸ—οΈ Phase 2: Architect (Permanent Tool)

Engineering Prompt:

**Role:** Python Tool Developer
**Task:** Create a "Text Normalizer".

**Requirements:**
1.  **GUI:**
    *   Select Excel file.
    *   Column selector (apply to which columns).
    *   Rule checkboxes: Trim, Collapse spaces, Case (dropdown).
    *   Unicode form dropdown (NFC, NFD).
    *   Preview changes.

2.  **Logic:**
    *   String manipulation with regex.
    *   unicodedata.normalize().
    *   Handle None/NaN values.

3.  **Deliverables:**
    *   `text_normalizer.py`
    *   `run.bat`, `run.sh`
    *   `requirements.txt`

🧠 Prompt Decoding

  • Unicode Normalization: Vietnamese characters can be composed differently. NFC is preferred for web.

πŸ› οΈ Instructions

  1. Copy Prompt β†’ Run.

Related Workflows

Explore other categories

πŸ“¬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

πŸ“– Free eBook Guide πŸ“¦ 7 Ready-to-use Scripts πŸ”” Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.