B.L.A.S.T.

Blueprint. Link. Architect. Stylize. Trigger.

LIVE

Problem

Teams needed high-quality OCR across PDF, PPTX, and images, but existing flows were brittle and inconsistent.

B.L.A.S.T. focuses on deterministic extraction instead of ad-hoc scripts: one pipeline that can ingest multi-page documents, recover from OCR failures, and ship standardized artifacts for downstream workflows.

Architecture

Key Decision

Structured the system around the A.N.T. model (Architect, Navigator, Tool) to keep routing, validation, and extraction logic separated and testable.

Input document (PDF/PPTX/Image) └─→ Navigator (blast_ocr/main.py) └─→ Preprocess + route by source type └─→ Robust OCR extractor └─→ Self-healing retries/backoff └─→ Markdown + DOCX output └─→ Job metrics in SQLite

Impact / Learnings

The project demonstrates that OCR reliability comes from architecture discipline: strict module boundaries, retry policies, and explicit cleanup matter more than adding more heuristics.

Recent repository work also emphasizes deployment resilience, with multiple cloud-build and dependency compatibility fixes to keep production installs stable.