- type
- summary
- created
- Tue Apr 07 2026 02:00:00 GMT+0200 (Central European Summer Time)
- updated
- Tue Apr 07 2026 02:00:00 GMT+0200 (Central European Summer Time)
- sources
- raw/articles/EPIC-060
- tags
- epic product-catalog pdf-extraction ai onboarding
Epic 060 Summary
abstract
Epic B2B-060 specifies a product catalog intelligence system enabling mills to onboard their products via PDF datasheet upload, Excel import, or manual entry -- with AI-powered extraction via Gemini Flash, product matching against existing catalog, and admin review workflow.Overview
The core problem: mills will not type 30 fields into a form to list their products. They already have PDF datasheets. B2BPaper already has 8,919 processed documents and 4,246 catalog products in the Extractor pipeline. This epic connects those dots.
Ticket Breakdown (9 Tickets, ~8 Days)
B2B-060: Extractor to Marketplace Sync Bridge (P0, Backend)
- Management command (
sync_extractor) that bulk-imports catalog_products and mills from the Extractor's SQLite DB into PostgreSQL - Idempotent via
extractor_fingerprintfield on Product model - Field mapping covers 14 fields including name, category-to-paper_type mapping, GSM, coating, certifications, fiber type, quality grade
- Re-running skips duplicates automatically
B2B-061: Datasheet Upload Model + API (P0, Backend)
- New
DatasheetUploadmodel with lifecycle: pending, processing, extracted, review, accepted, rejected - REST endpoints for upload (multipart PDF, max 10MB), list (filterable by status/mill/date), detail, and admin review
- Mill users see only their own uploads; admins see all
B2B-062: AI Extraction Integration (P0, Backend)
- Celery task triggered on upload creation
- Sends PDF to Extractor pipeline at localhost:8925
- Polls for completion (max 120s, every 3s)
- Stores extracted product specs as JSON on the DatasheetUpload record
- 3 retries on transient errors
B2B-063: Product Matching Engine (P0, Backend)
- Scoring algorithm: exact match (same mill + paper_type + GSM within +/-2 + width within +/-10mm, confidence >= 0.95), close match (same paper_type + GSM within +/-5, confidence 0.6-0.94), or new product (confidence < 0.6)
- Runs automatically after extraction completes
- Results stored as matched_products JSON for admin review
B2B-064: Admin Review Dashboard (P0, Frontend)
- Inbox view at
/manage/datasheetsshowing all uploads with status badges - Detail view with embedded PDF, editable extracted product table, match results with confidence scores
- Per-product actions: Accept (creates/links product), Edit (inline modify then accept), Skip
- Bulk "Accept All" for high-confidence matches (>= 0.90)
- Reject with required notes field
B2B-065: Mill Datasheet Upload UI (P1, Frontend)
- Drag-and-drop PDF upload zone
- Real-time processing status feedback
- Upload history filtered to own mill
B2B-066: Clone Product (P1, Full-stack)
- "Clone" button on product detail/list duplicates a product with all specs
- Opens pre-filled edit form for the copy
- Critical UX for mills producing same paper in multiple GSM weights
B2B-067: Excel Template Import (P1, Full-stack)
- Downloadable .xlsx template with sample data and dropdown validations
- Upload, preview (green=valid, red=errors), then confirm import flow
- Template includes reference sheets for valid enum values
B2B-068: Empty State + Onboarding Flow (P2, Frontend)
- Guided onboarding when a mill has zero products
- Three CTAs: Upload Datasheet, Import from Excel, Add Manually
- Admin dashboard shows "X mills with 0 products" action item
Build Order
| Sprint | Tickets | Duration |
|---|---|---|
| Sprint 1 | B2B-060 (sync bridge) + B2B-066 (clone) | 1 day |
| Sprint 2 | B2B-061 (upload model) + B2B-062 (AI extraction) | 2 days |
| Sprint 3 | B2B-063 (matching) + B2B-064 (admin dashboard) + B2B-065 (mill upload UI) | 3 days |
| Sprint 4 | B2B-067 (Excel import) + B2B-068 (onboarding) | 2 days |
Tech Notes
- Extractor pipeline: Flask + SQLite + Gemini 2.0 Flash via OpenRouter at localhost:8925
- Stats: 8,919 docs, 16,791 extracted products, 4,246 catalog products, 440 mills
- Category mapping table provided for Extractor categories to Marketplace paper_type values (13 mappings)
Sources
- raw/articles/EPIC-060 -- full epic specification with acceptance criteria and Playwright test expectations
Related
- wiki/concepts/epic-060-product-catalog-intelligence -- concept page on catalog enrichment
- wiki/entities/paper-pdf-extractor -- the Flask extraction service
- wiki/summaries/frontend-spec-summary -- frontend architecture for the admin review UI
- wiki/concepts/spec-based-matching -- the matching algorithm that underpins product matching
- wiki/entities/morichal-ai -- related data migration from legacy system