type: summary
created: Tue Apr 07 2026 02:00:00 GMT+0200 (Central European Summer Time)
updated: Tue Apr 07 2026 02:00:00 GMT+0200 (Central European Summer Time)
sources: raw/articles/EPIC-060
tags: epic product-catalog pdf-extraction ai onboarding

Epic 060 Summary

abstract

Epic B2B-060 specifies a product catalog intelligence system enabling mills to onboard their products via PDF datasheet upload, Excel import, or manual entry -- with AI-powered extraction via Gemini Flash, product matching against existing catalog, and admin review workflow.

Overview

The core problem: mills will not type 30 fields into a form to list their products. They already have PDF datasheets. B2BPaper already has 8,919 processed documents and 4,246 catalog products in the Extractor pipeline. This epic connects those dots.

Ticket Breakdown (9 Tickets, ~8 Days)

B2B-060: Extractor to Marketplace Sync Bridge (P0, Backend)

Management command (sync_extractor) that bulk-imports catalog_products and mills from the Extractor's SQLite DB into PostgreSQL
Idempotent via extractor_fingerprint field on Product model
Field mapping covers 14 fields including name, category-to-paper_type mapping, GSM, coating, certifications, fiber type, quality grade
Re-running skips duplicates automatically

B2B-061: Datasheet Upload Model + API (P0, Backend)

New DatasheetUpload model with lifecycle: pending, processing, extracted, review, accepted, rejected
REST endpoints for upload (multipart PDF, max 10MB), list (filterable by status/mill/date), detail, and admin review
Mill users see only their own uploads; admins see all

B2B-062: AI Extraction Integration (P0, Backend)

Celery task triggered on upload creation
Sends PDF to Extractor pipeline at localhost:8925
Polls for completion (max 120s, every 3s)
Stores extracted product specs as JSON on the DatasheetUpload record
3 retries on transient errors

B2B-063: Product Matching Engine (P0, Backend)

Scoring algorithm: exact match (same mill + paper_type + GSM within +/-2 + width within +/-10mm, confidence >= 0.95), close match (same paper_type + GSM within +/-5, confidence 0.6-0.94), or new product (confidence < 0.6)
Runs automatically after extraction completes
Results stored as matched_products JSON for admin review

B2B-064: Admin Review Dashboard (P0, Frontend)

Inbox view at /manage/datasheets showing all uploads with status badges
Detail view with embedded PDF, editable extracted product table, match results with confidence scores
Per-product actions: Accept (creates/links product), Edit (inline modify then accept), Skip
Bulk "Accept All" for high-confidence matches (>= 0.90)
Reject with required notes field

B2B-065: Mill Datasheet Upload UI (P1, Frontend)

Drag-and-drop PDF upload zone
Real-time processing status feedback
Upload history filtered to own mill

B2B-066: Clone Product (P1, Full-stack)

"Clone" button on product detail/list duplicates a product with all specs
Opens pre-filled edit form for the copy
Critical UX for mills producing same paper in multiple GSM weights

B2B-067: Excel Template Import (P1, Full-stack)

Downloadable .xlsx template with sample data and dropdown validations
Upload, preview (green=valid, red=errors), then confirm import flow
Template includes reference sheets for valid enum values

B2B-068: Empty State + Onboarding Flow (P2, Frontend)

Guided onboarding when a mill has zero products
Three CTAs: Upload Datasheet, Import from Excel, Add Manually
Admin dashboard shows "X mills with 0 products" action item

Build Order

Sprint	Tickets	Duration
Sprint 1	B2B-060 (sync bridge) + B2B-066 (clone)	1 day
Sprint 2	B2B-061 (upload model) + B2B-062 (AI extraction)	2 days
Sprint 3	B2B-063 (matching) + B2B-064 (admin dashboard) + B2B-065 (mill upload UI)	3 days
Sprint 4	B2B-067 (Excel import) + B2B-068 (onboarding)	2 days

Tech Notes

Extractor pipeline: Flask + SQLite + Gemini 2.0 Flash via OpenRouter at localhost:8925
Stats: 8,919 docs, 16,791 extracted products, 4,246 catalog products, 440 mills
Category mapping table provided for Extractor categories to Marketplace paper_type values (13 mappings)

Sources

raw/articles/EPIC-060 -- full epic specification with acceptance criteria and Playwright test expectations

wiki/concepts/epic-060-product-catalog-intelligence -- concept page on catalog enrichment
wiki/entities/paper-pdf-extractor -- the Flask extraction service
wiki/summaries/frontend-spec-summary -- frontend architecture for the admin review UI
wiki/concepts/spec-based-matching -- the matching algorithm that underpins product matching
wiki/entities/morichal-ai -- related data migration from legacy system