type
concept
created
Tue Apr 07 2026 02:00:00 GMT+0200 (Central European Summer Time)
updated
Tue Apr 07 2026 02:00:00 GMT+0200 (Central European Summer Time)
sources
raw/articles/PRD
tags
matching algorithm scoring core-mechanic

Matching Algorithm

abstract
The matching algorithm scores surplus-buyer pairs across five weighted dimensions (paper type 30%, GSM 25%, width 20%, grade 15%, geography 10%), applies a minimum threshold of 50, and classifies results as exact, close, or partial.

Overview

The matching algorithm is the core intelligence of the marketplace. It evaluates every combination of available surplus items against active buyer specifications, producing a composite score from 0 to 100 that determines match quality. The algorithm uses hard disqualification gates (paper type, GSM, width) before computing soft scores (grade, geography), ensuring only physically viable matches are surfaced.

The Five Scoring Functions

1. Paper Type (Weight: 30%)

A binary gate returning 100 (match) or 0 (disqualification). Paper type must match exactly -- there is no "close" paper type. A buyer of wiki/entities/kraftliner cannot use wiki/entities/testliner. If the surplus paper type does not equal the BuyerSpec paper type, the entire match is discarded with no further computation.

Valid paper types: kraftliner, testliner, fluting, duplex, triplex, sack_kraft, white_top_testliner, coated_board, mg_kraft, greaseproof, tissue.

2. GSM / Grammage (Weight: 25%)

Scored against the buyer's min/max GSM range using tolerance bands:

Condition Score
Within buyer's exact range (center of range) 100
Within buyer's exact range (edge of range) 80-100 (linear)
Within +/-5% beyond stated range 60
Within +/-10% beyond stated range 30
Outside all tolerances 0 (disqualification)

Example: Buyer spec 120-200 GSM. Surplus at 160 GSM scores 100. Surplus at 125 GSM scores ~86. Surplus at 205 GSM scores 60. Surplus at 215 GSM scores 30. Surplus at 250 GSM scores 0.

If the GSM score is 0, the match is discarded entirely.

3. Width (Weight: 20%)

Scored against the buyer's width range with a hard ceiling at machine max width:

Condition Score
Exceeds machine_max_width_mm 0 (hard disqualification)
Within buyer's width range 100
Below minimum by <=5% 70 (buyer might accept narrower)
Below minimum by 5-10% 40
Above maximum by <=5% (within machine limit) 50 (can be cut)
Outside all tolerances 0 (disqualification)

The machine max width is a physical constraint -- a roll that does not fit the machine cannot be used regardless of other specs. If width score is 0, the match is discarded.

4. Quality Grade (Weight: 15%)

Scored against the buyer's list of acceptable grades using a hierarchy: A (prime) > B (near-prime) > C (off-grade).

Condition Score
Grade is in buyer's acceptable list 100
Surplus grade is higher than required 90 (better quality than needed)
One grade below lowest acceptable 40 (buyer might accept at discount)
Two or more grades below 0

5. Geography (Weight: 10%)

Scored on proximity between surplus origin and buyer location:

Condition Score
Same country 100 (ideal for truck shipping, especially intra-EU)
Same region (e.g., both EU) 70
Adjacent regions (e.g., EU to Middle East) 40
Distant regions 20

Adjacent region pairs: EU-Middle East, EU-Africa, North America-South America, Asia-Middle East, Asia-Oceania, Middle East-Africa.

Composite Score Calculation

The composite score is computed as:

overall = (paper_type_score * 0.30) + (gsm_score * 0.25) + (width_score * 0.20) + (grade_score * 0.15) + (geo_score * 0.10)

Three hard disqualification gates are applied sequentially before computing the full composite:

  1. Visibility check (see wiki/concepts/geographic-visibility-system) -- checked first to avoid wasted computation
  2. Paper type must be non-zero
  3. GSM must be non-zero
  4. Width must be non-zero

If any gate fails, the function returns None (no match).

Match Classification

Type Score Range Description
Exact >= 90 Near-perfect fit for buyer's specs
Close 70-89 Good fit, minor deviations
Partial 50-69 Usable but notable compromises
Below threshold < 50 Not surfaced to buyer

Match Execution Flow

  1. New surplus item ingested (or buyer spec updated)
  2. Fetch all active BuyerSpecs with matching paper_type (pre-filter eliminates ~80%)
  3. For each BuyerSpec: check visibility, calculate composite score
  4. Scores >= 50 create MatchResult records
  5. Scores >= 80 become candidates for exclusivity
  6. Sort matches by overall_score descending
  7. Queue newsletter generation for all matched buyers
  8. Log match statistics

Price Check

After scoring, the algorithm checks if the surplus price (adjusted for the buyer's geographic region) falls within the buyer's max_price_per_mt. This is an informational flag (price_within_budget), not a hard disqualifier -- a buyer might pay more for a perfect spec match.

Performance Targets

Scenario Scale Target
Single item vs all specs 1 x 500 specs < 1 second
Full batch ingestion (50 items) 50 x 500 specs < 10 seconds
Full re-match (all surplus vs all specs) 5000 x 500 specs < 60 seconds

Optimization strategies: pre-filter by paper_type (index scan), pre-filter by visibility region, batch scoring in Django queryset operations, cache static region lookups, use Celery for async matching on large batches.

Sources

Related