Search
Full-text search with per-field BM25, pluggable scoring signals, and per-field language packs. Index lives in memory (and optionally mirrors to IndexedDB).
Quick start
import { createSearch, builtin } from '@perhapxin/dddk/toolbox/search';
interface Product { sku: string; name: string; description: string; price: number; }
const search = createSearch<Product>({
adapter: {
fromRow: (p) => ({
id: p.sku,
fields: { name: p.name, body: p.description },
meta: { price: p.price },
}),
},
scorers: [
builtin.bm25Field({ weights: { name: { weight: 3 }, body: { weight: 1 } } }),
],
weights: { bm25_field: 1.0 },
combiner: 'weighted_sum',
});
await search.init();
await search.addDocs(products);
const hits = await search.query('blue shirt', { topK: 10 });
// hits: Array<{ doc, score, explanation? }>
API
createSearch<TRow>({ adapter, scorers, weights, combiner, topK, language, fieldLanguages, storage, sync, id })await search.init()— open storage + load any persisted indexawait search.addDoc(row)/await search.addDocs(rows)/await search.updateDoc(row)/await search.removeDoc(id)await search.query(text, { topK, scoreThreshold, customMeta, explain })→SearchResult[]search.explain(text, docId)— score breakdown for one docsearch.onLowConfidence((q, top) => ...)— escalation hooksearch.getDoc(id)/search.allDocs()/search.getFeatures(id)
Scoring signals
Built-in signals (builtin.*):
| Signal | What it scores |
|---|---|
bm25Field({ weights }) |
Per-field BM25 with tf-saturation |
recencyDecay({ field, halfLifeDays }) |
Half-life decay on a timestamp field |
newProductBoost({ field, withinDays, multiplier }) |
Multiplier for items within a date window |
bestsellerBoost({ field, method, maxBoost }) |
Log-normalised popularity boost |
promoBoost({ field }) |
Multiplicative boost from a per-doc numeric field |
inventoryFilter({ field, min }) |
Gate: hide docs whose stock < min |
activeFilter({ field }) |
Gate: hide inactive items |
Custom signals are easy — implement SearchScorer:
const editorPick: SearchScorer = {
id: 'editor_pick',
compute: (doc) => doc.meta?.editor_pick ? 1.5 : 1,
};
Per-field language
Multi-lingual catalogs declare which field is which language:
const search = createSearch({
adapter: { /* ... */ },
fieldLanguages: {
title_zh: 'zh',
title_en: 'en',
title_ja: 'ja',
brand: 'universal', // brand names don't have a language
},
});
Each field gets indexed with its language pack (zh: bigram tokenizer, en: Porter stemmer, universal: Unicode segmentation). The query is auto-detected and matches against the right field.
Through the webagent
dddk.tools.registerSearch({
items: products,
toDoc: (p) => ({ id: p.sku, fields: { title: p.name, body: p.description } }),
});
The LLM sees search_catalog({ query }) and calls it autonomously
when the user asks about products. See notes/toolbox-as-tools.md
in the repo for the design notes.