Analytics

Anonymous event collector. Track signals fire-and-forget; they're batched in memory (optionally mirrored to IndexedDB for offline), flushed on a timer / size threshold, and POSTed as JSON to an endpoint the host controls.

Quick start

import { createAnalytics } from '@perhapxin/dddk/modules/analytics';

const analytics = createAnalytics({
  endpoint: '/api/dddk/events',
  identity: () => ({
    session_id,           // generated + persisted by host
    visitor_id,
    locale: 'zh-TW',
    device: 'desktop',
  }),
  batchSize: 20,
  flushIntervalMs: 5000,
});
await analytics.init();

analytics.track('session.start');
analytics.track('page.view', { path: location.pathname });
analytics.track('voice.result', { status: 'attempt' });

The endpoint receives an array of AnalyticsEvent:

[
  {
    "ts": 1716543210000,
    "event": "page.view",
    "session_id": "...",
    "visitor_id": "...",
    "locale": "zh-TW",
    "device": "desktop",
    "payload": { "path": "/commercial" }
  }
]

API

createAnalytics({ endpoint, identity, batchSize, flushIntervalMs, offlineBuffer, transport })
await analytics.init() — start the flush timer, drain any offline buffer
analytics.track(event, payload?, { identity? }?)
analytics.identify(values) — merge persistent identity fields into every subsequent event
analytics.reset() — clear persistent identity (e.g. on logout)
await analytics.flush() — force-flush the buffer
await analytics.dispose() — stop timer, final flush, close storage

Options

Option	Default	Purpose
`endpoint`	—	POST URL. Receives a JSON array of events.
`identity`	—	`() => Record<string, unknown>` — called per event for dynamic fields.
`batchSize`	`20`	Flush as soon as buffer hits this many events.
`flushIntervalMs`	`5000`	Timer-driven flush cadence.
`offlineBuffer`	—	`'indexeddb'` to mirror events; drained on next `init()`.
`transport`	—	Replace fetch with your own `(events) => Promise<void>`. Takes precedence over `endpoint`.

The `identity` callback

Identity fields are merged into EVERY event at fire time. Two mechanisms:

Source	When
`identity()` option	Called per event. Use for values that can change mid-session (locale, device).
`analytics.identify({ ... })`	Sticky. Use after login when a stable user id appears.
`track(event, payload, { identity })`	Per-call override.

Merge order (later wins): staticIdentity → identity() → per-call override. The host owns id generation and persistence — the toolbox deliberately doesn't fingerprint:

function uuid(): string {
  if (crypto?.randomUUID) return crypto.randomUUID();
  /* fallback ... */
}
const visitor_id = localStorage.getItem('app:visitor_id')
  ?? (localStorage.setItem('app:visitor_id', uuid()), localStorage.getItem('app:visitor_id')!);
const session_id = sessionStorage.getItem('app:session_id')
  ?? (sessionStorage.setItem('app:session_id', uuid()), sessionStorage.getItem('app:session_id')!);

Event shape

type AnalyticsEvent = {
  ts: number;                              // ms epoch, set at track() call
  event: string;                           // dotted name, e.g. 'page.view'
  payload?: Record<string, unknown>;       // call-site payload
} & Record<string, unknown>;               // identity fields merged in

track signature:

analytics.track('voice.result', { status: 'error', via: 'network' });

Convention: keep payload to small primitives — strings (ids, status codes), numbers (durations, counts), booleans. Anything free-form gets sanitized by the host before track (see below).

Connecting dddk intents

dddk emits typed IntentEvents (palette_activated, agent_asked, agent_answered, voice_captured, selection_used, skill_*, agent_feedback, agent_mode_changed, agent_tool_failed, …). Sanitize each one before tracking — content fields (questions, answers, transcripts, selection text) should NOT leave the device:

import { intentToTrack } from '$lib/analytics-sanitize';

dddk.on('intent', (i) => {
  const safe = intentToTrack(i);
  if (safe) analytics.track(safe.event, safe.payload as Record<string, unknown>);
});

`agent_feedback` — the labelled training signal

When the host sets webAgent.onLoopEnd: { kind: 'feedback', text: ... }, the end-of-loop closure asks the visitor to mark the run satisfied or not. The gesture (Space = yes, double-tap = no) becomes an agent_feedback IntentEvent:

{ kind: 'agent_feedback'
  runId?: string         // ties back to the agent_run_started this run came from
  skillId?: string       // set when a skill triggered the run
  satisfied: boolean | null  // true / false from gestures; null from ask_user picker
  summary: string        // closure text or the picked option value
  timestamp: number
}

Why this matters more than clickstream:

Labelled. Every row carries an explicit yes / no — no inferring intent from scroll depth or page dwell.
Joinable. runId and skillId let you slice by which skill or run produced the rating, so a regression after a prompt change shows up as a yes-rate drop in that one skill row instead of a noise floor across everything.
RL-ready. The pair (state at run start, satisfied) is exactly the input shape a reward model needs. Pipe it into an offline trainer; you don't need to manually label anything.

The bundled dashboard renders agent_feedback as the hero satisfaction section (/dashboard): headline yes-rate, per-day stacked bars, per-skill yes-rate table with a low_confidence flag once samples drop under 30. The same rows are in the CSV / JSON export so an external pipeline can consume them as-is.

intentToTrack whitelists per-kind: it keeps itemId, via, status, response, lengths, counts; it drops question, answer, text, selectionText. The pattern matters more than the exact sanitizer — your domain may have different sensitive fields.

A typical event map:

dddk intent	Tracked event	Payload kept
`palette_activated`	`intent.palette_activated`	`item_id`, `size_chars`, `attachments_count`
`agent_asked`	`intent.agent_asked`	(none — question text dropped)
`agent_answered`	`intent.agent_answered`	`via`, `size_chars`, `latency_ms` (ms from ask → answer)
`agent_llm_call`	`intent.agent_llm_call`	`item_id` (= runId), `variant_id` (= role), `via` (= model), `ttft_ms`, `duration_ms`, `output_tokens`
`voice_captured`	`intent.voice_captured`	`size_chars`
`confirm_action`	`intent.confirm_action`	`item_id` (= action name), `status` (`approved` / `rejected`)
`skill_started` / `skill_finished`	`intent.skill_*`	`item_id`
`agent_run_started`	`intent.agent_run_started`	`item_id` (= runId), `size_chars` (= task length)
`agent_run_completed`	`intent.agent_run_completed`	`item_id` (= runId), `size_chars` (= turn count)
`agent_run_stopped`	`intent.agent_run_stopped`	`item_id` (= runId), `status` (`close` / `esc` / `reject` / `palette` / `voice`)
`agent_pause_decision`	`intent.agent_pause_decision`	`item_id` (= runId), `status` (`continue` / `stop`)
`agent_feedback`	`intent.agent_feedback`	`response` (`yes` / `no` / `dismiss`), `size_chars`

Per-run reconstruction via `runId`

Agent-run intents (started / completed / stopped / pause_decision) all carry the same runId in item_id. Group by it on the dashboard to reconstruct one user query end-to-end — every agent_answered, confirm_action, agent_pause_decision that fired between agent_run_started and agent_run_(completed|stopped) belongs to that run.

Exporting a single run as JSON

The orchestrator buffers every intent emitted during the current agent run alongside the session log. After a run ends, call dddk.exportAgentRun() to get a single JSON object:

dddk.on('agent_final', () => {
  const run = dddk.exportAgentRun();
  if (!run) return;
  // run.runId, run.sessionId, run.session.turns, run.intents, run.exportedAt
  void fetch('/api/my-runs', { method: 'POST', body: JSON.stringify(run) });
});

Useful for shipping a complete query record (memory + per-turn tool calls + user decisions) to a host-side DB or dashboard without joining two streams. session is a deep clone so later turns don't mutate the export.

Proactive wires itself in via the analytics option and emits proactive.shown / proactive.response for free.

Lifecycle hooks

Flush before the page unloads — the timer may not fire in time on hard refresh:

window.addEventListener('beforeunload', () => { void analytics.flush(); });

If a flush fails (network error, 5xx, offline), the batch is re-queued at the front of the buffer for the next attempt. With offlineBuffer: 'indexeddb', events also persist to IndexedDB and drain on the next init() — survives tab close.

Privacy

The toolbox itself stores no PII and never enriches identity. What goes out is entirely what the host puts in:

identity() is YOUR code — don't return emails / phone / IP.
payload is YOUR code — sanitize before track. Use the intent-to-track whitelist pattern above.
The endpoint is YOUR server — apply schema validation there too (the dddk-frontend reference whitelists columns at the D1 ingestion endpoint as a second line of defense).

If a regulator asks "what does this collect" the answer is "exactly what the host writes in identity() and payload — read those two functions". Keep them short.

Transport override

For non-HTTP transports (Beacon, Worker postMessage, native bridge):

const analytics = createAnalytics({
  identity: () => ({ session_id, visitor_id }),
  transport: async (events) => {
    navigator.sendBeacon('/api/dddk/events', new Blob(
      [JSON.stringify(events)],
      { type: 'application/json' },
    ));
  },
});

transport takes precedence over endpoint — set one or the other.

Through the webagent

Analytics is NOT exposed as an LLM tool. Telemetry is a side-effect of host code (page navigation, dddk intents, proactive responses) — not something the agent decides to do. Pass the analytics instance to other toolbox modules (createProactive({ analytics })) so they auto-track their own lifecycle.