DOM dump format
The text the agent actually sees each turn. Read this when you want to tune your
appendSystemPromptto reference page structure, or when you're debugging "why didn't the agent click the right element."
What's in the dump
Every turn, the webagent serialises the page into a compact text block and pastes it into the user message right before the LLM call. The block looks like this:
URL: https://yoursite.com/pricing
TITLE: Pricing — YourSaaS
VIEWPORT: 240px above · 900px visible · 1820px below
[1]<a href="/">Home</a>
[2]<section> Pricing
[3]<h2>Plans</h2>
[4]<table>
[5]<tr>
<td>Hobby</td><td>Free</td>
[6]<tr>
<td>Pro</td><td>$29 /mo</td>
↓[7]<section> FAQ
↓[8]<h2>Common questions</h2>
Three header lines, then an indented tree of every actionable element.
The three header lines
| Line | What it means |
|---|---|
URL: … |
The page the agent is currently looking at. Use this in your prompt to gate destination logic — "if the user named a destination matching a sitemap route and URL is not that route, navigate first." |
TITLE: … |
document.title truncated. Useful for quick orientation. |
VIEWPORT: 240px above · 900px visible · 1820px below |
Where the user is scrolled on the page right now. The agent uses this to decide whether to scroll_to an element before talking about it. |
The viewport line is the strongest hint the SDK gives the agent about user attention. Reference it in your prompt when you want narration to track what the user can actually see: "If the subject is below the visible window, call scroll_to first."
The indexed tree
Every actionable element is prefixed with a numeric index in brackets: [3], [12], etc. The agent calls every tool with the index as the selector:
border({ selector: "5" }) // or "[5]" — both parse
click({ selector: "12" })
scroll_to({ selector: "8" })
The host doesn't need to know any CSS selector; the SDK's resolveSelector(target) looks up the numeric index in a per-turn map and returns the live Element. CSS selectors still work as a fallback for elements outside the dump.
Tab indentation reflects parent → child structure for sectioning containers (<section> / <article> / <aside> / <table> / <tr> / <form> / <details> / <dl> / <dialog> / <header> / <footer> / <main> / <nav> / <hgroup>). All other wrappers (<div>, <span>, framework display: contents fragments) flatten through without bumping indent — they're DOM noise the agent doesn't need to know about.
Viewport markers
A leading ↑ or ↓ on a line means that element is out of the current viewport:
| Marker | Meaning |
|---|---|
↑ |
scrolled past — above the visible window |
| (none) | currently visible to the user |
↓ |
below the fold — user needs to scroll down |
When the agent wants to point at an element with a ↓ marker, it should call scroll_to first so the user can see what the next sentence is about. The SDK's bundled system prompt teaches this rhythm; host prompts only need to reinforce it for vertical layouts.
The viewport check uses getBoundingClientRect() at dump time, so it tracks the user's current scroll position turn-by-turn.
What gets included
| Category | Tags | Emission |
|---|---|---|
| Interactive | <a>, <button>, <input>, <select>, <textarea>, <label>, <summary>, plus any element with an interactive ARIA role (button / link / tab / menuitem / option / checkbox / radio / switch / combobox / textbox / slider / spinbutton) or a keyboard-focusable tabindex with an accessible label |
Numbered, no descent |
| Headings | <h1> – <h6> |
Numbered |
| Sections | <section>, <article>, <aside>, <fieldset>, <main>, <nav>, <header>, <footer>, <hgroup>, <table>, <tr>, <form>, <details>, <dl>, <dialog> |
Numbered + bump indent for children |
| Media | <img> |
Numbered; emits alt text (or filename hint if no alt) |
| Preformatted | <pre> |
Numbered; preserves newlines, capped at 600 chars |
| Content lines | <p>, <li>, <td>, <th>, <dt>, <dd>, <blockquote>, <figcaption>, <caption>, <legend>, <output>, <meter>, <progress> |
Flat text, no index |
What gets dropped
<script>,<style>,<noscript>,<template>,<iframe>,<svg>,<canvas>,<source>,<track>,<head>,<meta>,<link>— silently skipped.- Anything hidden by
display: none/visibility: hidden/ zero bounding box. (display: contentsis the exception: the element has no box but children render, so the walker descends without bumping indent.) - Anything matching the host's
domFilterfunction passed toWebAgentConfig. - Anything with
class="dddk-skip"or attributedata-dddk-skip(and its subtree).
Host opt-outs
Two ways to tell the dump "don't include this":
Class / attribute opt-out
Add dddk-skip to elements you never want the agent to see — site chrome, cookie banners, footer fine print:
<footer class="site-footer dddk-skip">…</footer>
<div data-dddk-skip>internal-only widget</div>
Both the matching element AND its entire subtree disappear from the dump.
domFilter function
For dynamic conditions (route-specific filtering, user-permission-aware hiding):
new DotDotDuck({
webAgent: {
domFilter: (el) => !el.matches('nav.global-nav, footer, [data-cookie]'),
},
});
Return false to drop the element + its subtree. Applies after the class-based skip.
Size caps
The dump is capped at domMaxLength characters (default ~12,000). When the cap fires, the dump ends with a [...truncated] marker. Increase it for dense pages (long pricing tables, docs with many headings); decrease it on token-budget-sensitive deployments.
The cap is char-based not token-based, but the rest of the pipeline (maxPromptTokens on WebAgentConfig) catches over-long full prompts at the message-array level.
Debug exposure
Every dump is mirrored to window.__dddkDebug:
window.__dddkDebug.lastDom // string the LLM saw
window.__dddkDebug.lastDomAt // ISO timestamp
window.__dddkDebug.lastDomBytes // size
window.__dddkDebug.lastIndexMap // Map<number, Element> the tools resolved against
window.__dddkDebug.lastLlmMessages // full message array sent to the LLM (system + history + env block)
Pop devtools, type __dddkDebug.lastDom after any agent turn, and you'll see exactly what the model read — invaluable when narration goes sideways and you need to ask "did the agent even see that section?"
See also: prompt-design for how the dump composes with the layered system prompt, actions/catalog for the indexed-selector contract every tool uses, screenshot for attaching a viewport / full-page screenshot alongside the text dump.