webagent — LLM Providers
Design principles
- BYOK (Bring Your Own Key) — the package does not manage API keys.
- Provider abstraction — swapping the provider does not require touching agent logic.
- v1 supports OpenAI + Google AI Studio only (a single Google key calls both Gemini and Gemma); other vendors come later.
- OpenAI's function-calling shape is the internal standard. Google goes through an adapter.
- Per-role model routing —
LLMRouterlets different roles use different providers / models to save cost. See router and prompt-design.
LLMProvider interface
interface LLMProvider {
readonly name: string;
complete(opts: CompleteOptions): Promise<CompleteResult>;
}
interface CompleteOptions {
messages: LLMMessage[];
tools?: ToolDefinition[];
temperature?: number;
maxTokens?: number;
model?: string;
signal?: AbortSignal;
/** Reasoning intensity. 'off' suppresses thinking on models that support it. */
thinking?: 'off' | 'low' | 'medium' | 'high' | 'xhigh';
/** Force structured JSON output (response_format / responseMimeType). */
jsonMode?: boolean;
}
interface CompleteResult {
content: string;
toolCalls?: ToolCall[];
usage?: { promptTokens: number; completionTokens: number };
finishReason: 'stop' | 'tool_calls' | 'length' | 'content_filter';
}
interface LLMMessage {
role: 'system' | 'user' | 'assistant' | 'tool';
content: string | ContentPart[];
toolCallId?: string; // when role = 'tool'
toolCalls?: ToolCall[]; // when role = 'assistant'
}
interface ContentPart {
type: 'text' | 'image';
text?: string;
image?: string; // URL or base64
}
interface ToolDefinition {
name: string;
description: string;
parameters: JSONSchema; // standard JSON Schema
}
interface ToolCall {
id: string;
name: string;
arguments: Record<string, any>;
}
Security usage guide
@perhapxin/dddk providers are endpoint-neutral — the package itself doesn't pick a side, but how you wire it and where the key lives decides whether you ship safely.
BYOK modes, why
.envis not safe, a production backend proxy example (Cloudflare Worker / Next.js / Express), and the four things every server proxy must do — all live in security. Read that page first, then come back to pick a provider.
OpenAI Provider
import { OpenAIProvider } from '@perhapxin/dddk';
const llm = new OpenAIProvider({
apiKey: 'sk-...',
model: 'gpt-5.5', // default 'gpt-5.4-mini'
baseURL?: 'https://api.openai.com/v1', // override for self-hosted / reverse proxy
organization?: string,
headers?: Record<string, string>, // extra request headers
extraBody?: Record<string, unknown>, // vendor-specific request fields (see below)
});
Works with any OpenAI-compatible endpoint — Azure OpenAI, OpenRouter, Cloudflare AI Gateway, and OpenAI-compatible vendors like DeepSeek or Qwen via baseURL.
extraBody — vendor-specific knobs
The OpenAI chat-completions JSON shape is the de-facto lingua franca, but each vendor adds proprietary fields. extraBody is a flat object spread into every request body after the SDK's built-in fields (so it can also override temperature / max_tokens when needed). Use it for fields no other adapter would understand.
Example — DeepSeek's thinking toggle (DeepSeek v4-pro reasons by default; passing { type: 'disabled' } skips reasoning for lower TTFT):
const llm = new OpenAIProvider({
apiKey: process.env.DEEPSEEK_KEY!,
model: 'deepseek-v4-pro',
baseURL: 'https://api.deepseek.com/v1',
extraBody: {
thinking: { type: 'disabled' },
},
});
Other patterns the same hook covers:
- OpenRouter's
transforms/routefields - Self-hosted vLLM's
chat_template_kwargs - Any vendor's experimental flags during a private beta
The SDK doesn't validate extraBody; whatever you put in goes upstream verbatim. If the vendor rejects the body, you'll see the 4xx in OpenAIProvider.complete's throw.
Special handling for gpt-5.x / o-series reasoning models
The new-generation reasoning models (gpt-5.x, o1, o3, o4) reject:
max_tokens— usemax_completion_tokensinstead.- Custom
temperature— must use the default of1.
The provider's internal isReasoningModel (in src/agent/llm/openai.ts) detects the model prefix and switches automatically:
| Model prefix | token field | temperature |
|---|---|---|
gpt-5.x, o[1-9], gpt-1[0-9].x |
max_completion_tokens |
omitted (use default) |
| Anything else | max_tokens |
applies opts.temperature ?? 0.7 |
The host doesn't worry about this — change the model string and the rest follows.
Recommended models (mid 2026)
| Use | Model |
|---|---|
| Flagship | gpt-5.5 |
| Mid / cheap | gpt-5.4-mini |
| Cheapest / fastest | gpt-5.4-nano |
Google Provider (Gemini + Gemma share this)
import { GoogleProvider } from '@perhapxin/dddk';
const llm = new GoogleProvider({
apiKey: '...', // Google AI Studio key
model: 'gemini-3.1-pro-preview', // default
});
One Google AI Studio key calls both the Gemini and Gemma model families (only the model id differs). Internally we translate OpenAI-format tool definitions into Google function declarations, and translate Google's function-call responses back into OpenAI ToolCall shape.
Recommended models (mid 2026)
| Use | Model |
|---|---|
| Flagship | gemini-3.1-pro-preview |
| Mid | gemini-2.5-pro |
| Fast / cheap | gemini-3.1-flash-lite-preview |
| Open-weight (Gemma) | gemma-4-31b-it or gemma-4-26b-a4b-it |
Thinking / Reasoning control
Reasoning models (OpenAI gpt-5.x, o-series; Google gemini-2.5+) let you tune how much they think. Short tasks (inline translate, improve, summarize a sentence) don't need reasoning — save latency and tokens. Long tasks (the main webagent loop) benefit from higher budgets.
await llm.complete({
messages,
thinking: 'off', // no reasoning
// thinking: 'low' | 'medium' | 'high' | 'xhigh'
});
Internal mapping:
| Level | OpenAI (reasoning_effort) |
Google (thinkingBudget tokens) |
|---|---|---|
off |
param OMITTED (server default = no reasoning) | 0 |
low |
low |
512 |
medium |
medium |
1024 |
high |
high |
4096 |
xhigh |
xhigh |
(max) |
- OpenAI: only sends
reasoning_effortwhenthinking !== 'off'AND the request has notools. Two reasons:'minimal'was retired in 2026 — passing it returns HTTP 400Unsupported value. The current valid set is'none' | 'low' | 'medium' | 'high' | 'xhigh'.- On
/v1/chat/completions,reasoning_effortis INCOMPATIBLE withtools— the API rejects withFunction tools with reasoning_effort are not supported. Since the WebAgent loop always sends tools, omitting the param (server default) is functionally identical to "off" with zero conflict.
- Google: writes to
generationConfig.thinkingConfig.thinkingBudget(Gemini 2.5) orthinkingLevel(Gemini 3.x / Gemma 4). Models that don't support it ignore the flag.
JSON mode
await llm.complete({
messages,
jsonMode: true,
});
- OpenAI →
response_format: { type: 'json_object' } - Google →
generationConfig.responseMimeType: 'application/json'
You must say "output JSON" in the system or user prompt, otherwise OpenAI will reject the request. Pair with thinking: 'off' and temperature: 0 for "short task with stable structured output" — that's how inline-agent and immersive-translate use it.
Per-role model routing
You don't have to use the same model for everything.
LLMRouterlets each role pick a different provider/model (flagship for the main agent loop, cheap one for inline short tasks). For the full role list, resolution rules, and production wiring example, see LLM router.
Image support
Both vendors handle vision; the API surface is the same:
const messages: LLMMessage[] = [{
role: 'user',
content: [
{ type: 'text', text: 'What is this image?' },
{ type: 'image', image: 'data:image/png;base64,...' },
],
}];
await llm.complete({ messages });
Format differences are handled inside each provider.
Streaming
v1 does not support streaming (the agent is turn-based — every step needs a complete tool call before execution).
v2 will consider streaming for show_subtitle (letting the subtitle render character-by-character).
Writing your own provider
Implement complete() and you're done; the agent doesn't care how you get the result. To register a custom provider in the process-wide registry (so createProvider('your-id:model') resolves to it), or to wrap / replace one of the built-ins (OpenAI / Google / proxy), go through the adapter registry.
Full
LLMAdapterinterface,registerAdapter/seedDefaultAdapters/createProviderAPI, plus examples for self-hosted vLLM / Bedrock / Vertex live in LLM adapters.
Not supported in v1 (stated explicitly to save the question)
- ❌ Anthropic Claude (v2)
- ❌ Ollama / local models (v2)
- ❌ Embeddings (not part of webagent's scope)
- ❌ Fine-tune APIs (out of scope)
- ❌ Streaming (under consideration for v2)