LLM provider reference
For OpenClaw, NemoClaw, and other agent stacks: what to use, where to get API access, and how to think about cost. Pricing changes often—confirm on each vendor before production spend.
For customers running OpenClaw, NemoClaw, and other agent stacks. Last updated April 2026. Pricing changes frequently—verify on provider sites before committing production spend.
1. Model comparison at a glance
USD per 1M tokens unless noted. Self-host rows have no standard per-token public rate. Agentic score is a 1–10 shorthand for tool-use reliability, multi-step reasoning, and ecosystem maturity for autonomous workflows (indicative only).
| Model | Input $ | Output $ | Context | Agentic | Best for |
|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 1M | 9.8 | Flagship: long-horizon agents, hard coding, enterprise reasoning |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | 9.3 | Best price/perf for production agents |
| GPT-4o | $2.50 | $10.00 | 128K | 9.0 | General agents, fast tool calls |
| Gemini 1.5 Pro | $1.25 | $5.00 | 2M | 8.5 | Huge-context document/video work |
| DeepSeek V3.2 | $0.28 | $0.42 | 128K | 8.5 | Cost-sensitive reasoning |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | 8.0 | Fast structured agents |
| Grok 4.20 | $2.00 | $6.00 | 2M | 8.0 | Large context, real-time data |
| Grok 4.1 Fast | $0.20 | $0.50 | 2M | 8.0 | Budget production agents |
| Claude 3 Haiku | $0.25 | $1.25 | 200K | 7.5 | High-throughput chat / extraction |
| GPT-4o-mini | $0.15 | $0.60 | 128K | 7.0 | Testing, simple agents, volume |
| Gemini 1.5 Flash | $0.075 | $0.30 | 1M | 7.0 | Cheap chat, summarisation |
| Llama 3.1 405B | self-host | self-host | 128K | 7.0 | Private/local, no per-token cost |
2. Recommendations by use case
| Use case | Pick | Why |
|---|---|---|
| Mission-critical / long-horizon agent | Claude Opus 4.6 | Sustained reasoning across multi-hour sessions; 1M context at standard pricing. |
| Production agent (best value at frontier) | Claude Sonnet 4.6 | Roughly 60% the cost of Opus with 1M context; the sensible default for most agents. |
| Coding agent / dev tooling | Claude Opus 4.6 or GPT-4o | Opus for hard refactors and multi-file changes, GPT-4o when latency matters more than depth. |
| Long document / video analysis | Gemini 1.5 Pro or Sonnet 4.6 | Gemini for 2M context + native video; Sonnet 4.6 for stronger recall on text-heavy 1M prompts. |
| High-volume cheap automation | GPT-4o-mini or Gemini Flash | Sub-cent responses, fine for classify/extract. |
| Budget production agent | Grok 4.1 Fast or DeepSeek V3.2 | Near-frontier capability at fraction of premium pricing. |
| Private / on-prem deployment | Llama 3.1 via NVIDIA NIM | No data leaves the box; GPU required. |
| Real-time / social data | Grok 4.20 | Native X integration, low latency. |
3. Getting an API key
Providers use bearer-token style credentials. AgenticHosting does not collect LLM API keys in the dashboard—create keys in each provider console, then enter them when your stack prompts you on the VPS (OpenClaw / NemoClaw installer, scripts, or config on the server). Follow LLM setup for the step-by-step flow.
- Anthropic (Claude) — API keys; $5 minimum credit; key shown once. Console: console.anthropic.com.
- OpenAI (GPT) — Create API key; verify phone and add billing as required.
- Google (Gemini) — Quick path: Google AI Studio API key. Teams on GCP may use Google Cloud Console with Vertex AI and a service account JSON (see Vertex AI docs).
- xAI (Grok) — console.x.ai → API keys (check current free-credit offers on their site).
- DeepSeek — platform.deepseek.com → API keys.
- NVIDIA NIM — build.nvidia.com → API keys (prefix often
nvapi-). - Meta (Llama API) — llama.developer.meta.com (availability may vary by region), or self-host weights.
4. Cost optimisation
- Prompt caching — Anthropic, OpenAI, and DeepSeek cache repeated prefixes; large savings on system prompts and tool definitions. Cache your agent system prompt where supported.
- Batch APIs — OpenAI and Google offer discounted batch endpoints for non-realtime work.
- Model routing — Route cheap turns to Haiku / Flash / mini; escalate only on low confidence.
- Context pruning — Summarise old tool output instead of replaying full logs every step.
- Local preprocessing — Small local models can handle extraction before a paid frontier call (GPU helps).
4b. Tuning Claude Opus 4.6 (effort + Fast mode)
Opus 4.6 exposes effort (adaptive thinking) and optional Fast mode. Thinking tokens are billed at output rates, so effort drives cost on hard prompts. Fast mode trades cost for latency at 6× standard rates (e.g. $30/1M in and $150/1M out for Opus)—applied to the whole request, including cached reads.
Try a configuration
Pick a preset or fine-tune. Cost preview updates live (indicative USD; verify on Anthropic's pricing page).
Quick presets
Or fine-tune
Default. Autonomous extended thinking.
Sample token counts
Cost preview
Show API parameters (example JSON)
{
"model": "claude-sonnet-4-6",
"effort": "high"
}| effort | Behaviour | When to use |
|---|---|---|
| low | Fast, minimal reasoning. | Routing, classification, short tool calls. |
| medium | Balanced; light reasoning when needed. | Standard chat and most agent tool steps. |
| high | Default. Autonomous extended thinking. | Multi-step reasoning, code review, planning. |
| max | Forces deep scrutiny—slowest, most costly. | Critical decisions, hard debugging, audits. |
Fast mode: use when a human is waiting (interactive coding, IDE assistants) or latency would otherwise break the workflow. Avoid for background agents and high-volume tool loops—the multiplier compounds quickly.
| Workload | Suggested config | Why |
|---|---|---|
| Interactive dev assistant | Opus 4.6, effort high, Fast on | User is watching the cursor. |
| Background autonomous agent | Sonnet 4.6, effort high, Fast off | Cheaper base; latency less important. |
| High-volume routing / classification | Sonnet 4.6 or Haiku 4.5, effort low | Avoid wasted reasoning on simple turns. |
| Hard one-shot reasoning | Opus 4.6, effort max, Fast off | Pay for depth, not speed. |
Use the interactive configuration tool (earlier in this section) to apply these presets and estimate cost.
5. Practical notes for agentic workloads
- Verify current Anthropic billing and access terms in their console—agent frameworks typically use API keys with per-token billing.
- Watch for optional Claude modifiers (e.g. Fast mode, regional inference multipliers) documented on Anthropic's pricing page.
- Google Cloud: enable billing controls and spending caps before pointing autonomous agents at Vertex.
- Long context (Grok, Gemini) still scales cost with input size—big windows are not free context.
- NVIDIA NIM is a common path for local inference on your own hardware; validate rate limits against your tier.
- Implement retry with backoff for 429s—new accounts often hit rate limits early.
- Check each provider's supported regions and terms; availability varies by country.
6. Official pricing and docs
| Provider | Pricing | Docs |
|---|---|---|
| Anthropic | Pricing | Documentation |
| OpenAI | Pricing | Documentation |
| Google Vertex | Pricing | Documentation |
| xAI | Pricing | Documentation |
| DeepSeek | Pricing | Documentation |
| NVIDIA NIM | Pricing | Documentation |
| Meta Llama | Pricing | Documentation |
Disclaimer: prices and capabilities change frequently. Confirm against each vendor before committing production spend.