Key insight: Most people who run out of credits aren't using OpenCode "too much" — they're using it inefficiently. Picking the right model for the task and reusing past work typically cuts consumption by 50% or more, with no loss in output quality.
Top 5 levers — highest impact first
01 Right tool for the job

Agentic apps cost credits even for questions a free chat tool handles better. Start with Circuit or Claude.ai; save OpenCode for actual code work.

Jump to section →
02 Model choice

One Opus 4.7 prompt costs the same as 15 Sonnet 4.5 prompts. Default to cheap — Haiku or a free model — and escalate only when the output isn't good enough.

Jump to section →
03 Prompt caching

In a well-structured session, 93%+ of each request is served from cache at ~one-tenth the cost. Switching models or going idle busts it instantly.

Jump to section →
04 Session habits

One task per session. Stable context up front. Cluster work in time — five scattered 10-minute sessions cost more than one focused 50-minute block.

Jump to section →
05 Plan before you Build

Scope a task in Plan mode with a cheap model first, then execute in Build mode. One AI credit draw up front instead of ten back-and-forth.

Jump to section →

How billing works

As of June 1, 2026, GitHub Copilot uses consumption-based billing measured in GitHub AI Credits (1 credit = $0.01 USD), pooled across the organisation. Cost varies significantly by model — see the table below.

⚠️ June 2026 billing update.

A $1,000 per-user monthly spending cap is now enforced. If you hit it, your Copilot access is halted until the next billing cycle — even if the org pool still has capacity. A promotional period (June 1 – September 1) provides higher included credits while teams baseline their usage — after that, standard per-seat amounts apply.

Use the right tool for the job

Before thinking about which model to use, ask a more fundamental question: does this task actually need an agentic app at all? Agentic tools like OpenCode, VS Code Copilot, and Claude Code are powerful — but they're optimised for writing and editing code in a repo. A lot of daily AI work doesn't need that.

If you're researching a topic, summarising a document, brainstorming an approach, or just asking a question — a free chat tool like Circuit, ChatGPT, or Claude.ai handles all of that with zero impact on your Copilot credits. The same 1,000-token conversation that costs nothing in a chat tool draws down your allowance in an agentic app.

Use a free chat tool when…

  • Researching a topic or technology
  • Summarising a document or meeting notes
  • Brainstorming or exploring an idea
  • Asking a one-off question with no code involved
  • Comparing options or drafting a plan
  • Long-context Q&A (Circuit supports 1M token context)

No Copilot credits consumed.

Use an agentic app when…

  • Writing or editing code in a repo
  • Running multi-file refactoring or test generation
  • Executing shell commands or build steps
  • Working autonomously across a codebase
  • Using tools, MCPs, or file operations

Copilot credits consumed — use them deliberately.

Practical tip: If you're about to open an agentic app to figure out your approach to a task, consider drafting that thinking in a free chat tool first. Arrive at the agentic session knowing what you want — don't use it to think out loud.

Use the right model for the job

The single biggest lever you have is model choice. The same prompt, sent to two different models, can cost anywhere from zero to 30× a standard request. Knowing which bucket each model falls into is most of the battle.

How to change models in OpenCode: The current model is shown in the bottom status bar. Click it to switch at any point — the new model takes effect on your next message. To control which models appear in that list, go to Settings → Models and toggle on the ones you want.
Recommended starter set — three models to enable: The list can get overwhelming. A simple starting point is to enable just these three and turn the rest off for now — you can always come back and add more later.
  1. Claude Haiku 4.5 (Lightweight, ~0.2×) — your default for most things. Drafting, quick questions, summarisation, simple tasks. Start here every session.
  2. Claude Sonnet 4.5 / 4.6 (Versatile, ~1×) — step up to this when you need sharper reasoning, careful analysis, or real work.
  3. Claude Opus 4.7 / 4.8 (Powerful, ~5×) — for genuinely hard problems only: architectural decisions, complex multi-step work, or when Sonnet hasn't been enough. Use deliberately, not by default.

Note: the AI model landscape changes fast — new models appear and pricing shifts regularly. Treat this as a starting point, not a permanent recommendation.

GitHub publishes per-token pricing for all models. Here's the current picture grouped by category, with approximate relative cost to help with day-to-day decisions:

CategoryApprox. relative costModelsBest for
Included / Free 0× (unlimited) GPT-4.1, GPT-5 mini, Raptor mini (preview) Drafting, reformatting, lookups, anything routine. Start here.
Lightweight ~0.1× – 0.3× Haiku 4.5, GPT-5.4 nano, GPT-5.4 mini, Gemini 3 Flash (preview) Fast iteration, simple code edits, summarisation, classification.
Versatile ~0.5× – 1× Sonnet 4, Sonnet 4.5, Sonnet 4.6, GPT-5.2, GPT-5.2-Codex, GPT-5.3-Codex, Gemini 2.5 Pro, Gemini 3.1 Pro, Gemini 3.5 Flash The default for most real work. Strong reasoning at a sensible cost. Note: GPT-5.4 sits in this category but costs ~1.5× more than GPT-5.2 for similar tasks.
Versatile (heavier) ~1.5× – 3× GPT-5.4, GPT-5.5 Complex tasks requiring stronger reasoning. GPT-5.5 in particular carries a high output token cost — use it when GPT-5.2/Sonnet isn't cutting it.
Powerful ~5× Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8 Genuinely hard reasoning, architectural decisions, long-context analysis. All Opus 4.x variants are now similarly priced — pick the latest available.

Source: GitHub Copilot — models and pricing. Last verified: June 2026. Relative cost figures are approximate — actual cost depends on token count, not just model tier. Always check the official page for current per-token rates.

The Powerful tier rule of thumb: Before reaching for any Opus 4.x model, ask yourself "would a few Sonnet prompts solve this faster?" Opus 4.5, 4.6, 4.7, and 4.8 are now all similarly priced — so if you're going Powerful, use the latest available. Save it for the moments it genuinely earns its keep.

How you run a session matters as much as which model you pick

Once you're in an agentic app, the habits you bring to each session have a significant effect on cost — often as much as model choice itself. Here's why: every message you send re-transmits the entire conversation history to the model, not just your latest prompt. A session that started small grows with every turn — and that accumulated context is re-sent in full each time. The way you structure a session determines how fast that context bloats, how many requests you send, and how efficiently the underlying model can reuse work it's already done.

Session habits that save credits

  • One task per session. Start a session for a specific task, finish it, then start fresh. Accumulated context from unrelated work gets re-sent on every message.
  • Pick one model and stay with it. Switching models mid-session resets any efficiency the tool has built up. Decide upfront and commit.
  • Put stable context up front. Background information, instructions, and constraints should come at the start of a session — not buried mid-conversation where they shift on every turn.
  • Edit instruction files before you start. If you use a copilot-instructions.md or similar config, update it before opening the session — not while it's running.
  • Stay active within a session. Long idle gaps between messages mean the tool has to re-establish context when you return. Keep working or close the session.
  • Cluster work in time. Five scattered 10-minute sessions perform worse than one focused 50-minute block — each new session starts cold and has to rebuild cache from scratch.
  • Know what you want before you start. Use a free chat tool to think through the approach first. Arrive at the agentic session with a clear task, not a question.
  • Compact your context when a task is done. In OpenCode, run /compact at the end of a task to summarise the session history before starting the next one. This keeps context lean without losing the thread.

Session habits that waste credits

  • Leaving sessions open for days. Every new message re-sends the full accumulated context — even the parts from three tasks ago.
  • Switching models mid-session. You lose any efficiency gains the current model has built up and start cold on a new one.
  • Thinking out loud in the agentic app. Exploratory back-and-forth ("what if I did X… actually no, what about Y…") is expensive here. Do that in a free chat tool.
  • Editing config mid-session. Changing instructions or context files while a session is running forces a reset on the next request.
  • Returning to a cold idle session. A session left idle for an extended period has lost its warm state. Closing and starting fresh is usually more efficient.
  • Letting context accumulate across tasks. Every new message re-sends the full conversation history. A session you've been using all day is sending thousands of tokens of old context on every prompt — even when it's no longer relevant.
Prompt caching — the biggest single lever after model choice. VS Code 1.118+ (April 29, 2026) and other Copilot surfaces implement prompt caching: when the prefix of your request matches a recent one, it's served from cache at roughly one-tenth the cost of a full request. In an active, well-structured session, more than 93% of each request is served from cache. The session habits above are exactly what keeps that cache alive.

What busts the cache: switching models mid-session, editing your instruction or rules files while a session is running, or going idle for more than ~5 minutes.

Cache hit rate targets:
Hit rateWhat it looks like
15–25%Constant context-switching, many short unrelated sessions
30–50%Mixed workflow, some focused blocks
50–70%Focused, session-based work on a stable codebase — the target

Plan vs Build — and what it costs

At the bottom of the OpenCode interface you'll see two mode options: Plan and Build. They look similar but behave very differently — and the difference has a direct impact on your token and AI credit usage.

Plan mode

  • Read-only. OpenCode can explore your files, reason about the problem, and produce a plan — but it cannot write, edit, or run anything.
  • Lower token usage per prompt. Because no tool calls are executed, the context stays lighter. Good for scoping, understanding, and decision-making.
  • Low AI credit cost. Using /plan or staying in Plan mode for a task costs a single credit draw up front, regardless of how much the model reads.
  • Best for: thinking through an approach before committing, reviewing large codebases, drafting a structure you'll then hand off.

Build mode

  • Full autonomy. OpenCode can read, write, edit files, run shell commands, call APIs, and loop through multi-step tasks without asking for confirmation at each step.
  • Higher token accumulation. Each tool call (file read, command run, search) adds output back into the context window. Long agentic runs can build up significant context — and re-send it on every subsequent prompt.
  • Multiple AI credit draws. Every message you send in Build mode draws AI credits. A back-and-forth debugging session of ten messages at 1× each costs ten standard draws.
  • Best for: execution once you know what you want. Use Plan to figure out the approach, then switch to Build to carry it out.
The smart pattern: Start in Plan mode with a free or balanced model to scope the task and agree an approach. Then switch to Build mode — still on a balanced model — to execute. Reserve heavy models for the specific step that needs them, not the whole session. This single habit can cut a complex task from 20+ AI credit draws down to 5–8.

Cheap habits vs expensive habits

Once you can see your usage and you're choosing models deliberately, the next layer of savings comes from how you structure each session. Same outcome, very different cost:

Cheap habits

  • Reuse past outputs. Point OpenCode at a previous transcript, context pack, or saved skill instead of asking it to re-derive what you already have.
  • Be specific about output. "Give me a 3-bullet exec summary" produces less text — and costs less — than "tell me about X".
  • Start cheap, escalate only if needed. Begin with a free or balanced model. Only switch to a heavier one if the first answer isn't good enough.
  • End sessions when you switch tasks. A fresh session has zero accumulated context to re-send on every message.
  • Use Plan mode for big work. Low AI credit cost up front, then execute the plan with cheaper models.
  • Trim attached files. Drop the 200-page PDF out of context once you've extracted what you need.
  • Point at specific files, not the whole repo. "Look at src/auth/login.ts" costs far less than "look at the codebase and find where login is handled." The more targeted your instruction, the less the agent has to scan — and the less context it accumulates.

Expensive habits

  • Defaulting to Opus 4.7 for everything. Routine tasks on the heaviest model is the single fastest way to burn through your allowance.
  • Open-ended prompts. "Tell me everything about X" forces the model to produce — and you to pay for — output you didn't need.
  • Leaving sessions open for days. Every new message re-sends the full transcript, even the parts that are no longer relevant.
  • Re-asking instead of scrolling up. If you already got the answer earlier in the session, scroll. Don't pay to regenerate it.
  • Pasting huge files for tiny questions. If you only need one function from a 5,000-line file, paste the function.
  • "Just one more clarification…" loops. Five small follow-ups cost more than one well-formed prompt would have.
  • Re-adding a file that's already in context. If you attached a file earlier in the session, it's still there — attaching it again doubles the token cost without adding any information. Check what's already in context before adding more.

Where to see your usage

The fastest way to check your own consumption:

github.com/settings/copilot/features

This shows your premium requests used this month, your remaining balance, and which models you've been using. Bookmark it — it's the single most useful page for managing your own usage. If the dashboard exposes a cache hit %, track it: aim for 50% or above. Anything below 30% is a signal that sessions are too fragmented or context is shifting too much between calls.

Personal soft-cap & shared responsibility. Under the AI Credits model, costs can accumulate quickly mid-month without you noticing. Define a soft-cap for yourself — something like "if I'm tracking above $X by mid-month, I'll push more routine work to a free model for the rest of the month." Check your pace weekly, not just at month-end. The team draws from the same credit pool, so a monthly glance at aggregate usage helps catch individual drift before it becomes a budget problem for everyone.
⚠️ Real-time token counts (inside OpenCode itself)

Plugins like opencode-tokenscope and opencode-quota-sidebar can surface live usage in the status bar. Based on web research these look promising — but they have not been fully tested and have caused session instability in some cases. The GitHub usage page above is the recommended option for now.

A simple weekly check

Every Monday morning, take 60 seconds:

  1. Open your personal Copilot usage page — are you on pace for the month?
  2. Ask yourself: was there a task last week where I used a heavy model when a balanced one would have done? That's the optimisation for this week.
  3. Check whether you were mostly in Build mode when Plan mode would have been enough for scoping — that's often where hidden cost sits.

That's it. Tracking doesn't need to be elaborate to be useful.

Quick-start checklist

Copy this into your workflow notes or bookmark it:

What to do today

  1. Bookmark github.com/settings/copilot/features so you can check your usage in two clicks.
  2. Make a conscious model choice on your next session — pick the cheapest model that can plausibly do the job, and only escalate if it can't.
  3. Try the Plan → Build pattern on your next multi-step task. Scope in Plan mode, execute in Build mode, and notice the difference in how many messages it takes.
  4. Next time you're about to open an agentic app to research something or think through an approach — open a free chat tool instead. Save the agentic session for when you know what you want to build.