Agentic apps cost credits even for questions a free chat tool handles better. Start with Circuit or Claude.ai; save OpenCode for actual code work.
Jump to section → 02 Model choiceOne Opus 4.7 prompt costs the same as 15 Sonnet 4.5 prompts. Default to cheap — Haiku or a free model — and escalate only when the output isn't good enough.
Jump to section → 03 Prompt cachingIn a well-structured session, 93%+ of each request is served from cache at ~one-tenth the cost. Switching models or going idle busts it instantly.
Jump to section → 04 Session habitsOne task per session. Stable context up front. Cluster work in time — five scattered 10-minute sessions cost more than one focused 50-minute block.
Jump to section → 05 Plan before you BuildScope a task in Plan mode with a cheap model first, then execute in Build mode. One AI credit draw up front instead of ten back-and-forth.
Jump to section →How billing works
As of June 1, 2026, GitHub Copilot uses consumption-based billing measured in GitHub AI Credits (1 credit = $0.01 USD), pooled across the organisation. Cost varies significantly by model — see the table below.
A $1,000 per-user monthly spending cap is now enforced. If you hit it, your Copilot access is halted until the next billing cycle — even if the org pool still has capacity. A promotional period (June 1 – September 1) provides higher included credits while teams baseline their usage — after that, standard per-seat amounts apply.
Use the right tool for the job
Before thinking about which model to use, ask a more fundamental question: does this task actually need an agentic app at all? Agentic tools like OpenCode, VS Code Copilot, and Claude Code are powerful — but they're optimised for writing and editing code in a repo. A lot of daily AI work doesn't need that.
If you're researching a topic, summarising a document, brainstorming an approach, or just asking a question — a free chat tool like Circuit, ChatGPT, or Claude.ai handles all of that with zero impact on your Copilot credits. The same 1,000-token conversation that costs nothing in a chat tool draws down your allowance in an agentic app.
Use a free chat tool when…
- Researching a topic or technology
- Summarising a document or meeting notes
- Brainstorming or exploring an idea
- Asking a one-off question with no code involved
- Comparing options or drafting a plan
- Long-context Q&A (Circuit supports 1M token context)
No Copilot credits consumed.
Use an agentic app when…
- Writing or editing code in a repo
- Running multi-file refactoring or test generation
- Executing shell commands or build steps
- Working autonomously across a codebase
- Using tools, MCPs, or file operations
Copilot credits consumed — use them deliberately.
Use the right model for the job
The single biggest lever you have is model choice. The same prompt, sent to two different models, can cost anywhere from zero to 30× a standard request. Knowing which bucket each model falls into is most of the battle.
- Claude Haiku 4.5 (Lightweight, ~0.2×) — your default for most things. Drafting, quick questions, summarisation, simple tasks. Start here every session.
- Claude Sonnet 4.5 / 4.6 (Versatile, ~1×) — step up to this when you need sharper reasoning, careful analysis, or real work.
- Claude Opus 4.7 / 4.8 (Powerful, ~5×) — for genuinely hard problems only: architectural decisions, complex multi-step work, or when Sonnet hasn't been enough. Use deliberately, not by default.
Note: the AI model landscape changes fast — new models appear and pricing shifts regularly. Treat this as a starting point, not a permanent recommendation.
GitHub publishes per-token pricing for all models. Here's the current picture grouped by category, with approximate relative cost to help with day-to-day decisions:
| Category | Approx. relative cost | Models | Best for |
|---|---|---|---|
| Included / Free | 0× (unlimited) | GPT-4.1, GPT-5 mini, Raptor mini (preview) | Drafting, reformatting, lookups, anything routine. Start here. |
| Lightweight | ~0.1× – 0.3× | Haiku 4.5, GPT-5.4 nano, GPT-5.4 mini, Gemini 3 Flash (preview) | Fast iteration, simple code edits, summarisation, classification. |
| Versatile | ~0.5× – 1× | Sonnet 4, Sonnet 4.5, Sonnet 4.6, GPT-5.2, GPT-5.2-Codex, GPT-5.3-Codex, Gemini 2.5 Pro, Gemini 3.1 Pro, Gemini 3.5 Flash | The default for most real work. Strong reasoning at a sensible cost. Note: GPT-5.4 sits in this category but costs ~1.5× more than GPT-5.2 for similar tasks. |
| Versatile (heavier) | ~1.5× – 3× | GPT-5.4, GPT-5.5 | Complex tasks requiring stronger reasoning. GPT-5.5 in particular carries a high output token cost — use it when GPT-5.2/Sonnet isn't cutting it. |
| Powerful | ~5× | Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8 | Genuinely hard reasoning, architectural decisions, long-context analysis. All Opus 4.x variants are now similarly priced — pick the latest available. |
Source: GitHub Copilot — models and pricing. Last verified: June 2026. Relative cost figures are approximate — actual cost depends on token count, not just model tier. Always check the official page for current per-token rates.
How you run a session matters as much as which model you pick
Once you're in an agentic app, the habits you bring to each session have a significant effect on cost — often as much as model choice itself. Here's why: every message you send re-transmits the entire conversation history to the model, not just your latest prompt. A session that started small grows with every turn — and that accumulated context is re-sent in full each time. The way you structure a session determines how fast that context bloats, how many requests you send, and how efficiently the underlying model can reuse work it's already done.
Session habits that save credits
- One task per session. Start a session for a specific task, finish it, then start fresh. Accumulated context from unrelated work gets re-sent on every message.
- Pick one model and stay with it. Switching models mid-session resets any efficiency the tool has built up. Decide upfront and commit.
- Put stable context up front. Background information, instructions, and constraints should come at the start of a session — not buried mid-conversation where they shift on every turn.
- Edit instruction files before you start. If you use a
copilot-instructions.mdor similar config, update it before opening the session — not while it's running. - Stay active within a session. Long idle gaps between messages mean the tool has to re-establish context when you return. Keep working or close the session.
- Cluster work in time. Five scattered 10-minute sessions perform worse than one focused 50-minute block — each new session starts cold and has to rebuild cache from scratch.
- Know what you want before you start. Use a free chat tool to think through the approach first. Arrive at the agentic session with a clear task, not a question.
- Compact your context when a task is done. In OpenCode, run
/compactat the end of a task to summarise the session history before starting the next one. This keeps context lean without losing the thread.
Session habits that waste credits
- Leaving sessions open for days. Every new message re-sends the full accumulated context — even the parts from three tasks ago.
- Switching models mid-session. You lose any efficiency gains the current model has built up and start cold on a new one.
- Thinking out loud in the agentic app. Exploratory back-and-forth ("what if I did X… actually no, what about Y…") is expensive here. Do that in a free chat tool.
- Editing config mid-session. Changing instructions or context files while a session is running forces a reset on the next request.
- Returning to a cold idle session. A session left idle for an extended period has lost its warm state. Closing and starting fresh is usually more efficient.
- Letting context accumulate across tasks. Every new message re-sends the full conversation history. A session you've been using all day is sending thousands of tokens of old context on every prompt — even when it's no longer relevant.
What busts the cache: switching models mid-session, editing your instruction or rules files while a session is running, or going idle for more than ~5 minutes.
Cache hit rate targets:
| Hit rate | What it looks like |
|---|---|
| 15–25% | Constant context-switching, many short unrelated sessions |
| 30–50% | Mixed workflow, some focused blocks |
| 50–70% | Focused, session-based work on a stable codebase — the target |
Plan vs Build — and what it costs
At the bottom of the OpenCode interface you'll see two mode options: Plan and Build. They look similar but behave very differently — and the difference has a direct impact on your token and AI credit usage.
Plan mode
- Read-only. OpenCode can explore your files, reason about the problem, and produce a plan — but it cannot write, edit, or run anything.
- Lower token usage per prompt. Because no tool calls are executed, the context stays lighter. Good for scoping, understanding, and decision-making.
- Low AI credit cost. Using
/planor staying in Plan mode for a task costs a single credit draw up front, regardless of how much the model reads. - Best for: thinking through an approach before committing, reviewing large codebases, drafting a structure you'll then hand off.
Build mode
- Full autonomy. OpenCode can read, write, edit files, run shell commands, call APIs, and loop through multi-step tasks without asking for confirmation at each step.
- Higher token accumulation. Each tool call (file read, command run, search) adds output back into the context window. Long agentic runs can build up significant context — and re-send it on every subsequent prompt.
- Multiple AI credit draws. Every message you send in Build mode draws AI credits. A back-and-forth debugging session of ten messages at 1× each costs ten standard draws.
- Best for: execution once you know what you want. Use Plan to figure out the approach, then switch to Build to carry it out.
Cheap habits vs expensive habits
Once you can see your usage and you're choosing models deliberately, the next layer of savings comes from how you structure each session. Same outcome, very different cost:
Cheap habits
- Reuse past outputs. Point OpenCode at a previous transcript, context pack, or saved skill instead of asking it to re-derive what you already have.
- Be specific about output. "Give me a 3-bullet exec summary" produces less text — and costs less — than "tell me about X".
- Start cheap, escalate only if needed. Begin with a free or balanced model. Only switch to a heavier one if the first answer isn't good enough.
- End sessions when you switch tasks. A fresh session has zero accumulated context to re-send on every message.
- Use Plan mode for big work. Low AI credit cost up front, then execute the plan with cheaper models.
- Trim attached files. Drop the 200-page PDF out of context once you've extracted what you need.
- Point at specific files, not the whole repo. "Look at
src/auth/login.ts" costs far less than "look at the codebase and find where login is handled." The more targeted your instruction, the less the agent has to scan — and the less context it accumulates.
Expensive habits
- Defaulting to Opus 4.7 for everything. Routine tasks on the heaviest model is the single fastest way to burn through your allowance.
- Open-ended prompts. "Tell me everything about X" forces the model to produce — and you to pay for — output you didn't need.
- Leaving sessions open for days. Every new message re-sends the full transcript, even the parts that are no longer relevant.
- Re-asking instead of scrolling up. If you already got the answer earlier in the session, scroll. Don't pay to regenerate it.
- Pasting huge files for tiny questions. If you only need one function from a 5,000-line file, paste the function.
- "Just one more clarification…" loops. Five small follow-ups cost more than one well-formed prompt would have.
- Re-adding a file that's already in context. If you attached a file earlier in the session, it's still there — attaching it again doubles the token cost without adding any information. Check what's already in context before adding more.
Where to see your usage
The fastest way to check your own consumption:
github.com/settings/copilot/features
This shows your premium requests used this month, your remaining balance, and which models you've been using. Bookmark it — it's the single most useful page for managing your own usage. If the dashboard exposes a cache hit %, track it: aim for 50% or above. Anything below 30% is a signal that sessions are too fragmented or context is shifting too much between calls.
Plugins like opencode-tokenscope and opencode-quota-sidebar can surface live usage in the status bar. Based on web research these look promising — but they have not been fully tested and have caused session instability in some cases. The GitHub usage page above is the recommended option for now.
A simple weekly check
Every Monday morning, take 60 seconds:
- Open your personal Copilot usage page — are you on pace for the month?
- Ask yourself: was there a task last week where I used a heavy model when a balanced one would have done? That's the optimisation for this week.
- Check whether you were mostly in Build mode when Plan mode would have been enough for scoping — that's often where hidden cost sits.
That's it. Tracking doesn't need to be elaborate to be useful.
Quick-start checklist
Copy this into your workflow notes or bookmark it:
- Default to a local or free model for routine work — reserve premium calls for genuinely hard problems
- One task → one session → one model
- Stable context first (project rules, file context), variable content last (today's specific question)
- Custom instructions locked — change deliberately, not casually
- Cluster work in time — focused blocks beat scattered sessions
- Weekly usage check — catch drift early, don't wait until month-end
- Personal soft-cap defined — know your mid-month trigger point
What to do today
- Bookmark
github.com/settings/copilot/featuresso you can check your usage in two clicks. - Make a conscious model choice on your next session — pick the cheapest model that can plausibly do the job, and only escalate if it can't.
- Try the Plan → Build pattern on your next multi-step task. Scope in Plan mode, execute in Build mode, and notice the difference in how many messages it takes.
- Next time you're about to open an agentic app to research something or think through an approach — open a free chat tool instead. Save the agentic session for when you know what you want to build.