Efficient Usage · GitHub Copilot Best Practices

Key insight: Most people who run out of credits aren't using OpenCode "too much" — they're using it inefficiently. Picking the right model for the task and reusing past work typically cuts consumption by 50% or more, with no loss in output quality.

Context quality = output quality, not just cost. Sending too much irrelevant context doesn't just cost more — it actively makes the model worse. The model considers everything in the conversation window when forming a response, so irrelevant noise biases it toward lower-quality answers. Too little context causes it to fill gaps with guesses. The goal is as little as needed, but as much as required — that sweet spot improves both the quality of what you get back and the cost of getting it.

Top 5 levers — highest impact first

01 Right tool for the job

Agentic apps cost credits even for questions a free chat tool handles better. Start with a free chat tool; save OpenCode for actual code work.

Jump to section → 02 Model choice

One Opus 4.7 prompt costs the same as 15 Sonnet 4.5 prompts. Default to cheap — Haiku or a free model — and escalate only when the output isn't good enough.

Jump to section → 03 Prompt caching

In a well-structured session, 93%+ of each request is served from cache at ~one-tenth the cost. Switching models or going idle busts it instantly.

Jump to section → 04 Session habits

One task per session. Stable context up front. Cluster work in time — five scattered 10-minute sessions cost more than one focused 50-minute block.

Jump to section → 05 Plan before you Build

Scope a task in Plan mode with a cheap model first, then execute in Build mode. One AI credit draw up front instead of ten back-and-forth.

Jump to section →

How billing works

As of June 1, 2026, GitHub Copilot uses consumption-based billing measured in GitHub AI Credits (1 credit = $0.01 USD), pooled across the organisation. Cost varies significantly by model — see the table below.

Use the right tool for the job

Before thinking about which model to use, ask a more fundamental question: does this task actually need an agentic app at all? Agentic tools like OpenCode, VS Code Copilot, and Claude Code are powerful — but they're optimised for writing and editing code in a repo. A lot of daily AI work doesn't need that.

If you're researching a topic, summarising a document, brainstorming an approach, or just asking a question — a free chat tool like ChatGPT, Claude.ai, or similar handles all of that with zero impact on your Copilot credits. The same 1,000-token conversation that costs nothing in a chat tool draws down your allowance in an agentic app.

Use a free chat tool when…

Researching a topic or technology
Summarising a document or meeting notes
Brainstorming or exploring an idea
Asking a one-off question with no code involved
Comparing options or drafting a plan
Long-context Q&A (some free tools support 1M+ token context)

No Copilot credits consumed.

Use an agentic app when…

Writing or editing code in a repo
Running multi-file refactoring or test generation
Executing shell commands or build steps
Working autonomously across a codebase
Using tools, MCPs, or file operations

Copilot credits consumed — use them deliberately.

Practical tip: If you're about to open an agentic app to figure out your approach to a task, consider drafting that thinking in a free chat tool first. Arrive at the agentic session knowing what you want — don't use it to think out loud.

Use the right model for the job

The single biggest lever you have is model choice. The same prompt, sent to two different models, can cost anywhere from zero to 30× a standard request. Knowing which bucket each model falls into is most of the battle.

How to change models in OpenCode: The current model is shown in the bottom status bar. Click it to switch at any point — the new model takes effect on your next message. To control which models appear in that list, go to Settings → Models and toggle on the ones you want.

Recommended starter set — three models to enable: The list can get overwhelming. A simple starting point is to enable just these three and turn the rest off for now — you can always come back and add more later.

Claude Haiku 4.5 (Lightweight, ~0.2×) — your default for most things. Drafting, quick questions, summarisation, simple tasks. Start here every session.
Claude Sonnet 4.5 / 4.6 (Versatile, ~1×) — step up to this when you need sharper reasoning, careful analysis, or real work.
Claude Opus 4.7 / 4.8 (Powerful, ~5×) — for genuinely hard problems only: architectural decisions, complex multi-step work, or when Sonnet hasn't been enough. Use deliberately, not by default.

Note: the AI model landscape changes fast — new models appear and pricing shifts regularly. Treat this as a starting point, not a permanent recommendation.

GitHub publishes per-token pricing for all models. Here's the current picture grouped by category, with approximate relative cost to help with day-to-day decisions:

Category	Approx. relative cost	Models	Best for
Included / Free	0× (unlimited)	GPT-4.1, GPT-5 mini, Raptor mini (preview)	Drafting, reformatting, lookups, anything routine. Start here.
Lightweight	~0.1× – 0.3×	Haiku 4.5, GPT-5.4 nano, GPT-5.4 mini, Gemini 3 Flash (preview)	Fast iteration, simple code edits, summarisation, classification.
Versatile	~0.5× – 1×	Sonnet 4, Sonnet 4.5, Sonnet 4.6, GPT-5.2, GPT-5.2-Codex, GPT-5.3-Codex, Gemini 2.5 Pro, Gemini 3.1 Pro, Gemini 3.5 Flash	The default for most real work. Strong reasoning at a sensible cost. Note: GPT-5.4 sits in this category but costs ~1.5× more than GPT-5.2 for similar tasks.
Versatile (heavier)	~1.5× – 3×	GPT-5.4, GPT-5.5	Complex tasks requiring stronger reasoning. GPT-5.5 in particular carries a high output token cost — use it when GPT-5.2/Sonnet isn't cutting it.
Powerful	~5×	Opus 4.5, Opus 4.6, Opus 4.7, Opus 4.8	Genuinely hard reasoning, architectural decisions, long-context analysis. All Opus 4.x variants are now similarly priced — pick the latest available.

Source: GitHub Copilot — models and pricing. Last verified: June 2026. Relative cost figures are approximate — actual cost depends on token count, not just model tier. Always check the official page for current per-token rates.

The Powerful tier rule of thumb: Before reaching for any Opus 4.x model, ask yourself "would a few Sonnet prompts solve this faster?" Opus 4.5, 4.6, 4.7, and 4.8 are now all similarly priced — so if you're going Powerful, use the latest available. Save it for the moments it genuinely earns its keep.

How you run a session matters as much as which model you pick

Once you're in an agentic app, the habits you bring to each session have a significant effect on cost — often as much as model choice itself. Here's why: every message you send re-transmits the entire conversation history to the model, not just your latest prompt. A session that started small grows with every turn — and that accumulated context is re-sent in full each time. The way you structure a session determines how fast that context bloats, how many requests you send, and how efficiently the underlying model can reuse work it's already done.

How the context window works against you. The model doesn't treat everything in the conversation window equally. Where content sits matters — and it creates two failure modes worth knowing about.

Beginning

Instructions & original prompt

Always prioritised

│

Middle

Earlier work & past turns

Deprioritised — the danger zone

│

End

Your most recent message

Always prioritised

Lost in the middle (session under ~50% full): the model biases toward the start and end of the window. Switch tasks mid-session — say, from a bug fix to a new feature — and it can snap back to the original task, because that's what's at the beginning.

Recency bias (session over ~50–60% full): the model starts prioritising only the most recent context. Earlier instructions — including your original prompt and agent config — get deprioritised. The model starts doing things you don't recognise, driven solely by the last few turns.

The fix: one task per session. Start fresh for each distinct task rather than piling new work into a running session.

Session habits that save credits

One task per session. Start a session for a specific task, finish it, then start fresh. Accumulated context from unrelated work gets re-sent on every message.
Pick one model and stay with it. Switching models mid-session resets any efficiency the tool has built up. Decide upfront and commit.
Put stable context up front. Background information, instructions, and constraints should come at the start of a session — not buried mid-conversation where they shift on every turn.
Edit instruction files before you start. If you use a copilot-instructions.md or similar config, update it before opening the session — not while it's running.
Stay active within a session. Long idle gaps between messages mean the tool has to re-establish context when you return. Keep working or close the session.
Cluster work in time. Five scattered 10-minute sessions perform worse than one focused 50-minute block — each new session starts cold and has to rebuild cache from scratch.
Know what you want before you start. Use a free chat tool to think through the approach first. Arrive at the agentic session with a clear task, not a question.
Compact your context when a task is done. In OpenCode, run /compact at the end of a task to summarise the session history before starting the next one. This keeps context lean without losing the thread.

Session habits that waste credits

Leaving sessions open for days. Every new message re-sends the full accumulated context — even the parts from three tasks ago.
Switching models mid-session. You lose any efficiency gains the current model has built up and start cold on a new one.
Thinking out loud in the agentic app. Exploratory back-and-forth ("what if I did X… actually no, what about Y…") is expensive here. Do that in a free chat tool.
Editing config mid-session. Changing instructions or context files while a session is running forces a reset on the next request.
Returning to a cold idle session. A session left idle for an extended period has lost its warm state. Closing and starting fresh is usually more efficient.
Letting context accumulate across tasks. Every new message re-sends the full conversation history. A session you've been using all day is sending thousands of tokens of old context on every prompt — even when it's no longer relevant.

Prompt caching — the biggest single lever after model choice. VS Code 1.118+ (April 29, 2026) and other Copilot surfaces implement prompt caching: when the prefix of your request matches a recent one, it's served from cache at roughly one-tenth the cost of a full request. In an active, well-structured session, more than 93% of each request is served from cache. The session habits above are exactly what keeps that cache alive.

What busts the cache: switching models mid-session, editing your instruction or rules files while a session is running, or going idle for more than ~5 minutes.

Cache hit rate targets:

Hit rate	What it looks like
15–25%	Constant context-switching, many short unrelated sessions
30–50%	Mixed workflow, some focused blocks
50–70%	Focused, session-based work on a stable codebase — the target

Plan vs Build — and what it costs

At the bottom of the OpenCode interface you'll see two mode options: Plan and Build. They look similar but behave very differently — and the difference has a direct impact on your token and AI credit usage.

Plan mode

Read-only. OpenCode can explore your files, reason about the problem, and produce a plan — but it cannot write, edit, or run anything.
Lower token usage per prompt. Because no tool calls are executed, the context stays lighter. Good for scoping, understanding, and decision-making.
Low AI credit cost. Using /plan or staying in Plan mode for a task costs a single credit draw up front, regardless of how much the model reads.
Best for: thinking through an approach before committing, reviewing large codebases, drafting a structure you'll then hand off.

Build mode

Full autonomy. OpenCode can read, write, edit files, run shell commands, call APIs, and loop through multi-step tasks without asking for confirmation at each step.
Higher token accumulation. Each tool call (file read, command run, search) adds output back into the context window. Long agentic runs can build up significant context — and re-send it on every subsequent prompt.
Multiple AI credit draws. Every message you send in Build mode draws AI credits. A back-and-forth debugging session of ten messages at 1× each costs ten standard draws.
Best for: execution once you know what you want. Use Plan to figure out the approach, then switch to Build to carry it out.

The smart pattern: Start in Plan mode with a free or balanced model to scope the task and agree an approach. Then switch to Build mode — still on a balanced model — to execute. Reserve heavy models for the specific step that needs them, not the whole session. This single habit can cut a complex task from 20+ AI credit draws down to 5–8.

The three-phase pattern — and why they should be separate sessions

Plan and Build are two modes, but Research, Plan, and Build are three distinct phases — and ideally they happen in separate sessions, not just separate modes in the same one. Research loads a lot of context (files, documentation, exploratory back-and-forth) most of which is irrelevant to implementation. Carrying that forward into the Build phase bloats context and degrades the quality of what gets built. The better pattern: research in a free chat tool or short standalone session, produce a clear spec, then start a fresh Build session from that spec.

Phase 1

Research

Free chat tool or short standalone session. Ask questions, explore context, understand the problem.

→

Phase 2

Plan

Plan mode. Scope the task, agree the approach, produce a clear spec. Fresh session — no research baggage.

→

Phase 3

Build

New Build session, starting from the spec. Lean context, focused execution, no irrelevant history.

Cheap habits vs expensive habits

Once you can see your usage and you're choosing models deliberately, the next layer of savings comes from how you structure each session. Same outcome, very different cost:

Cheap habits

Reuse past outputs. Point OpenCode at a previous transcript, context pack, or saved skill instead of asking it to re-derive what you already have.
Be specific about output. "Give me a 3-bullet exec summary" produces less text — and costs less — than "tell me about X".
Start cheap, escalate only if needed. Begin with a free or balanced model. Only switch to a heavier one if the first answer isn't good enough.
End sessions when you switch tasks. A fresh session has zero accumulated context to re-send on every message.
Use Plan mode for big work. Low AI credit cost up front, then execute the plan with cheaper models.
Trim attached files. Drop the 200-page PDF out of context once you've extracted what you need.
Point at specific files, not the whole repo. "Look at src/auth/login.ts" costs far less than "look at the codebase and find where login is handled." The more targeted your instruction, the less the agent has to scan — and the less context it accumulates.
Tell the agent when to stop. End your prompt with a clear finish line — "Once the task is complete and tests pass, stop." Without one, agents tend to keep going: committing, pushing, tidying, linking. You pay for every step.

Expensive habits

Defaulting to Opus 4.7 for everything. Routine tasks on the heaviest model is the single fastest way to burn through your allowance.
Open-ended prompts. "Tell me everything about X" forces the model to produce — and you to pay for — output you didn't need.
Leaving sessions open for days. Every new message re-sends the full transcript, even the parts that are no longer relevant.
Re-asking instead of scrolling up. If you already got the answer earlier in the session, scroll. Don't pay to regenerate it.
Pasting huge files for tiny questions. If you only need one function from a 5,000-line file, paste the function.
"Just one more clarification…" loops. Five small follow-ups cost more than one well-formed prompt would have.
Re-adding a file that's already in context. If you attached a file earlier in the session, it's still there — attaching it again doubles the token cost without adding any information. Check what's already in context before adding more.

Your agent config file

Most agentic tools load a persistent config file at the start of every session — often called copilot-instructions.md, AGENTS.md, or similar depending on the tool. Whatever it's named, the contents get included in every single prompt you send. That makes it one of the most quietly expensive things on the page — and one of the most impactful to get right.

Write it yourself — don't use AI to generate it. AI-generated config files tend to be verbose and full of generic advice the model already knows. That costs tokens on every turn without adding signal. You know your project. Be precise. A good config file is short, specific, and full of things only you could have written: domain knowledge, guardrails, corrections for recurring errors, conventions the model won't infer on its own.

Treat it as a living document. Add a line when you see the agent repeat a mistake. Remove lines that are no longer relevant. Revisit it every few months — models improve, projects evolve, and stale instructions quietly add noise to every session without you noticing.

What to put in

Project guardrails and non-negotiables — things the agent must always or never do
Corrections for recurring errors you've seen the agent make
Conventions specific to your project that the model won't know
Domain knowledge only you have — context the model can't infer on its own

What to leave out

Generic coding advice the model already knows
Full documentation files or long prose guides — link to them instead
Instructions that are outdated or no longer apply
AI-generated boilerplate — verbose, imprecise, and costly on every turn

Where to see your usage

The fastest way to check your own consumption:

github.com/settings/copilot/features

This shows your premium requests used this month, your remaining balance, and which models you've been using. Bookmark it — it's the single most useful page for managing your own usage. If the dashboard exposes a cache hit %, track it: aim for 50% or above. Anything below 30% is a signal that sessions are too fragmented or context is shifting too much between calls.

⚠️ Real-time token counts (inside OpenCode itself)

Plugins like opencode-tokenscope and opencode-quota-sidebar can surface live usage in the status bar. Based on web research these look promising — but they have not been fully tested and have caused session instability in some cases. The GitHub usage page above is the recommended option for now.

A simple weekly check

Every Monday morning, take 60 seconds:

Open your personal Copilot usage page — are you on pace for the month?
Ask yourself: was there a task last week where I used a heavy model when a balanced one would have done? That's the optimisation for this week.
Check whether you were mostly in Build mode when Plan mode would have been enough for scoping — that's often where hidden cost sits.

That's it. Tracking doesn't need to be elaborate to be useful.

Quick-start checklist

Copy this into your workflow notes or bookmark it:

Default to a local or free model for routine work — reserve premium calls for genuinely hard problems
One task → one session → one model
Stable context first (project rules, file context), variable content last (today's specific question)
Custom instructions locked — change deliberately, not casually
Cluster work in time — focused blocks beat scattered sessions
Weekly usage check — catch drift early, don't wait until month-end

What to do today

Bookmark github.com/settings/copilot/features so you can check your usage in two clicks.
Make a conscious model choice on your next session — pick the cheapest model that can plausibly do the job, and only escalate if it can't.
Try the Plan → Build pattern on your next multi-step task. Scope in Plan mode, execute in Build mode, and notice the difference in how many messages it takes.
Next time you're about to open an agentic app to research something or think through an approach — open a free chat tool instead. Save the agentic session for when you know what you want to build.