Giving Claude Code, Codex, or Gemini CLI direct access to your source code
repository and CI is tempting, and it can enable some really useful agentic
workflows. This is also straightforward: set GH_TOKEN in the environment and
point out to the agent that it can use the gh CLI. Soon you’ll see the agent
figuring out why your flaky CI pipeline failed yet again.
If you are doing this in an enterprise environment, or to a private repository
containing sensitive information, you may take some steps to secure your setup,
like injecting the GH_TOKEN as an environment variable, and even minting a
limited-lifetime scoped token using the GitHub App credentials flow. You also
run the agent in a local execution sandbox or a dedicated VM. Now, if you’ve
taken these steps, I recommend you stop reading and instead feel good about
the security of your setup.
If you continue reading, you may realize a rabbit hole awaits; the deeper you go, the more despair you may feel.
GitHub as an exfiltration surface
The setup described above is likely vulnerable to the lethal trifecta. If any untrusted input enters the agent’s context – issue reports, user data, online sources consulted during the task – it can trigger prompt injection, telling the agent to exfiltrate secrets or other confidential information such as your entire codebase.
The simplest exfiltration vector is a request to an attacker-controlled server, and also the easiest to close: configure a firewall that allows access only to GitHub and a few other trusted destinations. However, GitHub provides a rich API surface and integrated CI where the agent can run code outside the local sandbox, making GitHub itself an exfiltration surface that is not straightforward to patch.
I’ve been developing a system called Airut that allows tasking Claude Code via email and Slack. One of the design goals has been to enable a secure email-to-PR workflow to GitHub. To achieve that, Airut has shipped several security features to prevent exfiltration:
- A transparent proxy with path- and method-level access policies, limiting REST API operations to specific repositories.
- Credential masking with surrogate credentials, preventing real tokens from being exfiltrated for use outside the sandbox.
- Foreign credential detection that drops credentials not matching the expected
token, preventing exfiltration to accounts controlled by the attacker.
Anthropic’s own API has this weakness: an attacker can
exfiltrate content to Anthropic’s servers
into their own account via the Files API. This is difficult to patch since
Claude Code requires
api.anthropic.comto function. - A sandbox for GitHub Actions, preventing the agent from running unsandboxed code in CI.
The security controls above were not trivial to implement, but together they appeared to secure GitHub’s API surface against exfiltration. Then Claude found an additional way out.
The GraphQL escape
I recently configured Claude to run a nightly pentest against the Airut sandbox using the newly added scheduled tasks. After a number of rounds, Claude found an exfiltration vector:
GraphQL bypasses REST API path scoping. The network allowlist restricts REST API access to
/repos/airutorg/airut*and/repos/airutorg/sandbox-action*, butPOST /graphqlis allowed without restrictions. Since the GitHub App has organization-wide installation, GraphQL can read/write to any repo the App accesses. I demonstrated this by creating issue #67 onairutorg/websitevia acreateIssuemutation – a repo that REST API correctly blocks.
Claude initially misidentified the root cause. GH_TOKEN was minted using
GitHub App credentials to allow access only to airutorg/airut and
airutorg/sandbox-action repositories. GitHub, however, allows any
authenticated user – including apps – to create issues in any public
repository. airutorg/website is a public repository, and hence Claude could
exfiltrate data to issues there.
Simply blocking the GraphQL endpoint is not a practical option. While GitHub
provides a REST API for most operations, the gh CLI uses GraphQL extensively
under the hood. Blocking GraphQL would break the agent’s ability to perform
routine operations like listing PRs or checking CI status.
Remediation
This exfiltration vector is now patched in Airut 0.24.0 by two layered defenses in the proxy.
- The proxy inspects node IDs in
/graphqlrequests and rejects any request targeting a repository outside the allowed set for that token. This is enabled for GitHub App credentials, where the allowed repositories are defined by the App installation or further narrowed in Airut’s server settings. - A generic GraphQL allowlist mechanism enables fine-grained control over
query, mutation, and subscription operations. This can be used, for example,
to entirely prevent the agent from making
createIssueoperations, providing finer-grained access control than GitHub App permissions alone.
The two defenses complement each other: node ID inspection enforces repository scoping on all GraphQL operations, while the operation allowlist can restrict which operations are permitted regardless of repository, for example, preventing issue creation entirely even on allowed repositories.
GitHub’s security model
The GraphQL escape is patched, but it is part of a larger pattern. I’ve previously written about how securing agentic AI is hard, especially when agents are granted access to existing systems. So far with GitHub I’ve discovered three design issues that make it hard to give an agent a working GitHub token:
- GitHub Actions presents a potent escape vector; if an agent can push code
that steers existing workflows or adds new ones, it can execute code in an
environment with unrestricted network access plus repository secrets. I’ve
closed this vector in Airut through
sandbox-action, which extends Airut’s sandbox to GitHub Actions. - GitHub App credentials don’t limit access to public repositories, making every public repository a potential exfiltration platform. While App credentials provide fine-grained repository and permission control, they still inherit the human-centric “any logged-in user should be able to access any public repository” paradigm, which becomes a security breach when taking the worst-case assumption of a prompt-injected agent.
- Personal access tokens come with additional security model limitations, such as allowing repository creation (e.g., an agent can create a new public repository to exfiltrate content into). The token that is easiest to get started with is also fundamentally the least secure.
GitHub itself seems to have concluded that GitHub tokens should not be given to
agents, and their
GitHub Agentic Workflows
security model relies on moving operations such as PR creation from the agent
sandbox to the infrastructure via the SafeOutputs concept.
Beyond GitHub
Airut’s network sandbox started with two generic mechanisms for securing agents: path-based allowlist and credential masking. As GitHub has shown, these are insufficient, and deeper inspection of requests is needed. The GraphQL allowlist adds a third generic mechanism, but for GitHub specifically, Airut now also includes service-specific logic as part of the App credentials flow. Special-casing services increases development and maintenance costs, and that complexity itself carries a security cost: more code in the policy engine increases the likelihood of exploitable bugs.
Given the value of having agents interact with existing systems, I expect most deployments will brush aside many security concerns, relying on a probabilistic approach. Depending on the use case this may be a perfectly fine risk-benefit tradeoff; exfiltration via public GitHub repository issue creation is likely not the biggest security gap to close in the big picture after all.
There is both demand and a surge of products aimed at securing agentic workloads. Most seem to focus on surface-level execution sandboxing. I have yet to see a product that seriously attempts to implement a comprehensive and granular network sandbox, which would allow a high level of secure autonomy. I think this is a real opportunity. Equally possible, of course, is that we end up with “DPI firewall”-equivalent products for agents, applying more heuristics (LLMs and otherwise) to the problem rather than fundamental solutions.