We recently investigated two alerts on the same macOS device that arrived close together. One detection flagged suspicious access to the macOS Keychain. The other focused on Unix credential access. Different detection logic, same host, same user context, same timeframe.
In our Detections from the SOC series, we dive deeper into investigations that stand out because they reveal something beyond the alert itself. This was one of those cases. Two independent detections pointed to the same behaviour from different angles, shifting the focus away from whether the alerts fired correctly and towards a more interesting question: what did the behaviour actually mean?
In this case, the answer was more interesting than the alert itself. The activity looked like credential access. It behaved like credential access. But the actor behind it was not malware, and not a human attacker sitting at a terminal. It was an AI-assisted development environment acting on behalf of a legitimate user. The case highlights a problem SOC teams will face more often: AI agents can perform legitimate work through behaviours that look almost identical to attacker tradecraft.
The macOS Keychain is the system-level credential store where passwords, certificates, private keys, tokens and application secrets accumulate over time, often without developers thinking much about what is in there. Some access to the Keychain is normal. Broad enumeration of the Keychain is different. It is legitimate in some developer workflows, but it is also a technique used by threat actors. That is why Hunt & Hackett maintains detection coverage around this type of behaviour. We would rather investigate suspicious access to credential stores than miss the early signs of credential theft.
When we reviewed the command line, the activity became clear quickly. A shell process had executed:
$ security dump-keychain -a
The output was then filtered for a Bitbucket API token and associated permissions. In practical terms, the command appeared to be looking for one specific credential. But the method still mattered. The filtering happened after the Keychain contents were requested. Even if the end goal was one token, the action still involved broad access to a local credential store. That is why the detections fired. And they were right to do so.
From an ATT&CK perspective, this behaviour touched several techniques at once. That did not mean compromise, but it explains why multiple detections reacted strongly.
| Area observed | ATT&CK technique | Why it mattered |
| Broad access to macOS Keychain | T1555.001 – Credentials from Password Stores: Keychain | The command requested Keychain contents where passwords, tokens, certificates and application secrets may be stored. |
| Locally stored secrets | T1552 – Unsecured Credentials | Developer tokens and local credentials are valuable to attackers and automation tools alike. |
| Private keys and certificates | T1552.004 – Unsecured Credentials: Private Keys | Keychain and local stores may contain certificates or private keys, making broad enumeration sensitive. |
| Data from endpoint | T1005 – Data from Local System | The activity collected information from the local device rather than querying a remote service. |
| Local staging pattern | T1074.001 – Data Staged: Local Data Staging | Output was redirected into a temporary working location, which can resemble staging before further use. |
| Automated collection | T1119 – Automated Collection | The command chain appeared scripted and tool-driven rather than manually typed. |
| Related Unix credential access | Additional Unix credential access detection | A second alert saw the same activity through another detection path, reinforcing that this was not a single weak signal. |
The key point is that the detection logic was not overreacting. The behaviour genuinely overlapped with attacker tradecraft. What changed the outcome was not the command itself, but the surrounding context: process lineage, lack of follow-on activity and validation that the user was working with an AI-assisted development tool.
If an attacker runs security dump-keychain -a, most defenders know how to interpret that. If a developer runs it manually while troubleshooting, that usually becomes clear through user context and surrounding activity. This case sat in the grey area between those two scenarios.
The command was not executed from a user manually opening Terminal and typing commands. The process chain pointed back to an AI-assisted development environment using a shell execution mechanism. The command looked scripted, output was redirected into a temporary working directory, and the surrounding execution context suggested the activity was performed by tooling rather than direct human input. That distinction matters. The command was the same. The privileges were the same. The process ran in the user session on a normal workstation. From an endpoint perspective, the behaviour looked almost identical to something we would expect to investigate in a real intrusion. Only the intent was different. And intent is exactly what logs are worst at explaining.
The investigation itself was straightforward, but important. We reviewed the full process tree, correlated both alerts into a single activity chain, checked surrounding host activity and looked for follow-on behaviour. The key questions were:
None of those checks changed the tone of the case. We found no suspicious persistence, no lateral movement attempts, no unusual outbound traffic, no staged archives and no evidence that the retrieved credentials were used in suspicious follow-on authentication activity. After validating context with the customer, it became clear that the user was working with an AI coding assistant during routine development work. The assistant had attempted to retrieve a Bitbucket token to complete the task it had been given.
The case was closed as a benign true positive.
That wording is deliberate. It was a true positive with benign intent. The behaviour happened exactly as detected. A process accessed a credential store in a way that should generate attention. The context was benign, but the detection was valid. Treating every non-malicious outcome as noise is how detections slowly get watered down until they stop being useful. The goal is not to make detections quiet. The goal is to make them explainable.
This case is worth unpacking in detail because it is an early example of a problem that will only grow: legitimate AI tooling can produce behaviour that overlaps with attacker tradecraft. Traditional endpoint monitoring assumes that process activity broadly reflects either user intent or attacker intent. AI agents blur that model. They sit between the user and the system. They infer steps, execute commands, inspect output and decide what to do next.
From telemetry alone, it is difficult to answer some of the most important questions:
That last question is where prompt injection becomes relevant.
In practical terms, prompt injection means that an AI system receives instructions through content it treats as data. A README file in a cloned repository, a Jira ticket description, a code comment, a webpage, an API response or an error message can all become part of the assistant’s working context. If the assistant has enough autonomy, those instructions may influence actions. In this case, the command filtered for a Bitbucket token. But the same mechanism could have searched for other credentials, omitted the filtering entirely, staged a full Keychain dump or sent data elsewhere. From the SOC side, the early telemetry could look very similar. The difference may only become visible later, after the interesting part has already happened. That creates an observability gap. We can see what process ran. We can see which command executed. We can sometimes see what data was touched. But we often cannot see why the agent chose that action or whether the user explicitly intended it.
The answer is not to ban AI tools. The answer is to treat them as a new execution layer that needs the same principles we apply elsewhere: least privilege, isolation, logging, approval for sensitive actions and clear response playbooks.
For us, this case was a useful reminder that detection engineering is often the easier part. We can write logic to detect suspicious credential access. The harder part is understanding intent when software starts acting on behalf of people. This alert was benign, and that is fine (rhyme!). The detection still did its job. But cases like this will become more common: behaviour that looks like attacker tradecraft, carried out by legitimate tools, under legitimate accounts, for mostly legitimate reasons. SOC teams will need to get comfortable operating in that grey area. The future of detection goes beyond understanding what happened, it’s also understanding whether the behaviour was expected, authorised and safe in context.
What this case highlights is that using AI-agents increases the attack surface. From a governance point of view, it’s important to grasp what the ai-agent should be allowed to do. Basically finding the balance between practicality and security. On the other hand, defenders have to account for the usage of AI agents when writing detections and triaging the alerts. During this, next to asking themselves whether something is malicious, they should also find out whether something was expected, authorised and safe, and how they can prove all that.