Summary:
Crew / Task agent runs sometimes finish with status completed, but the returned result is not the agent final summary. Instead it can contain a raw previous tool result such as:
[Previous tool result; call_id=call_...]: {...}
Impact:
- Parent agent may trust a completed Task even though the deliverable is missing or incomplete.
- Tool-result leakage pollutes conversation history, mission outputs, task result storage, and memory archives.
- Coding tasks may leave partial edits without a clear failure signal.
Recent observed cases on 2026-05-18:
1. Developer agent completed but returned a message saying implementation was interrupted midway and listed remaining work.
2. Developer agent completed but result was only a raw previous tool result from an Edit call:
[Previous tool result; call_id=call_yZqXLVu0CqENETlYMRVOHUac]: {"file_path":"src/mock-runtime.mjs","replacements":1,"changed":true,...}
Historical evidence:
Running `alma memory grep "Previous tool result"` found similar leakage in SketchUp-related threads from 2026-04-29, 2026-04-30, 2026-05-11, and 2026-05-18. Mission output files also contained raw previous tool results, e.g.:
~/.config/alma/missions/*/sprints/*/attempt-*/generator-output.md
Suspected root cause:
The issue appears to be in crew / harness / agent adapter final-output extraction. If the sub-agent performs a tool call and then does not emit or expose a normal final assistant summary, the wrapper may fall back to the latest transcript item, which can be a tool result. The Task is then marked completed because the process ended cleanly, even though the handoff output is invalid.
Expected behavior:
- Raw tool results should never be used as final Task result.
- If final output is missing/malformed or matches `[Previous tool result`, `call_id=`, raw JSON tool output, etc., mark as invalid/failed/needs retry, not completed.
- If an agent explicitly says implementation is incomplete, do not treat the Task as successful acceptance completion.
Suggested fix:
1. Validate final Task result before marking completed.
2. Prefer the last assistant-authored textual message over the last transcript item.
3. Add an `agent_output_invalid` or `incomplete` status.
4. Retry or ask the sub-agent to summarize actual changes when final output is invalid.
5. For coding agents, require a final summary with changed files and validation results, or an explicit failure/incomplete status.
Temporary workaround:
Parent agent will manually inspect files/tests after every crew completion and treat `[Previous tool result]` as invalid.
Please authenticate to join the conversation.
In Review
Bug Reports
2 days ago

linqi zhang
Get notified by email when there are changes.
In Review
Bug Reports
2 days ago

linqi zhang
Get notified by email when there are changes.