Pilot docs

How it works

The 8-stage pipeline: from Sentry URL to a deterministic failing pytest in an isolated sandbox.

logomesh is an AI agent with a tight job. The agent plans and uses tools. A separate, deterministic Python function writes the actual proof. That split is the whole point: AI is great at planning, but auditors won’t accept evidence that an AI wrote.

What the agent does

When Sentry fires, the agent runs a short investigation against your code:

  • Reads the crash — error type, message, stack trace, and the variable values that were in memory at the moment of failure.
  • Finds the part of your code that broke. If the file path doesn’t match your repo layout, it tries again with hints from a search step.
  • Checks if your project has any missing dependencies the sandbox would need, and prepares them in an isolated bundle before the test runs.
  • Calls the deterministic synthesizer to write the failing test.
  • Runs the test in a hardened Docker sandbox (no network, unprivileged user, memory + process caps).
  • Verifies that the sandbox raised the same error your users saw — not a similar error, the same one.
  • If something blocks it, it tells you why (“can’t find your source”, “crash needs database state we don’t have”, etc.) instead of guessing.

What the agent is NOT allowed to do

This is the part that matters for compliance.

  • Write the test code. The failing test is written by a pure Python function from the captured crash values — never by an AI. The bytes in the test file have no AI in them.
  • Write the audit file. The sealed JSON envelope is built deterministically. It includes a hash of the test bytes and a flag that says llm_in_evidence_path: false.
  • Edit your code.logomesh opens a draft PR. It never pushes, never merges, never “auto-fixes.” Your team owns the change.
  • Fake a green check. If the test crashed with a different error than your users saw, the run is flagged for human review with a structured reason. Never silently shipped.

What you get back

Every run produces three artifacts: a failing test file, a sealed JSON audit envelope mapped to SOC2 CC7.3 / CC7.4 and PCI DSS 12.10.5, and a verdict —reproduced, needs_human_review, or error. The audit envelope is what you forward to your reviewer.