Agent orientation artifact · last updated 2026-05-18

AI Adapt Scan Project Status

Static project tracker optimized for the next agent and for human status checks. Update the projectSections array in this route when a slice moves.

Home

done

in progress

waiting

blocked

Read First

This route is a rendered artifact, not a new source of business logic. The lock docs still win when there is a conflict.

Agent update contract

Keep statuses terse. Use done only when the route or backend path exists and typechecks. Use waiting for product-owner decisions. Use blocked for non-engineering blockers. Use next for buildable backlog.

Source docs folded into this tree

Docs/Working Docs/14-15-agent-response.mdDocs/Working Docs/2026-05-15-session-handoff.mdDocs/Working Docs/2026-05-17-session-handoff.mdDocs/Working Docs/Findings/2026-05-17-investor-readiness-testing-priorities.mdDocs/Working Docs/Notifications & Nudging/2026-05-17-notification-center-mvp-handoff.mdDocs/Working Docs/Notifications & Nudging/2026-05-17-notifications-nudging-system-layer-handoff.mdDocs/Working Docs/Notifications & Nudging/2026-05-17-response-loop-intelligence-findings-and-plan.mdDocs/Working Docs/2026-05-17-mcp-platform-simulation-exploration.mdDocs/Backend/API Documentation/01.mdDocs/Backend/2026-05-12-agent-pipeline-handoff.mdDocs/Backend/2026-05-12-layer1-loop-memory-handoff.mdDocs/Backend/2026-05-12-memory-proof-project-status-handoff.mdDocs/Backend/2026-05-14-current-state-orientation.mdDocs/Backend/2026-05-15-observability-implementation-handoff.mdDocs/Frontend/2026-05-12-frontend-investor-demo-loop-memory-handoff.mdDocs/Feedback /Round 2/2026-05-14-claude-handoff-full-context.mdDocs/Feedback /Round 2/2026-05-14-round-2-orientation.mdDocs/Feedback /Round 2/2026-05-14-strategy-context-phase-1b-handoff.mdDocs/Feedback /Round 2/Todo-Today.mdDocs/Feedback /Round 1/Business Logic/2026-05-14-ai-adoption-scan-business-logic.md

Prioritized Todo List

Ordered by current product risk: prove the closed loop first, then add the activation layer, then harden production.

P0 - Prove one baseline -> pulse path end to endp0-baseline-pulse-proof
The investor-readiness docs agree that the weakest point is not report generation; it is proving the closed loop from baseline report to commitment, pulse launch, pulse report, and next action.
next
prior_run_contextpulse_deltascan_action_commitmentspulse_measurement_plans
Next: Use one seeded or real leader path, submit/seed pulse rater responses, run analysis, inspect memory artifacts, deliver through operator review, and verify the leader can understand the pulse delta without engineering narration.
- Get pulse run to anonymity thresholdp0-pulse-data-threshold
  Leader-side pulse launch exists, but the next proof needs a pulse run with enough rater responses to analyze.
  next
  Next: Use the first-party rater route or dev seeding endpoint to reach the report-unlock rule (at least 3 responders and a response rate above 60%) on the linked pulse scan.
- Run pulse pipeline and operator reviewp0-pulse-report-review
  Memory injection is implemented, but the product still needs a full pulse report quality check with Strategy Context snapshots.
  next
  Next: Start analysis, inspect report_markdown, safety, prior_run_context, and pulse_delta, then use the operator review path to deliver.
- Make the demo path replayablep0-demo-reset-script
  Current local state was manually mutated during pulse launch testing. A deterministic demo path should not depend on hand-edited rows.
  next
  Next: Add or document a safe seed/reset command for one baseline/pulse fixture that can be replayed before demos.
P1 - Build Dialogue Agent and validation harnessp1-dialogue-agent
Docs consistently identify per-gap conversation prompts as the highest-value next leader activation slice after the loop proof.
next
dialogue_prompts artifact missingno dialogue step in pipeline.ts
Next: Add structured dialogue_prompts, include them in Safety, summarize them in the report/results surface, and validate with acme-custom plus an adversarial unapproved-tool case.
- Artifact and structured outputp1-dialogue-schema
  Schema should carry targetPracticeId, gapDirection, gapMagnitude, openingQuestion, probes, listenFor, and candidateCommitment.
  next
  Next: Add dialogue_prompts to artifact_type, pipeline ArtifactKind, prompt builder, and parser.
- Safety and report integrationp1-dialogue-safety
  Dialogue guidance must respect anonymity, approved tools, restricted data, no scripts, and no causal or ROI claims.
  next
  Next: Add dialogue evidence to Safety and a compact Conversations to run section after the gap report sections.
P1 - Add communication ledger and loop healthp1-orchestration-visibility
The notification docs reframe this as response-loop intelligence: Sarah needs to see who is stuck, what was sent, and whether the desired product action happened. The first email ledger and operator backend inbox QA layer now exist.
in progress
no notifications tablescommunication_events/recipients/messages added for real Resend pathspersona inbox QA is currently under /operator/backend, not a dedicated /operator/communications route
Next: Continue from the database-backed inbox simulator: add outcome completion updates, loop-health rollups, notification projections, and eventually a dedicated read-only operator communications timeline.
P1 - Finish pulse and reminder orchestrationp1-pulse-orchestration
Leader-side start-now and operator support launch exist, but due pulses, reminders, and re-invites still depend on manual action.
next
Next: Build an idempotent scheduler/reconcile path for due pulse drafts, rater reminders at 48h and 7d, and pulse re-invites.
P2 - Groups, campaigns, and Sarah rollupsp2-groups-campaigns-rollups
Current scans still use free-text cohort/team fields. The lock docs make groups, group_members, scan_campaigns, campaign status, and cohort rollups first-class before scale.
next
Next: Add campaign schema and launch semantics only after the single-leader loop is demo-reliable.
P2 - Enterprise hardening before scalep2-production-hardening
Scoped auth, durable jobs, token lifecycle, artifact retention, cost controls, and horizontal event fanout are still production blockers.
next
Next: Prioritize cross-tenant negative tests and token TTL/revocation before exposing broader customer data.
Waiting - Product-owner decisionswaiting-product-decisions
Some important slices should not be guessed by engineering.
waiting
Next: Resolve 36 self-view prompt copy, CXTO ROI proxy set, brand/domain/sender identity, group scheduling semantics, pulse cadence default, and blocked-scan UX.

Product Chain

The Sarah -> leader -> rater loop, with executive layer tracked separately.

Operator surface: Sarahoperator-sarah
Strategy Context capture, operator scan list, leader invite creation, review gate, and shared Sarah nav are built.
done
/strategy-context/operator/scans/operator/scans/new/operator/scans/[id]/review
- Strategy Context substrate and snapshot timingstrategy-context-substrate
  Organizations, organization context, scan snapshots, and interpretation-only pipeline grounding exist.
  done
  organization_context_snapshot artifactsnapshot captured when self-view starts
- Human review before deliveryoperator-review
  Reports still stop at review/flagged/blocked states until an operator approves delivery.
  done
- Groups, campaigns, and scheduled sendscampaigns
  Still deferred. Current leader scans use free-text team/cohort fields, not campaign tables.
  next
  Next: Build groups, group_members, scan_campaigns, campaign send scheduling, and cohort rollups.
Leader surface: Johan / Mikaelleader-surface
Leader can complete self-view, invite team, watch status, view report, commit actions, schedule/manage a pulse, and start a due pulse.
done
/leader-scans/[id]/self-view/leader-scans/[id]/team-invites/leader-scans/[id]/status/leader-scans/[id]/results
- First-party self-view UXself-view
  36-statement flow, immutable submit, auto-advance, keyboard shortcuts, and locked post-submit state.
  done
- Gap-first report and action commitmentsleader-results
  Leader-facing report viewer renders markdown, saves generated commitments, and starts the loop.
  done
- Pulse draft managementpulse-management
  Leader status page shows committed actions, next pulse, targeted practices, edit drawer, and cancel/update actions.
  done
- Leader-side due pulse launchleader-pulse-launch
  Baseline dashboard can start a due pulse draft, lock the plan, launch rater invites, and link to the pulse dashboard.
  done
  POST /leader-scans/:id/pulse-draft/launchStart pulse now CTA
- Auto-fire pulse on due datepulse-auto-fire
  No scheduler yet. A draft pulse remains draft until launched manually.
  next
  Next: Daily job: find pulse scans with status=draft and pulseDueAt <= now, then trigger existing invite path.
- Operator pulse launch / support controloperator-pulse-support-launch
  Operator scan detail now shows a support control for scheduled draft pulses. It launches the pulse immediately through the existing pulse-draft launch endpoint and sends planned rater invites without impersonating the leader.
  done
  /operator/scans/[id]POST /leader-scans/:id/pulse-draft/launch
Rater surface: Annarater-surface
Raters use the first-party anonymous in-app questionnaire (no third-party form service).
done
/leader-scans/[id]/rater#rater_token=.../raters
- First-party rater questionnairerater-questionnaire
  Token-authenticated anonymous rater flow mirrors the self-view interaction pattern and submits directly to backend.
  done
- Read-only rater links overviewrater-overview
  Operator page lists scan runs newest-first, exposes each rater link, shows completion, and includes all scans for that leader.
  done
- Rater remindersrater-reminders
  48-hour and 7-day reminder orchestration is not built.
  next
  Next: Add reminder jobs after scheduler conventions are decided.
Executive surface: CXTOexecutive-surface
Persona exists in the model, but the executive dashboard is not built and ROI proxy rules need product confirmation.
waiting
Next: Confirm leading-indicator ROI proxy set before building /executive.

Backend Core

Validated diagnostic, pipeline, memory, auth, and review foundation.

4P / 12P / 36-statement instrumentinstrument
Rater prompts and deterministic scoring are implemented. Self-view prompt slots exist.
done
36 product-approved self-view promptsself-prompts
Engineering supports draft placeholder mode, but production copy still needs product-owner approval.
blocked
Next: When approved, disable CLAUDE_SELFPROMPT_DRAFT_MODE for production flows.
Agent pipeline and Safety passpipeline
Context, Diagnostic, Themes, Experiments, Habits, Report, Safety, and repair pass are wired and tested.
done
Loop memory layermemory
prior_run_context, pulse_delta, scan_action_commitments, and pulse_measurement_plans persist loop state in DB.
done
Gap-aware pulse full validationpulse-validation
Pulse memory injection exists and leader launch works, but a full pulse run with rater data, report generation, review, and delivery still needs exercising.
next
Next: Run a pulse through threshold, analysis, Safety, operator delivery, and leader results. Confirm prior_run_context and pulse_delta shape the output.
Dialogue Agentdialogue-agent
Still the highest-value post-loop product slice: per-gap conversation prompts between baseline and pulse.
next
Next: Build dialogue_prompts artifact, Safety wiring, report summary section, and validation harness.

Agent Runtime Experiments

Future Anthropic Managed Agents, memory, outcomes, multi-agent, and MCP tests kept separate from the current SDK pipeline.

Current Agent SDK baselinecurrent-agent-sdk-baseline
The production path remains the existing Claude Agent SDK pipeline with Postgres-backed scan state, artifacts, Safety pass, and loop memory.
done
Next: Treat Managed Agents as a spike track until cost, access, data handling, and quality gates are proven against this baseline.
Managed Agents evaluation trackmanaged-agents-evaluation-track
Official docs position Managed Agents as a beta hosted harness for long-running, stateful sessions with agent definitions, environments, sessions, tools, MCP, and persisted events.
next
Next: Create a small spike branch that mirrors one non-critical report or dialogue flow; do not migrate the core pipeline by default.
Read-only memory store spikemanaged-memory-store-spike
Memory stores look useful for agent-facing reference material, but product truth should stay in Postgres and untrusted rater or team notes should not enter read-write memory.
next
Next: Test one read-only org/practice/policy store mounted into a Managed Agent session and compare prompt size, report quality, and traceability against the current DB snapshot path.
Read-only product-state MCP toolsproduct-state-mcp-tools
The safest bridge is curated backend tools over product state, not raw SQL or free-form database access.
next
Next: Expose get_scan_status, get_loop_state, get_report_artifact, and list_scan_events as read-only MCP tools/resources before any managed runtime test.
Managed Outcomes quality gate spikeoutcomes-quality-gate-spike
Outcomes could validate dialogue prompts, diagnostics, and reports with a rubric-driven grader, but access is gated and it must be compared with the current Safety pass.
waiting
Next: Request access, then test one dialogue_prompts rubric for opening question, probes, evidence grounding, commitment ask, privacy guardrails, and no-ROI claims.
Managed multi-agent spikemanaged-multiagent-spike
Official docs support a coordinator delegating to context-isolated agents with their own model, prompt, tools, MCP, and skills, sharing the same filesystem.
waiting
Next: After access is approved, test this on cross-leader pattern diagnosis or dialogue design, not on the current single-scan report chain.
Dreams research previewdreaming-research-preview
Dreams could support weekly consolidation of patterns across sessions and memory stores, but it is research preview and increases the risk of treating contaminated notes as trusted memory.
waiting
Next: Only test after a sanitization/redaction policy exists and the read-only memory-store spike is understood.
Managed Agent webhooks / async completionmanaged-webhook-async-flow
Managed webhooks are interesting for long-running sessions, while the current app already has pipeline_events and live status paths.
next
Next: If a Managed Agent spike runs, map webhook callbacks into existing pipeline_events, operator notifications, and failure states instead of creating a parallel event model.
Agent SDK billing and unit economicsagent-sdk-billing-unit-economics
Anthropic docs flag that Agent SDK and claude -p subscription-plan usage move to a separate monthly Agent SDK credit on 2026-06-15.
next
Next: Model per-scan and per-pulse cost with API-key billing assumptions, per-org budgets, and queue concurrency limits before increasing automation.

Frontend Surfaces

Routes now map to the four-persona story, with remaining gaps marked explicitly.

Home persona entry cardshome-personas
Executive, Operator, Leaders, Raters, and Yomento operator cards exist. Backend personas decide signed-in access.
done
Sarah operator chromeoperator-chrome
Shared navigation, width normalization, operator pages, and review polish landed.
done
Leader scan list run groupingleader-flat-list
/leader-scans now groups baseline and pulse runs under each leader, and /raters shows related baseline/pulse history inside each scan card.
done
frontend/src/lib/leader-dashboard.ts/leader-scans/raters
Blocked scan UXblocked-ux
Product decision still open for what Johan sees and what Sarah sees when Safety blocks a scan.
waiting
Next: Default remains neutral leader copy and operator-visible status.
Strategy Context mid-flight warningmid-flight-context-banner
Snapshot mechanics are enforced, but Sarah has no UI cue that edits affect only future scans.
next
Next: Add a banner when active scans exist.

Demo And Production Readiness

What matters for the Skanska path and what still blocks production.

Clickable Sarah -> leader -> rater chainskanska-clickable-chain
Current app has functional Sarah setup, leader self-view/status/results/loop, and first-party rater flow.
done
Investor demo loop rehearsalinvestor-demo-loop-rehearsal
The clickable chain exists, but the docs call out the baseline -> pulse -> delta path as the remaining proof that makes the product feel more than a one-time report.
next
Next: Rehearse one deterministic Sarah -> Johan -> Anna -> review -> commitments -> pulse -> delta path and capture any breakages before adding broad campaign surfaces.
Skanska deadline trackingskanska-dates
Docs cite clickable prototype due 2026-05-19 or 2026-05-20, client meeting 2026-05-25.
waiting
Next: Confirm whether this external deadline is still live before prioritizing polish over Dialogue Agent.
Multi-stage email orchestrationemail-orchestration
Leader invite and rater invite paths exist. Reminder cadence, pulse re-invite, campaign milestones, and complete lifecycle orchestration remain open.
next
- Operator notification centernotification-center
  No durable notifications or notification_deliveries tables exist yet. The MVP should project high-signal scan events into a bell/inbox without rendering raw pipeline_events.
  next
  Next: Add notifications, notification_deliveries, projection for ready/failed/delivered/pulse events, list/count/read routes, and a bell in OperatorShell.
- Operator Communications Timelineoperator-communications-monitor
  Initial real email visibility now lives in /operator/backend as persona inbox QA backed by communication_events, communication_recipients, and communication_messages. A fuller timeline still needs hierarchy filters, campaign context, and outcome state transitions.
  in progress
  Next: Add campaign/group fields when those models land, compute delivered/opened/clicked/responded transitions, and graduate the read-only view into /operator/communications once the backend lab surface gets crowded.
- Loop health and response outcomesloop-health
  The response-loop docs prioritize knowing who is stuck, which action is blocking progress, and whether a nudge produced the intended product action.
  next
  Next: Compute leader loop health states, attach desired outcomes to communication rows, and show stuck loops with owner, blocker, recommended action, and last nudge.
No-ROI and anonymity guardrailsclaims-policy
Reports remain leading-indicator only, deterministic scoring is unchanged, and the anonymity gate (at least 3 responders AND response rate above 60%) remains hard.
done
Name, domain, and sender domainbrand-domain
Echo / AI Adapt Scan are working names. Long-term brand and email sender domain still need owner decision.
waiting

Production Architecture Hardening

Risks from the production-readiness discussion that should be handled before scale.

Scoped auth and tenant isolationscoped-auth
The prototype still relies heavily on broad admin-token access. Production needs role-scoped access, auditability, and hard tests against cross-organization data leakage.
next
Next: Replace broad internal access paths with scoped permissions for operator, leader, rater, executive, and service jobs. Add cross-tenant negative tests for every org/scan/rater query.
Durable background jobsdurable-background-jobs
AI pipeline runs, email sends, reminders, pulse launch, retries, and scheduler work should not depend on one request lifecycle or one process staying alive.
next
Next: Introduce a real queue/job runner with idempotency keys, retry ceilings, dead-letter handling, and visible job state.
Horizontal live-event scalingsse-horizontal-scale
If SSE or pipeline events are process-local, multiple backend instances will make live status unreliable.
next
Next: Back event fanout with Redis/pubsub or another shared event bus before running more than one backend instance.
Leader and rater token lifecycletoken-lifecycle
Magic links and rater links are operationally useful, but production needs expiry, revocation, resend semantics, and careful internal exposure controls.
next
Next: Define token TTLs, rotation/revoke endpoints, resend behavior, and audit logs for token-bearing links shown in operator tools.
Prompt and artifact retention policyprompt-artifact-retention
Full prompts and artifacts are valuable for debugging but can contain sensitive organization context or open-text feedback.
next
Next: Define retention windows, redaction rules, role-gated access, and production-safe debug views for agent_runs.prompt and raw artifacts.
AI cost, rate, and concurrency controlsai-cost-rate-controls
Per-step caps exist, but production also needs per-org budgets, queue concurrency limits, rate limiting, retry ceilings, and cost/failure dashboards.
next
Next: Add operational limits around pipeline start, retry behavior, Anthropic spend, and concurrent agent runs.

Source docs folded into this tree