Agent orientation artifact ยท last updated 2026-05-18

AI Adapt Scan Project Status

Static project tracker optimized for the next agent and for human status checks. Update the projectSections array in this route when a slice moves.

Home
done
21
in progress
2
waiting
8
blocked
1
next
31
Read First
This route is a rendered artifact, not a new source of business logic. The lock docs still win when there is a conflict.

Agent update contract

Keep statuses terse. Use done only when the route or backend path exists and typechecks. Use waiting for product-owner decisions. Use blocked for non-engineering blockers. Use next for buildable backlog.

Source docs folded into this tree

Docs/Working Docs/14-15-agent-response.mdDocs/Working Docs/2026-05-15-session-handoff.mdDocs/Working Docs/2026-05-17-session-handoff.mdDocs/Working Docs/Findings/2026-05-17-investor-readiness-testing-priorities.mdDocs/Working Docs/Notifications & Nudging/2026-05-17-notification-center-mvp-handoff.mdDocs/Working Docs/Notifications & Nudging/2026-05-17-notifications-nudging-system-layer-handoff.mdDocs/Working Docs/Notifications & Nudging/2026-05-17-response-loop-intelligence-findings-and-plan.mdDocs/Working Docs/2026-05-17-mcp-platform-simulation-exploration.mdDocs/Backend/API Documentation/01.mdDocs/Backend/2026-05-12-agent-pipeline-handoff.mdDocs/Backend/2026-05-12-layer1-loop-memory-handoff.mdDocs/Backend/2026-05-12-memory-proof-project-status-handoff.mdDocs/Backend/2026-05-14-current-state-orientation.mdDocs/Backend/2026-05-15-observability-implementation-handoff.mdDocs/Frontend/2026-05-12-frontend-investor-demo-loop-memory-handoff.mdDocs/Feedback /Round 2/2026-05-14-claude-handoff-full-context.mdDocs/Feedback /Round 2/2026-05-14-round-2-orientation.mdDocs/Feedback /Round 2/2026-05-14-strategy-context-phase-1b-handoff.mdDocs/Feedback /Round 2/Todo-Today.mdDocs/Feedback /Round 1/Business Logic/2026-05-14-ai-adoption-scan-business-logic.md
Prioritized Todo List
Ordered by current product risk: prove the closed loop first, then add the activation layer, then harden production.
  • P0 - Prove one baseline -> pulse path end to endp0-baseline-pulse-proof

    The investor-readiness docs agree that the weakest point is not report generation; it is proving the closed loop from baseline report to commitment, pulse launch, pulse report, and next action.

    next
    prior_run_contextpulse_deltascan_action_commitmentspulse_measurement_plans

    Next: Use one seeded or real leader path, submit/seed pulse rater responses, run analysis, inspect memory artifacts, deliver through operator review, and verify the leader can understand the pulse delta without engineering narration.

    • Get pulse run to anonymity thresholdp0-pulse-data-threshold

      Leader-side pulse launch exists, but the next proof needs a pulse run with enough rater responses to analyze.

      next

      Next: Use the first-party rater route or dev seeding endpoint to reach n >= 5 on the linked pulse scan.

    • Run pulse pipeline and operator reviewp0-pulse-report-review

      Memory injection is implemented, but the product still needs a full pulse report quality check with Strategy Context snapshots.

      next

      Next: Start analysis, inspect report_markdown, safety, prior_run_context, and pulse_delta, then use the operator review path to deliver.

    • Make the demo path replayablep0-demo-reset-script

      Current local state was manually mutated during pulse launch testing. A deterministic demo path should not depend on hand-edited rows.

      next

      Next: Add or document a safe seed/reset command for one baseline/pulse fixture that can be replayed before demos.

  • P1 - Build Dialogue Agent and validation harnessp1-dialogue-agent

    Docs consistently identify per-gap conversation prompts as the highest-value next leader activation slice after the loop proof.

    next
    dialogue_prompts artifact missingno dialogue step in pipeline.ts

    Next: Add structured dialogue_prompts, include them in Safety, summarize them in the report/results surface, and validate with acme-custom plus an adversarial unapproved-tool case.

    • Artifact and structured outputp1-dialogue-schema

      Schema should carry targetPracticeId, gapDirection, gapMagnitude, openingQuestion, probes, listenFor, and candidateCommitment.

      next

      Next: Add dialogue_prompts to artifact_type, pipeline ArtifactKind, prompt builder, and parser.

    • Safety and report integrationp1-dialogue-safety

      Dialogue guidance must respect anonymity, approved tools, restricted data, no scripts, and no causal or ROI claims.

      next

      Next: Add dialogue evidence to Safety and a compact Conversations to run section after the gap report sections.

  • P1 - Add communication ledger and loop healthp1-orchestration-visibility

    The notification docs reframe this as response-loop intelligence: Sarah needs to see who is stuck, what was sent, and whether the desired product action happened. The first email ledger and operator backend inbox QA layer now exist.

    in progress
    no notifications tablescommunication_events/recipients/messages added for real Resend pathspersona inbox QA is currently under /operator/backend, not a dedicated /operator/communications route

    Next: Continue from the database-backed inbox simulator: add outcome completion updates, loop-health rollups, notification projections, and eventually a dedicated read-only operator communications timeline.

  • P1 - Finish pulse and reminder orchestrationp1-pulse-orchestration

    Leader-side start-now and operator support launch exist, but due pulses, reminders, and re-invites still depend on manual action.

    next

    Next: Build an idempotent scheduler/reconcile path for due pulse drafts, rater reminders at 48h and 7d, and pulse re-invites.

  • P2 - Groups, campaigns, and Sarah rollupsp2-groups-campaigns-rollups

    Current scans still use free-text cohort/team fields. The lock docs make groups, group_members, scan_campaigns, campaign status, and cohort rollups first-class before scale.

    next

    Next: Add campaign schema and launch semantics only after the single-leader loop is demo-reliable.

  • P2 - Enterprise hardening before scalep2-production-hardening

    Scoped auth, durable jobs, token lifecycle, artifact retention, cost controls, and horizontal event fanout are still production blockers.

    next

    Next: Prioritize cross-tenant negative tests and token TTL/revocation before exposing broader customer data.

  • Waiting - Product-owner decisionswaiting-product-decisions

    Some important slices should not be guessed by engineering.

    waiting

    Next: Resolve 36 self-view prompt copy, CXTO ROI proxy set, brand/domain/sender identity, group scheduling semantics, pulse cadence default, and blocked-scan UX.

Product Chain
The Sarah -> leader -> rater loop, with executive layer tracked separately.
  • Operator surface: Sarahoperator-sarah

    Strategy Context capture, operator scan list, leader invite creation, review gate, and shared Sarah nav are built.

    done
    /strategy-context/operator/scans/operator/scans/new/operator/scans/[id]/review
    • Strategy Context substrate and snapshot timingstrategy-context-substrate

      Organizations, organization context, scan snapshots, and interpretation-only pipeline grounding exist.

      done
      organization_context_snapshot artifactsnapshot captured when self-view starts
    • Human review before deliveryoperator-review

      Reports still stop at review/flagged/blocked states until an operator approves delivery.

      done
    • Groups, campaigns, and scheduled sendscampaigns

      Still deferred. Current leader scans use free-text team/cohort fields, not campaign tables.

      next

      Next: Build groups, group_members, scan_campaigns, campaign send scheduling, and cohort rollups.

  • Leader surface: Johan / Mikaelleader-surface

    Leader can complete self-view, invite team, watch status, view report, commit actions, schedule/manage a pulse, and start a due pulse.

    done
    /leader-scans/[id]/self-view/leader-scans/[id]/team-invites/leader-scans/[id]/status/leader-scans/[id]/results
    • First-party self-view UXself-view

      36-statement flow, immutable submit, auto-advance, keyboard shortcuts, and locked post-submit state.

      done
    • Gap-first report and action commitmentsleader-results

      Leader-facing report viewer renders markdown, saves generated commitments, and starts the loop.

      done
    • Pulse draft managementpulse-management

      Leader status page shows committed actions, next pulse, targeted practices, edit drawer, and cancel/update actions.

      done
    • Leader-side due pulse launchleader-pulse-launch

      Baseline dashboard can start a due pulse draft, lock the plan, launch rater invites, and link to the pulse dashboard.

      done
      POST /leader-scans/:id/pulse-draft/launchStart pulse now CTA
    • Auto-fire pulse on due datepulse-auto-fire

      No scheduler yet. A draft pulse remains draft until launched manually.

      next

      Next: Daily job: find pulse scans with status=draft and pulseDueAt <= now, then trigger existing invite path.

    • Operator pulse launch / support controloperator-pulse-support-launch

      Operator scan detail now shows a support control for scheduled draft pulses. It launches the pulse immediately through the existing pulse-draft launch endpoint and sends planned rater invites without impersonating the leader.

      done
      /operator/scans/[id]POST /leader-scans/:id/pulse-draft/launch
  • Rater surface: Annarater-surface

    Raters now use a first-party anonymous questionnaire instead of the embedded Tally experience.

    done
    /leader-scans/[id]/rater#rater_token=.../raters
    • First-party rater questionnairerater-questionnaire

      Token-authenticated anonymous rater flow mirrors the self-view interaction pattern and submits directly to backend.

      done
    • Read-only rater links overviewrater-overview

      Operator page lists scan runs newest-first, exposes each rater link, shows completion, and includes all scans for that leader.

      done
    • Rater remindersrater-reminders

      48-hour and 7-day reminder orchestration is not built.

      next

      Next: Add reminder jobs after scheduler conventions are decided.

  • Executive surface: CXTOexecutive-surface

    Persona exists in the model, but the executive dashboard is not built and ROI proxy rules need product confirmation.

    waiting

    Next: Confirm leading-indicator ROI proxy set before building /executive.

Backend Core
Validated diagnostic, pipeline, memory, auth, and review foundation.
  • 4P / 12P / 36-statement instrumentinstrument

    Rater prompts and deterministic scoring are implemented. Self-view prompt slots exist.

    done
  • 36 product-approved self-view promptsself-prompts

    Engineering supports draft placeholder mode, but production copy still needs product-owner approval.

    blocked

    Next: When approved, disable CLAUDE_SELFPROMPT_DRAFT_MODE for production flows.

  • Agent pipeline and Safety passpipeline

    Context, Diagnostic, Themes, Experiments, Habits, Report, Safety, and repair pass are wired and tested.

    done
  • Loop memory layermemory

    prior_run_context, pulse_delta, scan_action_commitments, and pulse_measurement_plans persist loop state in DB.

    done
  • Gap-aware pulse full validationpulse-validation

    Pulse memory injection exists and leader launch works, but a full pulse run with rater data, report generation, review, and delivery still needs exercising.

    next

    Next: Run a pulse through threshold, analysis, Safety, operator delivery, and leader results. Confirm prior_run_context and pulse_delta shape the output.

  • Dialogue Agentdialogue-agent

    Still the highest-value post-loop product slice: per-gap conversation prompts between baseline and pulse.

    next

    Next: Build dialogue_prompts artifact, Safety wiring, report summary section, and validation harness.

Agent Runtime Experiments
Future Anthropic Managed Agents, memory, outcomes, multi-agent, and MCP tests kept separate from the current SDK pipeline.
  • Current Agent SDK baselinecurrent-agent-sdk-baseline

    The production path remains the existing Claude Agent SDK pipeline with Postgres-backed scan state, artifacts, Safety pass, and loop memory.

    done

    Next: Treat Managed Agents as a spike track until cost, access, data handling, and quality gates are proven against this baseline.

  • Managed Agents evaluation trackmanaged-agents-evaluation-track

    Official docs position Managed Agents as a beta hosted harness for long-running, stateful sessions with agent definitions, environments, sessions, tools, MCP, and persisted events.

    next

    Next: Create a small spike branch that mirrors one non-critical report or dialogue flow; do not migrate the core pipeline by default.

  • Read-only memory store spikemanaged-memory-store-spike

    Memory stores look useful for agent-facing reference material, but product truth should stay in Postgres and untrusted rater or team notes should not enter read-write memory.

    next

    Next: Test one read-only org/practice/policy store mounted into a Managed Agent session and compare prompt size, report quality, and traceability against the current DB snapshot path.

  • Read-only product-state MCP toolsproduct-state-mcp-tools

    The safest bridge is curated backend tools over product state, not raw SQL or free-form database access.

    next

    Next: Expose get_scan_status, get_loop_state, get_report_artifact, and list_scan_events as read-only MCP tools/resources before any managed runtime test.

  • Managed Outcomes quality gate spikeoutcomes-quality-gate-spike

    Outcomes could validate dialogue prompts, diagnostics, and reports with a rubric-driven grader, but access is gated and it must be compared with the current Safety pass.

    waiting

    Next: Request access, then test one dialogue_prompts rubric for opening question, probes, evidence grounding, commitment ask, privacy guardrails, and no-ROI claims.

  • Managed multi-agent spikemanaged-multiagent-spike

    Official docs support a coordinator delegating to context-isolated agents with their own model, prompt, tools, MCP, and skills, sharing the same filesystem.

    waiting

    Next: After access is approved, test this on cross-leader pattern diagnosis or dialogue design, not on the current single-scan report chain.

  • Dreams research previewdreaming-research-preview

    Dreams could support weekly consolidation of patterns across sessions and memory stores, but it is research preview and increases the risk of treating contaminated notes as trusted memory.

    waiting

    Next: Only test after a sanitization/redaction policy exists and the read-only memory-store spike is understood.

  • Managed Agent webhooks / async completionmanaged-webhook-async-flow

    Managed webhooks are interesting for long-running sessions, while the current app already has pipeline_events and live status paths.

    next

    Next: If a Managed Agent spike runs, map webhook callbacks into existing pipeline_events, operator notifications, and failure states instead of creating a parallel event model.

  • Agent SDK billing and unit economicsagent-sdk-billing-unit-economics

    Anthropic docs flag that Agent SDK and claude -p subscription-plan usage move to a separate monthly Agent SDK credit on 2026-06-15.

    next

    Next: Model per-scan and per-pulse cost with API-key billing assumptions, per-org budgets, and queue concurrency limits before increasing automation.

Frontend Surfaces
Routes now map to the four-persona story, with remaining gaps marked explicitly.
  • Home persona entry cardshome-personas

    Executive, Operator, Leaders, Raters, and Yomento operator cards exist. Backend personas decide signed-in access.

    done
  • Sarah operator chromeoperator-chrome

    Shared navigation, width normalization, operator pages, and review polish landed.

    done
  • Leader scan list run groupingleader-flat-list

    /leader-scans now groups baseline and pulse runs under each leader, and /raters shows related baseline/pulse history inside each scan card.

    done
    frontend/src/lib/leader-dashboard.ts/leader-scans/raters
  • Blocked scan UXblocked-ux

    Product decision still open for what Johan sees and what Sarah sees when Safety blocks a scan.

    waiting

    Next: Default remains neutral leader copy and operator-visible status.

  • Strategy Context mid-flight warningmid-flight-context-banner

    Snapshot mechanics are enforced, but Sarah has no UI cue that edits affect only future scans.

    next

    Next: Add a banner when active scans exist.

Demo And Production Readiness
What matters for the Skanska path and what still blocks production.
  • Clickable Sarah -> leader -> rater chainskanska-clickable-chain

    Current app has functional Sarah setup, leader self-view/status/results/loop, and first-party rater flow.

    done
  • Investor demo loop rehearsalinvestor-demo-loop-rehearsal

    The clickable chain exists, but the docs call out the baseline -> pulse -> delta path as the remaining proof that makes the product feel more than a one-time report.

    next

    Next: Rehearse one deterministic Sarah -> Johan -> Anna -> review -> commitments -> pulse -> delta path and capture any breakages before adding broad campaign surfaces.

  • Skanska deadline trackingskanska-dates

    Docs cite clickable prototype due 2026-05-19 or 2026-05-20, client meeting 2026-05-25.

    waiting

    Next: Confirm whether this external deadline is still live before prioritizing polish over Dialogue Agent.

  • Multi-stage email orchestrationemail-orchestration

    Leader invite and rater invite paths exist. Reminder cadence, pulse re-invite, campaign milestones, and complete lifecycle orchestration remain open.

    next
    • Operator notification centernotification-center

      No durable notifications or notification_deliveries tables exist yet. The MVP should project high-signal scan events into a bell/inbox without rendering raw pipeline_events.

      next

      Next: Add notifications, notification_deliveries, projection for ready/failed/delivered/pulse events, list/count/read routes, and a bell in OperatorShell.

    • Operator Communications Timelineoperator-communications-monitor

      Initial real email visibility now lives in /operator/backend as persona inbox QA backed by communication_events, communication_recipients, and communication_messages. A fuller timeline still needs hierarchy filters, campaign context, and outcome state transitions.

      in progress

      Next: Add campaign/group fields when those models land, compute delivered/opened/clicked/responded transitions, and graduate the read-only view into /operator/communications once the backend lab surface gets crowded.

    • Loop health and response outcomesloop-health

      The response-loop docs prioritize knowing who is stuck, which action is blocking progress, and whether a nudge produced the intended product action.

      next

      Next: Compute leader loop health states, attach desired outcomes to communication rows, and show stuck loops with owner, blocker, recommended action, and last nudge.

  • No-ROI and anonymity guardrailsclaims-policy

    Reports remain leading-indicator only, deterministic scoring is unchanged, and anonymity floor n >= 5 remains hard.

    done
  • Name, domain, and sender domainbrand-domain

    Echo / AI Adapt Scan are working names. Long-term brand and email sender domain still need owner decision.

    waiting
Production Architecture Hardening
Risks from the production-readiness discussion that should be handled before scale.
  • Scoped auth and tenant isolationscoped-auth

    The prototype still relies heavily on broad admin-token access. Production needs role-scoped access, auditability, and hard tests against cross-organization data leakage.

    next

    Next: Replace broad internal access paths with scoped permissions for operator, leader, rater, executive, and service jobs. Add cross-tenant negative tests for every org/scan/rater query.

  • Durable background jobsdurable-background-jobs

    AI pipeline runs, email sends, reminders, pulse launch, retries, and scheduler work should not depend on one request lifecycle or one process staying alive.

    next

    Next: Introduce a real queue/job runner with idempotency keys, retry ceilings, dead-letter handling, and visible job state.

  • Horizontal live-event scalingsse-horizontal-scale

    If SSE or pipeline events are process-local, multiple backend instances will make live status unreliable.

    next

    Next: Back event fanout with Redis/pubsub or another shared event bus before running more than one backend instance.

  • Leader and rater token lifecycletoken-lifecycle

    Magic links and rater links are operationally useful, but production needs expiry, revocation, resend semantics, and careful internal exposure controls.

    next

    Next: Define token TTLs, rotation/revoke endpoints, resend behavior, and audit logs for token-bearing links shown in operator tools.

  • Prompt and artifact retention policyprompt-artifact-retention

    Full prompts and artifacts are valuable for debugging but can contain sensitive organization context or open-text feedback.

    next

    Next: Define retention windows, redaction rules, role-gated access, and production-safe debug views for agent_runs.prompt and raw artifacts.

  • AI cost, rate, and concurrency controlsai-cost-rate-controls

    Per-step caps exist, but production also needs per-org budgets, queue concurrency limits, rate limiting, retry ceilings, and cost/failure dashboards.

    next

    Next: Add operational limits around pipeline start, retry behavior, Anthropic spend, and concurrent agent runs.