CRP system architecture

End-to-end map of actors, edges, application tier, doctrine chokepoints, data tier, external integrations, scheduler, and observability. Updated 2026-06-16.

In plain terms. This runs the day-to-day operations of our research site mostly on autopilot — finding and booking patients, handling the calls / texts / faxes / emails, keeping the schedule and records straight, chasing the money, and catching mistakes — so the team does far less manual busywork and fewer things slip through the cracks. Everything below is how that's wired together.

One-paragraph mental model. Patients and ops staff hit two edges: a public marketing/intake surface and an IAP-gated dashboard. Every dashboard fetch routes through one chokepoint Cloud Function (~126 read feeds and ~125 read/action capabilities). Every mutating external action — CRIO calendar write, CRIO payment/invoice write-back, outbound email + SMS + fax, comms dispatch, AI-voice dial, bulk N-row mutation, alert escalation — funnels through a single chokepoint helper with default-on safety guards and read-back verification. State lives in two places: BigQuery (analytical + CRM data, the sole data store since 2026-04-23) and Firestore (keyed UI state + audit collections). The scheduler runs ~91 jobs (85 enabled) spanning system syncs, triage, audits, comms, reconciliation, and agent dispatch. Beyond the calendar/comms core, the platform also runs a CRIO browser-automation writer (payments, invoices, mark-paid), an AI-voice telephony fleet (Telnyx-native), and a 4-layer revenue & receivables platform (data-health → revenue validation → margin → collections). Core source, workflows, and hooks are change-controlled (code-owner-gated).

BQ feeds 126 read feeds · 125+ read/action capabilities

Cron jobs 91 85 enabled · Cloud Scheduler · us-east1

Frontend modules 26 + design system

Unit tests 1,559 passing · + 39 invariant rows

Lint rules 56 + 14 smoke canaries

Rule coverage 405 prose-rule → backstop rows

System diagram

People · actors

Patients Sponsors CRO

Coordinators Recruiters Investigators

Leadership Admin

Edges

Public edge joinresearchstudies.com — Lp(a) prescreen + patient intake.
phillyresearch.com — marketing.
Entry capabilities: lead submission, self-booked pre-screen, intake / release forms.

Authenticated edge The internal IAP-gated dashboard — Cloud Run, IAP + Workspace, server view-strip by role.
The internal finance dashboard (owner-only).

Core hub

chokepoint Cloud Function

Cloud Function · Gen 2 · Node 22 · 1 GB · us-east1

126 read feeds 125 read/action capabilities PHI feed allowlist auth gate

Every dashboard fetch routes through here.

Chokepoints & external integrations

Doctrine chokepoints

Query engineall BQ reads · cost-metered

Book & verifyCRIO calendar write → read-back

Mutate & verifygeneric read-back write

Comms queueapproval queue

Alert routeralerts → tasks

Bulk-action guardcanary → apply → fail-fast

State storeFirestore UI state

Email senderoutbound

Every mutating action funnels through one guarded chokepoint; a write is done only when read-back confirms.

External integrations

CRIOREST API + Fivetran CDC

TelnyxSMS · Fax · Voice AI

Google Voicevoicemail

Gmailoutbound mail

MetaLead Ads

ClickUptask board

Uber Healthrides

JotFormintake / release

Tremendousstipends

SignWelle-signatures

QuickBooksaccounting

AnthropicClaude

Langfuseagent tracing

ClinicalTrials.govNCT verification

Data tier

BigQuery Sole analytical / CRM store — a CRIO replica dataset (Fivetran CDC) + a CRM dataset (site-owned tables) + materialized hot-path views.

Firestore Keyed UI state + append-only audit.

Secret Manager Tokens, app passwords, webhook secrets.

GCS Fax media + artifacts.

Automation tier — Cloud Scheduler · ~91 jobs (85 enabled)

SyncCRIO, Meta, QB, calendar, drive, rides

Reminders / Commsmorning, day-of, post-visit, welcome

Triage & AIfax + voicemail classify, screen-fail sweep

Agentsno-show risk, orchestrator, outcomes

Audit / Healthdata-health, dupe audit, synthetic e2e

Alertsresolve sync, cleanup duplicates

Digestscoord brief, retarget, QC

Finance / Reconbank poll, reconcile, MV refresh

Automation feeds the data tier and acts via the core — same chokepoints, same guards.

Observability & guardrails

Cost telemetry · bytes billed per query Append-only audit tables 14 deploy-time smoke canaries CODEOWNERS-locked files

Five flows worth memorizing

flow1 · New patient lead → enrollment

Lane 1 · Lead capture

joinresearchstudies.com Public edge · patient submits a lead

lead submit→

chokepoint Cloud Function Core hub · writes the prescreening record

→

Eligibility match evaluates the lead against every study's eligibility rules → returns matched studies + a pre-screen booking option

Welcome comm queued through the comms approval layer → outbound email (Reply-To recruitment@) + SMS via Telnyx

Lane 2 · Self-booked pre-screen

Patient self-books a pre-screen Edge · picks an open slot

book pre-screen→

Book & verify validate → write CRIO → read-back → audit

→

Booking confirm comms confirmation

→

Reminders scheduled · day-before + day-of

→

Post-visit follow-up comms follow-up

flow2 · CRIO write path (mutating)

Every change funnels through the book-and-verify chokepoint (calendar) or the interaction-write chokepoint (notes). Reads are CDC-replicated into a CRIO replica dataset; a single read engine handles all reads. Any direct CRIO calendar write outside the chokepoint is blocked by a static lint rule in CI.

flow3 · Inbound triage

Fax: Telnyx → a scheduled classify step (Claude) → a fax-classification record → a scheduled route step → auto-dispatch email for the categories on the auto-send allowlist.

Voicemail: Google Voice → Gmail forward → a scheduled triage job → Claude classifies (11 categories × 4 urgency × 5 transfer targets) → merges into the voicemail record → SMS confirmation to caller via Telnyx → urgent items routed through the alert router to the task board.

flow4 · Comms-tab contract

Any auto-comm goes through the comms approval queue. Two near-in-time comms → the later one is canonical; earlier comms must NOT contradict it. An auto-send allowlist defines which message types fire automatically vs. land in a human-review tier for approval.

flow5 · Bulk mutations

Every N-row external mutation goes through the bulk-action guard (canary → apply → fail-fast: abort after 3 of 5 same-error). Direct N-row apply loops without the guard are lint-blocked in CI. The doctrine originated from an early incident where a bulk site-correction ran 25 cancel-without-create deletions for lack of such a guard.

Where the architecture is intentionally not symmetric

CRIO is read-mostly via CDC replication, write-rarely via API. Don't try to push every state change to CRIO — its API surface is narrow (calendar, interactions, traits). Audit lives in our own CRM data store.
Firestore is for keyed state + audit, NEVER for analytical queries. BigQuery is for analytical + CRM data. Firestore docs must stay <1 MiB (a lesson from a 2026-05-13 fax outage).
Email goes through the standard email chokepoint for everything except finance digests, which use a separate direct sender (off the Comms tab — financial-action emails, not coordinator workflow).
Change-controlled files (the dashboard server entrypoint, the core cloud-function source, all CI workflows, git hooks, and the agent instructions file) are code-owner-gated + pre-commit-enforced + re-checked in CI.
Cloud SQL is NOT used by the dashboard. The one Postgres instance is an experimental side project. BigQuery + Firestore cover the dashboard's analytical + state needs.

Doctrine chokepoint reference

Chokepoint	Concern	Default safety guard
Query engine	BigQuery submit	timezone-safe date normalize · 10-min in-app cache (bypassable) · per-query cost metering
Book & verify	CRIO calendar writes	validate → write → read-back → assert-matches → audit row · preflight subject-on-study · atomic create-then-cancel saga
Email sender	Outbound email	From a no-reply sender · Reply-To category map · finance uses a separate direct sender (off the Comms tab)
Comms queue	Auto-comms dispatch	auto-send allowlist fires · others land in a human-review tier · the LATER comm wins (canonical)
Alert router	Audit alerts → triage	Same-day idempotency per alert key · Comms / Ops / Data-Triage task-board lanes · auto-correct for known classes
Bulk-action guard	N-row mutations	Dry-by-default · short-lived canary token · fail-fast after 3 of 5 same-error
State store	UI state (Firestore)	Allowlisted state docs · merge helper + test required for new docs · NEVER raw localStorage
Intake merge guard	Intake → CRIO traits	Never overwrite for a patient with a completed visit · raw audit BEFORE mutation · multiple safety toggles
Phone-display helper	Phone display	Design-system helper · the resources fallback shim must mirror it
Interaction write	CRIO interaction POST	Allowlisted interaction types (no raw numeric type) · revision required for updates
Mutate & verify	Any phantom-confirming external write (task board / stipend provider / e-sign provider / 2nd CRIO surface)	Generalizes book-and-verify: validate → write → READ-BACK → assert storage == intent → audit. Returns success only when read-back confirms; new write sites use it now, ~36 legacy sites migrate in phases

Region & service map

Project: a single Google Cloud project (id and billing redacted).

us-east1 (production):

A Cloud Run operations dashboard — serves the internal IAP-gated dashboard
A Cloud Run read API
A single chokepoint Cloud Function Gen 2 — ~126 read feeds, ~125 read/action capabilities
A Cloud Run finance dashboard — the internal finance dashboard (owner-only)
~91 Cloud Scheduler jobs (85 enabled)

A second region (experimental):

A small experimental agent stack (Cloud Run + a micro Postgres) — side project
A handful of ad-hoc BigQuery tooling services

BigQuery region: US multi-region (job location).

RBAC tab matrix

A single role→tabs map is the source of truth, mirrored into a change-controlled region of the app shell. Admin is a strict superset of leadership — owners see every leadership surface automatically. Section-level role gating inside a shared tab is a lint-blocked anti-pattern; RBAC is tab-level only.

Role	Overview	Studies	Referrals	Actions	Leads	Resources	Lp(a)	Leadership	Agents	Comms	Admin
`admin`	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓
`leadership`	✓	✓	✓	✓	✓	✓	✓	✓	—	✓	—
`ops`	✓	✓	✓	✓	✓	✓	✓	—	—	—	—
`coordinator`	✓	✓	✓	✓	✓	✓	✓	—	—	—	—
`recruiter`	✓	✓	✓	✓	✓	✓	✓	—	—	—	—
`finance`	✓	✓	—	—	—	✓	✓	—	—	—	—
`investigator`	✓	✓	—	—	—	✓	✓	—	—	—	—
`readonly`	✓	—	—	—	—	✓	✓	—	—	—	—

Finance cluster decommissioned 2026-04-24: a separate internal finance dashboard. The finance role above maps to the main dashboard's read-only subset, not the legacy 6 finance tabs (which are change-controlled and no longer served).

Frontend modules

The dashboard front end is built from 26 source modules (concatenated at build time into a single hashed bundle) plus a shared design system, loaded ahead of a small late-binding inline block in the app shell. Rather than one monolith, responsibility is split into focused modules grouped by capability:

Capability area	What these modules cover
Session & platform	IAP session guard (detects an expired session and re-auths cleanly, loaded first), configuration roster (coordinators, recruiters, investigators, sponsors), and the role→tabs RBAC map.
Shared utilities & UI	Date / key / normalize / escape helpers, and the common UI layer (modals, toasts, health chips, tab switching).
Data pipeline	The phased dashboard loader, the live-data processing + dedup layer, and the top-level rendering / KPI layer.
Operational tabs	Schedule (visit detail, confirm + rideshare actions, progress chips), Studies (enrollment KPIs, scheduling-gap alerts), Referrals, and an Actions / smart follow-up surface.
Leads & resources	Leads list + detail, per-study resource cards with budget tiles and a portal editor, and a leadership analytics tab with period filtering and a query heatmap.
Comms & campaigns	The Comms queue + approval flow + agent dispatch surface, the campaign ladder (welcome / nurture / retarget), an eFax triage inbox, and a voicemail triage inbox.
Admin & specialty	The (change-controlled) admin tab, a sponsor/portfolio rollup (on hold), EDC-discrepancy categorization for leadership query intelligence, and AI-voice surfaces (per-call transcript / outcome / cost card + a live pulse strip showing concurrent dials, spend vs. cap, and dry-run / governance-pause state).

A shared design system exports the phone-formatting, link, chip, and button helpers; a fallback shim mirrors every render-path helper so a missing import never breaks a render.

Scheduled automation (~91 jobs · 85 enabled)

All schedules run on Eastern time. Each job's apply/send behavior follows the same safety defaults as the rest of the platform: read-only by default, with mutation and outbound-send gated behind explicit flags. The live count is ~91 jobs (85 enabled · 6 paused). Rather than enumerate them, here is the breakdown by capability domain — what each category of automation does and roughly how many jobs sit in it.

Domain	~Jobs	What this category of automation does
System syncs & replication	~11	Pulls external systems (ad platform, task board, accounting, drive, rides, calendar) into the warehouse, reconciles lead status, and refreshes hot-path materialized views.
Reminders & comms	~8	Visit reminders (multi-day, day-of, post-visit), pre-screen reminders, welcome / nurture campaigns, partial-lead nudges, and new-visit alerts.
AI triage & classification	~4	Classifies and routes inbound faxes, triages voicemails, and sweeps for screen-fail signals — all LLM-narrated with deterministic pre-classifiers.
Agents	~6	A morning orchestrator that dispatches sub-agents: no-show risk scoring, high-risk reminder targeting, a recruitment manager, coordinator digests, and an outcome tracker that closes the learning loop.
Audit & health	~11	Daily data-health scans, duplicate-appointment audits, bank-variance and freshness watchers, a security digest, consent-freshness checks, a synthetic end-to-end canary, and cost digests.
Alerts & auto-resolution	~4	Syncs alert resolution state, de-duplicates alerts, auto-suspends risky accounts, and auto-closes passed checks.
Digests & briefs	~6	Coordinator briefs (generate / send / smoke-verify), a retargeting digest, and an eSource-QC digest.
Finance & reconciliation	~6	Finance materialized-view refresh, a daily finance-manager run, weekly sponsor chase, stipend reconciliation, and bank-statement + remittance polling.
eSource QC & rideshare	~3	An ingest + classify loop feeding the eSource-QC engine, plus monthly rideshare billing.

Kill-switch pattern: any scheduled job can be paused independently without a code deploy; pausing is documented per-job in the incident-response runbook.

Agent system

The agent system (codename origin 2026-04-25) is the central strategic bet: end-to-end agent-orchestrated patient qualification, shipped in phases. Agents share infrastructure (the comms approval queue, the alert router, a warehouse-backed audit trail, agent tracing) and emit one row per decision, so the corpus becomes training data for future learning agents.

Agent	Concern	Cadence	Output
Orchestrator	Morning master agent	Daily, early AM	Dispatches to sub-agents; writes one decision row per action
Recruitment manager	Stale leads · screen-fail re-match · missing records pre-visit	Every 30 min, business hours M–F	3 signal types · least-loaded recruiter routing · 24h idempotency · per-signal quiet windows
No-show risk	Per-visit no-show probability score	Daily, early AM	Risk scores in CRITICAL / HIGH / MEDIUM tiers
High-risk reminder	Targeted SMS to high-risk patients	Daily, early AM	Queued to the comms approval layer with send enabled
Coordinator digest	Per-coordinator morning summary	Daily AM, M–F	Email digest (Reply-To: operations@)
Outcome tracker	Closes the feedback loop on the prior day's agent actions	Nightly	Outcome rows for the ML training corpus

Supporting infrastructure

Comms approval queue — every agent dispatch lands in a human-review tier unless its message type is on the auto-send allowlist (welcome / booking-confirm / reminder / fax-route-dispatch), which workers process immediately.
Alert router — every alert routes to the task board's auto-alert lanes (Comms / Ops / Data-Triage). Same-day idempotency per alert key · 30-min resolution sweep · operator-phone SMS bypasses throttling with a cross-instance rate limit.
Quiet-window gates (recruitment manager) — before dispatch it unions a last-touch map across CRIO patient interactions, appointment + subject-status audits, the comms queue, and the prescreening record. Per-signal thresholds: 7d (stale lead), 14d (screen-fail rematch), 3d (missing records).
Recruitment-feedback loop — records acted / positive / rejected / stale outcomes, with feedback notes referencing the parent action, so routing improves over time.
Agent tracing — every agent invocation is traced for QA-pass rate, hallucination rate, and latency. Agents not measured will degrade silently (an architectural-principles doctrine).

Phase roadmap

Phase 1 · Eligibility + lead capture (live)
Phase 2 · Comms approval chokepoint (live)
Phase 3 · HIE + lab APIs (planned, not live)
Phase 5 · Finance dollar-journey (live)
Phase 7 · Recruitment-manager scaffolding (live)
Phase 9 · Generic coordinator-brief templates (live)

Doctrine: agents support decisions, humans make them. Every write path is gated by a send / apply / dry flag; the recruitment manager never auto-sends — it queues to the comms approval layer for human review, and rule inserts land unapproved by default. The "learning" and "prediction" agents are roadmap — today's "prediction" handlers are deterministic SQL / heuristics, and the per-decision audit rows are the training corpus they'll eventually consume.

Leads pipeline & comms engine

Leads / patient-acquisition pipeline

Public intake + Meta Lead Ads → a prescreening / intake table → eligibility match → recruiter. Hardened across waves 1–3 (2026-06).

Stage	What
Intake	Public landing form + an idempotent ad-platform lead webhook + a conversion ladder back to the ad platform + structured disqualification reasons
Dedup	5-layer defense; a tiered partition keyed on the CRIO patient key first (then phone, email, unique fallback). Lint + write-time fail-loud + a daily drift audit
Eligibility	Evaluates each lead against the approved per-study eligibility rules → study type + a pre-screen booking option; the landing page renders match cards
Assignment	4-recruiter pool · an explicit per-patient assignment wins · a single routing chokepoint (a designated recruiter is fallback, not default)
Call-back queue	One unified, tiered call-back queue · feed columns + notes · per-referral call scripts · stable row order under auto-refresh
Speed-to-lead	First-contact timestamp backfill + KPI · materialized SLA columns · reply-positive parity
Partials rescue	20-min rescue SMS + a signed, single-purpose resume link (public, token-is-auth, prefills a few fields only)
Re-engagement	Cancel-without-reschedule digest + a reply router + an SMS dispatcher (observe-first, send is environment-gated)
Eligibility-match matrix	A large eligibility-match matrix (tens of thousands of rows at the 2026-05-30 backfill and growing daily, append-only) · origin-tagged (eligibility-match partner / provider office / public intake) · a daily recruiter call-list digest
Compliance	CRIO do-not-contact restrictions are rendered and outreach is blocked in the dashboard · existing-appointment outreach suppression (fail-closed)

Comms engine

One queue, three channels (email via Gmail, SMS + fax via Telnyx). Every auto-comm flows through a single enqueue chokepoint; allowlisted types auto-fire, everything else lands in a human-review tier for approval. The later comm wins — earlier comms must not carry contradicting calls-to-action.

Rule engine + role-inbox registry — a single source of truth for reply-to inboxes (recruitment / billing / operations / owner); raw inbox literals were ratcheted down from 163 to 84 across migration waves. Outbound always sends from a no-reply address; finance uses a separate direct sender (off the Comms tab).
Data model — a comms queue (state) · a comms audit log (chokepoint) · a send log (the only place carrying both the sent record and the rule that produced it; SMS is audited here too).
Send-time guards — toll-free phone guard · duplicate-send detector (a standing alarm) · email-volume detector · per-kind frequency guards · stuck-message escalation · structurally-impossible suppression (rolling window) · suppress-empty owner digests + a weekly heartbeat.
Phone-vs-in-person copy — phone-screen studies render "phone call" copy: no address, no maps link, no "bring photo ID".
Claims-coverage registry — every appointment-bearing comm declares its five claims (when / when / where / which / whom); a new kind can't ship without a guard for each (a CI tripwire).

AI voice telephony (Telnyx-native)

Outbound + inbound conversational voice for pre-screening and speed-to-lead, funneled through one chokepoint: the AI-caller layer. The stack is Telnyx Voice AI native — carrier + speech-to-text + text-to-speech + conversation LLM behind a single BAA. (An alternative vendor stack lives only in a separate pilot scaffold — it is NOT what ships in the dashboard.)

Use case	Status	Notes
Enrollment pre-screen (one study)	dark	Versioned prompt · routed per-study · dry-run + governance gated
Lp(a) pre-screen (a phone-screen study)	dark	Prompt not yet provisioned — refuses to run until it is set
Instant-callback (Phase A)	observe	Logs callback intent every few minutes; no live-dial branch yet
Generic first-touch assistant	dark	Enabled only when a dedicated assistant is configured (a separate lever)
Cancel-reengage (Phase C)	designed	The AI middle tier of an SMS → AI → human cascade; the SMS tier is built, voice not yet
ES / EN bilingual	built	Mid-call language switch is a prompt rule; each call records its language

Hard caps (compile-time constants)

Daily call cap 50 · max concurrent dials 5 · max call duration 8 min · quiet hours 8a–9p local
Daily spend cap $20 · per-study daily spend cap $10

Kill switches & gates (all default-safe)

A dry-run flag · a two-level warehouse governance pause (a control flag + an active-rules view, fail-closed) · a per-lead "AI voice disabled" flag · and an unconfigured assistant simply refuses to dial.
Consent-gated: a cryptographically version-locked, attorney-approved disclosure · TCPA / do-not-call / per-state gates · a concurrent-cap counter with a daily reconciler.

Independent audits (send only on findings)

Transcript audit (daily) — pattern matching + an IRB-approved compensation whitelist; flags any dollar amount outside the approved set and any unescalated adverse-event keyword. Critical for suicidality / chest-pain signals.
Prompt-drift audit (daily) — a normalized hash diff of each live assistant's instructions vs. the canonical version-controlled prompt; emails the owner only on drift.
Real-dial deploy canary (gated) — proves the real outbound-call path is alive on each deploy; the default smoke test forces dry-run. Catches the regression class that surfaced on an operator's first live dial.

CRIO writer — browser-automation write-back

CRIO has no payments/finance write API, so this drives the CRIO web UI like a bookkeeper to close the status-flip gap (sponsor pays → invoice stays "unpaid" in CRIO). Two paths in one tree:

Path	Engine / model	Writes?	Status
(a) Cloud Run scraper	Browser automation · Claude (Sonnet class)	OBSERVE-only (writing refuses by design)	dark — datacenter-IP CAPTCHA
(b) Live writer	Anthropic Computer Use · Claude (Haiku class, the model that supports the computer-use tool)	WRITES to CRIO	live — runs as local background agents on a residential-IP Mac

Recipes: payment + reconciliation, reconcile-only, and invoice-create — all LIVE behind the approval queue. Mark-invoice-paid ("Full Payment") is built but test-only — it requires an explicit execute flag, is hard-gated to a single test study, and its enqueuer is deliberately unwired (double-pay risk).

Write-then-verify (mirrors the calendar writer)

In-recipe vision-confirmation + URL navigation guards on every commit step.
An out-of-band warehouse read-back flips submitted → verified only on a deterministic remittance-id + amount match — the same phantom-confirmation doctrine as the calendar writer.

Local background agents (on the Mac · invisible to cloud logs)

A warm-browser daemon that consumes the approved-write queue one row at a time.
A no-browser read-back poller (every few minutes) that flips verification status.
A daily session / navigation smoke watchdog that performs no writes.

Kill switches

A soft kill-file · an auto-tripping circuit breaker · a short canary hold + STOP-via-SMS · schema validation refuses any row not explicitly approved · dry-by-default (mutation requires an explicit apply flag) · and a hard "unload the agents" stop.

Remote ops

The Mac sits on a private identity-based mesh network: SSH over the mesh (never exposed to the public internet) lets us restart the writer, the read-back poller, or the inbound tunnel when they die — no physical trip to the box. Deliberately on a separate path from the tunnel the agents use: the way in must not depend on the thing that breaks.

Book & verify saga sequence

The CRIO calendar write chokepoint. Any direct CRIO calendar write outside this helper is lint-blocked in CI. Doctrine origin: a 2026-05-01 incident (149 patient appointments stored four hours late since the prior autumn).

›

Intent (entry)

Caller (self-booking / bulk-action / agent) passes an intent:
patient, study, visit, site, start, end, and coordinator

validate(intent) — physical & business invariants

Equipment study pinned to its dedicated site
end > start (no zero-duration placeholders)
start is in the future
no same-site overlap (±15 min)
no cross-site appointment (±90 min travel)
fail closed if any check errors

Preflight subject-on-study

Checks live CRIO (not the replicated copy)
Refuses if the replica says enrolled but CRIO disagrees

Atomic create-then-cancel saga (create-first)

Create the new appointment
If create succeeds → cancel the old one
If create fails → don't touch the old (safe retry)
If the post-create cancel fails → audit it; the physical-impossibility audit catches the duplicate

Read-back assertions

GET the appointment back from CRIO
Assert wall-clock, duration, coordinator and status all match intent
Any mismatch → returns ok:false with the failures

✓

Audit + return

Write a replay-able Firestore audit row
Returns a verified-success flag, the appointment id, an audit id, and any assertion failures

Phantom-confirmation doctrine: callers MUST gate on ok === true, never on HTTP 200.

The 8-stack hardening that surrounds it

Real-CRIO smoke test on every deploy — exercises a designated test patient and fails the deploy on any non-success or read-back mismatch. No skip flag by design.
Preflight subject-on-study check — against live CRIO (~200 ms overhead). Catches replication lag.
Atomic create-first saga — earlier the order was cancel-first; a create-fail then left the patient with no appointment.
Eastern-naive datetime contract — a single formatting chokepoint; tests block UTC drift.
Phantom-confirmation gating — every consumer reads the verified-success flag, never "we tried."
Bulk-action canary → apply gate + fail-fast — 3 of 5 same-error aborts the loop.
Lead-status reconciliation alerts on dangerous demotions — phantom bookings can't hide.
CRIO write monitoring — a failure-rate sweep over the audit trail (deferred · tracked as a coverage gap).

Three morning defense-in-depth audits

Run by the daily data-health scan (early AM): CRIO time drift (booking notes vs. stored start), physical impossibility (overlap / cross-site <90 min), and zero-duration upcoming visits. Independent of the chokepoint, so they catch direct-in-CRIO-UI mistakes too.

Newer validation layers (2026-06)

Guard	Where	Catches
Single-equipment capacity guard	Write-time + a cross-patient data-health audit	CRIO validates per-patient + per-coordinator overlap but NOT per shared equipment. A single-machine study (one scanner at one site) could be double-booked across two coordinators. The guard pins those studies and clamps to office hours, fail-closed.
Cross-site duplicate-enrollment guard in review	Write-time + a data-health audit	Same patient enrolled / booked at both sites for the same protocol (a patient complaint, 2026-06-16). A Layer-1 audit flags existing cross-site duplicates; a Layer-2 write-time guard refuses the second-site booking, scoped to the active-status study set. Not yet merged / deployed.
Recruiter-vs-coordinator routing	Write-time + the staff-resolution helper	Phone pre-screens booked on a coordinator calendar instead of a recruiter — refuses the write.

Generalization: the mutate-and-verify chokepoint extracts the validate → write → read-back → assert → audit contract so any phantom-confirming external mutation (task board, stipend provider, e-sign provider, second CRIO surface) reports success only when read-back confirms intent. New write sites use it now; the ~36 legacy task-board / stipend-provider sites migrate in phases.

eSource QC & screen-fail intelligence

eSource QC engine

A 15-module subsystem (engine, chokepoint, LLM narrator, rules, registry, wizard, digest, a tracker ingest/classify loop, plus five proposers: consent, eligibility, IP-canon, and required-docs). The engine runs on a business-day morning cadence, an all-day ingest + classify loop feeds it, and a digest emails ops@ each morning. The LLM is a narrator: schema-validated output cites subject + visit + evidence row, with severity capped by a ground-truth tier (1–4). Auto-pause on a 3σ findings spike (an append-only run log, 21 CFR Part 11). A deterministic pre-classifier runs before the LLM.

Screen-fail intelligence

Classifier precedence: a deterministic pre-classifier (mirrors 18 months of real coordinator cancel-reason vocabulary) → Claude over the visit notes → a locked 10-value reason enum (drift blocked by a CI lint rule). The inbound webhook is primary; a scheduled sweep is the dropped-webhook backstop. About ten capabilities: manual-tag · reclassify · classifier-preview · protocol-level recurrence patterns · context · outcome detection (a rematch learning loop) · cross-study indication patterns · weekly digest. Surfaces sponsor-level signals (inclusion criteria too tight) that per-event triage misses.

Security, compliance & PHI perimeter

IT-security monitor — ~14 monitoring sections · an account-suspension check runs every 15 min (auto-suspends risky Workspace accounts) · an owner-only suppress-empty digest each morning + a weekly heartbeat.
PHI perimeter — a sensitive/PHI feed allowlist contract (403 to anonymous callers) · enforced by a CI lint rule + contract tests · every dashboard fetch is wrapped by the IAP session guard.
Tracing fail-closed — the agent-tracing integration refuses to enable if pointed at a public, out-of-perimeter endpoint (traces can carry PHI; they must stay in-perimeter). Currently dark after a 2026-05-22 leak.
Change-controlled sections — 4-layer enforcement: code-owner review + pre-commit + CI re-check + branch protection. Author-domain gate (@phillyresearch.com only, SSH-signed commits).
Supply-chain / IAM — secret scanning · static security scan · IaC static analysis · SBOM + a nightly security sweep · a weekly IAM-drift audit (project + service accounts + critical secrets + dataset ACL vs. an allowlist).

Provider portal

The fourth production surface: a token-gated provider portal at joinresearchstudies.com/refer for direct-from-clinic ordering. Referring physicians place orders, get a fax-back referral, and see a study list with eligibility criteria. It also renders live PI rheumatology availability — it reads a dedicated "PI schedule" calendar, subtracts booked visits, and honors out-of-office blackouts. PHI-free by construction (only dates + open windows; never patient / coordinator / subject identifiers). A reviewer view renders a no-code page (an operator eyeballs it, edits the calendar, refreshes) — to change what providers see, edit the calendar event times; no code change.

Build & deploy pipeline

Source surfaces: the 26 front-end modules + the design-system sources + a small inline late-bind block (and inline styles) in the app shell, delimited by build markers.

Build: a build script concatenates the modules + design system into a single hashed bundle, transforms the app shell, and extracts a stylesheet.

11 CI/CD workflows (auto-deploy on push to main, ~11 min):

Workflow	Trigger	Purpose
Dashboard deploy	push main (front-end)	Build + deploy the dashboard to the Cloud Run operations service
Cloud-function deploy	push main (core cloud-function source)	Compile + tests + smoke canaries + deploy the chokepoint Cloud Function
Landing-pages deploy	push main (landing pages)	Deploy joinresearchstudies.com
CI	PR	Lint + unit + invariant coverage + render-smoke
Change-control re-check	PR	Re-runs the change-controlled-sections check server-side
Author-domain	PR	Re-verifies @phillyresearch.com on every commit
Performance gate	PR	Lighthouse performance-score gate
IaC scan	PR	Infrastructure-as-code static analysis
Security scan	PR	Static security scan
SBOM	scheduled	SBOM generation + dependency audit
Nightly security sweep	nightly	Secret scan + dependency CVE check

Pre-commit pipeline

Step 0 — author-domain gate (must be @phillyresearch.com)
Step 0b — rule-drift advisory (prose rules ↔ coverage rows)
Step 0c — orphan-stash warning (other-agent work-in-progress stashes)
Step 1 — change-controlled-sections check
Step 2 — invariant-coverage check (coverage rows pointing at non-existent backstops fail)
Step 3 — the custom lint suite (56 rules)
Step 3b — an AST-based no-undef check across the front end + inline block
Step 4 — the unit-test suite (1,559 tests)
Step 5b — render-smoke (executes a key render path against fixtures in a headless DOM)

Change-controlled files (code-owner review + pre-commit + CI re-check): the dashboard server entrypoint · the core cloud-function source · git hooks · the code-owners file · all CI workflows · the change-control checker · the custom lint suite · the agent instructions file · the architecture + ground-truth docs. Inside the app shell, a handful of region tags mark change-controlled admin, finance, and RBAC blocks.

Lint rules & test backstops

56 active rules in the custom lint suite · 1,559 passing unit tests · 39 invariant rows (34 enforced) in the coverage checker · 5 cloud-function contract tests · a headless-DOM render-smoke. Selected high-leverage rules, described by what they catch:

Rule (what it guards)	What it catches
Safe date parsing	Bare date-string parsing that drifts to UTC (off-by-one in Eastern time)
No hardcoded year	Hardcoded calendar years in code or warehouse SQL
No known secret patterns	Ten secret-shaped regexes (cloud, source-host, payment, e-sign, key material, etc.)
PHI feed requires auth	New warehouse feeds touching PHI columns without a sensitive-feed allowlist entry
Schedule rebuild/inject pairing	Any schedule-table rebuild without its paired confirm + rideshare button injection
No inline script	Inline scripts or inline event handlers in standalone HTML (CSP)
Equipment-study site pairing	The single-equipment study co-occurring with the other site's key
Bulk action needs guards	N-row loops + mutating calls without the bulk-action guard
No direct CRIO calendar write	Direct CRIO calendar update/cancel outside the book-and-verify chokepoint
Comms via the queue only	Outbound comms bypassing the comms approval queue
Reason must be enum	Free-string cancel reasons outside the locked 10-value enum
Timezone-explicit datetimes	Bare current-date/current-datetime SQL (defaults to UTC)
No window.open after await	Opening a window after an await (silently popup-blocked)
RBAC: no section gating	Role reads outside the central RBAC module without an explicit allow tag
External links use query, not fragment	Email/SMS links using a URL fragment instead of a query param (IAP drops fragments)
Prose rule needs coverage	A new NEVER/ALWAYS prose rule without a paired coverage row
Pre-screen routing via chokepoint	Direct site→coordinator lookup bypassing the staff-resolution chokepoint
Recruitment list via chokepoint	A recruitment task-board task not routed through the recruiter-routing chokepoint
Pre-screen dedup via chokepoint	A new insert into the prescreening record outside a sanctioned writer
Extracted action needs handler	A new inline action without a paired dedicated handler module
No bank-variance to staff	A money-at-risk / bank-variance alert routed to a shared staff inbox

The table above is a curated subset — 56 rules total. Many enforce the doctrine chokepoints (each NEVER/ALWAYS prose rule is paired with a backstop and a coverage row — 405 rows).

Post-deploy smoke canaries — 14 (run in the cloud-function deploy and able to fail it): book-and-verify · email · AI caller · real AI dial · SMS · fax · task board · stipends · ad-platform · ad-platform conversions · rides · intake forms · voicemail triage · screen-fail (+ fax routing).

BigQuery join model

A small set of primary keys carries most analytical queries — roughly: study, patient, subject (one patient-on-one-study), subject-visit, calendar-appointment, and user. The accounting/revenue tables are the exception: they use string UUIDs and join to the clinical side through the study's external id rather than its integer key (a common gotcha). A handful of additional keys connect the document, web-form, and invoice-line chains.

Join key	Connects	Notes
Study key	study ↔ subject ↔ subject-visit ↔ appointment ↔ patient-interaction ↔ subject-document ↔ study-user ↔ subject fact tables	Integer in the CRIO replica, string in the CRM data; a helper resolves the right one.
Patient key	patient ↔ subject ↔ patient-interaction ↔ subject-payment ↔ prescreening record	A deterministic composite key (name + study + date) is used for dedup when the key is absent.
Subject key	subject ↔ subject-visit ↔ appointment ↔ subject-document ↔ subject-payment	One (patient, study) pair.
Subject-visit key	subject-visit ↔ visit stats ↔ procedure-question facts ↔ appointment audit	Drives eSource-first coordinator/PI attribution (12-month rolling).
Appointment key	appointment ↔ appointment audit ↔ user-appointment	Always filter out deleted "ghost" appointments.
User key	user ↔ study-user ↔ user-appointment ↔ patient-interaction	Filter on study-user role (PI vs. coordinator).
Study external id (UUID)	study ↔ the invoice / payment / revenue-group tables of the revenue schema	Join on the external id, NOT a cast of the integer study key. A common accounting trap.
Invoice-line id	the invoice-line ↔ revenue-group ↔ revenue-data-point chain of the revenue schema	Pass-through actuals key off the invoice-line name, not the line detail field.
Document-type key	subject-document ↔ document-type	Drives medical-records staging (filtered to medical-record document types + valid statuses).
Web-form key	web-form ↔ web-form-submission	The web-form funnel feed.

Universal filter gotchas

CDC soft-deletes: always exclude soft-deleted rows on the CRIO replica.
Cancelled appointments: always exclude the deleted-status sentinel (it means DELETED, not cancelled — otherwise it adds dozens of phantom rows).
Publisher / template studies: dropped at source by a study-fetch helper.
Sponsor coordinators: a normalization helper silently drops a known set of sponsor-side coordinator names.
Timezone: always use Eastern-explicit current-date SQL, never the bare (UTC) form; enforced by a lint rule.

Revenue & receivables platform (4 layers)

The finance system is structured as the finance lead's 4-layer revenue & receivables platform: catch bad data before it costs money, validate that booked revenue is real, attribute cost to study, then collect what's owed. The dollar-journey FMEA sits underneath all four as the reconciliation spine.

Layer	Concern	Status
L1 · Data health / invoice integrity	Cross-system audits before money is computed — invoice integrity, CRIO time-drift, corrupted-phone, lead-dedup drift. (A cross-site duplicate-enrollment audit is in review, not yet shipped.)	live
L2 · Revenue Validation Engine	Is booked revenue real and correctly based? The basis is study-dependent (base-fee is more common; overhead-inclusion varies by contract). Catches a class of false positives via per-visit exact matching.	design merged · analysis done, build pending
L3 · Per-study margin / cost ops	Revenue − allocated DIRECT cost, per study / PI / site (the margin engine, below).	live
L4 · Collections Automation Engine	A dunning ladder over sponsor AR, with a reconciliation gate (genuinely-owed AR across the open invoices) before any chase. A given sponsor's past-due AR can span multiple studies, billed via the sponsor's billing vendor.	built + tested (not yet merged) · SEND HARD-GATED OFF

L4 is intentionally dark: outbound send stays disabled until the finance lead signs off and the sponsor reconciliation closes. Sponsor AR lives in the CRIO revenue schema — the accounting system's A/R (a small number of customers) is NOT sponsor AR; the collections system of record is CRIO + bank reconciliation, never the accounting system.

Margin engine (L3)

Two metrics per study: contribution margin (revenue − allocated DIRECT, overhead excluded) and fully-loaded margin (− a per-study overhead share on a visit-share basis). DIRECT cost lands via a 5-tier allocation cascade:

PI fees by eSource attribution (named-PI vendors)
A configurable vendor-to-study allocation map (or flagged as overhead)
By study
Activity-driven (allocate by active subjects)
Uniform-per-visit fallback

Aggregate totals are summed FROM the per-row fields (rows == totals by construction); an unallocated-residual figure surfaces DIRECT dollars that couldn't be landed. PI-level + site-level profitability rollups. Operating expenses drop non-operating accounts; fully-loaded margin loads corporate overhead. Drives the per-study margin section on the finance dashboard. Doctrine: the GL class is a starting scope, not truth; reconcile totals from rows.

Dollar-journey FMEA spine

An end-to-end reconciliation effort (codename origin 2026-04-27) modeled as a 53-mode FMEA. Phases: CRIO revenue extraction (the revenue-data-point chain) · pass-through actuals keyed off the invoice-line name · sponsor cadence + invoice aging · a bank-statement variance watcher · a daily digest over the dollar-journey view + materialized view.

The CRIO mark-paid writer (see CRIO writer) is the L4↔CRIO bridge — it closes invoices in CRIO once the deposit is reconciled, fixing the status-flip gap that left paid invoices reading "unpaid."

Doctrine: the bank deposit is the ONLY trusted financial event. CRIO has no financial-write API beyond the browser writer; observe, never push speculative state. The real-dollar axis is truth; the CRIO axis is orthogonal hygiene — never sum the two.

Reconciliation jobs: weekly stipend reconciliation · the bank-variance watcher · a bank-statement freshness check · bank-statement + remittance polling · and a weekly sponsor chase.

Portal scraper (sponsor portal observations)

A headless Cloud Run scraper that observes a sponsor portal for protocol updates, query responses, and document expirations. It writes additively to a scraper-runs audit table via an open/close merge chokepoint.

Cost-telemetry chokepoint

Every warehouse read in the scraper's query layer routes through one accumulator that also reads each query's bytes-processed metric. The per-run sum is reset at run start and flushed at run end into the audit row's cost column. This mirrors the cloud-function-side read engine's cost sink so cost telemetry lands end-to-end across both services. Without it, the column is always null and any future cost-regression alert has nothing to work with.

Audit-table contract

Scraper-runs rows carry an open/closed state — run start merges an open row; run end closes it.
A stale-run alert fires through the alert router when a run stays open past its expected window.
A watcher endpoint + a Resources-tab health card are open follow-ups (tracked in the scraper runbook).

Auth: the portal vendor's bookmark-redirect URLs were unmasked in a one-shot step (2026-04-13). A cookie-based session is held in a secret. It is re-run ad hoc when bookmarks change — intentionally NOT on a schedule.

Quick reference index

RBAC tab matrix — 8 roles × 10 tabs
Frontend modules — 26 source modules + design system
Scheduled automation — ~91 jobs (85 enabled), grouped by capability
Agent system — 6 active agents + phase roadmap
AI voice telephony — Telnyx-native · enrollment/Lp(a) pre-screen · caps + consent + audits
CRIO writer — browser-automation write-back · payments/invoices/mark-paid
Book & verify saga — 4-step sequence + 8-stack hardening + capacity/cross-site guards
eSource QC & screen-fail intelligence — 15-module QC engine + classifier + rematch loop
Security, compliance & PHI perimeter — auto-suspend + PHI contract + tracing fail-closed
Provider portal — direct ordering + live rheum availability
Build & deploy pipeline — 11 workflows + pre-commit + change-controlled files
Lint rules & test backstops — 56 rules + 1,559 tests + 14 smoke canaries
BigQuery join model — primary keys + universal filter gotchas
Revenue & receivables platform — 4 layers · margin engine · dollar-journey FMEA
Portal scraper — sponsor-portal observer · audit table · cost-telemetry chokepoint