CRP system architecture
End-to-end map of actors, edges, application tier, doctrine chokepoints, data tier, external integrations, scheduler, and observability. Updated 2026-06-16.
System diagram
phillyresearch.com — marketing.
Entry capabilities: lead submission, self-booked pre-screen, intake / release forms.
The internal finance dashboard (owner-only).
Every dashboard fetch routes through here.
Every mutating action funnels through one guarded chokepoint; a write is done only when read-back confirms.
Automation feeds the data tier and acts via the core — same chokepoints, same guards.
Five flows worth memorizing
flow1 · New patient lead → enrollment
flow2 · CRIO write path (mutating)
Every change funnels through the book-and-verify chokepoint (calendar) or the interaction-write chokepoint (notes). Reads are CDC-replicated into a CRIO replica dataset; a single read engine handles all reads. Any direct CRIO calendar write outside the chokepoint is blocked by a static lint rule in CI.
flow3 · Inbound triage
Fax: Telnyx → a scheduled classify step (Claude) → a fax-classification record → a scheduled route step → auto-dispatch email for the categories on the auto-send allowlist.
Voicemail: Google Voice → Gmail forward → a scheduled triage job → Claude classifies (11 categories × 4 urgency × 5 transfer targets) → merges into the voicemail record → SMS confirmation to caller via Telnyx → urgent items routed through the alert router to the task board.
flow4 · Comms-tab contract
Any auto-comm goes through the comms approval queue. Two near-in-time comms → the later one is canonical; earlier comms must NOT contradict it. An auto-send allowlist defines which message types fire automatically vs. land in a human-review tier for approval.
flow5 · Bulk mutations
Every N-row external mutation goes through the bulk-action guard (canary → apply → fail-fast: abort after 3 of 5 same-error). Direct N-row apply loops without the guard are lint-blocked in CI. The doctrine originated from an early incident where a bulk site-correction ran 25 cancel-without-create deletions for lack of such a guard.
Where the architecture is intentionally not symmetric
- CRIO is read-mostly via CDC replication, write-rarely via API. Don't try to push every state change to CRIO — its API surface is narrow (calendar, interactions, traits). Audit lives in our own CRM data store.
- Firestore is for keyed state + audit, NEVER for analytical queries. BigQuery is for analytical + CRM data. Firestore docs must stay <1 MiB (a lesson from a 2026-05-13 fax outage).
- Email goes through the standard email chokepoint for everything except finance digests, which use a separate direct sender (off the Comms tab — financial-action emails, not coordinator workflow).
- Change-controlled files (the dashboard server entrypoint, the core cloud-function source, all CI workflows, git hooks, and the agent instructions file) are code-owner-gated + pre-commit-enforced + re-checked in CI.
- Cloud SQL is NOT used by the dashboard. The one Postgres instance is an experimental side project. BigQuery + Firestore cover the dashboard's analytical + state needs.
Doctrine chokepoint reference
| Chokepoint | Concern | Default safety guard |
|---|---|---|
| Query engine | BigQuery submit | timezone-safe date normalize · 10-min in-app cache (bypassable) · per-query cost metering |
| Book & verify | CRIO calendar writes | validate → write → read-back → assert-matches → audit row · preflight subject-on-study · atomic create-then-cancel saga |
| Email sender | Outbound email | From a no-reply sender · Reply-To category map · finance uses a separate direct sender (off the Comms tab) |
| Comms queue | Auto-comms dispatch | auto-send allowlist fires · others land in a human-review tier · the LATER comm wins (canonical) |
| Alert router | Audit alerts → triage | Same-day idempotency per alert key · Comms / Ops / Data-Triage task-board lanes · auto-correct for known classes |
| Bulk-action guard | N-row mutations | Dry-by-default · short-lived canary token · fail-fast after 3 of 5 same-error |
| State store | UI state (Firestore) | Allowlisted state docs · merge helper + test required for new docs · NEVER raw localStorage |
| Intake merge guard | Intake → CRIO traits | Never overwrite for a patient with a completed visit · raw audit BEFORE mutation · multiple safety toggles |
| Phone-display helper | Phone display | Design-system helper · the resources fallback shim must mirror it |
| Interaction write | CRIO interaction POST | Allowlisted interaction types (no raw numeric type) · revision required for updates |
| Mutate & verify | Any phantom-confirming external write (task board / stipend provider / e-sign provider / 2nd CRIO surface) | Generalizes book-and-verify: validate → write → READ-BACK → assert storage == intent → audit. Returns success only when read-back confirms; new write sites use it now, ~36 legacy sites migrate in phases |
Region & service map
Project: a single Google Cloud project (id and billing redacted).
us-east1 (production):
- A Cloud Run operations dashboard — serves the internal IAP-gated dashboard
- A Cloud Run read API
- A single chokepoint Cloud Function Gen 2 — ~126 read feeds, ~125 read/action capabilities
- A Cloud Run finance dashboard — the internal finance dashboard (owner-only)
- ~91 Cloud Scheduler jobs (85 enabled)
A second region (experimental):
- A small experimental agent stack (Cloud Run + a micro Postgres) — side project
- A handful of ad-hoc BigQuery tooling services
BigQuery region: US multi-region (job location).
RBAC tab matrix
A single role→tabs map is the source of truth, mirrored into a change-controlled region of the app shell. Admin is a strict superset of leadership — owners see every leadership surface automatically. Section-level role gating inside a shared tab is a lint-blocked anti-pattern; RBAC is tab-level only.
| Role | Overview | Studies | Referrals | Actions | Leads | Resources | Lp(a) | Leadership | Agents | Comms | Admin |
|---|---|---|---|---|---|---|---|---|---|---|---|
admin | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
leadership | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | ✓ | — |
ops | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | — | — | — |
coordinator | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | — | — | — |
recruiter | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | — | — | — |
finance | ✓ | ✓ | — | — | — | ✓ | ✓ | — | — | — | — |
investigator | ✓ | ✓ | — | — | — | ✓ | ✓ | — | — | — | — |
readonly | ✓ | — | — | — | — | ✓ | ✓ | — | — | — | — |
Finance cluster decommissioned 2026-04-24: a separate internal finance dashboard. The finance role above maps to the main dashboard's read-only subset, not the legacy 6 finance tabs (which are change-controlled and no longer served).
Frontend modules
The dashboard front end is built from 26 source modules (concatenated at build time into a single hashed bundle) plus a shared design system, loaded ahead of a small late-binding inline block in the app shell. Rather than one monolith, responsibility is split into focused modules grouped by capability:
| Capability area | What these modules cover |
|---|---|
| Session & platform | IAP session guard (detects an expired session and re-auths cleanly, loaded first), configuration roster (coordinators, recruiters, investigators, sponsors), and the role→tabs RBAC map. |
| Shared utilities & UI | Date / key / normalize / escape helpers, and the common UI layer (modals, toasts, health chips, tab switching). |
| Data pipeline | The phased dashboard loader, the live-data processing + dedup layer, and the top-level rendering / KPI layer. |
| Operational tabs | Schedule (visit detail, confirm + rideshare actions, progress chips), Studies (enrollment KPIs, scheduling-gap alerts), Referrals, and an Actions / smart follow-up surface. |
| Leads & resources | Leads list + detail, per-study resource cards with budget tiles and a portal editor, and a leadership analytics tab with period filtering and a query heatmap. |
| Comms & campaigns | The Comms queue + approval flow + agent dispatch surface, the campaign ladder (welcome / nurture / retarget), an eFax triage inbox, and a voicemail triage inbox. |
| Admin & specialty | The (change-controlled) admin tab, a sponsor/portfolio rollup (on hold), EDC-discrepancy categorization for leadership query intelligence, and AI-voice surfaces (per-call transcript / outcome / cost card + a live pulse strip showing concurrent dials, spend vs. cap, and dry-run / governance-pause state). |
A shared design system exports the phone-formatting, link, chip, and button helpers; a fallback shim mirrors every render-path helper so a missing import never breaks a render.
Scheduled automation (~91 jobs · 85 enabled)
All schedules run on Eastern time. Each job's apply/send behavior follows the same safety defaults as the rest of the platform: read-only by default, with mutation and outbound-send gated behind explicit flags. The live count is ~91 jobs (85 enabled · 6 paused). Rather than enumerate them, here is the breakdown by capability domain — what each category of automation does and roughly how many jobs sit in it.
| Domain | ~Jobs | What this category of automation does |
|---|---|---|
| System syncs & replication | ~11 | Pulls external systems (ad platform, task board, accounting, drive, rides, calendar) into the warehouse, reconciles lead status, and refreshes hot-path materialized views. |
| Reminders & comms | ~8 | Visit reminders (multi-day, day-of, post-visit), pre-screen reminders, welcome / nurture campaigns, partial-lead nudges, and new-visit alerts. |
| AI triage & classification | ~4 | Classifies and routes inbound faxes, triages voicemails, and sweeps for screen-fail signals — all LLM-narrated with deterministic pre-classifiers. |
| Agents | ~6 | A morning orchestrator that dispatches sub-agents: no-show risk scoring, high-risk reminder targeting, a recruitment manager, coordinator digests, and an outcome tracker that closes the learning loop. |
| Audit & health | ~11 | Daily data-health scans, duplicate-appointment audits, bank-variance and freshness watchers, a security digest, consent-freshness checks, a synthetic end-to-end canary, and cost digests. |
| Alerts & auto-resolution | ~4 | Syncs alert resolution state, de-duplicates alerts, auto-suspends risky accounts, and auto-closes passed checks. |
| Digests & briefs | ~6 | Coordinator briefs (generate / send / smoke-verify), a retargeting digest, and an eSource-QC digest. |
| Finance & reconciliation | ~6 | Finance materialized-view refresh, a daily finance-manager run, weekly sponsor chase, stipend reconciliation, and bank-statement + remittance polling. |
| eSource QC & rideshare | ~3 | An ingest + classify loop feeding the eSource-QC engine, plus monthly rideshare billing. |
Kill-switch pattern: any scheduled job can be paused independently without a code deploy; pausing is documented per-job in the incident-response runbook.
Agent system
The agent system (codename origin 2026-04-25) is the central strategic bet: end-to-end agent-orchestrated patient qualification, shipped in phases. Agents share infrastructure (the comms approval queue, the alert router, a warehouse-backed audit trail, agent tracing) and emit one row per decision, so the corpus becomes training data for future learning agents.
| Agent | Concern | Cadence | Output |
|---|---|---|---|
| Orchestrator | Morning master agent | Daily, early AM | Dispatches to sub-agents; writes one decision row per action |
| Recruitment manager | Stale leads · screen-fail re-match · missing records pre-visit | Every 30 min, business hours M–F | 3 signal types · least-loaded recruiter routing · 24h idempotency · per-signal quiet windows |
| No-show risk | Per-visit no-show probability score | Daily, early AM | Risk scores in CRITICAL / HIGH / MEDIUM tiers |
| High-risk reminder | Targeted SMS to high-risk patients | Daily, early AM | Queued to the comms approval layer with send enabled |
| Coordinator digest | Per-coordinator morning summary | Daily AM, M–F | Email digest (Reply-To: operations@) |
| Outcome tracker | Closes the feedback loop on the prior day's agent actions | Nightly | Outcome rows for the ML training corpus |
Supporting infrastructure
- Comms approval queue — every agent dispatch lands in a human-review tier unless its message type is on the auto-send allowlist (welcome / booking-confirm / reminder / fax-route-dispatch), which workers process immediately.
- Alert router — every alert routes to the task board's auto-alert lanes (Comms / Ops / Data-Triage). Same-day idempotency per alert key · 30-min resolution sweep · operator-phone SMS bypasses throttling with a cross-instance rate limit.
- Quiet-window gates (recruitment manager) — before dispatch it unions a last-touch map across CRIO patient interactions, appointment + subject-status audits, the comms queue, and the prescreening record. Per-signal thresholds: 7d (stale lead), 14d (screen-fail rematch), 3d (missing records).
- Recruitment-feedback loop — records acted / positive / rejected / stale outcomes, with feedback notes referencing the parent action, so routing improves over time.
- Agent tracing — every agent invocation is traced for QA-pass rate, hallucination rate, and latency. Agents not measured will degrade silently (an architectural-principles doctrine).
Phase roadmap
- Phase 1 · Eligibility + lead capture (live)
- Phase 2 · Comms approval chokepoint (live)
- Phase 3 · HIE + lab APIs (planned, not live)
- Phase 5 · Finance dollar-journey (live)
- Phase 7 · Recruitment-manager scaffolding (live)
- Phase 9 · Generic coordinator-brief templates (live)
Doctrine: agents support decisions, humans make them. Every write path is gated by a send / apply / dry flag; the recruitment manager never auto-sends — it queues to the comms approval layer for human review, and rule inserts land unapproved by default. The "learning" and "prediction" agents are roadmap — today's "prediction" handlers are deterministic SQL / heuristics, and the per-decision audit rows are the training corpus they'll eventually consume.
Leads pipeline & comms engine
Leads / patient-acquisition pipeline
Public intake + Meta Lead Ads → a prescreening / intake table → eligibility match → recruiter. Hardened across waves 1–3 (2026-06).
| Stage | What |
|---|---|
| Intake | Public landing form + an idempotent ad-platform lead webhook + a conversion ladder back to the ad platform + structured disqualification reasons |
| Dedup | 5-layer defense; a tiered partition keyed on the CRIO patient key first (then phone, email, unique fallback). Lint + write-time fail-loud + a daily drift audit |
| Eligibility | Evaluates each lead against the approved per-study eligibility rules → study type + a pre-screen booking option; the landing page renders match cards |
| Assignment | 4-recruiter pool · an explicit per-patient assignment wins · a single routing chokepoint (a designated recruiter is fallback, not default) |
| Call-back queue | One unified, tiered call-back queue · feed columns + notes · per-referral call scripts · stable row order under auto-refresh |
| Speed-to-lead | First-contact timestamp backfill + KPI · materialized SLA columns · reply-positive parity |
| Partials rescue | 20-min rescue SMS + a signed, single-purpose resume link (public, token-is-auth, prefills a few fields only) |
| Re-engagement | Cancel-without-reschedule digest + a reply router + an SMS dispatcher (observe-first, send is environment-gated) |
| Eligibility-match matrix | A large eligibility-match matrix (tens of thousands of rows at the 2026-05-30 backfill and growing daily, append-only) · origin-tagged (eligibility-match partner / provider office / public intake) · a daily recruiter call-list digest |
| Compliance | CRIO do-not-contact restrictions are rendered and outreach is blocked in the dashboard · existing-appointment outreach suppression (fail-closed) |
Comms engine
One queue, three channels (email via Gmail, SMS + fax via Telnyx). Every auto-comm flows through a single enqueue chokepoint; allowlisted types auto-fire, everything else lands in a human-review tier for approval. The later comm wins — earlier comms must not carry contradicting calls-to-action.
- Rule engine + role-inbox registry — a single source of truth for reply-to inboxes (recruitment / billing / operations / owner); raw inbox literals were ratcheted down from 163 to 84 across migration waves. Outbound always sends from a no-reply address; finance uses a separate direct sender (off the Comms tab).
- Data model — a comms queue (state) · a comms audit log (chokepoint) · a send log (the only place carrying both the sent record and the rule that produced it; SMS is audited here too).
- Send-time guards — toll-free phone guard · duplicate-send detector (a standing alarm) · email-volume detector · per-kind frequency guards · stuck-message escalation · structurally-impossible suppression (rolling window) · suppress-empty owner digests + a weekly heartbeat.
- Phone-vs-in-person copy — phone-screen studies render "phone call" copy: no address, no maps link, no "bring photo ID".
- Claims-coverage registry — every appointment-bearing comm declares its five claims (when / when / where / which / whom); a new kind can't ship without a guard for each (a CI tripwire).
AI voice telephony (Telnyx-native)
Outbound + inbound conversational voice for pre-screening and speed-to-lead, funneled through one chokepoint: the AI-caller layer. The stack is Telnyx Voice AI native — carrier + speech-to-text + text-to-speech + conversation LLM behind a single BAA. (An alternative vendor stack lives only in a separate pilot scaffold — it is NOT what ships in the dashboard.)
| Use case | Status | Notes |
|---|---|---|
| Enrollment pre-screen (one study) | dark | Versioned prompt · routed per-study · dry-run + governance gated |
| Lp(a) pre-screen (a phone-screen study) | dark | Prompt not yet provisioned — refuses to run until it is set |
| Instant-callback (Phase A) | observe | Logs callback intent every few minutes; no live-dial branch yet |
| Generic first-touch assistant | dark | Enabled only when a dedicated assistant is configured (a separate lever) |
| Cancel-reengage (Phase C) | designed | The AI middle tier of an SMS → AI → human cascade; the SMS tier is built, voice not yet |
| ES / EN bilingual | built | Mid-call language switch is a prompt rule; each call records its language |
Hard caps (compile-time constants)
- Daily call cap 50 · max concurrent dials 5 · max call duration 8 min · quiet hours 8a–9p local
- Daily spend cap $20 · per-study daily spend cap $10
Kill switches & gates (all default-safe)
- A dry-run flag · a two-level warehouse governance pause (a control flag + an active-rules view, fail-closed) · a per-lead "AI voice disabled" flag · and an unconfigured assistant simply refuses to dial.
- Consent-gated: a cryptographically version-locked, attorney-approved disclosure · TCPA / do-not-call / per-state gates · a concurrent-cap counter with a daily reconciler.
Independent audits (send only on findings)
- Transcript audit (daily) — pattern matching + an IRB-approved compensation whitelist; flags any dollar amount outside the approved set and any unescalated adverse-event keyword. Critical for suicidality / chest-pain signals.
- Prompt-drift audit (daily) — a normalized hash diff of each live assistant's instructions vs. the canonical version-controlled prompt; emails the owner only on drift.
- Real-dial deploy canary (gated) — proves the real outbound-call path is alive on each deploy; the default smoke test forces dry-run. Catches the regression class that surfaced on an operator's first live dial.
CRIO writer — browser-automation write-back
CRIO has no payments/finance write API, so this drives the CRIO web UI like a bookkeeper to close the status-flip gap (sponsor pays → invoice stays "unpaid" in CRIO). Two paths in one tree:
| Path | Engine / model | Writes? | Status |
|---|---|---|---|
| (a) Cloud Run scraper | Browser automation · Claude (Sonnet class) | OBSERVE-only (writing refuses by design) | dark — datacenter-IP CAPTCHA |
| (b) Live writer | Anthropic Computer Use · Claude (Haiku class, the model that supports the computer-use tool) | WRITES to CRIO | live — runs as local background agents on a residential-IP Mac |
Recipes: payment + reconciliation, reconcile-only, and invoice-create — all LIVE behind the approval queue. Mark-invoice-paid ("Full Payment") is built but test-only — it requires an explicit execute flag, is hard-gated to a single test study, and its enqueuer is deliberately unwired (double-pay risk).
Write-then-verify (mirrors the calendar writer)
- In-recipe vision-confirmation + URL navigation guards on every commit step.
- An out-of-band warehouse read-back flips submitted → verified only on a deterministic remittance-id + amount match — the same phantom-confirmation doctrine as the calendar writer.
Local background agents (on the Mac · invisible to cloud logs)
- A warm-browser daemon that consumes the approved-write queue one row at a time.
- A no-browser read-back poller (every few minutes) that flips verification status.
- A daily session / navigation smoke watchdog that performs no writes.
Kill switches
A soft kill-file · an auto-tripping circuit breaker · a short canary hold + STOP-via-SMS · schema validation refuses any row not explicitly approved · dry-by-default (mutation requires an explicit apply flag) · and a hard "unload the agents" stop.
Remote ops
The Mac sits on a private identity-based mesh network: SSH over the mesh (never exposed to the public internet) lets us restart the writer, the read-back poller, or the inbound tunnel when they die — no physical trip to the box. Deliberately on a separate path from the tunnel the agents use: the way in must not depend on the thing that breaks.
Book & verify saga sequence
The CRIO calendar write chokepoint. Any direct CRIO calendar write outside this helper is lint-blocked in CI. Doctrine origin: a 2026-05-01 incident (149 patient appointments stored four hours late since the prior autumn).
- Caller (self-booking / bulk-action / agent) passes an intent:
- patient, study, visit, site, start, end, and coordinator
validate(intent) — physical & business invariants
- Equipment study pinned to its dedicated site
- end > start (no zero-duration placeholders)
- start is in the future
- no same-site overlap (±15 min)
- no cross-site appointment (±90 min travel)
- fail closed if any check errors
- Checks live CRIO (not the replicated copy)
- Refuses if the replica says enrolled but CRIO disagrees
- Create the new appointment
- If create succeeds → cancel the old one
- If create fails → don't touch the old (safe retry)
- If the post-create cancel fails → audit it; the physical-impossibility audit catches the duplicate
- GET the appointment back from CRIO
- Assert wall-clock, duration, coordinator and status all match intent
- Any mismatch → returns
ok:falsewith the failures
- Write a replay-able Firestore audit row
- Returns a verified-success flag, the appointment id, an audit id, and any assertion failures
ok === true, never on HTTP 200.
The 8-stack hardening that surrounds it
- Real-CRIO smoke test on every deploy — exercises a designated test patient and fails the deploy on any non-success or read-back mismatch. No skip flag by design.
- Preflight subject-on-study check — against live CRIO (~200 ms overhead). Catches replication lag.
- Atomic create-first saga — earlier the order was cancel-first; a create-fail then left the patient with no appointment.
- Eastern-naive datetime contract — a single formatting chokepoint; tests block UTC drift.
- Phantom-confirmation gating — every consumer reads the verified-success flag, never "we tried."
- Bulk-action canary → apply gate + fail-fast — 3 of 5 same-error aborts the loop.
- Lead-status reconciliation alerts on dangerous demotions — phantom bookings can't hide.
- CRIO write monitoring — a failure-rate sweep over the audit trail (deferred · tracked as a coverage gap).
Three morning defense-in-depth audits
Run by the daily data-health scan (early AM): CRIO time drift (booking notes vs. stored start), physical impossibility (overlap / cross-site <90 min), and zero-duration upcoming visits. Independent of the chokepoint, so they catch direct-in-CRIO-UI mistakes too.
Newer validation layers (2026-06)
| Guard | Where | Catches |
|---|---|---|
| Single-equipment capacity guard | Write-time + a cross-patient data-health audit | CRIO validates per-patient + per-coordinator overlap but NOT per shared equipment. A single-machine study (one scanner at one site) could be double-booked across two coordinators. The guard pins those studies and clamps to office hours, fail-closed. |
| Cross-site duplicate-enrollment guard in review | Write-time + a data-health audit | Same patient enrolled / booked at both sites for the same protocol (a patient complaint, 2026-06-16). A Layer-1 audit flags existing cross-site duplicates; a Layer-2 write-time guard refuses the second-site booking, scoped to the active-status study set. Not yet merged / deployed. |
| Recruiter-vs-coordinator routing | Write-time + the staff-resolution helper | Phone pre-screens booked on a coordinator calendar instead of a recruiter — refuses the write. |
Generalization: the mutate-and-verify chokepoint extracts the validate → write → read-back → assert → audit contract so any phantom-confirming external mutation (task board, stipend provider, e-sign provider, second CRIO surface) reports success only when read-back confirms intent. New write sites use it now; the ~36 legacy task-board / stipend-provider sites migrate in phases.
eSource QC & screen-fail intelligence
eSource QC engine
A 15-module subsystem (engine, chokepoint, LLM narrator, rules, registry, wizard, digest, a tracker ingest/classify loop, plus five proposers: consent, eligibility, IP-canon, and required-docs). The engine runs on a business-day morning cadence, an all-day ingest + classify loop feeds it, and a digest emails ops@ each morning. The LLM is a narrator: schema-validated output cites subject + visit + evidence row, with severity capped by a ground-truth tier (1–4). Auto-pause on a 3σ findings spike (an append-only run log, 21 CFR Part 11). A deterministic pre-classifier runs before the LLM.
Screen-fail intelligence
Classifier precedence: a deterministic pre-classifier (mirrors 18 months of real coordinator cancel-reason vocabulary) → Claude over the visit notes → a locked 10-value reason enum (drift blocked by a CI lint rule). The inbound webhook is primary; a scheduled sweep is the dropped-webhook backstop. About ten capabilities: manual-tag · reclassify · classifier-preview · protocol-level recurrence patterns · context · outcome detection (a rematch learning loop) · cross-study indication patterns · weekly digest. Surfaces sponsor-level signals (inclusion criteria too tight) that per-event triage misses.
Security, compliance & PHI perimeter
- IT-security monitor — ~14 monitoring sections · an account-suspension check runs every 15 min (auto-suspends risky Workspace accounts) · an owner-only suppress-empty digest each morning + a weekly heartbeat.
- PHI perimeter — a sensitive/PHI feed allowlist contract (403 to anonymous callers) · enforced by a CI lint rule + contract tests · every dashboard fetch is wrapped by the IAP session guard.
- Tracing fail-closed — the agent-tracing integration refuses to enable if pointed at a public, out-of-perimeter endpoint (traces can carry PHI; they must stay in-perimeter). Currently dark after a 2026-05-22 leak.
- Change-controlled sections — 4-layer enforcement: code-owner review + pre-commit + CI re-check + branch protection. Author-domain gate (
@phillyresearch.comonly, SSH-signed commits). - Supply-chain / IAM — secret scanning · static security scan · IaC static analysis · SBOM + a nightly security sweep · a weekly IAM-drift audit (project + service accounts + critical secrets + dataset ACL vs. an allowlist).
Provider portal
The fourth production surface: a token-gated provider portal at joinresearchstudies.com/refer for direct-from-clinic ordering. Referring physicians place orders, get a fax-back referral, and see a study list with eligibility criteria. It also renders live PI rheumatology availability — it reads a dedicated "PI schedule" calendar, subtracts booked visits, and honors out-of-office blackouts. PHI-free by construction (only dates + open windows; never patient / coordinator / subject identifiers). A reviewer view renders a no-code page (an operator eyeballs it, edits the calendar, refreshes) — to change what providers see, edit the calendar event times; no code change.
Build & deploy pipeline
Source surfaces: the 26 front-end modules + the design-system sources + a small inline late-bind block (and inline styles) in the app shell, delimited by build markers.
Build: a build script concatenates the modules + design system into a single hashed bundle, transforms the app shell, and extracts a stylesheet.
11 CI/CD workflows (auto-deploy on push to main, ~11 min):
| Workflow | Trigger | Purpose |
|---|---|---|
| Dashboard deploy | push main (front-end) | Build + deploy the dashboard to the Cloud Run operations service |
| Cloud-function deploy | push main (core cloud-function source) | Compile + tests + smoke canaries + deploy the chokepoint Cloud Function |
| Landing-pages deploy | push main (landing pages) | Deploy joinresearchstudies.com |
| CI | PR | Lint + unit + invariant coverage + render-smoke |
| Change-control re-check | PR | Re-runs the change-controlled-sections check server-side |
| Author-domain | PR | Re-verifies @phillyresearch.com on every commit |
| Performance gate | PR | Lighthouse performance-score gate |
| IaC scan | PR | Infrastructure-as-code static analysis |
| Security scan | PR | Static security scan |
| SBOM | scheduled | SBOM generation + dependency audit |
| Nightly security sweep | nightly | Secret scan + dependency CVE check |
Pre-commit pipeline
- Step 0 — author-domain gate (must be
@phillyresearch.com) - Step 0b — rule-drift advisory (prose rules ↔ coverage rows)
- Step 0c — orphan-stash warning (other-agent work-in-progress stashes)
- Step 1 — change-controlled-sections check
- Step 2 — invariant-coverage check (coverage rows pointing at non-existent backstops fail)
- Step 3 — the custom lint suite (56 rules)
- Step 3b — an AST-based no-undef check across the front end + inline block
- Step 4 — the unit-test suite (1,559 tests)
- Step 5b — render-smoke (executes a key render path against fixtures in a headless DOM)
Change-controlled files (code-owner review + pre-commit + CI re-check): the dashboard server entrypoint · the core cloud-function source · git hooks · the code-owners file · all CI workflows · the change-control checker · the custom lint suite · the agent instructions file · the architecture + ground-truth docs. Inside the app shell, a handful of region tags mark change-controlled admin, finance, and RBAC blocks.
Lint rules & test backstops
56 active rules in the custom lint suite · 1,559 passing unit tests · 39 invariant rows (34 enforced) in the coverage checker · 5 cloud-function contract tests · a headless-DOM render-smoke. Selected high-leverage rules, described by what they catch:
| Rule (what it guards) | What it catches |
|---|---|
| Safe date parsing | Bare date-string parsing that drifts to UTC (off-by-one in Eastern time) |
| No hardcoded year | Hardcoded calendar years in code or warehouse SQL |
| No known secret patterns | Ten secret-shaped regexes (cloud, source-host, payment, e-sign, key material, etc.) |
| PHI feed requires auth | New warehouse feeds touching PHI columns without a sensitive-feed allowlist entry |
| Schedule rebuild/inject pairing | Any schedule-table rebuild without its paired confirm + rideshare button injection |
| No inline script | Inline scripts or inline event handlers in standalone HTML (CSP) |
| Equipment-study site pairing | The single-equipment study co-occurring with the other site's key |
| Bulk action needs guards | N-row loops + mutating calls without the bulk-action guard |
| No direct CRIO calendar write | Direct CRIO calendar update/cancel outside the book-and-verify chokepoint |
| Comms via the queue only | Outbound comms bypassing the comms approval queue |
| Reason must be enum | Free-string cancel reasons outside the locked 10-value enum |
| Timezone-explicit datetimes | Bare current-date/current-datetime SQL (defaults to UTC) |
| No window.open after await | Opening a window after an await (silently popup-blocked) |
| RBAC: no section gating | Role reads outside the central RBAC module without an explicit allow tag |
| External links use query, not fragment | Email/SMS links using a URL fragment instead of a query param (IAP drops fragments) |
| Prose rule needs coverage | A new NEVER/ALWAYS prose rule without a paired coverage row |
| Pre-screen routing via chokepoint | Direct site→coordinator lookup bypassing the staff-resolution chokepoint |
| Recruitment list via chokepoint | A recruitment task-board task not routed through the recruiter-routing chokepoint |
| Pre-screen dedup via chokepoint | A new insert into the prescreening record outside a sanctioned writer |
| Extracted action needs handler | A new inline action without a paired dedicated handler module |
| No bank-variance to staff | A money-at-risk / bank-variance alert routed to a shared staff inbox |
The table above is a curated subset — 56 rules total. Many enforce the doctrine chokepoints (each NEVER/ALWAYS prose rule is paired with a backstop and a coverage row — 405 rows).
Post-deploy smoke canaries — 14 (run in the cloud-function deploy and able to fail it): book-and-verify · email · AI caller · real AI dial · SMS · fax · task board · stipends · ad-platform · ad-platform conversions · rides · intake forms · voicemail triage · screen-fail (+ fax routing).
BigQuery join model
A small set of primary keys carries most analytical queries — roughly: study, patient, subject (one patient-on-one-study), subject-visit, calendar-appointment, and user. The accounting/revenue tables are the exception: they use string UUIDs and join to the clinical side through the study's external id rather than its integer key (a common gotcha). A handful of additional keys connect the document, web-form, and invoice-line chains.
| Join key | Connects | Notes |
|---|---|---|
| Study key | study ↔ subject ↔ subject-visit ↔ appointment ↔ patient-interaction ↔ subject-document ↔ study-user ↔ subject fact tables | Integer in the CRIO replica, string in the CRM data; a helper resolves the right one. |
| Patient key | patient ↔ subject ↔ patient-interaction ↔ subject-payment ↔ prescreening record | A deterministic composite key (name + study + date) is used for dedup when the key is absent. |
| Subject key | subject ↔ subject-visit ↔ appointment ↔ subject-document ↔ subject-payment | One (patient, study) pair. |
| Subject-visit key | subject-visit ↔ visit stats ↔ procedure-question facts ↔ appointment audit | Drives eSource-first coordinator/PI attribution (12-month rolling). |
| Appointment key | appointment ↔ appointment audit ↔ user-appointment | Always filter out deleted "ghost" appointments. |
| User key | user ↔ study-user ↔ user-appointment ↔ patient-interaction | Filter on study-user role (PI vs. coordinator). |
| Study external id (UUID) | study ↔ the invoice / payment / revenue-group tables of the revenue schema | Join on the external id, NOT a cast of the integer study key. A common accounting trap. |
| Invoice-line id | the invoice-line ↔ revenue-group ↔ revenue-data-point chain of the revenue schema | Pass-through actuals key off the invoice-line name, not the line detail field. |
| Document-type key | subject-document ↔ document-type | Drives medical-records staging (filtered to medical-record document types + valid statuses). |
| Web-form key | web-form ↔ web-form-submission | The web-form funnel feed. |
Universal filter gotchas
- CDC soft-deletes: always exclude soft-deleted rows on the CRIO replica.
- Cancelled appointments: always exclude the deleted-status sentinel (it means DELETED, not cancelled — otherwise it adds dozens of phantom rows).
- Publisher / template studies: dropped at source by a study-fetch helper.
- Sponsor coordinators: a normalization helper silently drops a known set of sponsor-side coordinator names.
- Timezone: always use Eastern-explicit current-date SQL, never the bare (UTC) form; enforced by a lint rule.
Revenue & receivables platform (4 layers)
The finance system is structured as the finance lead's 4-layer revenue & receivables platform: catch bad data before it costs money, validate that booked revenue is real, attribute cost to study, then collect what's owed. The dollar-journey FMEA sits underneath all four as the reconciliation spine.
| Layer | Concern | Status |
|---|---|---|
| L1 · Data health / invoice integrity | Cross-system audits before money is computed — invoice integrity, CRIO time-drift, corrupted-phone, lead-dedup drift. (A cross-site duplicate-enrollment audit is in review, not yet shipped.) | live |
| L2 · Revenue Validation Engine | Is booked revenue real and correctly based? The basis is study-dependent (base-fee is more common; overhead-inclusion varies by contract). Catches a class of false positives via per-visit exact matching. | design merged · analysis done, build pending |
| L3 · Per-study margin / cost ops | Revenue − allocated DIRECT cost, per study / PI / site (the margin engine, below). | live |
| L4 · Collections Automation Engine | A dunning ladder over sponsor AR, with a reconciliation gate (genuinely-owed AR across the open invoices) before any chase. A given sponsor's past-due AR can span multiple studies, billed via the sponsor's billing vendor. | built + tested (not yet merged) · SEND HARD-GATED OFF |
L4 is intentionally dark: outbound send stays disabled until the finance lead signs off and the sponsor reconciliation closes. Sponsor AR lives in the CRIO revenue schema — the accounting system's A/R (a small number of customers) is NOT sponsor AR; the collections system of record is CRIO + bank reconciliation, never the accounting system.
Margin engine (L3)
Two metrics per study: contribution margin (revenue − allocated DIRECT, overhead excluded) and fully-loaded margin (− a per-study overhead share on a visit-share basis). DIRECT cost lands via a 5-tier allocation cascade:
- PI fees by eSource attribution (named-PI vendors)
- A configurable vendor-to-study allocation map (or flagged as overhead)
- By study
- Activity-driven (allocate by active subjects)
- Uniform-per-visit fallback
Aggregate totals are summed FROM the per-row fields (rows == totals by construction); an unallocated-residual figure surfaces DIRECT dollars that couldn't be landed. PI-level + site-level profitability rollups. Operating expenses drop non-operating accounts; fully-loaded margin loads corporate overhead. Drives the per-study margin section on the finance dashboard. Doctrine: the GL class is a starting scope, not truth; reconcile totals from rows.
Dollar-journey FMEA spine
An end-to-end reconciliation effort (codename origin 2026-04-27) modeled as a 53-mode FMEA. Phases: CRIO revenue extraction (the revenue-data-point chain) · pass-through actuals keyed off the invoice-line name · sponsor cadence + invoice aging · a bank-statement variance watcher · a daily digest over the dollar-journey view + materialized view.
The CRIO mark-paid writer (see CRIO writer) is the L4↔CRIO bridge — it closes invoices in CRIO once the deposit is reconciled, fixing the status-flip gap that left paid invoices reading "unpaid."
Doctrine: the bank deposit is the ONLY trusted financial event. CRIO has no financial-write API beyond the browser writer; observe, never push speculative state. The real-dollar axis is truth; the CRIO axis is orthogonal hygiene — never sum the two.
Reconciliation jobs: weekly stipend reconciliation · the bank-variance watcher · a bank-statement freshness check · bank-statement + remittance polling · and a weekly sponsor chase.
Portal scraper (sponsor portal observations)
A headless Cloud Run scraper that observes a sponsor portal for protocol updates, query responses, and document expirations. It writes additively to a scraper-runs audit table via an open/close merge chokepoint.
Cost-telemetry chokepoint
Every warehouse read in the scraper's query layer routes through one accumulator that also reads each query's bytes-processed metric. The per-run sum is reset at run start and flushed at run end into the audit row's cost column. This mirrors the cloud-function-side read engine's cost sink so cost telemetry lands end-to-end across both services. Without it, the column is always null and any future cost-regression alert has nothing to work with.
Audit-table contract
- Scraper-runs rows carry an open/closed state — run start merges an open row; run end closes it.
- A stale-run alert fires through the alert router when a run stays open past its expected window.
- A watcher endpoint + a Resources-tab health card are open follow-ups (tracked in the scraper runbook).
Auth: the portal vendor's bookmark-redirect URLs were unmasked in a one-shot step (2026-04-13). A cookie-based session is held in a secret. It is re-run ad hoc when bookmarks change — intentionally NOT on a schedule.
Quick reference index
- RBAC tab matrix — 8 roles × 10 tabs
- Frontend modules — 26 source modules + design system
- Scheduled automation — ~91 jobs (85 enabled), grouped by capability
- Agent system — 6 active agents + phase roadmap
- AI voice telephony — Telnyx-native · enrollment/Lp(a) pre-screen · caps + consent + audits
- CRIO writer — browser-automation write-back · payments/invoices/mark-paid
- Book & verify saga — 4-step sequence + 8-stack hardening + capacity/cross-site guards
- eSource QC & screen-fail intelligence — 15-module QC engine + classifier + rematch loop
- Security, compliance & PHI perimeter — auto-suspend + PHI contract + tracing fail-closed
- Provider portal — direct ordering + live rheum availability
- Build & deploy pipeline — 11 workflows + pre-commit + change-controlled files
- Lint rules & test backstops — 56 rules + 1,559 tests + 14 smoke canaries
- BigQuery join model — primary keys + universal filter gotchas
- Revenue & receivables platform — 4 layers · margin engine · dollar-journey FMEA
- Portal scraper — sponsor-portal observer · audit table · cost-telemetry chokepoint