Most mental health apps wait until you're already in crisis before they respond. We think that's a design failure — not just a technical gap. Here's what we built instead.
I. The Problem with Waiting
There is a design pattern common across mental health technology that I want to name directly, because naming it is the first step toward changing it:
Most apps only act when you type the words.
Someone has to write "I want to hurt myself" before the system notices. Someone has to explicitly express despair before a resource surfaces. The keyword gets detected. The response triggers. The crisis line number appears.
This approach has a name in safety engineering: reactive detection. And while it's better than nothing, it's structurally backwards. It treats the crisis as the signal — when, in reality, the crisis is the outcome. The signal was everything that came before it.
Think about what a smoke alarm does. It detects smoke — the byproduct of a fire already in progress. That's a reactive detector. Now think about what a predictive fire system does: it monitors temperature gradients, detects accelerant vapor concentrations, flags unusual heat signatures — before a single flame appears. The intervention happens at the cause, not the consequence.
A crisis keyword detector is a smoke alarm. We wanted to build the temperature sensor.
ArcMirror is a self-reflection tool, not a clinical service. We are not therapists. We are not a crisis line. We say that clearly — in onboarding, in our user agreement, in every session. But that clarity of scope doesn't reduce our responsibility to build the safest possible container for the deeply personal work our users do inside it.
We built our predictive safety architecture before we had a single incident that demanded it. Not because we were required to. Because it was the right foundation.
Here's what we built, how it works, and why we made the decisions we made.
II. The Four Tiers of Protection
ArcMirror's safety system operates on four distinct layers. They're not redundant — each tier does something the others cannot. Together they form a system that can catch what no single approach catches alone.
74 crisis patterns scanned across every message — both user input and AI output — on every token, every session. Triggers immediate interventions including resource surfacing and session modification when severity thresholds are crossed.
Voice auto-interruption on high-severity detections — a 5-second forced pause before the session can continue. AI response filtering that prevents the companion from generating potentially harmful content, even if generation has already started. The system scans AI output, not just user input.
A composite score built from 9 behavioral signals tracked across 30 days. No single signal triggers concern — it's the pattern across signals over time. A user who journals late at night while gravitating toward shadow themes and showing declining mood trends gets flagged long before any crisis keyword appears.
Users who go silent after a concerning session don't just disappear from view. The system flags their absence and queues a gentle proactive check-in. Silence after distress is itself a signal — and most existing approaches have no mechanism to respond to it.
III. The 9 Signals
The predictive tier is where the real architectural work lives. Rather than reacting to what a user says, it tracks the shape of their engagement over time. Nine distinct signals feed a composite risk score. Here's each one in plain language:
Tracks the trajectory of risk-level language across sessions — not just whether it appears, but whether it's moving from low to medium to high over time. Escalation velocity matters as much as the current state.
Tracks the direction of emotional sentiment across sessions. A single sad session isn't a signal. Three weeks of slow, consistent mood decline — especially without recovery days — is a very different pattern.
Sudden spikes (desperate searching) and sudden drops (withdrawal) are both signals. Neither is inherently alarming alone. Both become meaningful in the context of the other eight signals.
The emotional valence of written journal entries over time. Journaling tends to be more raw and less filtered than conversation. Declining sentiment in journals often precedes vocal or conversational expression of distress.
3 AM usage is not inherently concerning. A shift from normal usage patterns to heavy late-night usage — especially combined with other signals — correlates with sleep disruption and rumination, which are known precursors to escalating distress.
Each archetype has a shadow expression — the dark, unintegrated side of its psychological pattern. A user who exclusively engages with shadow content across multiple archetypes, avoiding light or integration themes, is showing a pattern that deserves attention.
Both very long and very short session durations relative to a user's baseline can be signals. The former may indicate obsessive rumination. The latter may indicate avoidance — opening the app and immediately leaving, unable to engage.
A user who was highly engaged — exploring multiple archetypes, responding thoughtfully — and then progressively disengages is showing a behavioral pattern consistent with withdrawal. The content of what they say matters less than the fact that they're saying less.
This is the signal most existing approaches completely miss. A user has a session flagged for concerning content — and then goes silent. No sessions for 5+ days. The silence after distress is often the most dangerous period. We track it explicitly.
IV. What Makes This Architecture Different
The architectural decision most mental health apps make — for entirely understandable reasons — is to invest heavily in Tier 1 and call it done. Build a comprehensive keyword list. Route to resources when keywords appear. Document the system. Ship it.
That gets you to "responsible." It doesn't get you to "predictive."
Here is what is structurally different about our approach:
We Scan Our Own Output
One of the design decisions I'm most proud of — and that I see overlooked almost everywhere — is that our safety system scans both sides of the conversation. Not just what users say. What our AI says back to them.
If one of our archetype companions generates a response that contains, even accidentally, language that could be harmful to a vulnerable user — the safety layer catches it before delivery. The response gets filtered or replaced with a safe variant.
This matters enormously. An AI companion talking to someone in distress is not a passive tool — it is an active participant in their psychological state. If we're going to build something that sits with people in difficult moments, we have an obligation to ensure that our AI cannot amplify that difficulty, even inadvertently.
30 Days of Context, Not 30 Seconds
The composite scoring system looks back 30 days. Not at the current session. At the shape of how a user has been engaging over the past month.
This is the difference between reading a single page and reading an entire chapter. A single concerning session might be a hard day. A consistent pattern across 30 days of behavioral data is a different kind of signal entirely.
The scoring weights recent data more heavily — a concerning session last week matters more than one three weeks ago — but the full 30-day window allows the system to detect the kind of slow drift that resolves before any single session looks alarming in isolation.
Weighted Composite Scoring
No single signal triggers a risk tier change. The system uses weighted composite scoring where each of the 9 signals contributes to a total score, with weights calibrated based on correlation with known risk factors. Crisis escalation pattern and post-distress disappearance carry higher weights. Session duration anomalies and late-night usage carry lower weights. The composite allows the system to respond to patterns rather than isolated data points.
Voice Gets a Hard Interrupt
Voice sessions introduce a challenge that text journaling doesn't face: real-time content is harder to cleanly intercept. You can't just "not show" a response — the audio might already be playing.
Our solution: on high-severity detection during a voice session, the system triggers a mandatory 5-second pause. The companion stops. A gentle interrupt plays. The user has to actively choose to continue. This is not a soft suggestion. It is a hard interrupt built into the voice pipeline itself.
Why 5 seconds? It's enough time to break a runaway spiral without being disruptive enough to feel punitive. It creates a moment — a breath — where someone who is deteriorating has an opportunity to step back before continuing.
V. The Ethics of This System
Building a predictive behavioral analysis system for emotional wellness raises legitimate ethical questions. I want to address them directly.
We Are Not Making Diagnoses
Risk scores in ArcMirror do not diagnose anything. They do not classify a user as suicidal, depressed, or mentally ill. They inform app behavior — how the companion responds, which features remain accessible, how aggressively resources are surfaced. Risk scores are operational inputs, not clinical assessments.
This distinction is not semantic. It has real design consequences. A clinician seeing a risk score from ArcMirror is seeing a behavioral trend — "this user's engagement patterns over the last 30 days suggest declining wellbeing" — not a psychiatric label.
Zero-PII Architecture
The risk scoring system operates entirely without storing identifiable message content. Behavioral signals are computed from metadata — timestamps, session lengths, engagement patterns, sentiment scores — not from the actual text of what a user wrote.
Risk scores never contain message content. A score of 0.73 in a given week tells the system the user needs a gentler companion — it does not store the words that generated that score. This was a deliberate architectural decision made at the beginning, not a retrofit.
Clinician Access is Consent-Gated
For our B2B integrations — where a therapist might prescribe ArcMirror as a homework tool — clinicians can see risk trend timelines for their patients. The key phrase: "their patients." Clinician access to any user's data requires:
- Explicit user opt-in during setup
- Clear disclosure of what the clinician can see
- User ability to revoke access at any time
- Clinicians see trends only — never message content, ever
We are a reflection tool. The insights belong to the user first. Any extension of those insights to a clinical context happens only through explicit user consent.
988 Is Always One Tap Away
Regardless of risk tier, regardless of the session, the 988 Suicide and Crisis Lifeline is accessible from within the app at all times. Not in a settings menu. Not buried in a help article. One tap away, from any screen.
We didn't build this because it was required. We built it because if someone using our app is in genuine crisis, they deserve an immediate path to real human support — and it is not our place to be the last step in that path.
We Built This Before We Had To
We have not had a crisis incident on ArcMirror. We have not received a regulatory notice. No one forced our hand.
We built this because we knew, from the first day of design, that we were building something that would sit in intimate proximity to vulnerable people at vulnerable moments. That the Jungian shadow work we facilitate can, for some users, bring genuinely dark material to the surface. That an AI companion — however thoughtfully designed — carries the potential for harm if it engages with that dark material without a safety container around it.
The industry pattern is to build safety systems after an incident proves they were necessary. I think that pattern is backwards. You don't design a car without airbags and then add them after the first serious crash. You design the airbag as part of the original spec.
This system is our airbag.
VI. Technical Architecture for the Builders
For the engineers reading this, here's a more precise look at the scoring implementation:
// Composite Risk Score — simplified weight table compositeScore = ( crisisEscalation × 0.25 // highest weight — explicit trajectory + moodTrajectory × 0.20 // sustained decline most predictive + postDistressAbsence × 0.18 // silence after distress = critical + shadowGravitation × 0.12 + engagementDecline × 0.10 + journalSentiment × 0.07 + frequencyChange × 0.04 + lateNightUsage × 0.02 + durationAnomaly × 0.02 ) // Risk Tier Thresholds NORMAL = score < 0.25 ELEVATED = score < 0.50 CONCERNING = score < 0.75 CRITICAL = score ≥ 0.75 // Scoring window: 30 days, exponential recency decay // Sessions from past 7 days weighted ~3x vs sessions 21–30 days ago // Tier escalation: exceeds threshold on 2 consecutive cycles // Tier de-escalation: below threshold for 3 consecutive cycles
A few implementation details worth noting:
- Exponential recency weighting. Events from yesterday contribute roughly 3x as much to the composite as events from 28 days ago. This prevents a bad month from unfairly affecting a user who has genuinely stabilized.
- Tier change latency. Moving from Concerning to Critical requires exceeding the threshold on two consecutive scoring cycles, not just one. This prevents single-session spikes from triggering the most intensive interventions. Downward tier movement (de-escalation) requires three consecutive cycles below threshold.
- AI output scanning is synchronous. It runs before the response is delivered, not after. This is not a logging system that catches harm retrospectively — it is a filter that prevents delivery.
- Voice interrupt is pipeline-level. The 5-second pause is enforced at the audio pipeline layer, not the application layer. It cannot be bypassed by app-level crashes or slow network conditions.
- Zero message content in signal computation. Every signal is derived from behavioral metadata, not stored message text. Sentiment scores are computed and immediately discarded — the text is never persisted for scoring purposes.
The architecture draws inspiration from predictive risk approaches developed in veteran healthcare programs, which discovered through hard experience that behavioral pattern analysis over time is dramatically more predictive than acute-moment screening alone. We adapted those principles for a consumer self-reflection context.
VII. What We Haven't Built Yet
This is the Build in Public series. That means honesty about what's done and what isn't.
The system described above is architecturally live on the web app — all four tiers are in production. But there are three capabilities still in active development:
Proactive Check-In System
Tier 4 — proactive outreach for absent users — is designed but not yet fully deployed. The detection logic is running (Signal 9 is active). The outreach mechanism itself — a gentle, opt-in notification for users who consent to it — is in active development. We expect to ship it in the next release cycle.
This matters more to me personally than almost anything else we're building. The moment after someone discloses something vulnerable and then goes quiet is exactly when a system should reach out — not passively wait for them to return.
Clinician Risk Timeline Visualization
For B2B integrations, we're building a timeline view that lets consenting clinicians see a 30-day risk trend for their patients — visualized as a graph, not just a number. A clinician seeing a risk score move from 0.2 → 0.6 → 0.4 → 0.7 over four weeks has dramatically more information than one seeing only today's score of 0.7. This is currently in design phase; the data layer is built.
Longitudinal Pattern Detection
The current 30-day window is good. We believe 90-day and 12-month pattern detection will be dramatically better — able to catch seasonal patterns, anniversary reactions, and chronic low-level distress that stays below acute thresholds but represents meaningful long-term risk. The data is accumulating. The modeling work is on the roadmap.
VIII. Why We're Publishing This
Some founders would not publish this level of detail about their safety architecture. The concern: competitors could copy it. Adversarial users might learn to game it. It draws attention to the existence of failure modes.
I think that concern is wrong, and here's why:
The mental health technology space has a safety gap. Not because the founders are bad people — they're not. But because safety systems are expensive to build, hard to validate, and easy to delay in favor of features that drive acquisition metrics. The economic incentives push against safety investment.
If publishing our architecture in detail encourages even one other team to build something similar — or inspires them to go further — then any competitive disadvantage we incur from transparency is worth it. The goal is not to be the only company with a predictive safety system. The goal is for predictive safety systems to become the baseline expectation in this space.
We're not asking for credit. We're making an argument for a higher standard, and we're doing it by showing our work.
We built this before we had to. We're publishing it before we had to. We believe that's the only way to earn the trust that this kind of product demands.
If you're building in this space and want to compare notes on safety architecture, reach out: hello@arcmirror.app. The technical challenges are real, and this is a problem that benefits from more minds working on it.
And if you're a user of ArcMirror — thank you for trusting us with work that matters. We take that trust seriously. This post is the evidence that we mean it.
— Mackenwo Dorval, Founder of ArcMirror · March 2026