The Trillion-Dollar “Why”
Building Context Graphs for Accountable AI
A hands-on exploration of Foundation Capital’s thesis- with working code, cloud infrastructure, and hard-won lessons from implementation
“What happened?” is table stakes. The trillion-dollar opportunity is capturing why it happened and making that reasoning searchable, auditable, and learnable.
TL;DR
Introduction: The Convergence That Matters
On December 22, 2025, Foundation Capital published what may be the most important piece of enterprise AI thinking in years: “Context Graphs: AI’s Trillion-Dollar Opportunity”. Authors Jaya Gupta and Ashu Garg argue that the next trillion-dollar platforms won’t be built by adding AI to existing systems of record - they’ll be built by capturing something those systems never stored: the reasoning behind decisions.
The thesis is deceptively simple: our systems capture what happened, but not why it was allowed to happen. That missing “why” is the bottleneck for enterprise AI adoption. It’s why agents hallucinate, ignore internal policy, fail compliance audits, and keep rediscovering the same exceptions that humans resolved years ago.
Shortly after, HubSpot CTO Dharmesh Shah ( dharmesh ) weighed in with a characteristically pragmatic take: the idea is “elegant, intellectually compelling,” but most companies are still struggling with basic data unification. “Asking companies to capture decision traces when they are still bringing their data efforts in order,” Shah wrote, “is sort of like asking someone to install a three-car garage when they don’t own a single car.”
Both perspectives are correct. Foundation Capital is right about where enterprise software is heading. Shah is right about where most companies actually are.
This code tries to bridge that gap. It’s not a production system - it’s a laboratory for exploring what context graphs actually look like in code. Inspired by Shah’s recent Linkedin posts on coding and how he codes relentlessly nowadays, this weekend I have built a working implementation using Aiven’s managed infrastructure stack (Postgres, OpenSearch, Redis, Kafka, ClickHouse) so we can see how guardrails, decision traces, precedent search, and observability actually stitch together. My approach to learning is by doing and since last one year I have immersed myself into completing many coding / vibe-coding assignments. This is my attempt to study foundation capital’s approach and what will be the challenges in having a bolt-on approach that can be taken by incumbents.
Consider this your playground for the trillion-dollar opportunity’s terrain.
Why Context Graphs Matter (The Theory)
The Problem No System Solves Today
Every enterprise runs on decisions. A support lead decides whether to escalate a ticket. A sales manager approves a discount outside policy. A procurement officer grants an exception to a vendor requirement. A claims adjuster approves a payment that technically shouldn’t qualify.
These decisions happen constantly. They’re the real work of organizations- the judgment calls that transform data into action.
But where do they live?
Look at any enterprise system - Salesforce, Workday, SAP, ServiceNow - and you’ll find meticulous records of outcomes. The deal closed at a 40% discount. The ticket was escalated. The claim was paid. The system logs what happened.
What it doesn’t capture:
Why the 40% discount was approved despite a 10% policy cap
Which precedent justified the escalation
What context the adjuster saw when they approved the exception
Who approved it, under what authority, at what point in time
That reasoning lives in Slack threads, Zoom calls, email chains, and the ephemeral memory of employees. When the employee leaves, the institutional knowledge walks out the door.
As Foundation Capital puts it:
“When a discount gets approved, the context that justified it isn’t preserved. You can’t replay the state of the world at decision time, which means you can’t audit the decision, learn from it, or use it as precedent.”
My critique is that it doesn’t seem to be as revolutionary as it appears from the article. I think well-structured decision logs and providing context have been standard practice for a while at enterprises. Yes, most people just supply links to threads on Slack and internal wiki. Yes, sometimes people simply don’t provide the context. But I’d argue it happens more often than not. - A proofreader's comment
Why This Matters for AI Agents
The missing “why” was always a problem, but it was a manageable problem when humans made all the decisions. Humans could ask colleagues, dig through email, or simply exercise judgment based on experience.
AI agents have none of these options.
When you deploy an agent to handle support tickets, it inherits access to the CRM, the helpdesk, the billing system - all the records of past outcomes. But it doesn’t see the reasoning that produced those outcomes. It sees that Customer X got a full refund last month, but not why the refund was approved when policy said otherwise.
So the agent faces the same edge case today and has two choices:
Follow the rules rigidly (and upset the customer who got an exception last time)
Make a judgment call with no institutional memory to guide it
Both options are bad. The first creates inconsistency. The second creates unaccountable decisions that nobody can audit, explain, or learn from.
Foundation Capital identifies this as the wall that governance alone can’t solve:
“Agents run into the same ambiguity humans resolve every day with judgment and organizational memory. The wall isn’t missing data. It’s missing decision traces.”
What Is a Context Graph?
Foundation Capital defines it precisely:
“We call the accumulated structure formed by those traces a context graph: not ‘the model’s chain-of-thought,’ but a living record of decision traces stitched across entities and time so precedent becomes searchable.”
Let’s unpack that definition:
“Decision traces” — Not just what was decided, but the full context at decision time: inputs, policies evaluated, precedents consulted, exceptions granted, approvers involved, and reasoning articulated.
“Stitched across entities” — A single decision often touches multiple systems. An escalation decision might involve customer tier from the CRM, SLA terms from billing, recent outages from PagerDuty, and a Slack thread flagging churn risk. The context graph connects these into a coherent picture.
“Stitched across time” — Decisions create precedent. The context graph makes that precedent queryable so future decisions can ask: “Has this situation happened before? What did we do?”
“Precedent becomes searchable” — This is the key insight. The context graph isn’t a passive audit log—it’s an active resource that agents (and humans) can query during decision-making.
Over time, the context graph becomes the real source of truth for autonomy. It explains not just what happened, but why it was allowed to happen. And it compounds: every exception becomes training data, every override becomes a case study, every decision trace makes future decisions faster and more consistent.
The Structural Advantage of New Systems
Here’s where Foundation Capital’s thesis gets controversial: they argue that startups building “systems of agents” have a structural advantage over incumbents.
Why? Because capturing decision traces requires being in the execution path at commit time, not bolting on governance after the fact.
Consider the difference:
Incumbent SaaS: Salesforce sees the deal close. It records the final state. But the negotiation happened in Zoom, the approval happened in Slack, and the exception logic lives in someone’s head.
Data warehouses: Snowflake and Databricks receive data via ETL after decisions are made. By then, the decision context is gone. They’re in the read path, not the write path.
Agent-native systems: A system built around agentic execution sees the full context at decision time. It can capture the reasoning at the moment it’s articulated, not reconstruct it later from fragments.
“Incumbents can make extraction harder, but they can’t insert themselves into an orchestration layer they were never part of.”
This doesn’t mean incumbents are doomed - Shah is right that they can layer context graphs on top of existing systems. But it does mean the architectural choices made now will determine who owns the decision layer later. It is with this intent that I tried to code it exploring if it can be used alongside everyday series.
Anatomy of a Context Graph (The Architecture)
Before diving into code, let’s understand what a context graph actually looks like architecturally.
The Five Essential Layers
Let’s examine each layer.
Layer 1: Entity Graph
The entity graph provides a unified view of business entities across systems. This is table stakes for any data integration- but context graphs require a specific capability: identity resolution.
The problem: Sarah Chen appears as sarah.chen@acme.com in the CRM, +1-555-0142 in the phone system, cust_3847 in Shopify, and user_91823 in Zendesk. These are all the same person, but no single system knows that.
Identity resolution connects these fragments into a unified entity. When an agent needs to make a decision about Sarah, it sees all her interactions across systems - not just the slice visible to whichever system she’s currently in.
-- Unified customer entity with identity resolution
CREATE TABLE customers (
id UUID PRIMARY KEY,
email TEXT,
phone TEXT,
name TEXT,
tier TEXT DEFAULT 'standard',
ltv REAL DEFAULT 0,
-- External system IDs (identity resolution)
shopify_id TEXT,
zendesk_id TEXT,
stripe_id TEXT,
-- Flexible attributes
attributes JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for identity resolution
CREATE INDEX idx_customers_email ON customers(email);
CREATE INDEX idx_customers_phone ON customers(phone);
CREATE INDEX idx_customers_shopify ON customers(shopify_id);
Layer 2: Decision Traces
This is the core innovation of context graphs. Every decision captures:
What was decided: Action taken, outcome, amount involved
Why it was decided: Reasoning text, confidence score, rules applied, precedents consulted
What the agent saw: Full context snapshot at decision time (customer profile, order history, ticket state)
Who decided: Agent version, model used, human approver if applicable
Versioning: Which prompt version, which policy version was in effect
-- Decision traces: THE MOST IMPORTANT TABLE
CREATE TABLE decisions (
id UUID PRIMARY KEY,
-- What entity was this about?
entity_type TEXT, -- 'refund', 'escalation', 'discount'
entity_id UUID,
customer_id UUID REFERENCES customers(id),
-- What was decided?
action TEXT, -- 'approve_refund', 'deny_refund', 'escalate'
outcome TEXT,
amount REAL,
-- WHY (the gold)
reasoning TEXT,
confidence REAL,
rules_applied JSONB, -- ["shipping_delay_policy", "vip_exception"]
precedents_used JSONB, -- ["dec_abc123", "dec_def456"]
-- Context snapshot at decision time
inputs_snapshot JSONB, -- Full state the agent saw
-- Who decided
decided_by TEXT, -- 'agent:refund-v1' or 'human:jane@acme.com'
approved_by TEXT,
-- Versioning
agent_version TEXT,
prompt_version TEXT,
model_used TEXT,
-- Status
status TEXT DEFAULT 'pending', -- pending, approved, rejected, executed
exception_type TEXT, -- If policy exception
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW(),
approved_at TIMESTAMPTZ,
executed_at TIMESTAMPTZ
);
The inputs_snapshot field deserves special attention. This captures exactly what the agent saw at decision time - customer tier, order history, recent tickets, even market conditions if relevant. This enables:
Audit: “What information led to this decision?”
Replay: “Given the same inputs, would the current agent make the same call?”
Debugging: “Why did the agent miss this important context?”
Layer 3: Precedent Search
Decision traces become useful when you can search them. The pattern:
Agent encounters a new situation
Agent queries: “Have we seen similar cases before?”
Precedent search returns decisions with similar reasoning/context
Agent uses precedent to inform its decision
New decision becomes precedent for future agents
This requires semantic search- matching on meaning, not just keywords. A query like “customer upset about late delivery” should match precedents about “shipping delays” and “frustrated customers” even if those exact words weren’t used.
def search_similar_decisions(
self,
query: str,
n_results: int = 5,
entity_type: Optional[str] = None,
outcome: Optional[str] = None,
min_confidence: Optional[float] = None
) -> List[Dict]:
"""
Semantic search for similar past decisions.
Args:
query: Natural language description of current case
n_results: Number of results to return
entity_type: Filter by type (e.g., 'refund', 'escalation')
outcome: Filter by outcome (e.g., 'approved', 'denied')
min_confidence: Filter by minimum confidence score
Returns:
List of similar decisions with similarity scores
"""
# Build metadata filters
where_filter = None
conditions = []
if entity_type:
conditions.append({"entity_type": entity_type})
if outcome:
conditions.append({"outcome": outcome})
if min_confidence:
conditions.append({"confidence": {"$gte": min_confidence}})
if len(conditions) == 1:
where_filter = conditions[0]
elif len(conditions) > 1:
where_filter = {"$and": conditions}
# Query vector store
results = self.decisions_collection.query(
query_texts=[query],
n_results=n_results,
where=where_filter
)
return self._format_results(results)
Layer 4: Guardrails & Human-in-the-Loop
Not all decisions should be automated. Guardrails define the boundaries:
Hard blocks: Some actions are never allowed (delete customer data, bypass security)
Approval thresholds: Amounts above $X require human sign-off
Confidence gates: Low-confidence decisions route to humans
VIP handling: High-value customers get extra scrutiny
class ActionGuardrails:
"""Validate actions before execution"""
@classmethod
def check_action(
cls,
action_name: str,
action_params: Dict,
context: Dict
) -> GuardrailCheck:
"""
Validate an action before execution.
Returns: GuardrailCheck with result (ALLOW, BLOCK, REQUIRE_APPROVAL)
"""
# Hard blocks
if action_name in BLOCKED_ACTIONS:
return GuardrailCheck(
result=GuardrailResult.BLOCK,
reason=f"Action '{action_name}' is not permitted"
)
# Amount thresholds
if action_name == "process_refund":
amount = action_params.get("amount", 0)
if amount > MAX_REFUND_AMOUNT:
return GuardrailCheck(
result=GuardrailResult.BLOCK,
reason=f"Refund ${amount} exceeds maximum ${MAX_REFUND_AMOUNT}"
)
if amount > AUTO_APPROVE_THRESHOLD:
return GuardrailCheck(
result=GuardrailResult.REQUIRE_APPROVAL,
reason=f"Refund ${amount} exceeds auto-approve threshold",
approval_required_from="manager"
)
# Confidence thresholds
confidence = context.get("confidence", 1.0)
if confidence < CONFIDENCE_THRESHOLD_FOR_HITL:
return GuardrailCheck(
result=GuardrailResult.REQUIRE_APPROVAL,
reason=f"Confidence {confidence:.2f} below threshold"
)
return GuardrailCheck(
result=GuardrailResult.ALLOW,
reason="Action passed all checks"
)
Layer 5: Observability
Every decision generates telemetry:
Audit logs: Immutable record of what happened (Kafka). Note that this is event/message queue and not the store itself.
Metrics: Latency, confidence distributions, approval rates (ClickHouse)
Alerts: SLA violations, unusual patterns, compliance triggers
This isn’t just about debugging - it’s about organizational learning. Dashboards surface:
Which types of decisions have the lowest confidence?
Where do humans override agents most frequently?
Which precedents get cited most often?
How has decision quality changed over time?
The Implementation (Aiven Stack)
Theory is nice. Code is better. Let’s look at how these layers map to real infrastructure.
Why Aiven?
Aiven provides managed versions of the open-source databases we need. This matters because:
No DIY ops: Spinning up Postgres, OpenSearch, Kafka, Redis, and ClickHouse yourself is work in itself. Aiven does it in minutes.
Production-grade: Connection pooling, backups, monitoring, security - all handled.
Focus on the demo: We’re here to explore context graphs, not debug Kafka broker configurations.
This is not a sponsored post but my startup is a part of Aiven Startup Program which gave us startup credits, which let us experiment with these platforms with ease.
Service Mapping
Context Graph Layer
Service: Aiven
Why this choice: Managed, reliable infrastructure for core data services
Entity Graph
Technology: Postgres
Why: Strong SQL, JSONB support, foreign keys, temporal queries
Decision Traces
Technology: Postgres
Why: Same store as entity graph, easy relational links to entities
Approval Queue
Technology: Redis (Valkey)
Why: Fast pub/sub, TTLs, sorted sets for SLA handling
Precedent Search
Technology: OpenSearch
Why: Full-text search + k-NN vector search out of the box
Audit Log
Technology: Kafka
Why: Immutable, ordered, durable event stream
Observability
Technology: ClickHouse
Why: Columnar storage, fast aggregations, analytics at scale
Configuration
Create these services in the Aiven console (or via CLI/Terraform), then populate your .env:
# Postgres - Core decision/trace store
POSTGRES_URL=postgres://avnadmin:password@host:port/defaultdb?sslmode=require
# OpenSearch - Precedent search (enable k-NN)
OPENSEARCH_URL=https://host:port
OPENSEARCH_USER=avnadmin
OPENSEARCH_PASSWORD=xxx
# Redis/Valkey - Approval queues
REDIS_URL=rediss://default:password@host:port
# Kafka - Audit log
KAFKA_BROKERS=host:port
KAFKA_SSL_CA=./kafka-creds/ca.pem
KAFKA_SSL_CERT=./kafka-creds/service.cert
KAFKA_SSL_KEY=./kafka-creds/service.key
# ClickHouse - Observability
CLICKHOUSE_HOST=host
CLICKHOUSE_PORT=port
CLICKHOUSE_DB=default
CLICKHOUSE_USER=avnadmin
CLICKHOUSE_PASSWORD=xxx
CLICKHOUSE_SECURE=true
For Kafka, download TLS certs via:
avn service user-kafka-java-creds --project your-project your-kafka-service -d ./kafka-credsThe Agent Flow (Code Walkthrough)
Let’s trace through exactly what happens when an agent processes a request.
Step 1: Input Guardrails
Before any processing, validate the input:
def process_request(self, customer_id: str, user_message: str, order_id: str = None) -> Dict:
"""Process a customer request with full context graph flow."""
print("1️⃣ Checking input guardrails...")
input_check = InputGuardrails.check_input(user_message)
if input_check.result == GuardrailResult.BLOCK:
return {"success": False, "error": input_check.reason}
if input_check.result == GuardrailResult.MODIFY:
user_message = input_check.modified_value
print(f" ⚠️ Input modified: {input_check.reason}")
Input guardrails catch:
Prompt injection attempts: “Ignore previous instructions...”
Sensitive data: Credit card numbers get redacted
DOS attempts: Extremely long inputs get truncated
Step 2: Load Context from Graph
Fetch the unified context - customer profile, orders, tickets, past decisions, and precedent:
print("2️⃣ Loading context from graph...")
context = self.context_graph.get_full_context(
customer_id=customer_id,
include_precedent=True,
current_situation=user_message
)
print(f" 👤 Customer: {context['customer']['name']} ({context['customer']['tier']})")
print(f" 📦 Orders: {len(context['orders'])}")
print(f" 🎫 Tickets: {len(context['tickets'])}")
print(f" 📜 Past decisions: {len(context['past_decisions'])}")
print(f" 🔍 Precedent found: {len(context['precedent'])}")
Step 3: Precedent Search
The precedent search is crucial. When the agent sees “customer complaining about late delivery,” it searches for similar past cases:
def find_precedent(current_situation: str, entity_type: str = None, n_results: int = 5) -> List[Dict]:
"""Find similar past decisions to use as precedent."""
store = get_vector_store()
return store.search_similar_decisions(
query=current_situation,
n_results=n_results,
entity_type=entity_type,
outcome="approved" # Learn from successful decisions
)
The agent’s prompt includes these precedents:
## Similar Cases (Precedent)
- [approved] Customer order was delayed 4 days. Full refund approved per shipping delay policy. (similarity: 0.94)
- [approved] Gold tier customer had 3-day delay. Applied VIP exception. Full refund plus 10% credit. (similarity: 0.87)
Step 4: Capture Decision Trace
This is the heart of the context graph - capture the full decision before execution:
print("5️⃣ Capturing decision trace...")
context_snapshot = self.context_graph.get_context_snapshot(
customer_id=customer_id,
current_situation=user_message
)
decision_trace = capture_decision(
entity_type=decision_data.get("entity_type", "support_request"),
entity_id=order_id or str(uuid.uuid4()),
customer_id=customer_id,
action=decision_data.get("action", "respond"),
reasoning=decision_data.get("reasoning", response.get("text", "")),
confidence=decision_data.get("confidence", 0.8),
inputs_snapshot=context_snapshot, # FULL STATE AT DECISION TIME
amount=decision_data.get("amount"),
rules_applied=decision_data.get("rules_applied", []),
precedents_used=decision_data.get("precedents_referenced", [])
)
print(f" 📝 Decision ID: {decision_trace.decision_id}")The Learning Loop
The context graph isn’t static - it learns from every decision.
How Precedent Accumulates
Every decision gets indexed:
def index_decision(
decision_id: str,
entity_type: str,
action: str,
reasoning: str,
confidence: float,
outcome: str = None,
customer_tier: str = None,
amount: float = None
):
"""Index a decision for future precedent search."""
store = get_vector_store()
# Build searchable text
searchable_text = f"{action}: {reasoning}"
# Metadata for filtering
metadata = {
"entity_type": entity_type,
"action": action,
"confidence": confidence,
"outcome": outcome or "pending",
"customer_tier": customer_tier or "standard",
"amount": amount or 0
}
store.add_decision(decision_id, searchable_text, metadata)
The next agent to face a similar situation will find this decision in its precedent search.
The Compounding Effect
Foundation Capital calls this the compounding effect:
“The more workflows you mediate, the more traces you capture. The more traces you capture, the better you get at automating the next edge case.”
Each decision makes the system smarter:
Day 1: Agent has no precedent, relies on rules alone
Day 30: Agent has hundreds of decisions, starts finding relevant precedent
Day 365: Agent has thousands of decisions, can handle most edge cases by precedent
Demo Walkthrough
Let’s run through actual demo output:
Scenario 1: Gold Customer with Late Delivery
============================================================
🎬 SCENARIO 1: Gold Customer - Late Delivery Refund
============================================================
Customer: Sarah (Gold tier, $3500 LTV)
Order: $89.99, delivered 5 days late
Expected: Auto-approve (policy + precedent support)
1️⃣ Checking input guardrails...
✅ Input OK
2️⃣ Loading context from graph...
👤 Customer: Sarah Johnson (gold, LTV: $3500.0)
📦 Orders: 2
📜 Precedents found: 3
└─ [approved] Customer order was delayed by 4 days...
└─ [approved] Gold tier customer had 3-day shipping delay...
3️⃣ Agent reasoning...
🎯 Action: process_refund
💪 Confidence: 0.95
4️⃣ Checking action guardrails...
✅ Action approved
5️⃣ Capturing decision trace...
📝 Decision ID: dec_cdee5e3014e0
6️⃣ Executing action...
✅ Refund processed: ref_ca8334c0
Scenario 2: Large Refund Requires Approval
============================================================
🎬 SCENARIO 2: Large Order Refund
============================================================
Customer: Mike (Standard tier, $150 LTV)
Order: $599.99, no delay
3️⃣ Agent reasoning...
🎯 Action: evaluate_refund
💪 Confidence: 0.65
4️⃣ Checking action guardrails...
⏳ Requires approval: Confidence 0.65 below threshold
6️⃣ ⏳ Queued for human approval
Practical Guidance
For Companies Wanting to Explore Context Graphs
Map decision-rich workflows first. Start with refunds, claims, escalations - anywhere the “why” matters.
Store decisions like traces, not records. You need reasoning, approvals, confidence, and links to upstream entities.
Build precedent search. Index each decision for semantic retrieval.
Stream to observability. Surface confidence distributions, compliance hits, SLA violations.
Treat it as a learning loop. Each decision becomes precedent for the next agent.
For Incumbents
The context graph can layer on top of existing systems:
Add a decision trace table to your existing Postgres
Index decisions to OpenSearch for precedent search
Stream audit events to Kafka for compliance
The context graph augments existing systems rather than replacing them.
The HITL Paradox- Your Friction Is Your Moat
Here’s the counterintuitive truth that most AI companies get backwards:
Every time your agent fails and a human steps in, you’re not losing. You’re winning.
The override is the product. The correction is the data. The friction is the moat.
The Economics of Disagreement
Think about what a human override actually represents:
A £200/hour decision-maker just labelled a training example for you - for free
They didn’t just say “wrong” - they demonstrated “here’s what right looks like”
They provided the context that made the agent’s answer insufficient
They revealed a gap between documented policy and actual practice
Every competitor who optimises for “fewer human touchpoints” is optimising themselves out of the learning loop.
They’re celebrating metrics (automation rate! deflection rate!) while haemorrhaging the signal that would make their system actually intelligent.
The Approval Queue as a Knowledge Mine
That Redis queue holding pending decisions? It’s not a bottleneck. It’s a knowledge extraction pipeline.
Consider what flows through it:
What the agent proposed What the human did What you learned
Deny refund (policy says no). Approved anyway. Policy has unstated exceptions
Approve £500 refund Reduced to £200 Threshold intuition exists Escalate to manager
Resolved directly Authority model is wrong Standard response Completely rewrote Tone/context was misjudged
The delta between proposal and action is pure institutional knowledge - the stuff that lives in people’s heads and walks out the door when they leave.
Three Strategic Implications
1. The Override Ratio Inversion
What if you wanted a 30% override rate rather than 5%? Not because your agent is bad, but because you’re deliberately pushing it into uncertain territory to harvest corrections. The agent becomes a hypothesis generator, and humans become hypothesis validators. You’re not automating work - you’re mining expertise.
2. The Approval UX as Competitive Weapon
Most approval flows are designed to be tolerated. What if they were designed to be preferred?
Imagine: the human sees the agent’s reasoning, the precedents it considered, the confidence score, the similar cases. They’re not just approving - they’re teaching. Make that experience so good that experts want to use it because it makes them better at their job.
Suddenly your HITL isn’t a fallback - it’s the primary interface for institutional knowledge capture.
3. The Correction Marketplace
What if human corrections had explicit value?
“Your override of decision #4,521 improved agent accuracy by 0.3% across 12,000 subsequent decisions.”
The expert sees their impact. The organisation sees who holds tacit knowledge. The system learns who to route edge cases to. The approval queue becomes a reputation system for institutional expertise.
The Line That Should Keep Competitors Up at Night
“We trained our model on 100,000 human corrections in procurement decisions. Each one cost our customer nothing but made our next decision better. Our competitors trained on documentation. We trained on disagreement.”
The companies that make human oversight seamless won’t just have better compliance. They’ll have better AI.Because they understood that the override isn’t the failure case.
It’s the whole point.
What the Paper Doesn’t Tell You - The Make-or-Break Gaps
Foundation Capital wrote an investment thesis, not an implementation guide. The paper is right about what needs to exist. It’s silent on how it gets built - and that silence is where most companies will fail.
Here are the gaps that will separate winners from casualties:
Gap 1: The Cold Start Problem
The paper describes the compounding
effect beautifully: more decisions → more traces → better precedent → smarter agents.
But they skip the brutal first chapter: Day 1, you have zero precedents.
Your agent faces an edge case. It searches for similar decisions. Nothing. It makes a guess. The guess might be wrong. The customer is upset. The human overrides. Now you have one data point.
The paper assumes you can bootstrap this from existing data. You can’t. The “why” was never captured. Your CRM has outcomes, not reasoning. Your Slack has reasoning, but it’s scattered, unstructured, and unsearchable.
The make-or-break question: How do you survive the 6-18 months where your context graph is too sparse to be useful, but you’re asking enterprises to trust agents anyway? Most implementations will die here - not because the architecture is wrong, but because the time-to-value is too long for enterprise procurement cycles.
Gap 2: The Reasoning Capture Problem
The paper assumes agents will produce capturable reasoning. But most agent architectures today emit:
A final answer
Maybe a chain-of-thought (if you prompted for it)
Tool calls
They don’t emit: “I considered policies A, B, and C. Policy B conflicted with precedent X. I weighted the customer’s tier at 40% importance because of factor Y.”
The reasoning that matters isn’t in the model’s output. It’s implicit in the model’s weights.
So what gets stored in your decision trace? A post-hoc rationalisation. The model explaining what it did, not why it did it. These aren’t the same thing.
The make-or-break question: How do you capture actual decision factors rather than plausible-sounding explanations? This is an interpretability problem that the paper waves away.
Gap 3: The Precedent Poisoning Problem
The paper treats all precedents as equally valid. But consider:
Temporal decay: A decision made under last year’s policy shouldn’t guide today’s agent
Context collapse: “Customer got full refund” loses critical context - which customer? What circumstances?
Error propagation: One bad decision, if cited as precedent, corrupts thousands of future decisions
Adversarial gaming: If I know precedent influences decisions, I can manufacture situations to create favourable precedents
The context graph learns from itself. What if it learns the wrong things?
This is the alignment problem in miniature. You’ve built a system that compounds institutional knowledge. It will also compound institutional mistakes.
If you’ve studied nonlinear dynamics, this failure mode should feel familiar. The context graph is a feedback system - decisions influence future decisions through precedent. In dynamical systems terms, you’ve created a recurrence relation where each state depends on previous states. And like any such system, it can exhibit stable attractors, limit cycles, or chaos.
A single corrupted precedent is a perturbation. In a well-damped system, it decays. But in a system with positive feedback- where confident decisions get cited more often, reinforcing their influence- small errors can amplify. You get the organisational equivalent of period-doubling: the system oscillates between interpretations, or worse, settles into a stable but wrong attractor that’s resistant to correction. The challenge isn’t just capturing decisions. It’s designing the feedback topology so the system converges to truth rather than confident nonsense.
The make-or-break question: What’s the correction mechanism when precedent itself is wrong? How do you “unlearn” a bad pattern that’s been cited 500 times?
Gap 4: The Expertise Extraction Ethics
Here’s the one nobody wants to discuss:
When a human expert overrides an agent, you’re extracting their tacit knowledge - often knowledge they spent decades accumulating - and encoding it into a system that might eventually replace them.
The expert is training their replacement. Do they know it?
Most won’t. And when they figure it out, they’ll stop providing corrections. Or worse, they’ll provide misleadingcorrections to protect their position.
The make-or-break question: How do you align incentives so that experts want to contribute to the context graph? This isn’t a technical problem - it’s a political economy problem that will kill implementations faster than any architectural flaw.
Gap 5: The Liability Vacuum
The paper mentions audit trails. But audit trails aren’t the same as accountability.
When an agent makes a decision based on precedent, and that decision causes harm, who’s liable?
The company that deployed the agent?
The vendor who built the context graph?
The human who created the original precedent?
The model provider whose weights influenced reasoning?
Legal frameworks assume a person made a decision. Context graphs create decisions that are archaeological - layers of precedent, policy, model weights, and human corrections, none of which is individually responsible.
The make-or-break question: How do you assign liability in a system designed to diffuse decision-making across time and actors? Until this is solved, regulated industries (finance, healthcare, government) will move slowly regardless of the technology’s promise.
Gap 6: The Network Effects Paradox
Foundation Capital argues context graphs are winner-take-most because they compound.
But here’s the paradox: if everyone believes this, nobody shares data.
Every company hoards their decision traces. No cross-industry learning. No shared precedent for common problems (refunds, escalations, claims). You get a thousand siloed context graphs, each learning the same lessons independently.
The network effects the paper promises require openness that competitive dynamics forbid.
The make-or-break question: Is there a model - like anonymised precedent sharing, or industry consortiums - that unlocks cross-organisational learning? Or does the trillion-dollar opportunity fragment into a million small opportunities?
The Survivorship Bias Warning
Five years from now, we’ll read case studies about the companies that built successful context graphs. We won’t read about the hundreds that tried and failed.
The failures won’t be because the thesis was wrong. They’ll be because:
They couldn’t survive the cold start period
They captured rationalisations instead of reasoning
They poisoned their precedent store with bad data
Their experts stopped cooperating
They couldn’t answer the liability question for risk-averse buyers
The trillion-dollar opportunity is real. But the path there is littered with pitfalls the investment thesis doesn’t mention.
Conclusion: The Depth-First Bet
Context graphs are a deep idea that requires every layer to cooperate. You can’t bolt them on as an afterthought - they need to be woven into the execution path where decisions happen.
Foundation Capital is betting that this architectural shift will create the next trillion-dollar platforms. Shah is right that most companies aren’t ready. Both can be true.
But here’s what we’ve learned from actually building this:
The HITL layer isn’t a safety net- it’s the learning engine. Companies that treat human oversight as friction to minimise will lose to companies that treat it as signal to maximise. The approval queue is where institutional knowledge gets captured, not where automation goes to die.
The gaps in the thesis aren’t reasons to wait - they’re the actual work. Cold start, reasoning capture, precedent poisoning, expertise extraction, liability - these aren’t theoretical problems. They’re the engineering and organisational challenges that will separate the winners from the case studies nobody writes.
The compounding starts now. Every decision trace you capture today is precedent for tomorrow’s agent. Every human correction is a training example you didn’t have to pay for. The companies that start building context graphs now - even imperfect ones - will have insurmountable advantages in 3-5 years.
This repository gives you a playground to explore that question. Run it. Tweak the domain spec. See how the graph starts capturing the “why” in your own operations. Pay attention to what happens in the approval queue - that’s where the real value accumulates.
The trillion-dollar opportunity is real. The terrain is mapped. The pitfalls are documented. The only question is who starts building - and who keeps waiting for the perfect architecture that never arrives.
References
Gupta, J. & Garg, A. (2025). “Context Graphs: AI’s Trillion-Dollar Opportunity.” Foundation Capital. https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/
Shah, D. (2026). “What Are Context Graphs?” simple.ai. https://simple.ai/p/what-are-context-graphs
Foundation Capital. (2026). “Where AI is Headed in 2026.” https://foundationcapital.com/where-ai-is-headed-in-2026/
The post is edited and corrected using various LLMs, though the experiments and thoughts are original.















Phenomenal deep dive into context graph architecture! The HITL paradox section flips the usual thinking on its head. I've seen teams treat human overrides as failure metrics to minimize, but framing them as knowledge extraction oportunities changes the entire design space. That precedent poisoning problem with feedback loops is exactly the graph topology challange that needs more attention though. How do you weight temporal decay versus compouding confidence?