FabOps Copilot

Agentic AI Stockout-Risk Diagnostician for Semiconductor Fab Service Parts

LangGraph · Gemini 2.5 Flash · AWS Lambda · DynamoDB · MCP · RAG

DS-5730 Context-Augmented Gen AI Apps | Vanderbilt University | Spring 2026

Semiconductor fabs run hundreds of expensive, slow-moving service parts across many sites. When a part is about to stock out, a planner must decide in minutes whether the root cause is demand drift, supply drift, or a stale reorder policy, and each diagnosis implies a different action. FabOps Copilot is a deployed agentic AI system that reads six disconnected data sources (ERP inventory, demand history, supplier lead-time logs, SEC 10-K filings, FRED macro signals, and reorder-policy metadata) and returns a single diagnosed driver with a prescribed action and grounded citations.

Key Achievement

83.3%

Gold-set pass rate (15/18 cases)

19s

Cold start, down from 50s

41/41

Tests passing

The Diagnostic Challenge

Service-part planners must triage stockout risk across fragmented evidence: ERP inventory, historical demand, supplier lead-time logs, SEC risk disclosures, FRED industrial production, and reorder-policy metadata. A wrong diagnosis means rushed air freight or fab downtime. FabOps Copilot collapses that triage into a single agent call.

Can a grounded, tool-using agent reliably diagnose the root cause of a stockout risk and prescribe the correct action, end-to-end, in under 25 seconds?

How It Works (9-Node LangGraph)

A natural-language question like "Why is part 10279876 at risk at the Taiwan fab, and what should I do?" flows through a 9-node state machine:

  1. Pull live inventory and reorder policy from DynamoDB.
  2. Check policy staleness against latest demand history.
  3. Check demand drift against a Croston SBA forecast baked the prior night.
  4. Check supply drift using supplier lead-time trend and a simulated disruption model.
  5. Ground the answer in real Applied Materials SEC 10-K disclosures via cosine similarity over 3072-dim Gemini embeddings.
  6. Call Gemini 2.5 Flash to diagnose the driver as one of policy_drift, supply_risk, demand_shift, or none.
  7. Run a rule-based prescriber keyed on the diagnosis.
  8. Run a verify pass (gated behind an env flag in production for latency).
  9. Finalize and write every intermediate artifact to a DynamoDB audit table.

Architecture

Frontend (Amplify, vanilla HTML/JS, dark theme)
   -> API Gateway HTTP API (30s cap, native CORS)
      -> Runtime Lambda (Python 3.9, arm64, 1024 MB, 42 MB zip)
         -> LangGraph 9-node agent
         -> 7 tools over DynamoDB + FRED + EDGAR
         -> Gemini 2.5 Flash (diagnose), Claude Haiku (eval judge)

Nightly Lambda (container, arm64, 3008 MB, 900s)
   -> Croston SBA forecast bake via statsforecast
   -> MLflow tracking with S3-backed DB
   -> EventBridge cron 02:00 UTC

9 DynamoDB tables: audit, forecasts, policies, inventory, suppliers,
                   incidents, macro_cache, edgar_index, sessions

Hardest Problems Solved

  1. Gold set was fiction. The original 30-case gold set had 14 of 17 label-vs-state mismatches. Wrote regenerate_gold_set.py to derive labels from real DynamoDB state, plus inject_gold_drift.py to seed deterministic 6/6/6 drift signals. Lifted pass rate from noise to 83.3%.
  2. 50 MB Lambda ceiling with ML deps. Moved Gemini SDK to a Lambda layer, extracted p90-stockout math to a numpy-free module, and wrote a pure-Python cosine ranker over 3072-dim vectors. Final runtime zip: 42 MB.
  3. 43-second cold start from EDGAR DynamoDB scan. Pre-baked all 1,079 EDGAR embedding chunks into the Lambda zip as a module-level constant, cutting cold start to roughly 19s and warm diagnose to 3 to 5s.
  4. Eval harness hit API Gateway 30s timeout. Switched to direct boto3 Lambda invoke with a 180s read timeout, incremental cache writes, and per-case try/except so one failing case does not nuke the whole run.

Scale and Metrics

9

Agent Nodes

1,079

EDGAR Chunks

18

Gold Cases

41/41

Tests Passing

82

Commits

~$0.04

Per Eval Run

~4,400

Lines of Python

9

DynamoDB Tables

Tech Stack

LangGraphLangChainGemini 2.5 FlashClaude HaikuDSPyMCPPythonPydantic v2statsforecastMLflowAWS LambdaDynamoDBAPI GatewayAmplifyEventBridgeCloudWatchboto3pytestmotoGitHub Actions

Solo Build

Solo architecture, implementation, evaluation, and deployment by Roshan Siddartha Sivakumar over an 11-day build window in April 2026. 82 commits, all authored by one person.

Instructor: Prof. Jesse Spencer-Smith