FabOps Copilot
Agentic AI Stockout-Risk Diagnostician for Semiconductor Fab Service Parts
LangGraph · Gemini 2.5 Flash · AWS Lambda · DynamoDB · MCP · RAG
DS-5730 Context-Augmented Gen AI Apps | Vanderbilt University | Spring 2026
Semiconductor fabs run hundreds of expensive, slow-moving service parts across many sites. When a part is about to stock out, a planner must decide in minutes whether the root cause is demand drift, supply drift, or a stale reorder policy, and each diagnosis implies a different action. FabOps Copilot is a deployed agentic AI system that reads six disconnected data sources (ERP inventory, demand history, supplier lead-time logs, SEC 10-K filings, FRED macro signals, and reorder-policy metadata) and returns a single diagnosed driver with a prescribed action and grounded citations.
Key Achievement
83.3%
Gold-set pass rate (15/18 cases)
19s
Cold start, down from 50s
41/41
Tests passing
The Diagnostic Challenge
Service-part planners must triage stockout risk across fragmented evidence: ERP inventory, historical demand, supplier lead-time logs, SEC risk disclosures, FRED industrial production, and reorder-policy metadata. A wrong diagnosis means rushed air freight or fab downtime. FabOps Copilot collapses that triage into a single agent call.
Can a grounded, tool-using agent reliably diagnose the root cause of a stockout risk and prescribe the correct action, end-to-end, in under 25 seconds?
How It Works (9-Node LangGraph)
A natural-language question like "Why is part 10279876 at risk at the Taiwan fab, and what should I do?" flows through a 9-node state machine:
- Pull live inventory and reorder policy from DynamoDB.
- Check policy staleness against latest demand history.
- Check demand drift against a Croston SBA forecast baked the prior night.
- Check supply drift using supplier lead-time trend and a simulated disruption model.
- Ground the answer in real Applied Materials SEC 10-K disclosures via cosine similarity over 3072-dim Gemini embeddings.
- Call Gemini 2.5 Flash to diagnose the driver as one of policy_drift, supply_risk, demand_shift, or none.
- Run a rule-based prescriber keyed on the diagnosis.
- Run a verify pass (gated behind an env flag in production for latency).
- Finalize and write every intermediate artifact to a DynamoDB audit table.
Architecture
Frontend (Amplify, vanilla HTML/JS, dark theme)
-> API Gateway HTTP API (30s cap, native CORS)
-> Runtime Lambda (Python 3.9, arm64, 1024 MB, 42 MB zip)
-> LangGraph 9-node agent
-> 7 tools over DynamoDB + FRED + EDGAR
-> Gemini 2.5 Flash (diagnose), Claude Haiku (eval judge)
Nightly Lambda (container, arm64, 3008 MB, 900s)
-> Croston SBA forecast bake via statsforecast
-> MLflow tracking with S3-backed DB
-> EventBridge cron 02:00 UTC
9 DynamoDB tables: audit, forecasts, policies, inventory, suppliers,
incidents, macro_cache, edgar_index, sessionsHardest Problems Solved
- Gold set was fiction. The original 30-case gold set had 14 of 17 label-vs-state mismatches. Wrote
regenerate_gold_set.pyto derive labels from real DynamoDB state, plusinject_gold_drift.pyto seed deterministic 6/6/6 drift signals. Lifted pass rate from noise to 83.3%. - 50 MB Lambda ceiling with ML deps. Moved Gemini SDK to a Lambda layer, extracted p90-stockout math to a numpy-free module, and wrote a pure-Python cosine ranker over 3072-dim vectors. Final runtime zip: 42 MB.
- 43-second cold start from EDGAR DynamoDB scan. Pre-baked all 1,079 EDGAR embedding chunks into the Lambda zip as a module-level constant, cutting cold start to roughly 19s and warm diagnose to 3 to 5s.
- Eval harness hit API Gateway 30s timeout. Switched to direct boto3 Lambda invoke with a 180s read timeout, incremental cache writes, and per-case try/except so one failing case does not nuke the whole run.
Scale and Metrics
9
Agent Nodes
1,079
EDGAR Chunks
18
Gold Cases
41/41
Tests Passing
82
Commits
~$0.04
Per Eval Run
~4,400
Lines of Python
9
DynamoDB Tables
Tech Stack
Solo Build
Solo architecture, implementation, evaluation, and deployment by Roshan Siddartha Sivakumar over an 11-day build window in April 2026. 82 commits, all authored by one person.
Instructor: Prof. Jesse Spencer-Smith