24 KiB
DeepAudit: System Architecture for Academic Paper
This document provides the system architecture description suitable for top-tier academic conferences (ICSE, FSE, CCS, S&P, USENIX Security, etc.).
Architecture Diagram
System Overview
DeepAudit is an LLM-driven intelligent code security audit system that employs a hierarchical multi-agent architecture with Retrieval-Augmented Generation (RAG) and sandbox-based vulnerability verification.
Key Contributions
-
LLM-Driven Multi-Agent Orchestration: A dynamic agent hierarchy where the LLM serves as the central decision-making brain, autonomously orchestrating specialized agents for reconnaissance, analysis, and verification.
-
RAG-Enhanced Vulnerability Detection: Integration of semantic code understanding with vulnerability knowledge bases (CWE/CVE) to reduce false positives and improve detection accuracy.
-
Sandbox-Based Exploit Verification: Docker-isolated execution environment for automated PoC generation and vulnerability confirmation.
Architecture Components
Layer 1: User Interface Layer
┌─────────────────────────────────────────────────────────────────┐
│ User Interface Layer │
├─────────────────────────────────────────────────────────────────┤
│ ┌───────────────────┐ ┌───────────────────────────────────┐ │
│ │ Web Frontend │ │ API Gateway │ │
│ │ (React + TS) │◄──►│ REST API / SSE Event Stream │ │
│ └───────────────────┘ └───────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Components:
- Web Frontend: React 18 + TypeScript SPA with real-time log streaming
- API Gateway: FastAPI-based REST endpoints with SSE for real-time events
Layer 2: Multi-Agent Orchestration Layer
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Agent Orchestration Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ │
│ │ Orchestrator Agent │ ◄─── LLM Provider │
│ │ (ReAct Loop) │ (GPT-4/Claude) │
│ └──────────┬──────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Recon Agent │ │Analysis Agent│ │Verification │ │
│ │ │ │ │ │ Agent │ │
│ │ • Structure │ │ • SAST │ │ • PoC Gen │ │
│ │ • Tech Stack │ │ • Pattern │ │ • Sandbox │ │
│ │ • Entry Pts │ │ • Dataflow │ │ • Validation │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Key Design Decisions:
| Component | Design Choice | Rationale |
|---|---|---|
| Orchestrator | LLM-driven ReAct loop | Dynamic strategy adaptation based on findings |
| Sub-Agents | Specialized roles | Domain expertise separation for precision |
| Communication | TaskHandoff protocol | Structured context passing between agents |
| Iteration Limits | Configurable (20/30/15) | Prevent infinite loops while ensuring depth |
Layer 3: RAG Knowledge Enhancement Layer
┌─────────────────────────────────────────────────────────────────┐
│ RAG Knowledge Enhancement Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Code Chunker│ │ Embedding │ │ Vector Database │ │
│ │(Tree-sitter)│───►│ Model │───►│ (ChromaDB) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────┼───────────┐│
│ │ CWE/CVE Knowledge Base │ ││
│ │ • SQL Injection patterns ▼ ││
│ │ • XSS signatures ┌───────────────────┐ ││
│ │ • Command Injection │ Semantic Retriever│ ││
│ │ • Path Traversal └───────────────────┘ ││
│ │ • SSRF patterns ││
│ │ • ... ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘
RAG Pipeline:
- Code Chunking: Tree-sitter based AST-aware chunking for semantic preservation
- Embedding: Support for OpenAI text-embedding-3-small/large, local models
- Vector Store: ChromaDB for lightweight deployment
- Retrieval: Semantic similarity search with vulnerability pattern matching
Layer 4: Security Tool Integration Layer
┌─────────────────────────────────────────────────────────────────┐
│ Security Tool Integration Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ SAST Tools ││
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ││
│ │ │ Semgrep │ │ Bandit │ │Kunlun-M │ │Pattern Match │ ││
│ │ │ (Multi) │ │ (Python) │ │ (PHP/JS) │ │ (Fallback) │ ││
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ ┌────────────────────────┐ ┌────────────────────────────────┐ │
│ │ Secret Detection │ │ Dependency Analysis │ │
│ │ • Gitleaks │ │ • OSV-Scanner │ │
│ │ • TruffleHog │ │ • npm audit / pip-audit │ │
│ └────────────────────────┘ └────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Tool Selection Strategy:
| Category | Primary Tool | Fallback | Coverage |
|---|---|---|---|
| Multi-lang SAST | Semgrep | PatternMatch | 20+ languages |
| Python Security | Bandit | PatternMatch | Python-specific |
| PHP/JS Analysis | Kunlun-M | Semgrep | Semantic analysis |
| Secret Detection | Gitleaks | TruffleHog | Git history scan |
| Dependencies | OSV-Scanner | npm/pip audit | Multi-ecosystem |
Layer 5: Sandbox Verification Layer
┌─────────────────────────────────────────────────────────────────┐
│ Sandbox Verification Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Docker Sandbox Container ││
│ │ ┌────────────────────────────────────────────────────────┐ ││
│ │ │ Security Constraints │ ││
│ │ │ • Network: Isolated / No external access │ ││
│ │ │ • Resources: Memory 512MB / CPU 1.0 │ ││
│ │ │ • Syscalls: seccomp whitelist policy │ ││
│ │ │ • Timeout: 60 seconds max execution │ ││
│ │ └────────────────────────────────────────────────────────┘ ││
│ │ ││
│ │ ┌──────────────────┐ ┌──────────────────────────────┐ ││
│ │ │ PoC Generator │───►│ Exploit Validator │ ││
│ │ │ (LLM-assisted) │ │ (Execution + Verification) │ ││
│ │ └──────────────────┘ └──────────────────────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘
Verification Workflow:
- PoC Generation: LLM generates exploitation code based on vulnerability analysis
- Sandbox Setup: Docker container with strict security constraints
- Execution: Run PoC in isolated environment
- Validation: Check execution results against expected vulnerability behavior
- Confidence Scoring: Assign verification confidence (0-1)
Data Flow Diagram
┌─────────────────────────────────────────────────────────────────────────────┐
│ DeepAudit Data Flow │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────┐ ┌──────────────┐
│ User │ │ Reports │
│ Request │ │ (MD/JSON) │
└────┬─────┘ └──────▲───────┘
│ │
▼ │
┌───────────────┐ ┌─────────────────────────────────────────────┴───────┐
│ API Gateway │───►│ PostgreSQL DB │
└───────┬───────┘ │ • Tasks • Findings • Projects • Reports │
│ └─────────────────────────────────────────────────────┘
▼
┌───────────────────────────────────────────────────────────────────────────┐
│ Orchestrator Agent │
│ │
│ ┌─────────────┐ ┌─────────────────────────────────────────────┐ │
│ │ LLM Service │◄────►│ ReAct Decision Loop │ │
│ │ (GPT/Claude)│ │ Thought → Action → Observation → Thought │ │
│ └─────────────┘ └───────────────────┬─────────────────────────┘ │
│ │ │
│ ┌─────────────┬───────────────┼───────────────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ Recon │ │ Analysis │ │Verification│ │ Finish │ │
│ │ Agent │ │ Agent │ │ Agent │ │ Action │ │
│ └──────┬──────┘ └─────┬─────┘ └──────┬─────┘ └──────────────┘ │
│ │ │ │ │
└─────────────┼──────────────┼──────────────┼───────────────────────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ File Tools │ │ SAST Tools │ │ Sandbox │
│ list/read │ │ Semgrep... │ │ Docker │
└─────┬──────┘ └─────┬──────┘ └──────┬─────┘
│ │ │
│ ┌──────┴──────┐ │
│ ▼ │ │
│ ┌─────────┐ │ │
└─►│ RAG │◄───────┘ │
│ Pipeline│ │
└────┬────┘ │
│ │
▼ ▼
┌────────────┐ ┌────────────┐
│ Vector DB │ │ Verification│
│ ChromaDB │ │ Result │
└────────────┘ └────────────┘
Algorithm: Multi-Agent Audit Orchestration
Algorithm 1: LLM-Driven Multi-Agent Security Audit
Input: Project P, Target vulnerabilities V, Configuration C
Output: Findings F, Verification Results R
1: Initialize Orchestrator Agent with LLM
2: Create sub-agents: Recon, Analysis, Verification
3: findings ← ∅
4: verified_results ← ∅
5:
6: // Phase 1: Reconnaissance
7: recon_result ← ReconAgent.run(P, V)
8: high_risk_areas ← recon_result.priority_areas
9:
10: // Phase 2: Orchestration Loop
11: while iteration < MAX_ITERATIONS do
12: thought, action ← LLM.reason(context, history)
13:
14: if action = "dispatch_agent" then
15: agent ← select_agent(action.params)
16: result ← agent.run(action.task, context)
17: findings ← findings ∪ result.findings
18: update_context(result)
19: else if action = "finish" then
20: break
21: end if
22:
23: iteration ← iteration + 1
24: end while
25:
26: // Phase 3: Verification
27: for each f ∈ findings where f.severity ≥ HIGH do
28: poc ← LLM.generate_poc(f)
29: result ← Sandbox.execute(poc)
30: verified_results ← verified_results ∪ {(f, result)}
31: end for
32:
33: return (findings, verified_results)
Evaluation Metrics
For academic evaluation, we suggest the following metrics:
Detection Effectiveness
| Metric | Formula | Description |
|---|---|---|
| Precision | TP / (TP + FP) | Accuracy of reported vulnerabilities |
| Recall | TP / (TP + FN) | Coverage of actual vulnerabilities |
| F1-Score | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |
Efficiency Metrics
| Metric | Description |
|---|---|
| Time-to-Detection (TTD) | Time from start to first vulnerability found |
| Total Audit Time | End-to-end execution time |
| LLM Token Usage | Total tokens consumed during audit |
| Tool Invocation Count | Number of external tool calls |
Verification Quality
| Metric | Description |
|---|---|
| Verification Rate | Percentage of findings verified via sandbox |
| False Positive Reduction | % reduction after verification |
| PoC Success Rate | Successful exploit demonstrations |
Comparison with Related Work
| System | Multi-Agent | RAG | Sandbox | LLM-Driven |
|---|---|---|---|---|
| CodeQL | ✗ | ✗ | ✗ | ✗ |
| Semgrep | ✗ | ✗ | ✗ | ✗ |
| Snyk Code | ✗ | ✗ | ✗ | Partial |
| GitHub Copilot | ✗ | ✗ | ✗ | ✓ |
| DeepAudit | ✓ | ✓ | ✓ | ✓ |
LaTeX TikZ Diagram Code
For LaTeX papers, you can use the following TikZ code:
\begin{figure}[t]
\centering
\begin{tikzpicture}[
node distance=1cm,
box/.style={rectangle, draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, align=center},
agent/.style={box, fill=blue!10},
tool/.style={box, fill=orange!10},
rag/.style={box, fill=green!10},
sandbox/.style={box, fill=red!10},
arrow/.style={->, >=stealth, thick}
]
% Orchestrator
\node[agent] (orch) {Orchestrator Agent};
% Sub-agents
\node[agent, below left=1.5cm and 1cm of orch] (recon) {Recon Agent};
\node[agent, below=1.5cm of orch] (analysis) {Analysis Agent};
\node[agent, below right=1.5cm and 1cm of orch] (verify) {Verification Agent};
% Connections
\draw[arrow] (orch) -- (recon);
\draw[arrow] (orch) -- (analysis);
\draw[arrow] (orch) -- (verify);
% Tools
\node[tool, below=1cm of analysis] (tools) {SAST Tools\\Semgrep, Bandit, Kunlun-M};
% RAG
\node[rag, left=1cm of tools] (rag) {RAG Pipeline\\Vector DB + CWE/CVE};
% Sandbox
\node[sandbox, right=1cm of tools] (sandbox) {Docker Sandbox\\PoC Verification};
% Tool connections
\draw[arrow] (analysis) -- (tools);
\draw[arrow, dashed] (tools) -- (rag);
\draw[arrow] (verify) -- (sandbox);
% LLM
\node[box, fill=purple!10, above=0.5cm of orch] (llm) {LLM Provider\\GPT-4 / Claude};
\draw[arrow, <->] (orch) -- (llm);
\end{tikzpicture}
\caption{DeepAudit System Architecture}
\label{fig:architecture}
\end{figure}
Citation
If you use DeepAudit in your research, please cite:
@software{deepaudit2024,
title = {DeepAudit: LLM-Driven Multi-Agent Code Security Audit System with RAG Enhancement and Sandbox Verification},
author = {Lin Tsinghua},
year = {2024},
url = {https://github.com/lintsinghua/DeepAudit},
version = {3.0.0}
}
