395 lines
24 KiB
Markdown
395 lines
24 KiB
Markdown
|
|
# DeepAudit: System Architecture for Academic Paper
|
|||
|
|
|
|||
|
|
This document provides the system architecture description suitable for top-tier academic conferences (ICSE, FSE, CCS, S&P, USENIX Security, etc.).
|
|||
|
|
|
|||
|
|
## Architecture Diagram
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## System Overview
|
|||
|
|
|
|||
|
|
**DeepAudit** is an LLM-driven intelligent code security audit system that employs a **hierarchical multi-agent architecture** with **Retrieval-Augmented Generation (RAG)** and **sandbox-based vulnerability verification**.
|
|||
|
|
|
|||
|
|
### Key Contributions
|
|||
|
|
|
|||
|
|
1. **LLM-Driven Multi-Agent Orchestration**: A dynamic agent hierarchy where the LLM serves as the central decision-making brain, autonomously orchestrating specialized agents for reconnaissance, analysis, and verification.
|
|||
|
|
|
|||
|
|
2. **RAG-Enhanced Vulnerability Detection**: Integration of semantic code understanding with vulnerability knowledge bases (CWE/CVE) to reduce false positives and improve detection accuracy.
|
|||
|
|
|
|||
|
|
3. **Sandbox-Based Exploit Verification**: Docker-isolated execution environment for automated PoC generation and vulnerability confirmation.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Architecture Components
|
|||
|
|
|
|||
|
|
### Layer 1: User Interface Layer
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ User Interface Layer │
|
|||
|
|
├─────────────────────────────────────────────────────────────────┤
|
|||
|
|
│ ┌───────────────────┐ ┌───────────────────────────────────┐ │
|
|||
|
|
│ │ Web Frontend │ │ API Gateway │ │
|
|||
|
|
│ │ (React + TS) │◄──►│ REST API / SSE Event Stream │ │
|
|||
|
|
│ └───────────────────┘ └───────────────────────────────────┘ │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Components:**
|
|||
|
|
- **Web Frontend**: React 18 + TypeScript SPA with real-time log streaming
|
|||
|
|
- **API Gateway**: FastAPI-based REST endpoints with SSE for real-time events
|
|||
|
|
|
|||
|
|
### Layer 2: Multi-Agent Orchestration Layer
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ Multi-Agent Orchestration Layer │
|
|||
|
|
├─────────────────────────────────────────────────────────────────┤
|
|||
|
|
│ │
|
|||
|
|
│ ┌─────────────────────┐ │
|
|||
|
|
│ │ Orchestrator Agent │ ◄─── LLM Provider │
|
|||
|
|
│ │ (ReAct Loop) │ (GPT-4/Claude) │
|
|||
|
|
│ └──────────┬──────────┘ │
|
|||
|
|
│ │ │
|
|||
|
|
│ ┌────────────────┼────────────────┐ │
|
|||
|
|
│ ▼ ▼ ▼ │
|
|||
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|||
|
|
│ │ Recon Agent │ │Analysis Agent│ │Verification │ │
|
|||
|
|
│ │ │ │ │ │ Agent │ │
|
|||
|
|
│ │ • Structure │ │ • SAST │ │ • PoC Gen │ │
|
|||
|
|
│ │ • Tech Stack │ │ • Pattern │ │ • Sandbox │ │
|
|||
|
|
│ │ • Entry Pts │ │ • Dataflow │ │ • Validation │ │
|
|||
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|||
|
|
│ │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Key Design Decisions:**
|
|||
|
|
|
|||
|
|
| Component | Design Choice | Rationale |
|
|||
|
|
|-----------|---------------|-----------|
|
|||
|
|
| Orchestrator | LLM-driven ReAct loop | Dynamic strategy adaptation based on findings |
|
|||
|
|
| Sub-Agents | Specialized roles | Domain expertise separation for precision |
|
|||
|
|
| Communication | TaskHandoff protocol | Structured context passing between agents |
|
|||
|
|
| Iteration Limits | Configurable (20/30/15) | Prevent infinite loops while ensuring depth |
|
|||
|
|
|
|||
|
|
### Layer 3: RAG Knowledge Enhancement Layer
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ RAG Knowledge Enhancement Layer │
|
|||
|
|
├─────────────────────────────────────────────────────────────────┤
|
|||
|
|
│ │
|
|||
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
|||
|
|
│ │ Code Chunker│ │ Embedding │ │ Vector Database │ │
|
|||
|
|
│ │(Tree-sitter)│───►│ Model │───►│ (ChromaDB) │ │
|
|||
|
|
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
|
|||
|
|
│ │ │
|
|||
|
|
│ ┌─────────────────────────────────────────────────┼───────────┐│
|
|||
|
|
│ │ CWE/CVE Knowledge Base │ ││
|
|||
|
|
│ │ • SQL Injection patterns ▼ ││
|
|||
|
|
│ │ • XSS signatures ┌───────────────────┐ ││
|
|||
|
|
│ │ • Command Injection │ Semantic Retriever│ ││
|
|||
|
|
│ │ • Path Traversal └───────────────────┘ ││
|
|||
|
|
│ │ • SSRF patterns ││
|
|||
|
|
│ │ • ... ││
|
|||
|
|
│ └─────────────────────────────────────────────────────────────┘│
|
|||
|
|
│ │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**RAG Pipeline:**
|
|||
|
|
|
|||
|
|
1. **Code Chunking**: Tree-sitter based AST-aware chunking for semantic preservation
|
|||
|
|
2. **Embedding**: Support for OpenAI text-embedding-3-small/large, local models
|
|||
|
|
3. **Vector Store**: ChromaDB for lightweight deployment
|
|||
|
|
4. **Retrieval**: Semantic similarity search with vulnerability pattern matching
|
|||
|
|
|
|||
|
|
### Layer 4: Security Tool Integration Layer
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ Security Tool Integration Layer │
|
|||
|
|
├─────────────────────────────────────────────────────────────────┤
|
|||
|
|
│ │
|
|||
|
|
│ ┌─────────────────────────────────────────────────────────────┐│
|
|||
|
|
│ │ SAST Tools ││
|
|||
|
|
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ││
|
|||
|
|
│ │ │ Semgrep │ │ Bandit │ │Kunlun-M │ │Pattern Match │ ││
|
|||
|
|
│ │ │ (Multi) │ │ (Python) │ │ (PHP/JS) │ │ (Fallback) │ ││
|
|||
|
|
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ ││
|
|||
|
|
│ └─────────────────────────────────────────────────────────────┘│
|
|||
|
|
│ │
|
|||
|
|
│ ┌────────────────────────┐ ┌────────────────────────────────┐ │
|
|||
|
|
│ │ Secret Detection │ │ Dependency Analysis │ │
|
|||
|
|
│ │ • Gitleaks │ │ • OSV-Scanner │ │
|
|||
|
|
│ │ • TruffleHog │ │ • npm audit / pip-audit │ │
|
|||
|
|
│ └────────────────────────┘ └────────────────────────────────┘ │
|
|||
|
|
│ │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Tool Selection Strategy:**
|
|||
|
|
|
|||
|
|
| Category | Primary Tool | Fallback | Coverage |
|
|||
|
|
|----------|-------------|----------|----------|
|
|||
|
|
| Multi-lang SAST | Semgrep | PatternMatch | 20+ languages |
|
|||
|
|
| Python Security | Bandit | PatternMatch | Python-specific |
|
|||
|
|
| PHP/JS Analysis | Kunlun-M | Semgrep | Semantic analysis |
|
|||
|
|
| Secret Detection | Gitleaks | TruffleHog | Git history scan |
|
|||
|
|
| Dependencies | OSV-Scanner | npm/pip audit | Multi-ecosystem |
|
|||
|
|
|
|||
|
|
### Layer 5: Sandbox Verification Layer
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ Sandbox Verification Layer │
|
|||
|
|
├─────────────────────────────────────────────────────────────────┤
|
|||
|
|
│ │
|
|||
|
|
│ ┌─────────────────────────────────────────────────────────────┐│
|
|||
|
|
│ │ Docker Sandbox Container ││
|
|||
|
|
│ │ ┌────────────────────────────────────────────────────────┐ ││
|
|||
|
|
│ │ │ Security Constraints │ ││
|
|||
|
|
│ │ │ • Network: Isolated / No external access │ ││
|
|||
|
|
│ │ │ • Resources: Memory 512MB / CPU 1.0 │ ││
|
|||
|
|
│ │ │ • Syscalls: seccomp whitelist policy │ ││
|
|||
|
|
│ │ │ • Timeout: 60 seconds max execution │ ││
|
|||
|
|
│ │ └────────────────────────────────────────────────────────┘ ││
|
|||
|
|
│ │ ││
|
|||
|
|
│ │ ┌──────────────────┐ ┌──────────────────────────────┐ ││
|
|||
|
|
│ │ │ PoC Generator │───►│ Exploit Validator │ ││
|
|||
|
|
│ │ │ (LLM-assisted) │ │ (Execution + Verification) │ ││
|
|||
|
|
│ │ └──────────────────┘ └──────────────────────────────┘ ││
|
|||
|
|
│ │ ││
|
|||
|
|
│ └─────────────────────────────────────────────────────────────┘│
|
|||
|
|
│ │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Verification Workflow:**
|
|||
|
|
|
|||
|
|
1. **PoC Generation**: LLM generates exploitation code based on vulnerability analysis
|
|||
|
|
2. **Sandbox Setup**: Docker container with strict security constraints
|
|||
|
|
3. **Execution**: Run PoC in isolated environment
|
|||
|
|
4. **Validation**: Check execution results against expected vulnerability behavior
|
|||
|
|
5. **Confidence Scoring**: Assign verification confidence (0-1)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Data Flow Diagram
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ DeepAudit Data Flow │
|
|||
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|||
|
|
|
|||
|
|
┌──────────┐ ┌──────────────┐
|
|||
|
|
│ User │ │ Reports │
|
|||
|
|
│ Request │ │ (MD/JSON) │
|
|||
|
|
└────┬─────┘ └──────▲───────┘
|
|||
|
|
│ │
|
|||
|
|
▼ │
|
|||
|
|
┌───────────────┐ ┌─────────────────────────────────────────────┴───────┐
|
|||
|
|
│ API Gateway │───►│ PostgreSQL DB │
|
|||
|
|
└───────┬───────┘ │ • Tasks • Findings • Projects • Reports │
|
|||
|
|
│ └─────────────────────────────────────────────────────┘
|
|||
|
|
▼
|
|||
|
|
┌───────────────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ Orchestrator Agent │
|
|||
|
|
│ │
|
|||
|
|
│ ┌─────────────┐ ┌─────────────────────────────────────────────┐ │
|
|||
|
|
│ │ LLM Service │◄────►│ ReAct Decision Loop │ │
|
|||
|
|
│ │ (GPT/Claude)│ │ Thought → Action → Observation → Thought │ │
|
|||
|
|
│ └─────────────┘ └───────────────────┬─────────────────────────┘ │
|
|||
|
|
│ │ │
|
|||
|
|
│ ┌─────────────┬───────────────┼───────────────┐ │
|
|||
|
|
│ ▼ ▼ ▼ ▼ │
|
|||
|
|
│ ┌─────────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │
|
|||
|
|
│ │ Recon │ │ Analysis │ │Verification│ │ Finish │ │
|
|||
|
|
│ │ Agent │ │ Agent │ │ Agent │ │ Action │ │
|
|||
|
|
│ └──────┬──────┘ └─────┬─────┘ └──────┬─────┘ └──────────────┘ │
|
|||
|
|
│ │ │ │ │
|
|||
|
|
└─────────────┼──────────────┼──────────────┼───────────────────────────────┘
|
|||
|
|
│ │ │
|
|||
|
|
▼ ▼ ▼
|
|||
|
|
┌────────────┐ ┌────────────┐ ┌────────────┐
|
|||
|
|
│ File Tools │ │ SAST Tools │ │ Sandbox │
|
|||
|
|
│ list/read │ │ Semgrep... │ │ Docker │
|
|||
|
|
└─────┬──────┘ └─────┬──────┘ └──────┬─────┘
|
|||
|
|
│ │ │
|
|||
|
|
│ ┌──────┴──────┐ │
|
|||
|
|
│ ▼ │ │
|
|||
|
|
│ ┌─────────┐ │ │
|
|||
|
|
└─►│ RAG │◄───────┘ │
|
|||
|
|
│ Pipeline│ │
|
|||
|
|
└────┬────┘ │
|
|||
|
|
│ │
|
|||
|
|
▼ ▼
|
|||
|
|
┌────────────┐ ┌────────────┐
|
|||
|
|
│ Vector DB │ │ Verification│
|
|||
|
|
│ ChromaDB │ │ Result │
|
|||
|
|
└────────────┘ └────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Algorithm: Multi-Agent Audit Orchestration
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Algorithm 1: LLM-Driven Multi-Agent Security Audit
|
|||
|
|
|
|||
|
|
Input: Project P, Target vulnerabilities V, Configuration C
|
|||
|
|
Output: Findings F, Verification Results R
|
|||
|
|
|
|||
|
|
1: Initialize Orchestrator Agent with LLM
|
|||
|
|
2: Create sub-agents: Recon, Analysis, Verification
|
|||
|
|
3: findings ← ∅
|
|||
|
|
4: verified_results ← ∅
|
|||
|
|
5:
|
|||
|
|
6: // Phase 1: Reconnaissance
|
|||
|
|
7: recon_result ← ReconAgent.run(P, V)
|
|||
|
|
8: high_risk_areas ← recon_result.priority_areas
|
|||
|
|
9:
|
|||
|
|
10: // Phase 2: Orchestration Loop
|
|||
|
|
11: while iteration < MAX_ITERATIONS do
|
|||
|
|
12: thought, action ← LLM.reason(context, history)
|
|||
|
|
13:
|
|||
|
|
14: if action = "dispatch_agent" then
|
|||
|
|
15: agent ← select_agent(action.params)
|
|||
|
|
16: result ← agent.run(action.task, context)
|
|||
|
|
17: findings ← findings ∪ result.findings
|
|||
|
|
18: update_context(result)
|
|||
|
|
19: else if action = "finish" then
|
|||
|
|
20: break
|
|||
|
|
21: end if
|
|||
|
|
22:
|
|||
|
|
23: iteration ← iteration + 1
|
|||
|
|
24: end while
|
|||
|
|
25:
|
|||
|
|
26: // Phase 3: Verification
|
|||
|
|
27: for each f ∈ findings where f.severity ≥ HIGH do
|
|||
|
|
28: poc ← LLM.generate_poc(f)
|
|||
|
|
29: result ← Sandbox.execute(poc)
|
|||
|
|
30: verified_results ← verified_results ∪ {(f, result)}
|
|||
|
|
31: end for
|
|||
|
|
32:
|
|||
|
|
33: return (findings, verified_results)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Evaluation Metrics
|
|||
|
|
|
|||
|
|
For academic evaluation, we suggest the following metrics:
|
|||
|
|
|
|||
|
|
### Detection Effectiveness
|
|||
|
|
|
|||
|
|
| Metric | Formula | Description |
|
|||
|
|
|--------|---------|-------------|
|
|||
|
|
| Precision | TP / (TP + FP) | Accuracy of reported vulnerabilities |
|
|||
|
|
| Recall | TP / (TP + FN) | Coverage of actual vulnerabilities |
|
|||
|
|
| F1-Score | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |
|
|||
|
|
|
|||
|
|
### Efficiency Metrics
|
|||
|
|
|
|||
|
|
| Metric | Description |
|
|||
|
|
|--------|-------------|
|
|||
|
|
| Time-to-Detection (TTD) | Time from start to first vulnerability found |
|
|||
|
|
| Total Audit Time | End-to-end execution time |
|
|||
|
|
| LLM Token Usage | Total tokens consumed during audit |
|
|||
|
|
| Tool Invocation Count | Number of external tool calls |
|
|||
|
|
|
|||
|
|
### Verification Quality
|
|||
|
|
|
|||
|
|
| Metric | Description |
|
|||
|
|
|--------|-------------|
|
|||
|
|
| Verification Rate | Percentage of findings verified via sandbox |
|
|||
|
|
| False Positive Reduction | % reduction after verification |
|
|||
|
|
| PoC Success Rate | Successful exploit demonstrations |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Comparison with Related Work
|
|||
|
|
|
|||
|
|
| System | Multi-Agent | RAG | Sandbox | LLM-Driven |
|
|||
|
|
|--------|-------------|-----|---------|------------|
|
|||
|
|
| CodeQL | ✗ | ✗ | ✗ | ✗ |
|
|||
|
|
| Semgrep | ✗ | ✗ | ✗ | ✗ |
|
|||
|
|
| Snyk Code | ✗ | ✗ | ✗ | Partial |
|
|||
|
|
| GitHub Copilot | ✗ | ✗ | ✗ | ✓ |
|
|||
|
|
| **DeepAudit** | **✓** | **✓** | **✓** | **✓** |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## LaTeX TikZ Diagram Code
|
|||
|
|
|
|||
|
|
For LaTeX papers, you can use the following TikZ code:
|
|||
|
|
|
|||
|
|
```latex
|
|||
|
|
\begin{figure}[t]
|
|||
|
|
\centering
|
|||
|
|
\begin{tikzpicture}[
|
|||
|
|
node distance=1cm,
|
|||
|
|
box/.style={rectangle, draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, align=center},
|
|||
|
|
agent/.style={box, fill=blue!10},
|
|||
|
|
tool/.style={box, fill=orange!10},
|
|||
|
|
rag/.style={box, fill=green!10},
|
|||
|
|
sandbox/.style={box, fill=red!10},
|
|||
|
|
arrow/.style={->, >=stealth, thick}
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
% Orchestrator
|
|||
|
|
\node[agent] (orch) {Orchestrator Agent};
|
|||
|
|
|
|||
|
|
% Sub-agents
|
|||
|
|
\node[agent, below left=1.5cm and 1cm of orch] (recon) {Recon Agent};
|
|||
|
|
\node[agent, below=1.5cm of orch] (analysis) {Analysis Agent};
|
|||
|
|
\node[agent, below right=1.5cm and 1cm of orch] (verify) {Verification Agent};
|
|||
|
|
|
|||
|
|
% Connections
|
|||
|
|
\draw[arrow] (orch) -- (recon);
|
|||
|
|
\draw[arrow] (orch) -- (analysis);
|
|||
|
|
\draw[arrow] (orch) -- (verify);
|
|||
|
|
|
|||
|
|
% Tools
|
|||
|
|
\node[tool, below=1cm of analysis] (tools) {SAST Tools\\Semgrep, Bandit, Kunlun-M};
|
|||
|
|
|
|||
|
|
% RAG
|
|||
|
|
\node[rag, left=1cm of tools] (rag) {RAG Pipeline\\Vector DB + CWE/CVE};
|
|||
|
|
|
|||
|
|
% Sandbox
|
|||
|
|
\node[sandbox, right=1cm of tools] (sandbox) {Docker Sandbox\\PoC Verification};
|
|||
|
|
|
|||
|
|
% Tool connections
|
|||
|
|
\draw[arrow] (analysis) -- (tools);
|
|||
|
|
\draw[arrow, dashed] (tools) -- (rag);
|
|||
|
|
\draw[arrow] (verify) -- (sandbox);
|
|||
|
|
|
|||
|
|
% LLM
|
|||
|
|
\node[box, fill=purple!10, above=0.5cm of orch] (llm) {LLM Provider\\GPT-4 / Claude};
|
|||
|
|
\draw[arrow, <->] (orch) -- (llm);
|
|||
|
|
|
|||
|
|
\end{tikzpicture}
|
|||
|
|
\caption{DeepAudit System Architecture}
|
|||
|
|
\label{fig:architecture}
|
|||
|
|
\end{figure}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
If you use DeepAudit in your research, please cite:
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@software{deepaudit2024,
|
|||
|
|
title = {DeepAudit: LLM-Driven Multi-Agent Code Security Audit System with RAG Enhancement and Sandbox Verification},
|
|||
|
|
author = {Lin Tsinghua},
|
|||
|
|
year = {2024},
|
|||
|
|
url = {https://github.com/lintsinghua/DeepAudit},
|
|||
|
|
version = {3.0.0}
|
|||
|
|
}
|
|||
|
|
```
|