CodeReview/docs/PAPER_ARCHITECTURE.md

395 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# DeepAudit: System Architecture for Academic Paper
This document provides the system architecture description suitable for top-tier academic conferences (ICSE, FSE, CCS, S&P, USENIX Security, etc.).
## Architecture Diagram
![DeepAudit Architecture](images/deepaudit_architecture.png)
---
## System Overview
**DeepAudit** is an LLM-driven intelligent code security audit system that employs a **hierarchical multi-agent architecture** with **Retrieval-Augmented Generation (RAG)** and **sandbox-based vulnerability verification**.
### Key Contributions
1. **LLM-Driven Multi-Agent Orchestration**: A dynamic agent hierarchy where the LLM serves as the central decision-making brain, autonomously orchestrating specialized agents for reconnaissance, analysis, and verification.
2. **RAG-Enhanced Vulnerability Detection**: Integration of semantic code understanding with vulnerability knowledge bases (CWE/CVE) to reduce false positives and improve detection accuracy.
3. **Sandbox-Based Exploit Verification**: Docker-isolated execution environment for automated PoC generation and vulnerability confirmation.
---
## Architecture Components
### Layer 1: User Interface Layer
```
┌─────────────────────────────────────────────────────────────────┐
│ User Interface Layer │
├─────────────────────────────────────────────────────────────────┤
│ ┌───────────────────┐ ┌───────────────────────────────────┐ │
│ │ Web Frontend │ │ API Gateway │ │
│ │ (React + TS) │◄──►│ REST API / SSE Event Stream │ │
│ └───────────────────┘ └───────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
**Components:**
- **Web Frontend**: React 18 + TypeScript SPA with real-time log streaming
- **API Gateway**: FastAPI-based REST endpoints with SSE for real-time events
### Layer 2: Multi-Agent Orchestration Layer
```
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Agent Orchestration Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ │
│ │ Orchestrator Agent │ ◄─── LLM Provider │
│ │ (ReAct Loop) │ (GPT-4/Claude) │
│ └──────────┬──────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Recon Agent │ │Analysis Agent│ │Verification │ │
│ │ │ │ │ │ Agent │ │
│ │ • Structure │ │ • SAST │ │ • PoC Gen │ │
│ │ • Tech Stack │ │ • Pattern │ │ • Sandbox │ │
│ │ • Entry Pts │ │ • Dataflow │ │ • Validation │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Key Design Decisions:**
| Component | Design Choice | Rationale |
|-----------|---------------|-----------|
| Orchestrator | LLM-driven ReAct loop | Dynamic strategy adaptation based on findings |
| Sub-Agents | Specialized roles | Domain expertise separation for precision |
| Communication | TaskHandoff protocol | Structured context passing between agents |
| Iteration Limits | Configurable (20/30/15) | Prevent infinite loops while ensuring depth |
### Layer 3: RAG Knowledge Enhancement Layer
```
┌─────────────────────────────────────────────────────────────────┐
│ RAG Knowledge Enhancement Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Code Chunker│ │ Embedding │ │ Vector Database │ │
│ │(Tree-sitter)│───►│ Model │───►│ (ChromaDB) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────┼───────────┐│
│ │ CWE/CVE Knowledge Base │ ││
│ │ • SQL Injection patterns ▼ ││
│ │ • XSS signatures ┌───────────────────┐ ││
│ │ • Command Injection │ Semantic Retriever│ ││
│ │ • Path Traversal └───────────────────┘ ││
│ │ • SSRF patterns ││
│ │ • ... ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘
```
**RAG Pipeline:**
1. **Code Chunking**: Tree-sitter based AST-aware chunking for semantic preservation
2. **Embedding**: Support for OpenAI text-embedding-3-small/large, local models
3. **Vector Store**: ChromaDB for lightweight deployment
4. **Retrieval**: Semantic similarity search with vulnerability pattern matching
### Layer 4: Security Tool Integration Layer
```
┌─────────────────────────────────────────────────────────────────┐
│ Security Tool Integration Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ SAST Tools ││
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ││
│ │ │ Semgrep │ │ Bandit │ │Kunlun-M │ │Pattern Match │ ││
│ │ │ (Multi) │ │ (Python) │ │ (PHP/JS) │ │ (Fallback) │ ││
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
│ ┌────────────────────────┐ ┌────────────────────────────────┐ │
│ │ Secret Detection │ │ Dependency Analysis │ │
│ │ • Gitleaks │ │ • OSV-Scanner │ │
│ │ • TruffleHog │ │ • npm audit / pip-audit │ │
│ └────────────────────────┘ └────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Tool Selection Strategy:**
| Category | Primary Tool | Fallback | Coverage |
|----------|-------------|----------|----------|
| Multi-lang SAST | Semgrep | PatternMatch | 20+ languages |
| Python Security | Bandit | PatternMatch | Python-specific |
| PHP/JS Analysis | Kunlun-M | Semgrep | Semantic analysis |
| Secret Detection | Gitleaks | TruffleHog | Git history scan |
| Dependencies | OSV-Scanner | npm/pip audit | Multi-ecosystem |
### Layer 5: Sandbox Verification Layer
```
┌─────────────────────────────────────────────────────────────────┐
│ Sandbox Verification Layer │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Docker Sandbox Container ││
│ │ ┌────────────────────────────────────────────────────────┐ ││
│ │ │ Security Constraints │ ││
│ │ │ • Network: Isolated / No external access │ ││
│ │ │ • Resources: Memory 512MB / CPU 1.0 │ ││
│ │ │ • Syscalls: seccomp whitelist policy │ ││
│ │ │ • Timeout: 60 seconds max execution │ ││
│ │ └────────────────────────────────────────────────────────┘ ││
│ │ ││
│ │ ┌──────────────────┐ ┌──────────────────────────────┐ ││
│ │ │ PoC Generator │───►│ Exploit Validator │ ││
│ │ │ (LLM-assisted) │ │ (Execution + Verification) │ ││
│ │ └──────────────────┘ └──────────────────────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Verification Workflow:**
1. **PoC Generation**: LLM generates exploitation code based on vulnerability analysis
2. **Sandbox Setup**: Docker container with strict security constraints
3. **Execution**: Run PoC in isolated environment
4. **Validation**: Check execution results against expected vulnerability behavior
5. **Confidence Scoring**: Assign verification confidence (0-1)
---
## Data Flow Diagram
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ DeepAudit Data Flow │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────┐ ┌──────────────┐
│ User │ │ Reports │
│ Request │ │ (MD/JSON) │
└────┬─────┘ └──────▲───────┘
│ │
▼ │
┌───────────────┐ ┌─────────────────────────────────────────────┴───────┐
│ API Gateway │───►│ PostgreSQL DB │
└───────┬───────┘ │ • Tasks • Findings • Projects • Reports │
│ └─────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────┐
│ Orchestrator Agent │
│ │
│ ┌─────────────┐ ┌─────────────────────────────────────────────┐ │
│ │ LLM Service │◄────►│ ReAct Decision Loop │ │
│ │ (GPT/Claude)│ │ Thought → Action → Observation → Thought │ │
│ └─────────────┘ └───────────────────┬─────────────────────────┘ │
│ │ │
│ ┌─────────────┬───────────────┼───────────────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │
│ │ Recon │ │ Analysis │ │Verification│ │ Finish │ │
│ │ Agent │ │ Agent │ │ Agent │ │ Action │ │
│ └──────┬──────┘ └─────┬─────┘ └──────┬─────┘ └──────────────┘ │
│ │ │ │ │
└─────────────┼──────────────┼──────────────┼───────────────────────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ File Tools │ │ SAST Tools │ │ Sandbox │
│ list/read │ │ Semgrep... │ │ Docker │
└─────┬──────┘ └─────┬──────┘ └──────┬─────┘
│ │ │
│ ┌──────┴──────┐ │
│ ▼ │ │
│ ┌─────────┐ │ │
└─►│ RAG │◄───────┘ │
│ Pipeline│ │
└────┬────┘ │
│ │
▼ ▼
┌────────────┐ ┌────────────┐
│ Vector DB │ │ Verification│
│ ChromaDB │ │ Result │
└────────────┘ └────────────┘
```
---
## Algorithm: Multi-Agent Audit Orchestration
```
Algorithm 1: LLM-Driven Multi-Agent Security Audit
Input: Project P, Target vulnerabilities V, Configuration C
Output: Findings F, Verification Results R
1: Initialize Orchestrator Agent with LLM
2: Create sub-agents: Recon, Analysis, Verification
3: findings ← ∅
4: verified_results ← ∅
5:
6: // Phase 1: Reconnaissance
7: recon_result ← ReconAgent.run(P, V)
8: high_risk_areas ← recon_result.priority_areas
9:
10: // Phase 2: Orchestration Loop
11: while iteration < MAX_ITERATIONS do
12: thought, action ← LLM.reason(context, history)
13:
14: if action = "dispatch_agent" then
15: agent ← select_agent(action.params)
16: result ← agent.run(action.task, context)
17: findings ← findings result.findings
18: update_context(result)
19: else if action = "finish" then
20: break
21: end if
22:
23: iteration ← iteration + 1
24: end while
25:
26: // Phase 3: Verification
27: for each f ∈ findings where f.severity ≥ HIGH do
28: poc ← LLM.generate_poc(f)
29: result ← Sandbox.execute(poc)
30: verified_results ← verified_results {(f, result)}
31: end for
32:
33: return (findings, verified_results)
```
---
## Evaluation Metrics
For academic evaluation, we suggest the following metrics:
### Detection Effectiveness
| Metric | Formula | Description |
|--------|---------|-------------|
| Precision | TP / (TP + FP) | Accuracy of reported vulnerabilities |
| Recall | TP / (TP + FN) | Coverage of actual vulnerabilities |
| F1-Score | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |
### Efficiency Metrics
| Metric | Description |
|--------|-------------|
| Time-to-Detection (TTD) | Time from start to first vulnerability found |
| Total Audit Time | End-to-end execution time |
| LLM Token Usage | Total tokens consumed during audit |
| Tool Invocation Count | Number of external tool calls |
### Verification Quality
| Metric | Description |
|--------|-------------|
| Verification Rate | Percentage of findings verified via sandbox |
| False Positive Reduction | % reduction after verification |
| PoC Success Rate | Successful exploit demonstrations |
---
## Comparison with Related Work
| System | Multi-Agent | RAG | Sandbox | LLM-Driven |
|--------|-------------|-----|---------|------------|
| CodeQL | ✗ | ✗ | ✗ | ✗ |
| Semgrep | ✗ | ✗ | ✗ | ✗ |
| Snyk Code | ✗ | ✗ | ✗ | Partial |
| GitHub Copilot | ✗ | ✗ | ✗ | ✓ |
| **DeepAudit** | **✓** | **✓** | **✓** | **✓** |
---
## LaTeX TikZ Diagram Code
For LaTeX papers, you can use the following TikZ code:
```latex
\begin{figure}[t]
\centering
\begin{tikzpicture}[
node distance=1cm,
box/.style={rectangle, draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, align=center},
agent/.style={box, fill=blue!10},
tool/.style={box, fill=orange!10},
rag/.style={box, fill=green!10},
sandbox/.style={box, fill=red!10},
arrow/.style={->, >=stealth, thick}
]
% Orchestrator
\node[agent] (orch) {Orchestrator Agent};
% Sub-agents
\node[agent, below left=1.5cm and 1cm of orch] (recon) {Recon Agent};
\node[agent, below=1.5cm of orch] (analysis) {Analysis Agent};
\node[agent, below right=1.5cm and 1cm of orch] (verify) {Verification Agent};
% Connections
\draw[arrow] (orch) -- (recon);
\draw[arrow] (orch) -- (analysis);
\draw[arrow] (orch) -- (verify);
% Tools
\node[tool, below=1cm of analysis] (tools) {SAST Tools\\Semgrep, Bandit, Kunlun-M};
% RAG
\node[rag, left=1cm of tools] (rag) {RAG Pipeline\\Vector DB + CWE/CVE};
% Sandbox
\node[sandbox, right=1cm of tools] (sandbox) {Docker Sandbox\\PoC Verification};
% Tool connections
\draw[arrow] (analysis) -- (tools);
\draw[arrow, dashed] (tools) -- (rag);
\draw[arrow] (verify) -- (sandbox);
% LLM
\node[box, fill=purple!10, above=0.5cm of orch] (llm) {LLM Provider\\GPT-4 / Claude};
\draw[arrow, <->] (orch) -- (llm);
\end{tikzpicture}
\caption{DeepAudit System Architecture}
\label{fig:architecture}
\end{figure}
```
---
## Citation
If you use DeepAudit in your research, please cite:
```bibtex
@software{deepaudit2024,
title = {DeepAudit: LLM-Driven Multi-Agent Code Security Audit System with RAG Enhancement and Sandbox Verification},
author = {Lin Tsinghua},
year = {2024},
url = {https://github.com/lintsinghua/DeepAudit},
version = {3.0.0}
}
```