24 KiB

Raw Blame History

DeepAudit: System Architecture for Academic Paper

This document provides the system architecture description suitable for top-tier academic conferences (ICSE, FSE, CCS, S&P, USENIX Security, etc.).

Architecture Diagram

System Overview

DeepAudit is an LLM-driven intelligent code security audit system that employs a hierarchical multi-agent architecture with Retrieval-Augmented Generation (RAG) and sandbox-based vulnerability verification.

Key Contributions

LLM-Driven Multi-Agent Orchestration: A dynamic agent hierarchy where the LLM serves as the central decision-making brain, autonomously orchestrating specialized agents for reconnaissance, analysis, and verification.
RAG-Enhanced Vulnerability Detection: Integration of semantic code understanding with vulnerability knowledge bases (CWE/CVE) to reduce false positives and improve detection accuracy.
Sandbox-Based Exploit Verification: Docker-isolated execution environment for automated PoC generation and vulnerability confirmation.

Architecture Components

Layer 1: User Interface Layer

┌─────────────────────────────────────────────────────────────────┐
│                      User Interface Layer                        │
├─────────────────────────────────────────────────────────────────┤
│  ┌───────────────────┐    ┌───────────────────────────────────┐ │
│  │   Web Frontend    │    │        API Gateway                │ │
│  │  (React + TS)     │◄──►│  REST API / SSE Event Stream      │ │
│  └───────────────────┘    └───────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Components:

Web Frontend: React 18 + TypeScript SPA with real-time log streaming
API Gateway: FastAPI-based REST endpoints with SSE for real-time events

Layer 2: Multi-Agent Orchestration Layer

┌─────────────────────────────────────────────────────────────────┐
│               Multi-Agent Orchestration Layer                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│                    ┌─────────────────────┐                       │
│                    │  Orchestrator Agent │ ◄─── LLM Provider    │
│                    │  (ReAct Loop)       │      (GPT-4/Claude)  │
│                    └──────────┬──────────┘                       │
│                               │                                  │
│              ┌────────────────┼────────────────┐                 │
│              ▼                ▼                ▼                 │
│     ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│     │ Recon Agent  │  │Analysis Agent│  │Verification  │        │
│     │              │  │              │  │    Agent     │        │
│     │ • Structure  │  │ • SAST       │  │ • PoC Gen    │        │
│     │ • Tech Stack │  │ • Pattern    │  │ • Sandbox    │        │
│     │ • Entry Pts  │  │ • Dataflow   │  │ • Validation │        │
│     └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions:

Component	Design Choice	Rationale
Orchestrator	LLM-driven ReAct loop	Dynamic strategy adaptation based on findings
Sub-Agents	Specialized roles	Domain expertise separation for precision
Communication	TaskHandoff protocol	Structured context passing between agents
Iteration Limits	Configurable (20/30/15)	Prevent infinite loops while ensuring depth

Layer 3: RAG Knowledge Enhancement Layer

┌─────────────────────────────────────────────────────────────────┐
│              RAG Knowledge Enhancement Layer                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐  │
│  │ Code Chunker│    │  Embedding  │    │   Vector Database   │  │
│  │(Tree-sitter)│───►│   Model     │───►│     (ChromaDB)      │  │
│  └─────────────┘    └─────────────┘    └─────────────────────┘  │
│                                                    │             │
│  ┌─────────────────────────────────────────────────┼───────────┐│
│  │              CWE/CVE Knowledge Base             │           ││
│  │  • SQL Injection patterns                       ▼           ││
│  │  • XSS signatures                     ┌───────────────────┐ ││
│  │  • Command Injection                  │ Semantic Retriever│ ││
│  │  • Path Traversal                     └───────────────────┘ ││
│  │  • SSRF patterns                                            ││
│  │  • ...                                                      ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

RAG Pipeline:

Code Chunking: Tree-sitter based AST-aware chunking for semantic preservation
Embedding: Support for OpenAI text-embedding-3-small/large, local models
Vector Store: ChromaDB for lightweight deployment
Retrieval: Semantic similarity search with vulnerability pattern matching

Layer 4: Security Tool Integration Layer

┌─────────────────────────────────────────────────────────────────┐
│              Security Tool Integration Layer                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                    SAST Tools                                ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────┐ ││
│  │  │ Semgrep  │  │  Bandit  │  │Kunlun-M  │  │Pattern Match │ ││
│  │  │ (Multi)  │  │ (Python) │  │ (PHP/JS) │  │  (Fallback)  │ ││
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────────┘ ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  ┌────────────────────────┐  ┌────────────────────────────────┐ │
│  │   Secret Detection     │  │    Dependency Analysis         │ │
│  │  • Gitleaks            │  │  • OSV-Scanner                 │ │
│  │  • TruffleHog          │  │  • npm audit / pip-audit       │ │
│  └────────────────────────┘  └────────────────────────────────┘ │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Tool Selection Strategy:

Category	Primary Tool	Fallback	Coverage
Multi-lang SAST	Semgrep	PatternMatch	20+ languages
Python Security	Bandit	PatternMatch	Python-specific
PHP/JS Analysis	Kunlun-M	Semgrep	Semantic analysis
Secret Detection	Gitleaks	TruffleHog	Git history scan
Dependencies	OSV-Scanner	npm/pip audit	Multi-ecosystem

Layer 5: Sandbox Verification Layer

┌─────────────────────────────────────────────────────────────────┐
│                Sandbox Verification Layer                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                 Docker Sandbox Container                     ││
│  │  ┌────────────────────────────────────────────────────────┐ ││
│  │  │              Security Constraints                       │ ││
│  │  │  • Network: Isolated / No external access              │ ││
│  │  │  • Resources: Memory 512MB / CPU 1.0                   │ ││
│  │  │  • Syscalls: seccomp whitelist policy                  │ ││
│  │  │  • Timeout: 60 seconds max execution                   │ ││
│  │  └────────────────────────────────────────────────────────┘ ││
│  │                                                              ││
│  │  ┌──────────────────┐    ┌──────────────────────────────┐   ││
│  │  │   PoC Generator  │───►│     Exploit Validator        │   ││
│  │  │  (LLM-assisted)  │    │  (Execution + Verification)  │   ││
│  │  └──────────────────┘    └──────────────────────────────┘   ││
│  │                                                              ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Verification Workflow:

PoC Generation: LLM generates exploitation code based on vulnerability analysis
Sandbox Setup: Docker container with strict security constraints
Execution: Run PoC in isolated environment
Validation: Check execution results against expected vulnerability behavior
Confidence Scoring: Assign verification confidence (0-1)

Data Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                           DeepAudit Data Flow                                │
└─────────────────────────────────────────────────────────────────────────────┘

   ┌──────────┐                                              ┌──────────────┐
   │   User   │                                              │   Reports    │
   │ Request  │                                              │  (MD/JSON)   │
   └────┬─────┘                                              └──────▲───────┘
        │                                                           │
        ▼                                                           │
┌───────────────┐    ┌─────────────────────────────────────────────┴───────┐
│  API Gateway  │───►│                   PostgreSQL DB                      │
└───────┬───────┘    │  • Tasks  • Findings  • Projects  • Reports         │
        │            └─────────────────────────────────────────────────────┘
        ▼
┌───────────────────────────────────────────────────────────────────────────┐
│                         Orchestrator Agent                                 │
│                                                                            │
│   ┌─────────────┐      ┌─────────────────────────────────────────────┐    │
│   │ LLM Service │◄────►│              ReAct Decision Loop             │    │
│   │ (GPT/Claude)│      │  Thought → Action → Observation → Thought   │    │
│   └─────────────┘      └───────────────────┬─────────────────────────┘    │
│                                            │                               │
│              ┌─────────────┬───────────────┼───────────────┐              │
│              ▼             ▼               ▼               ▼              │
│      ┌─────────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────┐       │
│      │    Recon    │ │  Analysis │ │Verification│ │   Finish     │       │
│      │    Agent    │ │   Agent   │ │   Agent    │ │   Action     │       │
│      └──────┬──────┘ └─────┬─────┘ └──────┬─────┘ └──────────────┘       │
│             │              │              │                               │
└─────────────┼──────────────┼──────────────┼───────────────────────────────┘
              │              │              │
              ▼              ▼              ▼
       ┌────────────┐ ┌────────────┐ ┌────────────┐
       │ File Tools │ │ SAST Tools │ │  Sandbox   │
       │ list/read  │ │ Semgrep... │ │   Docker   │
       └─────┬──────┘ └─────┬──────┘ └──────┬─────┘
             │              │               │
             │       ┌──────┴──────┐        │
             │       ▼             │        │
             │  ┌─────────┐        │        │
             └─►│   RAG   │◄───────┘        │
                │ Pipeline│                 │
                └────┬────┘                 │
                     │                      │
                     ▼                      ▼
              ┌────────────┐        ┌────────────┐
              │  Vector DB │        │ Verification│
              │  ChromaDB  │        │   Result    │
              └────────────┘        └────────────┘

Algorithm: Multi-Agent Audit Orchestration

Algorithm 1: LLM-Driven Multi-Agent Security Audit

Input: Project P, Target vulnerabilities V, Configuration C
Output: Findings F, Verification Results R

1:  Initialize Orchestrator Agent with LLM
2:  Create sub-agents: Recon, Analysis, Verification
3:  findings ← ∅
4:  verified_results ← ∅
5:  
6:  // Phase 1: Reconnaissance
7:  recon_result ← ReconAgent.run(P, V)
8:  high_risk_areas ← recon_result.priority_areas
9:  
10: // Phase 2: Orchestration Loop
11: while iteration < MAX_ITERATIONS do
12:     thought, action ← LLM.reason(context, history)
13:     
14:     if action = "dispatch_agent" then
15:         agent ← select_agent(action.params)
16:         result ← agent.run(action.task, context)
17:         findings ← findings ∪ result.findings
18:         update_context(result)
19:     else if action = "finish" then
20:         break
21:     end if
22:     
23:     iteration ← iteration + 1
24: end while
25: 
26: // Phase 3: Verification
27: for each f ∈ findings where f.severity ≥ HIGH do
28:     poc ← LLM.generate_poc(f)
29:     result ← Sandbox.execute(poc)
30:     verified_results ← verified_results ∪ {(f, result)}
31: end for
32: 
33: return (findings, verified_results)

Evaluation Metrics

For academic evaluation, we suggest the following metrics:

Detection Effectiveness

Metric	Formula	Description
Precision	TP / (TP + FP)	Accuracy of reported vulnerabilities
Recall	TP / (TP + FN)	Coverage of actual vulnerabilities
F1-Score	2 × (P × R) / (P + R)	Harmonic mean of precision and recall

Efficiency Metrics

Metric	Description
Time-to-Detection (TTD)	Time from start to first vulnerability found
Total Audit Time	End-to-end execution time
LLM Token Usage	Total tokens consumed during audit
Tool Invocation Count	Number of external tool calls

Verification Quality

Metric	Description
Verification Rate	Percentage of findings verified via sandbox
False Positive Reduction	% reduction after verification
PoC Success Rate	Successful exploit demonstrations

System	Multi-Agent	RAG	Sandbox	LLM-Driven
CodeQL	✗	✗	✗	✗
Semgrep	✗	✗	✗	✗
Snyk Code	✗	✗	✗	Partial
GitHub Copilot	✗	✗	✗	✓
DeepAudit	✓	✓	✓	✓

LaTeX TikZ Diagram Code

For LaTeX papers, you can use the following TikZ code:

\begin{figure}[t]
\centering
\begin{tikzpicture}[
    node distance=1cm,
    box/.style={rectangle, draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, align=center},
    agent/.style={box, fill=blue!10},
    tool/.style={box, fill=orange!10},
    rag/.style={box, fill=green!10},
    sandbox/.style={box, fill=red!10},
    arrow/.style={->, >=stealth, thick}
]

% Orchestrator
\node[agent] (orch) {Orchestrator Agent};

% Sub-agents
\node[agent, below left=1.5cm and 1cm of orch] (recon) {Recon Agent};
\node[agent, below=1.5cm of orch] (analysis) {Analysis Agent};
\node[agent, below right=1.5cm and 1cm of orch] (verify) {Verification Agent};

% Connections
\draw[arrow] (orch) -- (recon);
\draw[arrow] (orch) -- (analysis);
\draw[arrow] (orch) -- (verify);

% Tools
\node[tool, below=1cm of analysis] (tools) {SAST Tools\\Semgrep, Bandit, Kunlun-M};

% RAG
\node[rag, left=1cm of tools] (rag) {RAG Pipeline\\Vector DB + CWE/CVE};

% Sandbox
\node[sandbox, right=1cm of tools] (sandbox) {Docker Sandbox\\PoC Verification};

% Tool connections
\draw[arrow] (analysis) -- (tools);
\draw[arrow, dashed] (tools) -- (rag);
\draw[arrow] (verify) -- (sandbox);

% LLM
\node[box, fill=purple!10, above=0.5cm of orch] (llm) {LLM Provider\\GPT-4 / Claude};
\draw[arrow, <->] (orch) -- (llm);

\end{tikzpicture}
\caption{DeepAudit System Architecture}
\label{fig:architecture}
\end{figure}

Citation

If you use DeepAudit in your research, please cite:

@software{deepaudit2024,
  title = {DeepAudit: LLM-Driven Multi-Agent Code Security Audit System with RAG Enhancement and Sandbox Verification},
  author = {Lin Tsinghua},
  year = {2024},
  url = {https://github.com/lintsinghua/DeepAudit},
  version = {3.0.0}
}

24 KiB Raw Blame History Unescape Escape