CodeReview/docs/PAPER_ARCHITECTURE.md

24 KiB
Raw Permalink Blame History

DeepAudit: System Architecture for Academic Paper

This document provides the system architecture description suitable for top-tier academic conferences (ICSE, FSE, CCS, S&P, USENIX Security, etc.).

Architecture Diagram

DeepAudit Architecture


System Overview

DeepAudit is an LLM-driven intelligent code security audit system that employs a hierarchical multi-agent architecture with Retrieval-Augmented Generation (RAG) and sandbox-based vulnerability verification.

Key Contributions

  1. LLM-Driven Multi-Agent Orchestration: A dynamic agent hierarchy where the LLM serves as the central decision-making brain, autonomously orchestrating specialized agents for reconnaissance, analysis, and verification.

  2. RAG-Enhanced Vulnerability Detection: Integration of semantic code understanding with vulnerability knowledge bases (CWE/CVE) to reduce false positives and improve detection accuracy.

  3. Sandbox-Based Exploit Verification: Docker-isolated execution environment for automated PoC generation and vulnerability confirmation.


Architecture Components

Layer 1: User Interface Layer

┌─────────────────────────────────────────────────────────────────┐
│                      User Interface Layer                        │
├─────────────────────────────────────────────────────────────────┤
│  ┌───────────────────┐    ┌───────────────────────────────────┐ │
│  │   Web Frontend    │    │        API Gateway                │ │
│  │  (React + TS)     │◄──►│  REST API / SSE Event Stream      │ │
│  └───────────────────┘    └───────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Components:

  • Web Frontend: React 18 + TypeScript SPA with real-time log streaming
  • API Gateway: FastAPI-based REST endpoints with SSE for real-time events

Layer 2: Multi-Agent Orchestration Layer

┌─────────────────────────────────────────────────────────────────┐
│               Multi-Agent Orchestration Layer                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│                    ┌─────────────────────┐                       │
│                    │  Orchestrator Agent │ ◄─── LLM Provider    │
│                    │  (ReAct Loop)       │      (GPT-4/Claude)  │
│                    └──────────┬──────────┘                       │
│                               │                                  │
│              ┌────────────────┼────────────────┐                 │
│              ▼                ▼                ▼                 │
│     ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│     │ Recon Agent  │  │Analysis Agent│  │Verification  │        │
│     │              │  │              │  │    Agent     │        │
│     │ • Structure  │  │ • SAST       │  │ • PoC Gen    │        │
│     │ • Tech Stack │  │ • Pattern    │  │ • Sandbox    │        │
│     │ • Entry Pts  │  │ • Dataflow   │  │ • Validation │        │
│     └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions:

Component Design Choice Rationale
Orchestrator LLM-driven ReAct loop Dynamic strategy adaptation based on findings
Sub-Agents Specialized roles Domain expertise separation for precision
Communication TaskHandoff protocol Structured context passing between agents
Iteration Limits Configurable (20/30/15) Prevent infinite loops while ensuring depth

Layer 3: RAG Knowledge Enhancement Layer

┌─────────────────────────────────────────────────────────────────┐
│              RAG Knowledge Enhancement Layer                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐  │
│  │ Code Chunker│    │  Embedding  │    │   Vector Database   │  │
│  │(Tree-sitter)│───►│   Model     │───►│     (ChromaDB)      │  │
│  └─────────────┘    └─────────────┘    └─────────────────────┘  │
│                                                    │             │
│  ┌─────────────────────────────────────────────────┼───────────┐│
│  │              CWE/CVE Knowledge Base             │           ││
│  │  • SQL Injection patterns                       ▼           ││
│  │  • XSS signatures                     ┌───────────────────┐ ││
│  │  • Command Injection                  │ Semantic Retriever│ ││
│  │  • Path Traversal                     └───────────────────┘ ││
│  │  • SSRF patterns                                            ││
│  │  • ...                                                      ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

RAG Pipeline:

  1. Code Chunking: Tree-sitter based AST-aware chunking for semantic preservation
  2. Embedding: Support for OpenAI text-embedding-3-small/large, local models
  3. Vector Store: ChromaDB for lightweight deployment
  4. Retrieval: Semantic similarity search with vulnerability pattern matching

Layer 4: Security Tool Integration Layer

┌─────────────────────────────────────────────────────────────────┐
│              Security Tool Integration Layer                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                    SAST Tools                                ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────┐ ││
│  │  │ Semgrep  │  │  Bandit  │  │Kunlun-M  │  │Pattern Match │ ││
│  │  │ (Multi)  │  │ (Python) │  │ (PHP/JS) │  │  (Fallback)  │ ││
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────────┘ ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
│  ┌────────────────────────┐  ┌────────────────────────────────┐ │
│  │   Secret Detection     │  │    Dependency Analysis         │ │
│  │  • Gitleaks            │  │  • OSV-Scanner                 │ │
│  │  • TruffleHog          │  │  • npm audit / pip-audit       │ │
│  └────────────────────────┘  └────────────────────────────────┘ │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Tool Selection Strategy:

Category Primary Tool Fallback Coverage
Multi-lang SAST Semgrep PatternMatch 20+ languages
Python Security Bandit PatternMatch Python-specific
PHP/JS Analysis Kunlun-M Semgrep Semantic analysis
Secret Detection Gitleaks TruffleHog Git history scan
Dependencies OSV-Scanner npm/pip audit Multi-ecosystem

Layer 5: Sandbox Verification Layer

┌─────────────────────────────────────────────────────────────────┐
│                Sandbox Verification Layer                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                 Docker Sandbox Container                     ││
│  │  ┌────────────────────────────────────────────────────────┐ ││
│  │  │              Security Constraints                       │ ││
│  │  │  • Network: Isolated / No external access              │ ││
│  │  │  • Resources: Memory 512MB / CPU 1.0                   │ ││
│  │  │  • Syscalls: seccomp whitelist policy                  │ ││
│  │  │  • Timeout: 60 seconds max execution                   │ ││
│  │  └────────────────────────────────────────────────────────┘ ││
│  │                                                              ││
│  │  ┌──────────────────┐    ┌──────────────────────────────┐   ││
│  │  │   PoC Generator  │───►│     Exploit Validator        │   ││
│  │  │  (LLM-assisted)  │    │  (Execution + Verification)  │   ││
│  │  └──────────────────┘    └──────────────────────────────┘   ││
│  │                                                              ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Verification Workflow:

  1. PoC Generation: LLM generates exploitation code based on vulnerability analysis
  2. Sandbox Setup: Docker container with strict security constraints
  3. Execution: Run PoC in isolated environment
  4. Validation: Check execution results against expected vulnerability behavior
  5. Confidence Scoring: Assign verification confidence (0-1)

Data Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                           DeepAudit Data Flow                                │
└─────────────────────────────────────────────────────────────────────────────┘

   ┌──────────┐                                              ┌──────────────┐
   │   User   │                                              │   Reports    │
   │ Request  │                                              │  (MD/JSON)   │
   └────┬─────┘                                              └──────▲───────┘
        │                                                           │
        ▼                                                           │
┌───────────────┐    ┌─────────────────────────────────────────────┴───────┐
│  API Gateway  │───►│                   PostgreSQL DB                      │
└───────┬───────┘    │  • Tasks  • Findings  • Projects  • Reports         │
        │            └─────────────────────────────────────────────────────┘
        ▼
┌───────────────────────────────────────────────────────────────────────────┐
│                         Orchestrator Agent                                 │
│                                                                            │
│   ┌─────────────┐      ┌─────────────────────────────────────────────┐    │
│   │ LLM Service │◄────►│              ReAct Decision Loop             │    │
│   │ (GPT/Claude)│      │  Thought → Action → Observation → Thought   │    │
│   └─────────────┘      └───────────────────┬─────────────────────────┘    │
│                                            │                               │
│              ┌─────────────┬───────────────┼───────────────┐              │
│              ▼             ▼               ▼               ▼              │
│      ┌─────────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────┐       │
│      │    Recon    │ │  Analysis │ │Verification│ │   Finish     │       │
│      │    Agent    │ │   Agent   │ │   Agent    │ │   Action     │       │
│      └──────┬──────┘ └─────┬─────┘ └──────┬─────┘ └──────────────┘       │
│             │              │              │                               │
└─────────────┼──────────────┼──────────────┼───────────────────────────────┘
              │              │              │
              ▼              ▼              ▼
       ┌────────────┐ ┌────────────┐ ┌────────────┐
       │ File Tools │ │ SAST Tools │ │  Sandbox   │
       │ list/read  │ │ Semgrep... │ │   Docker   │
       └─────┬──────┘ └─────┬──────┘ └──────┬─────┘
             │              │               │
             │       ┌──────┴──────┐        │
             │       ▼             │        │
             │  ┌─────────┐        │        │
             └─►│   RAG   │◄───────┘        │
                │ Pipeline│                 │
                └────┬────┘                 │
                     │                      │
                     ▼                      ▼
              ┌────────────┐        ┌────────────┐
              │  Vector DB │        │ Verification│
              │  ChromaDB  │        │   Result    │
              └────────────┘        └────────────┘

Algorithm: Multi-Agent Audit Orchestration

Algorithm 1: LLM-Driven Multi-Agent Security Audit

Input: Project P, Target vulnerabilities V, Configuration C
Output: Findings F, Verification Results R

1:  Initialize Orchestrator Agent with LLM
2:  Create sub-agents: Recon, Analysis, Verification
3:  findings ← ∅
4:  verified_results ← ∅
5:  
6:  // Phase 1: Reconnaissance
7:  recon_result ← ReconAgent.run(P, V)
8:  high_risk_areas ← recon_result.priority_areas
9:  
10: // Phase 2: Orchestration Loop
11: while iteration < MAX_ITERATIONS do
12:     thought, action ← LLM.reason(context, history)
13:     
14:     if action = "dispatch_agent" then
15:         agent ← select_agent(action.params)
16:         result ← agent.run(action.task, context)
17:         findings ← findings  result.findings
18:         update_context(result)
19:     else if action = "finish" then
20:         break
21:     end if
22:     
23:     iteration ← iteration + 1
24: end while
25: 
26: // Phase 3: Verification
27: for each f ∈ findings where f.severity ≥ HIGH do
28:     poc ← LLM.generate_poc(f)
29:     result ← Sandbox.execute(poc)
30:     verified_results ← verified_results  {(f, result)}
31: end for
32: 
33: return (findings, verified_results)

Evaluation Metrics

For academic evaluation, we suggest the following metrics:

Detection Effectiveness

Metric Formula Description
Precision TP / (TP + FP) Accuracy of reported vulnerabilities
Recall TP / (TP + FN) Coverage of actual vulnerabilities
F1-Score 2 × (P × R) / (P + R) Harmonic mean of precision and recall

Efficiency Metrics

Metric Description
Time-to-Detection (TTD) Time from start to first vulnerability found
Total Audit Time End-to-end execution time
LLM Token Usage Total tokens consumed during audit
Tool Invocation Count Number of external tool calls

Verification Quality

Metric Description
Verification Rate Percentage of findings verified via sandbox
False Positive Reduction % reduction after verification
PoC Success Rate Successful exploit demonstrations

System Multi-Agent RAG Sandbox LLM-Driven
CodeQL
Semgrep
Snyk Code Partial
GitHub Copilot
DeepAudit

LaTeX TikZ Diagram Code

For LaTeX papers, you can use the following TikZ code:

\begin{figure}[t]
\centering
\begin{tikzpicture}[
    node distance=1cm,
    box/.style={rectangle, draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, align=center},
    agent/.style={box, fill=blue!10},
    tool/.style={box, fill=orange!10},
    rag/.style={box, fill=green!10},
    sandbox/.style={box, fill=red!10},
    arrow/.style={->, >=stealth, thick}
]

% Orchestrator
\node[agent] (orch) {Orchestrator Agent};

% Sub-agents
\node[agent, below left=1.5cm and 1cm of orch] (recon) {Recon Agent};
\node[agent, below=1.5cm of orch] (analysis) {Analysis Agent};
\node[agent, below right=1.5cm and 1cm of orch] (verify) {Verification Agent};

% Connections
\draw[arrow] (orch) -- (recon);
\draw[arrow] (orch) -- (analysis);
\draw[arrow] (orch) -- (verify);

% Tools
\node[tool, below=1cm of analysis] (tools) {SAST Tools\\Semgrep, Bandit, Kunlun-M};

% RAG
\node[rag, left=1cm of tools] (rag) {RAG Pipeline\\Vector DB + CWE/CVE};

% Sandbox
\node[sandbox, right=1cm of tools] (sandbox) {Docker Sandbox\\PoC Verification};

% Tool connections
\draw[arrow] (analysis) -- (tools);
\draw[arrow, dashed] (tools) -- (rag);
\draw[arrow] (verify) -- (sandbox);

% LLM
\node[box, fill=purple!10, above=0.5cm of orch] (llm) {LLM Provider\\GPT-4 / Claude};
\draw[arrow, <->] (orch) -- (llm);

\end{tikzpicture}
\caption{DeepAudit System Architecture}
\label{fig:architecture}
\end{figure}

Citation

If you use DeepAudit in your research, please cite:

@software{deepaudit2024,
  title = {DeepAudit: LLM-Driven Multi-Agent Code Security Audit System with RAG Enhancement and Sandbox Verification},
  author = {Lin Tsinghua},
  year = {2024},
  url = {https://github.com/lintsinghua/DeepAudit},
  version = {3.0.0}
}