# DeepAudit: System Architecture for Academic Paper This document provides the system architecture description suitable for top-tier academic conferences (ICSE, FSE, CCS, S&P, USENIX Security, etc.). ## Architecture Diagram ![DeepAudit Architecture](images/deepaudit_architecture.png) --- ## System Overview **DeepAudit** is an LLM-driven intelligent code security audit system that employs a **hierarchical multi-agent architecture** with **Retrieval-Augmented Generation (RAG)** and **sandbox-based vulnerability verification**. ### Key Contributions 1. **LLM-Driven Multi-Agent Orchestration**: A dynamic agent hierarchy where the LLM serves as the central decision-making brain, autonomously orchestrating specialized agents for reconnaissance, analysis, and verification. 2. **RAG-Enhanced Vulnerability Detection**: Integration of semantic code understanding with vulnerability knowledge bases (CWE/CVE) to reduce false positives and improve detection accuracy. 3. **Sandbox-Based Exploit Verification**: Docker-isolated execution environment for automated PoC generation and vulnerability confirmation. --- ## Architecture Components ### Layer 1: User Interface Layer ``` ┌─────────────────────────────────────────────────────────────────┐ │ User Interface Layer │ ├─────────────────────────────────────────────────────────────────┤ │ ┌───────────────────┐ ┌───────────────────────────────────┐ │ │ │ Web Frontend │ │ API Gateway │ │ │ │ (React + TS) │◄──►│ REST API / SSE Event Stream │ │ │ └───────────────────┘ └───────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` **Components:** - **Web Frontend**: React 18 + TypeScript SPA with real-time log streaming - **API Gateway**: FastAPI-based REST endpoints with SSE for real-time events ### Layer 2: Multi-Agent Orchestration Layer ``` ┌─────────────────────────────────────────────────────────────────┐ │ Multi-Agent Orchestration Layer │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────┐ │ │ │ Orchestrator Agent │ ◄─── LLM Provider │ │ │ (ReAct Loop) │ (GPT-4/Claude) │ │ └──────────┬──────────┘ │ │ │ │ │ ┌────────────────┼────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Recon Agent │ │Analysis Agent│ │Verification │ │ │ │ │ │ │ │ Agent │ │ │ │ • Structure │ │ • SAST │ │ • PoC Gen │ │ │ │ • Tech Stack │ │ • Pattern │ │ • Sandbox │ │ │ │ • Entry Pts │ │ • Dataflow │ │ • Validation │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` **Key Design Decisions:** | Component | Design Choice | Rationale | |-----------|---------------|-----------| | Orchestrator | LLM-driven ReAct loop | Dynamic strategy adaptation based on findings | | Sub-Agents | Specialized roles | Domain expertise separation for precision | | Communication | TaskHandoff protocol | Structured context passing between agents | | Iteration Limits | Configurable (20/30/15) | Prevent infinite loops while ensuring depth | ### Layer 3: RAG Knowledge Enhancement Layer ``` ┌─────────────────────────────────────────────────────────────────┐ │ RAG Knowledge Enhancement Layer │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │ Code Chunker│ │ Embedding │ │ Vector Database │ │ │ │(Tree-sitter)│───►│ Model │───►│ (ChromaDB) │ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │ │ │ ┌─────────────────────────────────────────────────┼───────────┐│ │ │ CWE/CVE Knowledge Base │ ││ │ │ • SQL Injection patterns ▼ ││ │ │ • XSS signatures ┌───────────────────┐ ││ │ │ • Command Injection │ Semantic Retriever│ ││ │ │ • Path Traversal └───────────────────┘ ││ │ │ • SSRF patterns ││ │ │ • ... ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ └─────────────────────────────────────────────────────────────────┘ ``` **RAG Pipeline:** 1. **Code Chunking**: Tree-sitter based AST-aware chunking for semantic preservation 2. **Embedding**: Support for OpenAI text-embedding-3-small/large, local models 3. **Vector Store**: ChromaDB for lightweight deployment 4. **Retrieval**: Semantic similarity search with vulnerability pattern matching ### Layer 4: Security Tool Integration Layer ``` ┌─────────────────────────────────────────────────────────────────┐ │ Security Tool Integration Layer │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ SAST Tools ││ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ││ │ │ │ Semgrep │ │ Bandit │ │Kunlun-M │ │Pattern Match │ ││ │ │ │ (Multi) │ │ (Python) │ │ (PHP/JS) │ │ (Fallback) │ ││ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ │ ┌────────────────────────┐ ┌────────────────────────────────┐ │ │ │ Secret Detection │ │ Dependency Analysis │ │ │ │ • Gitleaks │ │ • OSV-Scanner │ │ │ │ • TruffleHog │ │ • npm audit / pip-audit │ │ │ └────────────────────────┘ └────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` **Tool Selection Strategy:** | Category | Primary Tool | Fallback | Coverage | |----------|-------------|----------|----------| | Multi-lang SAST | Semgrep | PatternMatch | 20+ languages | | Python Security | Bandit | PatternMatch | Python-specific | | PHP/JS Analysis | Kunlun-M | Semgrep | Semantic analysis | | Secret Detection | Gitleaks | TruffleHog | Git history scan | | Dependencies | OSV-Scanner | npm/pip audit | Multi-ecosystem | ### Layer 5: Sandbox Verification Layer ``` ┌─────────────────────────────────────────────────────────────────┐ │ Sandbox Verification Layer │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────┐│ │ │ Docker Sandbox Container ││ │ │ ┌────────────────────────────────────────────────────────┐ ││ │ │ │ Security Constraints │ ││ │ │ │ • Network: Isolated / No external access │ ││ │ │ │ • Resources: Memory 512MB / CPU 1.0 │ ││ │ │ │ • Syscalls: seccomp whitelist policy │ ││ │ │ │ • Timeout: 60 seconds max execution │ ││ │ │ └────────────────────────────────────────────────────────┘ ││ │ │ ││ │ │ ┌──────────────────┐ ┌──────────────────────────────┐ ││ │ │ │ PoC Generator │───►│ Exploit Validator │ ││ │ │ │ (LLM-assisted) │ │ (Execution + Verification) │ ││ │ │ └──────────────────┘ └──────────────────────────────┘ ││ │ │ ││ │ └─────────────────────────────────────────────────────────────┘│ │ │ └─────────────────────────────────────────────────────────────────┘ ``` **Verification Workflow:** 1. **PoC Generation**: LLM generates exploitation code based on vulnerability analysis 2. **Sandbox Setup**: Docker container with strict security constraints 3. **Execution**: Run PoC in isolated environment 4. **Validation**: Check execution results against expected vulnerability behavior 5. **Confidence Scoring**: Assign verification confidence (0-1) --- ## Data Flow Diagram ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ DeepAudit Data Flow │ └─────────────────────────────────────────────────────────────────────────────┘ ┌──────────┐ ┌──────────────┐ │ User │ │ Reports │ │ Request │ │ (MD/JSON) │ └────┬─────┘ └──────▲───────┘ │ │ ▼ │ ┌───────────────┐ ┌─────────────────────────────────────────────┴───────┐ │ API Gateway │───►│ PostgreSQL DB │ └───────┬───────┘ │ • Tasks • Findings • Projects • Reports │ │ └─────────────────────────────────────────────────────┘ ▼ ┌───────────────────────────────────────────────────────────────────────────┐ │ Orchestrator Agent │ │ │ │ ┌─────────────┐ ┌─────────────────────────────────────────────┐ │ │ │ LLM Service │◄────►│ ReAct Decision Loop │ │ │ │ (GPT/Claude)│ │ Thought → Action → Observation → Thought │ │ │ └─────────────┘ └───────────────────┬─────────────────────────┘ │ │ │ │ │ ┌─────────────┬───────────────┼───────────────┐ │ │ ▼ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌───────────┐ ┌────────────┐ ┌──────────────┐ │ │ │ Recon │ │ Analysis │ │Verification│ │ Finish │ │ │ │ Agent │ │ Agent │ │ Agent │ │ Action │ │ │ └──────┬──────┘ └─────┬─────┘ └──────┬─────┘ └──────────────┘ │ │ │ │ │ │ └─────────────┼──────────────┼──────────────┼───────────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ File Tools │ │ SAST Tools │ │ Sandbox │ │ list/read │ │ Semgrep... │ │ Docker │ └─────┬──────┘ └─────┬──────┘ └──────┬─────┘ │ │ │ │ ┌──────┴──────┐ │ │ ▼ │ │ │ ┌─────────┐ │ │ └─►│ RAG │◄───────┘ │ │ Pipeline│ │ └────┬────┘ │ │ │ ▼ ▼ ┌────────────┐ ┌────────────┐ │ Vector DB │ │ Verification│ │ ChromaDB │ │ Result │ └────────────┘ └────────────┘ ``` --- ## Algorithm: Multi-Agent Audit Orchestration ``` Algorithm 1: LLM-Driven Multi-Agent Security Audit Input: Project P, Target vulnerabilities V, Configuration C Output: Findings F, Verification Results R 1: Initialize Orchestrator Agent with LLM 2: Create sub-agents: Recon, Analysis, Verification 3: findings ← ∅ 4: verified_results ← ∅ 5: 6: // Phase 1: Reconnaissance 7: recon_result ← ReconAgent.run(P, V) 8: high_risk_areas ← recon_result.priority_areas 9: 10: // Phase 2: Orchestration Loop 11: while iteration < MAX_ITERATIONS do 12: thought, action ← LLM.reason(context, history) 13: 14: if action = "dispatch_agent" then 15: agent ← select_agent(action.params) 16: result ← agent.run(action.task, context) 17: findings ← findings ∪ result.findings 18: update_context(result) 19: else if action = "finish" then 20: break 21: end if 22: 23: iteration ← iteration + 1 24: end while 25: 26: // Phase 3: Verification 27: for each f ∈ findings where f.severity ≥ HIGH do 28: poc ← LLM.generate_poc(f) 29: result ← Sandbox.execute(poc) 30: verified_results ← verified_results ∪ {(f, result)} 31: end for 32: 33: return (findings, verified_results) ``` --- ## Evaluation Metrics For academic evaluation, we suggest the following metrics: ### Detection Effectiveness | Metric | Formula | Description | |--------|---------|-------------| | Precision | TP / (TP + FP) | Accuracy of reported vulnerabilities | | Recall | TP / (TP + FN) | Coverage of actual vulnerabilities | | F1-Score | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall | ### Efficiency Metrics | Metric | Description | |--------|-------------| | Time-to-Detection (TTD) | Time from start to first vulnerability found | | Total Audit Time | End-to-end execution time | | LLM Token Usage | Total tokens consumed during audit | | Tool Invocation Count | Number of external tool calls | ### Verification Quality | Metric | Description | |--------|-------------| | Verification Rate | Percentage of findings verified via sandbox | | False Positive Reduction | % reduction after verification | | PoC Success Rate | Successful exploit demonstrations | --- ## Comparison with Related Work | System | Multi-Agent | RAG | Sandbox | LLM-Driven | |--------|-------------|-----|---------|------------| | CodeQL | ✗ | ✗ | ✗ | ✗ | | Semgrep | ✗ | ✗ | ✗ | ✗ | | Snyk Code | ✗ | ✗ | ✗ | Partial | | GitHub Copilot | ✗ | ✗ | ✗ | ✓ | | **DeepAudit** | **✓** | **✓** | **✓** | **✓** | --- ## LaTeX TikZ Diagram Code For LaTeX papers, you can use the following TikZ code: ```latex \begin{figure}[t] \centering \begin{tikzpicture}[ node distance=1cm, box/.style={rectangle, draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, align=center}, agent/.style={box, fill=blue!10}, tool/.style={box, fill=orange!10}, rag/.style={box, fill=green!10}, sandbox/.style={box, fill=red!10}, arrow/.style={->, >=stealth, thick} ] % Orchestrator \node[agent] (orch) {Orchestrator Agent}; % Sub-agents \node[agent, below left=1.5cm and 1cm of orch] (recon) {Recon Agent}; \node[agent, below=1.5cm of orch] (analysis) {Analysis Agent}; \node[agent, below right=1.5cm and 1cm of orch] (verify) {Verification Agent}; % Connections \draw[arrow] (orch) -- (recon); \draw[arrow] (orch) -- (analysis); \draw[arrow] (orch) -- (verify); % Tools \node[tool, below=1cm of analysis] (tools) {SAST Tools\\Semgrep, Bandit, Kunlun-M}; % RAG \node[rag, left=1cm of tools] (rag) {RAG Pipeline\\Vector DB + CWE/CVE}; % Sandbox \node[sandbox, right=1cm of tools] (sandbox) {Docker Sandbox\\PoC Verification}; % Tool connections \draw[arrow] (analysis) -- (tools); \draw[arrow, dashed] (tools) -- (rag); \draw[arrow] (verify) -- (sandbox); % LLM \node[box, fill=purple!10, above=0.5cm of orch] (llm) {LLM Provider\\GPT-4 / Claude}; \draw[arrow, <->] (orch) -- (llm); \end{tikzpicture} \caption{DeepAudit System Architecture} \label{fig:architecture} \end{figure} ``` --- ## Citation If you use DeepAudit in your research, please cite: ```bibtex @software{deepaudit2024, title = {DeepAudit: LLM-Driven Multi-Agent Code Security Audit System with RAG Enhancement and Sandbox Verification}, author = {Lin Tsinghua}, year = {2024}, url = {https://github.com/lintsinghua/DeepAudit}, version = {3.0.0} } ```