zeroclaw 如何設計 Tool 的抽象層

zeroclaw 的 Tool 是 agent 採取行動的核心機制——執行 shell 指令、讀寫檔案、瀏覽網頁、呼叫 HTTP API、操作記憶體、甚至把任務委派給另一個 sub-agent。目前實作了 30+ 種工具。這篇記錄它如何用一個乾淨的 trait 統一所有工具，以及工具的組裝、dispatch、schema 正規化、安全注入等設計。

`Tool` Trait：四個必填，一個免費

整個抽象的核心在 src/tools/traits.rs，只有四個方法是必須實作的：

#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    fn parameters_schema(&self) -> serde_json::Value;
    async fn execute(&self, args: serde_json::Value) -> anyhow::Result<ToolResult>;

    // 預設實作：把前三個方法組裝成 ToolSpec，免費附贈
    fn spec(&self) -> ToolSpec {
        ToolSpec {
            name: self.name().to_string(),
            description: self.description().to_string(),
            parameters: self.parameters_schema(),
        }
    }
}

幾個設計決策值得注意：

Send + Sync：讓工具可以存進 Arc<dyn Tool> 並在 thread 之間共享，必要條件。
async fn execute：所有工具都可能有 I/O，統一用 async，不需要區分同步/非同步工具。
參數是 serde_json::Value：不是每個工具一個強型別 struct，而是執行時從 JSON 取值。省去了大量 boilerplate，代價是錯誤在執行時才出現，而不是編譯時。
spec() 是預設方法：把 name、description、parameters_schema 這三個方法的結果組裝成 ToolSpec，新增工具完全不需要自己實作這個。

核心資料型別

三個關鍵型別：

// 每次執行的回傳值
pub struct ToolResult {
    pub success: bool,
    pub output: String,
    pub error: Option<String>,
}

// 送給 LLM 描述「這個工具能做什麼」的規格書
pub struct ToolSpec {
    pub name: String,
    pub description: String,
    pub parameters: serde_json::Value,  // JSON Schema 物件
}

注意 execute 的錯誤處理設計：正常的執行失敗（找不到檔案、路徑不允許）用 ToolResult { success: false, error: Some(...) } 回傳；只有程式本身的 bug 或不可恢復的錯誤才回傳 anyhow::Result::Err。這讓 agent loop 可以把執行失敗的結果繼續送給 LLM 讓它決定下一步，而不是直接炸掉整個流程。

參數 schema 是用 serde_json::json!() 手寫 JSON Schema，沒有 proc macro 或 derive：

fn parameters_schema(&self) -> serde_json::Value {
    json!({
        "type": "object",
        "properties": {
            "command": {
                "type": "string",
                "description": "The shell command to execute"
            },
            "approved": {
                "type": "boolean",
                "description": "Set true to explicitly approve medium/high-risk commands",
                "default": false
            }
        },
        "required": ["command"]
    })
}

工具的組裝：工廠函式，不靠反射

工具不是自動發現的，而是在 src/tools/mod.rs 裡用工廠函式顯式組裝：

// 最小集合，給測試和簡單 agent 用
pub fn default_tools(security: Arc<SecurityPolicy>) -> Vec<Box<dyn Tool>> {
    vec![
        Box::new(ShellTool::new(security.clone(), runtime)),
        Box::new(FileReadTool::new(security.clone())),
        Box::new(FileWriteTool::new(security)),
    ]
}

// 完整工具集，根據設定條件啟用
pub fn all_tools_with_runtime(config, security, runtime, memory, ...) -> Vec<Box<dyn Tool>> {
    let mut tools: Vec<Box<dyn Tool>> = vec![
        Box::new(ShellTool::new(...)),
        Box::new(FileReadTool::new(...)),
        Box::new(CronAddTool::new(...)),
        Box::new(MemoryStoreTool::new(...)),
        // ...
    ];

    // 條件啟用：設定決定工具集，不是全部預設開啟
    if browser_config.enabled {
        tools.push(Box::new(BrowserTool::new_with_backend(...)));
    }
    if http_config.enabled {
        tools.push(Box::new(HttpRequestTool::new(...)));
    }
    if !agents.is_empty() {
        tools.push(Box::new(DelegateTool::new(agents, ...)));
    }
    tools
}

這是工廠/建構器模式——沒有反射、沒有 inventory 這類 compile-time 自動收集 crate、沒有 #[register_tool] 巨集。好處是依賴關係一目了然，壞處是每加一個工具都要手動登記。對這個規模的 codebase 來說是對的取捨。

兩種 Dispatch 模式

工具的呼叫有兩條路，取決於 LLM provider 是否支援原生 function calling：

graph TD A[LLM 回應] --> B{支援原生 Tool Calling?} B -->|是| C[NativeToolDispatcher] B -->|否| D[XmlToolDispatcher] C -->|解析 API 的 tool_calls 欄位| E[ParsedToolCall] D -->|解析文字裡的 XML tag| E E --> F["find_tool(name)"] F --> G["tool.execute(args)"] G --> H[ToolResult] H --> I[轉成 ConversationMessage] I --> J[送回 LLM 繼續下一輪]

NativeToolDispatcher 用於 Anthropic、OpenAI、Gemini——這些 provider 會在 API response 裡回傳結構化的工具呼叫，直接解析即可。

XmlToolDispatcher 用於不支援原生 function calling 的 LLM。此時 zeroclaw 把工具的說明注入到 system prompt，要求 LLM 用特定格式回應：

<tool_call>{"name": "shell", "arguments": {"command": "ls -la"}}</tool_call>

然後用字串解析從回應文字裡抓出這些 XML tag。

Agent loop 查找工具的方式很簡單：

fn find_tool<'a>(tools: &'a [Box<dyn Tool>], name: &str) -> Option<&'a dyn Tool> {
    tools.iter().find(|t| t.name() == name).map(|t| t.as_ref())
}

送給 LLM 的格式（以 OpenAI 為例）：

fn tools_to_openai_format(tools: &[Box<dyn Tool>]) -> Vec<serde_json::Value> {
    tools.iter().map(|tool| {
        json!({
            "type": "function",
            "function": {
                "name": tool.name(),
                "description": tool.description(),
                "parameters": tool.parameters_schema()
            }
        })
    }).collect()
}

JSON Schema 的跨 Provider 正規化

各家 LLM API 對 JSON Schema 的支援程度差異很大，尤其是 Gemini——它會拒絕很多標準的 JSON Schema 關鍵字。src/tools/schema.rs 的 SchemaCleanr 專門處理這件事：

pub enum CleaningStrategy {
    Gemini,       // 最嚴格
    Anthropic,
    OpenAI,       // 最寬鬆
    Conservative, // 保守策略，適合未知 provider
}

pub struct SchemaCleanr;

impl SchemaCleanr {
    pub fn clean_for_gemini(schema: Value) -> Value { ... }
    pub fn clean_for_anthropic(schema: Value) -> Value { ... }
    pub fn clean_for_openai(schema: Value) -> Value { ... }
}

各策略的主要差異：

關鍵字	Gemini	Anthropic	OpenAI
`minLength`、`pattern`	移除	移除	保留
`$ref`	內聯展開	內聯展開	保留
`additionalProperties`	移除	保留	保留
`anyOf`/`oneOf`	攤平為第一個型別	保留	保留
`nullable`	轉換格式	保留	保留

Gemini 甚至連 anyOf: [type: string, type: null]（nullable 的常見寫法）都要特別轉換。這些邊緣情況全部藏在 SchemaCleanr 裡，工具本身完全不需要知道。

安全性是注入的，不是全域的

zeroclaw 的安全設計有個核心原則：安全策略是建構時注入的，不是全域的靜態狀態。

// 每個有 I/O 的工具都在建構時收到 SecurityPolicy
pub struct FileReadTool {
    security: Arc<SecurityPolicy>,
}

pub struct ShellTool {
    security: Arc<SecurityPolicy>,
    runtime: Arc<dyn RuntimeAdapter>,
}

async fn execute(&self, args: serde_json::Value) -> anyhow::Result<ToolResult> {
    let path = args.get("path").and_then(|v| v.as_str())...;
    // 安全檢查在 execute 裡，繞不過去
    if !self.security.is_path_allowed(path) {
        return Ok(ToolResult { success: false, error: Some("Path not allowed".into()), ... });
    }
    // 繼續執行...
}

SecurityPolicy 強制執行工作區沙盒（防止讀寫工作目錄以外的路徑）、自主等級（ReadOnly、Supervised、Full）、rate limiting 和指令白名單。

Supervised 模式下還有 ApprovalManager，在執行高風險工具前暫停等待使用者確認：

[zeroclaw] Tool: shell
Command: rm -rf /tmp/build
[y]es / [n]o / [a]lways: _

選 always 之後，這個指令會被加進 session allowlist，下次遇到同樣的指令就不再詢問。

有趣的具體工具

DelegateTool：動態 Schema 的工具

DelegateTool 是最有趣的工具之一——它的 parameters_schema() 是執行時動態生成的，而不是靜態寫死的：

pub struct DelegateTool {
    agents: Arc<HashMap<String, DelegateAgentConfig>>,
    depth: u32,  // 防止無限委派遞迴
}

fn parameters_schema(&self) -> serde_json::Value {
    // 根據目前設定的 agent 列表動態生成 schema
    let agent_names: Vec<&str> = self.agents.keys().map(|s| s.as_str()).collect();
    json!({
        "properties": {
            "agent": {
                "type": "string",
                "description": format!("Which agent to delegate to. Available: {}",
                                       agent_names.join(", "))
            },
            "task": { "type": "string", "description": "Task description for the agent" }
        }
    })
}

async fn execute(&self, args: serde_json::Value) -> anyhow::Result<ToolResult> {
    if self.depth >= MAX_DELEGATION_DEPTH {
        return Ok(ToolResult { success: false, error: Some("Max delegation depth reached"), ... });
    }
    let provider = create_provider(&agent_config.provider, ...)?;
    // 120 秒 timeout，防止 sub-agent 跑太久
    tokio::time::timeout(Duration::from_secs(120),
        provider.chat_with_system(...)).await
}

depth 欄位防止 agent A 委派給 agent B、B 又委派回 A 的無限迴圈。

ShellTool：乾淨的執行環境

ShellTool 的安全設計很謹慎——它清空整個環境變數，只保留一個安全白名單：

async fn execute(&self, args: serde_json::Value) -> anyhow::Result<ToolResult> {
    // 清除所有環境變數，防止 API key 洩漏給子行程
    let mut cmd = Command::new("sh");
    cmd.env_clear();  // 全部清空
    for key in SAFE_ENV_VARS {  // 只加回白名單裡的
        if let Ok(val) = std::env::var(key) {
            cmd.env(key, val);
        }
    }
    // 60 秒 timeout，截斷輸出到 1MB
    tokio::time::timeout(Duration::from_secs(60), cmd.output()).await
}

SAFE_ENV_VARS 包含 PATH、HOME、LANG 這類系統必要的變數，但不包含任何 *_API_KEY、*_SECRET 這類敏感變數。即使主行程有這些環境變數，子行程也看不到。

完整的 Tool Call 流程

sequenceDiagram participant U as 使用者 participant AL as Agent Loop participant P as Provider participant LLM as LLM API participant T as Tool U->>AL: 送出訊息 AL->>P: tools_to_provider_format(specs) P->>LLM: chat request + tool definitions LLM-->>P: 回應（含 tool calls） P-->>AL: ChatResponse { tool_calls: [...] } loop 最多 10 次 AL->>AL: find_tool(name) AL->>AL: ApprovalManager（Supervised 模式） AL->>T: execute(args) T->>T: SecurityPolicy 檢查 T-->>AL: ToolResult AL->>P: 把結果轉成 ConversationMessage P->>LLM: 繼續對話（含工具執行結果） LLM-->>P: 新回應 alt 不再呼叫工具 P-->>AL: 純文字回應 AL-->>U: 最終答覆 end end

整個流程最多迴圈 10 次（DEFAULT_MAX_TOOL_ITERATIONS）。超過就強制停止，防止 agent 無止境地呼叫工具。

小結

zeroclaw 的 Tool 設計幾個值得借鑑的地方：

最小介面：四個必填方法，spec() 是免費的預設實作，新增工具的門檻很低。
工廠函式，不靠反射：顯式組裝、條件啟用，依賴關係清晰，沒有魔法。
兩種 Dispatch 兼顧：原生 API tool calling 和 XML prompt 注入都支援，對各種 LLM 都能用。
Schema 正規化是一等公民：SchemaCleanr 把各家 API 的奇葩限制集中處理，工具本身不受污染。
安全是建構時注入的：SecurityPolicy 在 new() 時就進去了，不是全域狀態，也無法繞過。
執行失敗不等於程式崩潰：ToolResult { success: false } 讓 agent 可以從失敗中學習，繼續嘗試。