Plan: Telegram Ops Bot + LLM Router + Agent-on-Server
Status: Proposed — not yet implemented Author: Planning agent, for Mo Date: 2026-04-16
0. Executive brief
Extend the existing qgtm_bot/ Telegram integration into a full ops interface that (a) answers operational questions, (b) can safely execute a narrow set of privileged actions on the DO droplet with human-in-the-loop confirmation, and (c) is backed by a resilient LLM router that prefers local Ollama but escalates to Claude / OpenAI when local quality or availability isn't good enough. Surface the same capability in the Next.js dashboard via a new /ops page.
The plan is explicit: the LLM is never allowed to mutate state without a human typing a confirm token. "Intelligent" here means the router picks the best answerer; the actor is always a constrained tool executor gated by a 2-step Telegram confirm.
1. Architecture
flowchart LR
subgraph Clients
TG[Telegram app<br/>@qgtm_trading_bot]
WEB[qgtm_web /ops page]
end
subgraph Bot["qgtm_bot (existing process, extended)"]
LISTEN[command_listener<br/>long-poll]
DISPATCH[ops_commands dispatcher]
AUDIT[audit trail writer]
CONFIRM[pending_confirms registry<br/>Redis TTL]
end
subgraph Ops["qgtm_ops (new package)"]
ROUTER[llm_router]
TOOLS[tool_registry<br/>+ executor]
QUALITY[quality_scorer]
CACHE[response_cache<br/>Redis]
end
subgraph LLMs
OLLAMA[(Ollama<br/>llama3.2:3b<br/>localhost:11434)]
CLAUDE[(Anthropic API)]
OPENAI[(OpenAI API)]
end
subgraph Server["DO droplet"]
DAEMON[qgtm-daemon<br/>systemd unit]
API[qgtm-api<br/>systemd unit]
REDIS[(Redis)]
end
TG -->|getUpdates| LISTEN
WEB -->|POST /api/v1/ops/ask| API
API --> ROUTER
LISTEN --> DISPATCH
DISPATCH --> ROUTER
DISPATCH --> CONFIRM
DISPATCH --> AUDIT
ROUTER -->|primary| OLLAMA
ROUTER -->|fallback 1| CLAUDE
ROUTER -->|fallback 2| OPENAI
ROUTER --> QUALITY
ROUTER --> CACHE
CACHE --- REDIS
CONFIRM --- REDIS
ROUTER -->|tool-use JSON| TOOLS
TOOLS -->|systemctl restart qgtm-daemon| DAEMON
TOOLS -->|POST /v1/risk/kill_tier| API
TOOLS -->|read journalctl| DAEMON
AUDIT --> REDIS
2. Module breakdown
2.1 New package — qgtm_ops/
| File | Responsibility |
|---|---|
qgtm_ops/__init__.py |
Package marker |
qgtm_ops/llm_router.py |
Primary router — route / score / fall through / cache / cost-account |
qgtm_ops/providers/ollama.py |
Thin wrapper on existing qgtm_bot.intelligence._query_ollama pattern |
qgtm_ops/providers/anthropic.py |
Claude API client (Messages API, tool use) |
qgtm_ops/providers/openai.py |
OpenAI Chat Completions client (function calling) |
qgtm_ops/quality.py |
score_response(), self_critique(), heuristic checks |
qgtm_ops/cache.py |
Redis-backed response cache with SHA256-keyed prompts |
qgtm_ops/cost.py |
Token & dollar accounting; writes to Redis hash qgtm:ops:cost:{yyyymm} |
qgtm_ops/tools/registry.py |
Tool schema definitions (name, args, effect, confirm_required) |
qgtm_ops/tools/executor.py |
Safe executor; only invokes whitelisted tools; records audit |
qgtm_ops/tools/impl/daemon.py |
daemon_restart, daemon_status, fetch_logs |
qgtm_ops/tools/impl/risk.py |
set_kill_tier, flatten_all, run_reconciliation |
qgtm_ops/tools/impl/strategy.py |
toggle_strategy, rebalance_now |
qgtm_ops/audit.py |
Re-uses qgtm_core.audit_log.AuditEntry; adds OPS_COMMAND event type |
qgtm_ops/confirm.py |
request_confirm() / verify_confirm() — Redis TTL-backed 2-step flow |
2.2 Extensions to existing modules
| File | Change |
|---|---|
qgtm_bot/telegram.py |
Add ops command branch; no mutation inside the router — everything destructive goes through qgtm_ops.tools.executor. Chat-ID allowlist moved to settings.telegram_ops_allowlist. |
qgtm_bot/ops_commands.py |
NEW — handlers for /status (alias existing), /why, /debug, /restart, /flatten, /tier, /rebalance, /toggle, /health |
qgtm_core/config.py |
Add: anthropic_api_key, openai_api_key, llm_router_mode, llm_router_quality_threshold, llm_cache_ttl_sec, telegram_ops_allowlist, telegram_ops_confirm_ttl_sec, ops_bot_enabled, ops_monthly_budget_usd |
qgtm_core/audit_log.py |
Add OPS_COMMAND + OPS_TOOL_EXECUTION to AuditEventType enum |
qgtm_api/ops_routes.py |
NEW — POST /api/v1/ops/ask, POST /api/v1/ops/execute, GET /api/v1/ops/audit?limit=N, GET /api/v1/ops/cost |
qgtm_api/main.py |
Register ops_routes.router |
qgtm_web/src/app/ops/page.tsx |
NEW — dashboard integration |
qgtm_web/src/app/ops/components/ModelSelector.tsx |
NEW — pin / auto toggle |
qgtm_web/src/app/ops/components/AuditTail.tsx |
NEW — live poll of audit |
qgtm_web/src/app/ops/components/AskBox.tsx |
NEW — mirrors /ask but hits /ops/ask |
qgtm_web/src/app/ops/components/CostMetrics.tsx |
NEW — per-provider spend |
2.3 Tests
| File | Scope |
|---|---|
tests/test_llm_router.py |
Unit: routing logic, fallback cascade, cache hits, quality threshold |
tests/test_llm_router_quality.py |
Unit: scoring heuristics with synthetic good/bad responses |
tests/test_ops_tools.py |
Unit: each tool's arg validation + confirm requirement |
tests/test_ops_confirm.py |
Unit: confirm token generation, TTL expiry, wrong-token rejection |
tests/test_ops_bot_mock.py |
Mock-Telegram updates in → expected outputs |
tests/test_ops_integration.py |
End-to-end: fake LLM → tool proposal → confirm → executor no-op → audit entry |
tests/test_ops_auth.py |
Non-allowlisted chat IDs rejected for every command |
tests/test_ops_routes.py |
API integration — auth, ask, execute, audit tail |
qgtm_web/src/app/ops/__tests__/page.test.tsx |
React component tests |
3. API contract — qgtm_ops/llm_router.py
class Provider(StrEnum):
OLLAMA = "ollama"
CLAUDE = "claude"
OPENAI = "openai"
class RouteMode(StrEnum):
AUTO = "auto"
OLLAMA = "ollama"
CLAUDE = "claude"
OPENAI = "openai"
@dataclass(frozen=True)
class RouterRequest:
prompt: str
kind: Literal["debug", "status", "why", "tool_use", "brief", "eod", "free"]
pin: RouteMode = RouteMode.AUTO
tools: list[ToolSpec] | None = None
context: dict[str, Any] | None = None
user_id: str
correlation_id: str
cache: bool = True
@dataclass(frozen=True)
class RouterResponse:
text: str
tool_calls: list[ToolCall]
provider: Provider
quality_score: float
latency_ms: int
tokens_in: int
tokens_out: int
cost_usd: float
cache_hit: bool
fallbacks_used: list[Provider]
async def route(req: RouterRequest) -> RouterResponse: ...
async def health() -> dict[Provider, bool]: ...
async def costs(month: str | None = None) -> dict[Provider, dict[str, float]]: ...
async def reset_cache() -> int: ...
Supporting:
# qgtm_ops/quality.py
def score_response(
prompt: str,
response: str,
kind: str,
*,
ground_truth_keywords: set[str] | None = None,
) -> float: ...
async def self_critique(response: str, critic: Provider = Provider.OLLAMA) -> float: ...
# qgtm_ops/tools/registry.py
@dataclass(frozen=True)
class ToolSpec:
name: str
description: str
params_schema: dict[str, Any]
effect: Literal["read", "mutate", "destructive"]
confirm_required: bool
min_role: Literal["owner", "founder"]
def get_all_tools() -> list[ToolSpec]: ...
def get_tool(name: str) -> ToolSpec | None: ...
# qgtm_ops/tools/executor.py
@dataclass(frozen=True)
class ToolCall:
tool: str
args: dict[str, Any]
@dataclass(frozen=True)
class ToolResult:
ok: bool
output: str
duration_ms: int
audit_sequence: int
async def execute(
call: ToolCall,
*,
actor_id: str,
confirmed_token: str | None,
correlation_id: str,
) -> ToolResult: ...
4. Telegram command spec
All commands scoped by chat-ID allowlist (config: telegram_ops_allowlist). Non-allowlisted chat IDs get no reply. Known IDs: Mo 6902420777, Naz 8383804765.
Read-only (tier: owner)
| Cmd | Args | Behavior | Router kind |
|---|---|---|---|
/status |
— | Existing handler | n/a |
/positions |
— | Existing handler | n/a |
/pnl |
— | Existing handler | n/a |
/health |
— | NEW: daemon heartbeat age, redis ping, alpaca ping, Ollama ping, Claude/OpenAI health | n/a |
/why |
<strategy_id> |
NEW: explain current signals for one strategy | why |
/debug |
<freeform> or --claude <freeform> |
NEW: freeform router Q&A | debug |
/audit |
[limit] |
NEW: last N ops audit entries | n/a |
/cost |
— | NEW: LLM spend this month per provider | n/a |
Destructive (tier: founder)
Every destructive command uses a two-step typed confirm:
- User types the command (e.g.
/restart) - Bot replies:
Confirm by sending: CONFIRM <token>— 6-char random string, 120s TTL - User types
CONFIRM <token>from the same chat ID - Bot executes, replies with outcome + audit sequence
| Cmd | Args | Tool invoked | Effect |
|---|---|---|---|
/restart |
— | daemon_restart |
systemctl restart qgtm-daemon |
/flatten |
— | flatten_all |
Sets kill tier FLATTEN, cancels all orders, closes all positions |
/tier |
NORMAL\|WARN\|NO_NEW\|FLATTEN |
set_kill_tier |
POST to risk API |
/rebalance |
— | rebalance_now |
Triggers one-shot rebalance |
/toggle |
<strategy_id> |
toggle_strategy |
Toggle strategy feature flag |
/recon |
— | run_reconciliation |
Triggers reconciliation |
/logs |
<service> [lines] |
fetch_logs |
Read-only, gated |
Confirmation flow:
user: /tier FLATTEN
bot: WARNING: set kill tier to FLATTEN will halt new orders and flatten positions.
Type "CONFIRM A3F9K2" within 120s to proceed, or /cancel to abort.
user: CONFIRM A3F9K2
bot: Executed set_kill_tier(FLATTEN). audit_seq=84211. kill tier is now FLATTEN.
5. Tool-use layer — exact list
| Tool | Effect | Confirm | Implementation |
|---|---|---|---|
daemon_status |
read | no | Read /var/run/qgtm/daemon_state.json |
fetch_logs(service, lines=200) |
read | no | journalctl -u {service} -n {lines} with service allowlist |
daemon_restart() |
destructive | YES | sudo systemctl restart qgtm-daemon via sudoers NOPASSWD |
set_kill_tier(tier) |
destructive | YES | POST /api/v1/risk/kill_tier |
flatten_all() |
destructive | YES | Composite: tier=FLATTEN → cancel_all → close_all |
rebalance_now() |
destructive | YES | POST /api/v1/daemon/rebalance |
toggle_strategy(strategy_id, enabled) |
destructive | YES | POST /api/v1/strategies/{id}/toggle |
run_reconciliation() |
destructive | YES | POST /api/v1/daemon/reconcile |
get_pnl(), get_positions(), get_signals() |
read | no | Existing handlers |
Guard rails:
- Executor validates tool in registry — no arbitrary names
- effect==destructive and not confirmed_token → refuse
- Every tool execution appends OPS_TOOL_EXECUTION audit entry + Telegram alert
- Idempotency via correlation_id; duplicates within 5s rejected
- Rate limit: per-chat-ID, max 6 destructive calls per hour
Sudoers file — /etc/sudoers.d/qgtm-ops:
qgtm ALL=(root) NOPASSWD: /bin/systemctl restart qgtm-daemon
qgtm ALL=(root) NOPASSWD: /bin/systemctl restart qgtm-api
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-daemon *
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-api *
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-redis *
6. Secrets
| Secret | Source | Purpose | Rotate by |
|---|---|---|---|
ANTHROPIC_API_KEY |
https://console.anthropic.com/settings/keys | Claude fallback | Generate new key → GH Actions Secrets → redeploy |
OPENAI_API_KEY |
https://platform.openai.com/api-keys | OpenAI fallback | Same flow |
TELEGRAM_OPS_ALLOWLIST |
manual — comma-separated chat IDs | Auth | Edit secret → redeploy |
QGTM_LLM_ROUTER_MODE |
auto|ollama|claude|openai |
Pin mode | Redeploy |
Config additions:
anthropic_api_key: str = ""
openai_api_key: str = ""
llm_router_mode: Literal["auto", "ollama", "claude", "openai"] = "auto"
llm_router_quality_threshold: float = 0.6
llm_cache_ttl_sec: int = 900
telegram_ops_allowlist: str = ""
telegram_ops_confirm_ttl_sec: int = 120
ops_bot_enabled: bool = False
ops_monthly_budget_usd: float = 50.0
7. Router quality-scoring algorithm
Compound score in [0.0, 1.0]; < threshold (default 0.6) triggers escalation.
score(response, kind, prompt, ground_truth):
s_length = clip(len(response) in [50, 1200] -> linear map [0,1], else 0)
s_refusal = 0 if response starts with ("i cannot", "as an ai", "i'm sorry") else 1
s_halluc = 1 - count(hallucinated_tickers(response) not in known_universe) / max(count(tickers(response)),1)
s_format = 1 if response matches required_sections(kind) else 0.5
s_gt = jaccard(tokens(response), ground_truth) if ground_truth else 1.0
weights = kind_weights[kind]
return sum(w_i * s_i) / sum(w_i)
Optional self-critique: second Ollama call grades its own answer 0-10.
| kind | threshold | self-critique |
|---|---|---|
status / why |
0.5 | off |
debug |
0.6 | on |
tool_use |
0.7 (JSON parse must succeed) | off |
brief / eod |
0.55 | off |
free |
0.6 | on |
Escalation tree:
1. If req.pin != AUTO, hit that provider only.
2. Try OLLAMA.
a. Timeout > 150s OR 5xx OR OOM → fall through.
b. Score < threshold[kind] AND kind in {debug, tool_use, brief} → fall through.
c. Else return Ollama.
3. Try CLAUDE (if key set AND budget not exceeded).
a. 429 / 5xx / no key → fall through.
b. Else return Claude.
4. Try OPENAI (same rules).
5. All fail → return hand-crafted deterministic fallback.
8. Caching
- Backend: Redis
- Key:
qgtm:ops:cache:{sha256(prompt + kind + pinned_provider)} - Value: JSON
{text, provider, quality_score, tokens, cost, ts} - TTL:
llm_cache_ttl_sec(default 900s) - Bypass:
tool_use, time-sensitive context (daemon_state),pin != AUTOwith debug kind - Invalidate on: kill-tier change,
/cancel_cacheby owner
9. Dashboard wireframe — /ops
┌──────────────────────────────────────────────────────────────────────┐
│ QGTM / OPS Mo | logout │
├──────────────────────────────────────────────────────────────────────┤
│ MODEL SELECTOR │
│ [ Auto ] [ Ollama ] [ Claude ] [ OpenAI ] Quality threshold: │
│ current: Auto (Ollama primary) [ 0.60 ▼ ] [Save] │
├────────────────────────────┬─────────────────────────────────────────┤
│ ASK │ AUDIT TAIL (live, polls every 3s) │
│ > _ │ 12:41:03 mo /status │
│ │ 12:41:02 claude debug query (382t) │
│ │ 12:39:10 ollama why mvg_v1 (95t) │
│ │ 12:38:54 mo /tier FLATTEN │
│ │ 12:38:54 executor set_kill_tier OK │
│ [ Send ] │ [ Load more ] │
├────────────────────────────┴─────────────────────────────────────────┤
│ FALLBACK METRICS (this month) │
│ ┌─────────────┬────────────┬───────────┬──────────┬───────────────┐│
│ │ Provider │ Calls │ Success% │ p50 ms │ Cost $ ││
│ ├─────────────┼────────────┼───────────┼──────────┼───────────────┤│
│ │ Ollama │ 1,412 │ 94.1% │ 620 │ $0.00 ││
│ │ Claude │ 38 │ 100% │ 1,850 │ $1.42 ││
│ │ OpenAI │ 4 │ 100% │ 1,210 │ $0.11 ││
│ └─────────────┴────────────┴───────────┴──────────┴───────────────┘│
│ Budget: $1.53 / $50.00 monthly cap │
└──────────────────────────────────────────────────────────────────────┘
10. Testing strategy
- Router: table-driven routing decisions, quality scoring fixtures, cache TTL, budget enforcement
- Tools: arg validation, confirm requirement negative tests, audit entry assertion
- Mock-Telegram: dispatcher fixtures, allowlist enforcement, confirm flow
- Integration: fake LLM emits tool call → executor → audit verified with
audit_log.verify_chain() - Frontend: page.test.tsx mocks all 3 API endpoints
Target: ≥ 85% line coverage on qgtm_ops/; every destructive tool has a confirm-required negative test.
11. Rollout
v0 — read-only + Ollama (1 week)
qgtm_ops/llm_router.pywith Ollama provider only- Cache + cost accounting scaffolding
- Commands:
/health,/why,/debug(no--claudeyet),/audit,/cost - No destructive commands, no dashboard, no Claude, no OpenAI
- Acceptance:
/debugreturns Ollama answer or deterministic fallback within 180s; non-allowlisted chat IDs get zero response; tests green
v0.5 — Claude fallback + quality router (1 week)
- Claude provider + quality-scored fallback for
debugandtool_use --claudeflag on/debug- Acceptance: low-quality Ollama → Claude escalation; cost accounting real; budget cap refuses above limit
v1 — destructive commands + dashboard + tool-use (2 weeks)
- Tool registry + executor + sudoers (staging first)
- Confirmation flow via Redis
- Dashboard
/opspage - Acceptance:
/tier FLATTENwithout confirm = no-op; with valid confirm = audited execution; dashboard Ask mirrors Telegram; rate limit kicks in
v1.5 — OpenAI fallback + full cost tracking (3 days)
- OpenAI as tier-2 fallback
- CostMetrics dashboard panel
- Acceptance: Claude 429 → OpenAI used + logged
12. Non-goals (v0)
- No voice input
- No bot-to-bot / autonomous tool-chain loops
- No destructive tools in v0
- No fine-tuning
- No streaming in Telegram flow
- No RAG / vector DB
- No multi-group coordination
13. Risks
| # | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| R1 | Telegram chat-ID spoofing | Low | Catastrophic | Allowlist + typed confirm + rate limit + Telegram-alert on any destructive |
| R2 | Ollama hallucination triggering destructive action | Medium | Catastrophic | LLM can only propose; human types confirm; tool registry is hard allowlist |
| R3 | API key leakage | Medium | High | Keys only in /opt/trading/.env, masked in logs, audit doesn't store prompt body |
| R4 | Cost blowup | Medium | Medium | Hard monthly budget + per-minute rate cap + circuit breaker on 5 consecutive errors |
| R5 | Sudoers misconfiguration | Low | Catastrophic | Golden file in repo + visudo -cf in deploy CI + 0440 perms |
| R6 | Confirm token brute-force | Very low | Catastrophic | 3 attempts/min/chat, one-time, tied to pending ToolCall |
| R7 | Redis outage drops confirms | Low | Medium | Re-issue on failure, systemd restart, docs flag UX |
| R8 | Ops bot becomes daemon dependency | Low | Medium | Separate process, daemon never imports qgtm_ops, flag-gated |
| R9 | Bad ops_bot deploy breaks control channel | Medium | Medium | Additive design — existing handlers keep working via current dispatcher |
14. Engineering cost — v0
- Production LOC: ~760 new + ~90 modified
- Test LOC: ~780
- Test count: 62 new tests
- Calendar: ~1 week (5 dev days + 3 days review/bake/runbooks)
15. Top 3 risks (summary)
- LLM hallucination driving a destructive action — mitigated by the "LLM proposes, human confirms" invariant. Load-bearing file:
test_ops_confirm.pywith ≥ 8 negative tests. - Sudoers widening root privilege on the droplet — golden file in
infra/systemd/sudoers.d/qgtm-ops,visudo -cfvalidation in CI, 0440 perms. - Cost blowup via runaway loop or compromised chat ID — hard monthly budget cap, per-minute rate cap, circuit breaker on consecutive errors, audit alerts on any single call > $0.25.
Critical files for implementation
qgtm_bot/telegram.py— host for ops command dispatchqgtm_bot/intelligence.py— Ollama client pattern to extract intoqgtm_ops/providers/ollama.pyqgtm_core/config.py— all new settingsqgtm_core/audit_log.py— append-only Merkle-chained audit logqgtm_web/src/app/ask/page.tsx— closest existing pattern for new/opsdashboard