Plan: Telegram Ops Bot + LLM Router + Agent-on-Server

Status: Proposed — not yet implemented Author: Planning agent, for Mo Date: 2026-04-16

0. Executive brief

Extend the existing qgtm_bot/ Telegram integration into a full ops interface that (a) answers operational questions, (b) can safely execute a narrow set of privileged actions on the DO droplet with human-in-the-loop confirmation, and (c) is backed by a resilient LLM router that prefers local Ollama but escalates to Claude / OpenAI when local quality or availability isn't good enough. Surface the same capability in the Next.js dashboard via a new /ops page.

The plan is explicit: the LLM is never allowed to mutate state without a human typing a confirm token. "Intelligent" here means the router picks the best answerer; the actor is always a constrained tool executor gated by a 2-step Telegram confirm.

1. Architecture

flowchart LR
  subgraph Clients
    TG[Telegram app<br/>@qgtm_trading_bot]
    WEB[qgtm_web /ops page]
  end

  subgraph Bot["qgtm_bot (existing process, extended)"]
    LISTEN[command_listener<br/>long-poll]
    DISPATCH[ops_commands dispatcher]
    AUDIT[audit trail writer]
    CONFIRM[pending_confirms registry<br/>Redis TTL]
  end

  subgraph Ops["qgtm_ops (new package)"]
    ROUTER[llm_router]
    TOOLS[tool_registry<br/>+ executor]
    QUALITY[quality_scorer]
    CACHE[response_cache<br/>Redis]
  end

  subgraph LLMs
    OLLAMA[(Ollama<br/>llama3.2:3b<br/>localhost:11434)]
    CLAUDE[(Anthropic API)]
    OPENAI[(OpenAI API)]
  end

  subgraph Server["DO droplet"]
    DAEMON[qgtm-daemon<br/>systemd unit]
    API[qgtm-api<br/>systemd unit]
    REDIS[(Redis)]
  end

  TG -->|getUpdates| LISTEN
  WEB -->|POST /api/v1/ops/ask| API
  API --> ROUTER
  LISTEN --> DISPATCH
  DISPATCH --> ROUTER
  DISPATCH --> CONFIRM
  DISPATCH --> AUDIT
  ROUTER -->|primary| OLLAMA
  ROUTER -->|fallback 1| CLAUDE
  ROUTER -->|fallback 2| OPENAI
  ROUTER --> QUALITY
  ROUTER --> CACHE
  CACHE --- REDIS
  CONFIRM --- REDIS
  ROUTER -->|tool-use JSON| TOOLS
  TOOLS -->|systemctl restart qgtm-daemon| DAEMON
  TOOLS -->|POST /v1/risk/kill_tier| API
  TOOLS -->|read journalctl| DAEMON
  AUDIT --> REDIS

2. Module breakdown

2.1 New package — `qgtm_ops/`

File	Responsibility
`qgtm_ops/__init__.py`	Package marker
`qgtm_ops/llm_router.py`	Primary router — route / score / fall through / cache / cost-account
`qgtm_ops/providers/ollama.py`	Thin wrapper on existing `qgtm_bot.intelligence._query_ollama` pattern
`qgtm_ops/providers/anthropic.py`	Claude API client (Messages API, tool use)
`qgtm_ops/providers/openai.py`	OpenAI Chat Completions client (function calling)
`qgtm_ops/quality.py`	`score_response()`, `self_critique()`, heuristic checks
`qgtm_ops/cache.py`	Redis-backed response cache with SHA256-keyed prompts
`qgtm_ops/cost.py`	Token & dollar accounting; writes to Redis hash `qgtm:ops:cost:{yyyymm}`
`qgtm_ops/tools/registry.py`	Tool schema definitions (name, args, effect, confirm_required)
`qgtm_ops/tools/executor.py`	Safe executor; only invokes whitelisted tools; records audit
`qgtm_ops/tools/impl/daemon.py`	`daemon_restart`, `daemon_status`, `fetch_logs`
`qgtm_ops/tools/impl/risk.py`	`set_kill_tier`, `flatten_all`, `run_reconciliation`
`qgtm_ops/tools/impl/strategy.py`	`toggle_strategy`, `rebalance_now`
`qgtm_ops/audit.py`	Re-uses `qgtm_core.audit_log.AuditEntry`; adds `OPS_COMMAND` event type
`qgtm_ops/confirm.py`	`request_confirm()` / `verify_confirm()` — Redis TTL-backed 2-step flow

2.2 Extensions to existing modules

File	Change
`qgtm_bot/telegram.py`	Add ops command branch; no mutation inside the router — everything destructive goes through `qgtm_ops.tools.executor`. Chat-ID allowlist moved to `settings.telegram_ops_allowlist`.
`qgtm_bot/ops_commands.py`	NEW — handlers for `/status` (alias existing), `/why`, `/debug`, `/restart`, `/flatten`, `/tier`, `/rebalance`, `/toggle`, `/health`
`qgtm_core/config.py`	Add: `anthropic_api_key`, `openai_api_key`, `llm_router_mode`, `llm_router_quality_threshold`, `llm_cache_ttl_sec`, `telegram_ops_allowlist`, `telegram_ops_confirm_ttl_sec`, `ops_bot_enabled`, `ops_monthly_budget_usd`
`qgtm_core/audit_log.py`	Add `OPS_COMMAND` + `OPS_TOOL_EXECUTION` to `AuditEventType` enum
`qgtm_api/ops_routes.py`	NEW — `POST /api/v1/ops/ask`, `POST /api/v1/ops/execute`, `GET /api/v1/ops/audit?limit=N`, `GET /api/v1/ops/cost`
`qgtm_api/main.py`	Register `ops_routes.router`
`qgtm_web/src/app/ops/page.tsx`	NEW — dashboard integration
`qgtm_web/src/app/ops/components/ModelSelector.tsx`	NEW — pin / auto toggle
`qgtm_web/src/app/ops/components/AuditTail.tsx`	NEW — live poll of audit
`qgtm_web/src/app/ops/components/AskBox.tsx`	NEW — mirrors `/ask` but hits `/ops/ask`
`qgtm_web/src/app/ops/components/CostMetrics.tsx`	NEW — per-provider spend

2.3 Tests

File	Scope
`tests/test_llm_router.py`	Unit: routing logic, fallback cascade, cache hits, quality threshold
`tests/test_llm_router_quality.py`	Unit: scoring heuristics with synthetic good/bad responses
`tests/test_ops_tools.py`	Unit: each tool's arg validation + confirm requirement
`tests/test_ops_confirm.py`	Unit: confirm token generation, TTL expiry, wrong-token rejection
`tests/test_ops_bot_mock.py`	Mock-Telegram updates in → expected outputs
`tests/test_ops_integration.py`	End-to-end: fake LLM → tool proposal → confirm → executor no-op → audit entry
`tests/test_ops_auth.py`	Non-allowlisted chat IDs rejected for every command
`tests/test_ops_routes.py`	API integration — auth, ask, execute, audit tail
`qgtm_web/src/app/ops/__tests__/page.test.tsx`	React component tests

3. API contract — `qgtm_ops/llm_router.py`

class Provider(StrEnum):
    OLLAMA = "ollama"
    CLAUDE = "claude"
    OPENAI = "openai"

class RouteMode(StrEnum):
    AUTO = "auto"
    OLLAMA = "ollama"
    CLAUDE = "claude"
    OPENAI = "openai"

@dataclass(frozen=True)
class RouterRequest:
    prompt: str
    kind: Literal["debug", "status", "why", "tool_use", "brief", "eod", "free"]
    pin: RouteMode = RouteMode.AUTO
    tools: list[ToolSpec] | None = None
    context: dict[str, Any] | None = None
    user_id: str
    correlation_id: str
    cache: bool = True

@dataclass(frozen=True)
class RouterResponse:
    text: str
    tool_calls: list[ToolCall]
    provider: Provider
    quality_score: float
    latency_ms: int
    tokens_in: int
    tokens_out: int
    cost_usd: float
    cache_hit: bool
    fallbacks_used: list[Provider]

async def route(req: RouterRequest) -> RouterResponse: ...
async def health() -> dict[Provider, bool]: ...
async def costs(month: str | None = None) -> dict[Provider, dict[str, float]]: ...
async def reset_cache() -> int: ...

Supporting:

# qgtm_ops/quality.py
def score_response(
    prompt: str,
    response: str,
    kind: str,
    *,
    ground_truth_keywords: set[str] | None = None,
) -> float: ...

async def self_critique(response: str, critic: Provider = Provider.OLLAMA) -> float: ...

# qgtm_ops/tools/registry.py
@dataclass(frozen=True)
class ToolSpec:
    name: str
    description: str
    params_schema: dict[str, Any]
    effect: Literal["read", "mutate", "destructive"]
    confirm_required: bool
    min_role: Literal["owner", "founder"]

def get_all_tools() -> list[ToolSpec]: ...
def get_tool(name: str) -> ToolSpec | None: ...

# qgtm_ops/tools/executor.py
@dataclass(frozen=True)
class ToolCall:
    tool: str
    args: dict[str, Any]

@dataclass(frozen=True)
class ToolResult:
    ok: bool
    output: str
    duration_ms: int
    audit_sequence: int

async def execute(
    call: ToolCall,
    *,
    actor_id: str,
    confirmed_token: str | None,
    correlation_id: str,
) -> ToolResult: ...

4. Telegram command spec

All commands scoped by chat-ID allowlist (config: telegram_ops_allowlist). Non-allowlisted chat IDs get no reply. Known IDs: Mo 6902420777, Naz 8383804765.

Read-only (tier: `owner`)

Cmd	Args	Behavior	Router kind
`/status`	—	Existing handler	n/a
`/positions`	—	Existing handler	n/a
`/pnl`	—	Existing handler	n/a
`/health`	—	NEW: daemon heartbeat age, redis ping, alpaca ping, Ollama ping, Claude/OpenAI health	n/a
`/why`	`<strategy_id>`	NEW: explain current signals for one strategy	`why`
`/debug`	`<freeform>` or `--claude <freeform>`	NEW: freeform router Q&A	`debug`
`/audit`	`[limit]`	NEW: last N ops audit entries	n/a
`/cost`	—	NEW: LLM spend this month per provider	n/a

Destructive (tier: `founder`)

Every destructive command uses a two-step typed confirm:

User types the command (e.g. /restart)
Bot replies: Confirm by sending: CONFIRM <token> — 6-char random string, 120s TTL
User types CONFIRM <token> from the same chat ID
Bot executes, replies with outcome + audit sequence

Cmd	Args	Tool invoked	Effect
`/restart`	—	`daemon_restart`	`systemctl restart qgtm-daemon`
`/flatten`	—	`flatten_all`	Sets kill tier FLATTEN, cancels all orders, closes all positions
`/tier`	`NORMAL\\|WARN\\|NO_NEW\\|FLATTEN`	`set_kill_tier`	POST to risk API
`/rebalance`	—	`rebalance_now`	Triggers one-shot rebalance
`/toggle`	`<strategy_id>`	`toggle_strategy`	Toggle strategy feature flag
`/recon`	—	`run_reconciliation`	Triggers reconciliation
`/logs`	`<service> [lines]`	`fetch_logs`	Read-only, gated

Confirmation flow:

user: /tier FLATTEN
bot:  WARNING: set kill tier to FLATTEN will halt new orders and flatten positions.
      Type "CONFIRM A3F9K2" within 120s to proceed, or /cancel to abort.
user: CONFIRM A3F9K2
bot:  Executed set_kill_tier(FLATTEN). audit_seq=84211. kill tier is now FLATTEN.

5. Tool-use layer — exact list

Tool	Effect	Confirm	Implementation
`daemon_status`	read	no	Read `/var/run/qgtm/daemon_state.json`
`fetch_logs(service, lines=200)`	read	no	`journalctl -u {service} -n {lines}` with service allowlist
`daemon_restart()`	destructive	YES	`sudo systemctl restart qgtm-daemon` via sudoers NOPASSWD
`set_kill_tier(tier)`	destructive	YES	`POST /api/v1/risk/kill_tier`
`flatten_all()`	destructive	YES	Composite: tier=FLATTEN → cancel_all → close_all
`rebalance_now()`	destructive	YES	`POST /api/v1/daemon/rebalance`
`toggle_strategy(strategy_id, enabled)`	destructive	YES	`POST /api/v1/strategies/{id}/toggle`
`run_reconciliation()`	destructive	YES	`POST /api/v1/daemon/reconcile`
`get_pnl()`, `get_positions()`, `get_signals()`	read	no	Existing handlers

Guard rails: - Executor validates tool in registry — no arbitrary names - effect==destructive and not confirmed_token → refuse - Every tool execution appends OPS_TOOL_EXECUTION audit entry + Telegram alert - Idempotency via correlation_id; duplicates within 5s rejected - Rate limit: per-chat-ID, max 6 destructive calls per hour

Sudoers file — /etc/sudoers.d/qgtm-ops:

qgtm ALL=(root) NOPASSWD: /bin/systemctl restart qgtm-daemon
qgtm ALL=(root) NOPASSWD: /bin/systemctl restart qgtm-api
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-daemon *
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-api *
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u redis-server *

6. Secrets

Secret	Source	Purpose	Rotate by
`ANTHROPIC_API_KEY`	https://console.anthropic.com/settings/keys	Claude fallback	Generate new key → GH Actions Secrets → redeploy
`OPENAI_API_KEY`	https://platform.openai.com/api-keys	OpenAI fallback	Same flow
`TELEGRAM_OPS_ALLOWLIST`	manual — comma-separated chat IDs	Auth	Edit secret → redeploy
`QGTM_LLM_ROUTER_MODE`	`auto`\|`ollama`\|`claude`\|`openai`	Pin mode	Redeploy

Config additions:

anthropic_api_key: str = ""
openai_api_key: str = ""
llm_router_mode: Literal["auto", "ollama", "claude", "openai"] = "auto"
llm_router_quality_threshold: float = 0.6
llm_cache_ttl_sec: int = 900
telegram_ops_allowlist: str = ""
telegram_ops_confirm_ttl_sec: int = 120
ops_bot_enabled: bool = False
ops_monthly_budget_usd: float = 50.0

7. Router quality-scoring algorithm

Compound score in [0.0, 1.0]; < threshold (default 0.6) triggers escalation.

score(response, kind, prompt, ground_truth):
  s_length  = clip(len(response) in [50, 1200] -> linear map [0,1], else 0)
  s_refusal = 0 if response starts with ("i cannot", "as an ai", "i'm sorry") else 1
  s_halluc  = 1 - count(hallucinated_tickers(response) not in known_universe) / max(count(tickers(response)),1)
  s_format  = 1 if response matches required_sections(kind) else 0.5
  s_gt      = jaccard(tokens(response), ground_truth) if ground_truth else 1.0
  weights = kind_weights[kind]
  return sum(w_i * s_i) / sum(w_i)

Optional self-critique: second Ollama call grades its own answer 0-10.

kind	threshold	self-critique
`status` / `why`	0.5	off
`debug`	0.6	on
`tool_use`	0.7 (JSON parse must succeed)	off
`brief` / `eod`	0.55	off
`free`	0.6	on

Escalation tree:

1. If req.pin != AUTO, hit that provider only.
2. Try OLLAMA.
   a. Timeout > 150s OR 5xx OR OOM → fall through.
   b. Score < threshold[kind] AND kind in {debug, tool_use, brief} → fall through.
   c. Else return Ollama.
3. Try CLAUDE (if key set AND budget not exceeded).
   a. 429 / 5xx / no key → fall through.
   b. Else return Claude.
4. Try OPENAI (same rules).
5. All fail → return hand-crafted deterministic fallback.

8. Caching

Backend: Redis
Key: qgtm:ops:cache:{sha256(prompt + kind + pinned_provider)}
Value: JSON {text, provider, quality_score, tokens, cost, ts}
TTL: llm_cache_ttl_sec (default 900s)
Bypass: tool_use, time-sensitive context (daemon_state), pin != AUTO with debug kind
Invalidate on: kill-tier change, /cancel_cache by owner

9. Dashboard wireframe — `/ops`

┌──────────────────────────────────────────────────────────────────────┐
│  QGTM / OPS                                  Mo  | logout             │
├──────────────────────────────────────────────────────────────────────┤
│  MODEL SELECTOR                                                       │
│  [ Auto ] [ Ollama ] [ Claude ] [ OpenAI ]       Quality threshold:  │
│  current: Auto (Ollama primary)                  [ 0.60 ▼ ]  [Save]  │
├────────────────────────────┬─────────────────────────────────────────┤
│  ASK                       │  AUDIT TAIL (live, polls every 3s)     │
│  > _                       │  12:41:03  mo       /status            │
│                            │  12:41:02  claude   debug query (382t)  │
│                            │  12:39:10  ollama   why mvg_v1 (95t)    │
│                            │  12:38:54  mo       /tier FLATTEN       │
│                            │  12:38:54  executor set_kill_tier OK    │
│  [ Send ]                  │  [ Load more ]                          │
├────────────────────────────┴─────────────────────────────────────────┤
│  FALLBACK METRICS (this month)                                       │
│  ┌─────────────┬────────────┬───────────┬──────────┬───────────────┐│
│  │ Provider    │ Calls      │ Success%  │ p50 ms   │  Cost $       ││
│  ├─────────────┼────────────┼───────────┼──────────┼───────────────┤│
│  │ Ollama      │ 1,412      │ 94.1%     │ 620      │  $0.00        ││
│  │ Claude      │ 38         │ 100%      │ 1,850    │  $1.42        ││
│  │ OpenAI      │ 4          │ 100%      │ 1,210    │  $0.11        ││
│  └─────────────┴────────────┴───────────┴──────────┴───────────────┘│
│  Budget: $1.53 / $50.00 monthly cap                                  │
└──────────────────────────────────────────────────────────────────────┘

10. Testing strategy

Router: table-driven routing decisions, quality scoring fixtures, cache TTL, budget enforcement
Tools: arg validation, confirm requirement negative tests, audit entry assertion
Mock-Telegram: dispatcher fixtures, allowlist enforcement, confirm flow
Integration: fake LLM emits tool call → executor → audit verified with audit_log.verify_chain()
Frontend: page.test.tsx mocks all 3 API endpoints

Target: ≥ 85% line coverage on qgtm_ops/; every destructive tool has a confirm-required negative test.

11. Rollout

v0 — read-only + Ollama (1 week)

qgtm_ops/llm_router.py with Ollama provider only
Cache + cost accounting scaffolding
Commands: /health, /why, /debug (no --claude yet), /audit, /cost
No destructive commands, no dashboard, no Claude, no OpenAI
Acceptance: /debug returns Ollama answer or deterministic fallback within 180s; non-allowlisted chat IDs get zero response; tests green

v0.5 — Claude fallback + quality router (1 week)

Claude provider + quality-scored fallback for debug and tool_use
--claude flag on /debug
Acceptance: low-quality Ollama → Claude escalation; cost accounting real; budget cap refuses above limit

v1 — destructive commands + dashboard + tool-use (2 weeks)

Tool registry + executor + sudoers (staging first)
Confirmation flow via Redis
Dashboard /ops page
Acceptance: /tier FLATTEN without confirm = no-op; with valid confirm = audited execution; dashboard Ask mirrors Telegram; rate limit kicks in

v1.5 — OpenAI fallback + full cost tracking (3 days)

OpenAI as tier-2 fallback
CostMetrics dashboard panel
Acceptance: Claude 429 → OpenAI used + logged

12. Non-goals (v0)

No voice input
No bot-to-bot / autonomous tool-chain loops
No destructive tools in v0
No fine-tuning
No streaming in Telegram flow
No RAG / vector DB
No multi-group coordination

13. Risks

#	Risk	Likelihood	Impact	Mitigation
R1	Telegram chat-ID spoofing	Low	Catastrophic	Allowlist + typed confirm + rate limit + Telegram-alert on any destructive
R2	Ollama hallucination triggering destructive action	Medium	Catastrophic	LLM can only propose; human types confirm; tool registry is hard allowlist
R3	API key leakage	Medium	High	Keys only in `/opt/trading/.env`, masked in logs, audit doesn't store prompt body
R4	Cost blowup	Medium	Medium	Hard monthly budget + per-minute rate cap + circuit breaker on 5 consecutive errors
R5	Sudoers misconfiguration	Low	Catastrophic	Golden file in repo + `visudo -cf` in deploy CI + 0440 perms
R6	Confirm token brute-force	Very low	Catastrophic	3 attempts/min/chat, one-time, tied to pending ToolCall
R7	Redis outage drops confirms	Low	Medium	Re-issue on failure, systemd restart, docs flag UX
R8	Ops bot becomes daemon dependency	Low	Medium	Separate process, daemon never imports `qgtm_ops`, flag-gated
R9	Bad ops_bot deploy breaks control channel	Medium	Medium	Additive design — existing handlers keep working via current dispatcher

14. Engineering cost — v0

Production LOC: ~760 new + ~90 modified
Test LOC: ~780
Test count: 62 new tests
Calendar: ~1 week (5 dev days + 3 days review/bake/runbooks)

15. Top 3 risks (summary)

LLM hallucination driving a destructive action — mitigated by the "LLM proposes, human confirms" invariant. Load-bearing file: test_ops_confirm.py with ≥ 8 negative tests.
Sudoers widening root privilege on the droplet — golden file in infra/systemd/sudoers.d/qgtm-ops, visudo -cf validation in CI, 0440 perms.
Cost blowup via runaway loop or compromised chat ID — hard monthly budget cap, per-minute rate cap, circuit breaker on consecutive errors, audit alerts on any single call > $0.25.

Critical files for implementation

qgtm_bot/telegram.py — host for ops command dispatch
qgtm_bot/intelligence.py — Ollama client pattern to extract into qgtm_ops/providers/ollama.py
qgtm_core/config.py — all new settings
qgtm_core/audit_log.py — append-only Merkle-chained audit log
qgtm_web/src/app/ask/page.tsx — closest existing pattern for new /ops dashboard