Skip to content

Plan: Telegram Ops Bot + LLM Router + Agent-on-Server

Status: Proposed — not yet implemented Author: Planning agent, for Mo Date: 2026-04-16

0. Executive brief

Extend the existing qgtm_bot/ Telegram integration into a full ops interface that (a) answers operational questions, (b) can safely execute a narrow set of privileged actions on the DO droplet with human-in-the-loop confirmation, and (c) is backed by a resilient LLM router that prefers local Ollama but escalates to Claude / OpenAI when local quality or availability isn't good enough. Surface the same capability in the Next.js dashboard via a new /ops page.

The plan is explicit: the LLM is never allowed to mutate state without a human typing a confirm token. "Intelligent" here means the router picks the best answerer; the actor is always a constrained tool executor gated by a 2-step Telegram confirm.

1. Architecture

flowchart LR
  subgraph Clients
    TG[Telegram app<br/>@qgtm_trading_bot]
    WEB[qgtm_web /ops page]
  end

  subgraph Bot["qgtm_bot (existing process, extended)"]
    LISTEN[command_listener<br/>long-poll]
    DISPATCH[ops_commands dispatcher]
    AUDIT[audit trail writer]
    CONFIRM[pending_confirms registry<br/>Redis TTL]
  end

  subgraph Ops["qgtm_ops (new package)"]
    ROUTER[llm_router]
    TOOLS[tool_registry<br/>+ executor]
    QUALITY[quality_scorer]
    CACHE[response_cache<br/>Redis]
  end

  subgraph LLMs
    OLLAMA[(Ollama<br/>llama3.2:3b<br/>localhost:11434)]
    CLAUDE[(Anthropic API)]
    OPENAI[(OpenAI API)]
  end

  subgraph Server["DO droplet"]
    DAEMON[qgtm-daemon<br/>systemd unit]
    API[qgtm-api<br/>systemd unit]
    REDIS[(Redis)]
  end

  TG -->|getUpdates| LISTEN
  WEB -->|POST /api/v1/ops/ask| API
  API --> ROUTER
  LISTEN --> DISPATCH
  DISPATCH --> ROUTER
  DISPATCH --> CONFIRM
  DISPATCH --> AUDIT
  ROUTER -->|primary| OLLAMA
  ROUTER -->|fallback 1| CLAUDE
  ROUTER -->|fallback 2| OPENAI
  ROUTER --> QUALITY
  ROUTER --> CACHE
  CACHE --- REDIS
  CONFIRM --- REDIS
  ROUTER -->|tool-use JSON| TOOLS
  TOOLS -->|systemctl restart qgtm-daemon| DAEMON
  TOOLS -->|POST /v1/risk/kill_tier| API
  TOOLS -->|read journalctl| DAEMON
  AUDIT --> REDIS

2. Module breakdown

2.1 New package — qgtm_ops/

File Responsibility
qgtm_ops/__init__.py Package marker
qgtm_ops/llm_router.py Primary router — route / score / fall through / cache / cost-account
qgtm_ops/providers/ollama.py Thin wrapper on existing qgtm_bot.intelligence._query_ollama pattern
qgtm_ops/providers/anthropic.py Claude API client (Messages API, tool use)
qgtm_ops/providers/openai.py OpenAI Chat Completions client (function calling)
qgtm_ops/quality.py score_response(), self_critique(), heuristic checks
qgtm_ops/cache.py Redis-backed response cache with SHA256-keyed prompts
qgtm_ops/cost.py Token & dollar accounting; writes to Redis hash qgtm:ops:cost:{yyyymm}
qgtm_ops/tools/registry.py Tool schema definitions (name, args, effect, confirm_required)
qgtm_ops/tools/executor.py Safe executor; only invokes whitelisted tools; records audit
qgtm_ops/tools/impl/daemon.py daemon_restart, daemon_status, fetch_logs
qgtm_ops/tools/impl/risk.py set_kill_tier, flatten_all, run_reconciliation
qgtm_ops/tools/impl/strategy.py toggle_strategy, rebalance_now
qgtm_ops/audit.py Re-uses qgtm_core.audit_log.AuditEntry; adds OPS_COMMAND event type
qgtm_ops/confirm.py request_confirm() / verify_confirm() — Redis TTL-backed 2-step flow

2.2 Extensions to existing modules

File Change
qgtm_bot/telegram.py Add ops command branch; no mutation inside the router — everything destructive goes through qgtm_ops.tools.executor. Chat-ID allowlist moved to settings.telegram_ops_allowlist.
qgtm_bot/ops_commands.py NEW — handlers for /status (alias existing), /why, /debug, /restart, /flatten, /tier, /rebalance, /toggle, /health
qgtm_core/config.py Add: anthropic_api_key, openai_api_key, llm_router_mode, llm_router_quality_threshold, llm_cache_ttl_sec, telegram_ops_allowlist, telegram_ops_confirm_ttl_sec, ops_bot_enabled, ops_monthly_budget_usd
qgtm_core/audit_log.py Add OPS_COMMAND + OPS_TOOL_EXECUTION to AuditEventType enum
qgtm_api/ops_routes.py NEW — POST /api/v1/ops/ask, POST /api/v1/ops/execute, GET /api/v1/ops/audit?limit=N, GET /api/v1/ops/cost
qgtm_api/main.py Register ops_routes.router
qgtm_web/src/app/ops/page.tsx NEW — dashboard integration
qgtm_web/src/app/ops/components/ModelSelector.tsx NEW — pin / auto toggle
qgtm_web/src/app/ops/components/AuditTail.tsx NEW — live poll of audit
qgtm_web/src/app/ops/components/AskBox.tsx NEW — mirrors /ask but hits /ops/ask
qgtm_web/src/app/ops/components/CostMetrics.tsx NEW — per-provider spend

2.3 Tests

File Scope
tests/test_llm_router.py Unit: routing logic, fallback cascade, cache hits, quality threshold
tests/test_llm_router_quality.py Unit: scoring heuristics with synthetic good/bad responses
tests/test_ops_tools.py Unit: each tool's arg validation + confirm requirement
tests/test_ops_confirm.py Unit: confirm token generation, TTL expiry, wrong-token rejection
tests/test_ops_bot_mock.py Mock-Telegram updates in → expected outputs
tests/test_ops_integration.py End-to-end: fake LLM → tool proposal → confirm → executor no-op → audit entry
tests/test_ops_auth.py Non-allowlisted chat IDs rejected for every command
tests/test_ops_routes.py API integration — auth, ask, execute, audit tail
qgtm_web/src/app/ops/__tests__/page.test.tsx React component tests

3. API contract — qgtm_ops/llm_router.py

class Provider(StrEnum):
    OLLAMA = "ollama"
    CLAUDE = "claude"
    OPENAI = "openai"

class RouteMode(StrEnum):
    AUTO = "auto"
    OLLAMA = "ollama"
    CLAUDE = "claude"
    OPENAI = "openai"

@dataclass(frozen=True)
class RouterRequest:
    prompt: str
    kind: Literal["debug", "status", "why", "tool_use", "brief", "eod", "free"]
    pin: RouteMode = RouteMode.AUTO
    tools: list[ToolSpec] | None = None
    context: dict[str, Any] | None = None
    user_id: str
    correlation_id: str
    cache: bool = True

@dataclass(frozen=True)
class RouterResponse:
    text: str
    tool_calls: list[ToolCall]
    provider: Provider
    quality_score: float
    latency_ms: int
    tokens_in: int
    tokens_out: int
    cost_usd: float
    cache_hit: bool
    fallbacks_used: list[Provider]

async def route(req: RouterRequest) -> RouterResponse: ...
async def health() -> dict[Provider, bool]: ...
async def costs(month: str | None = None) -> dict[Provider, dict[str, float]]: ...
async def reset_cache() -> int: ...

Supporting:

# qgtm_ops/quality.py
def score_response(
    prompt: str,
    response: str,
    kind: str,
    *,
    ground_truth_keywords: set[str] | None = None,
) -> float: ...

async def self_critique(response: str, critic: Provider = Provider.OLLAMA) -> float: ...

# qgtm_ops/tools/registry.py
@dataclass(frozen=True)
class ToolSpec:
    name: str
    description: str
    params_schema: dict[str, Any]
    effect: Literal["read", "mutate", "destructive"]
    confirm_required: bool
    min_role: Literal["owner", "founder"]

def get_all_tools() -> list[ToolSpec]: ...
def get_tool(name: str) -> ToolSpec | None: ...

# qgtm_ops/tools/executor.py
@dataclass(frozen=True)
class ToolCall:
    tool: str
    args: dict[str, Any]

@dataclass(frozen=True)
class ToolResult:
    ok: bool
    output: str
    duration_ms: int
    audit_sequence: int

async def execute(
    call: ToolCall,
    *,
    actor_id: str,
    confirmed_token: str | None,
    correlation_id: str,
) -> ToolResult: ...

4. Telegram command spec

All commands scoped by chat-ID allowlist (config: telegram_ops_allowlist). Non-allowlisted chat IDs get no reply. Known IDs: Mo 6902420777, Naz 8383804765.

Read-only (tier: owner)

Cmd Args Behavior Router kind
/status Existing handler n/a
/positions Existing handler n/a
/pnl Existing handler n/a
/health NEW: daemon heartbeat age, redis ping, alpaca ping, Ollama ping, Claude/OpenAI health n/a
/why <strategy_id> NEW: explain current signals for one strategy why
/debug <freeform> or --claude <freeform> NEW: freeform router Q&A debug
/audit [limit] NEW: last N ops audit entries n/a
/cost NEW: LLM spend this month per provider n/a

Destructive (tier: founder)

Every destructive command uses a two-step typed confirm:

  1. User types the command (e.g. /restart)
  2. Bot replies: Confirm by sending: CONFIRM <token> — 6-char random string, 120s TTL
  3. User types CONFIRM <token> from the same chat ID
  4. Bot executes, replies with outcome + audit sequence
Cmd Args Tool invoked Effect
/restart daemon_restart systemctl restart qgtm-daemon
/flatten flatten_all Sets kill tier FLATTEN, cancels all orders, closes all positions
/tier NORMAL\|WARN\|NO_NEW\|FLATTEN set_kill_tier POST to risk API
/rebalance rebalance_now Triggers one-shot rebalance
/toggle <strategy_id> toggle_strategy Toggle strategy feature flag
/recon run_reconciliation Triggers reconciliation
/logs <service> [lines] fetch_logs Read-only, gated

Confirmation flow:

user: /tier FLATTEN
bot:  WARNING: set kill tier to FLATTEN will halt new orders and flatten positions.
      Type "CONFIRM A3F9K2" within 120s to proceed, or /cancel to abort.
user: CONFIRM A3F9K2
bot:  Executed set_kill_tier(FLATTEN). audit_seq=84211. kill tier is now FLATTEN.

5. Tool-use layer — exact list

Tool Effect Confirm Implementation
daemon_status read no Read /var/run/qgtm/daemon_state.json
fetch_logs(service, lines=200) read no journalctl -u {service} -n {lines} with service allowlist
daemon_restart() destructive YES sudo systemctl restart qgtm-daemon via sudoers NOPASSWD
set_kill_tier(tier) destructive YES POST /api/v1/risk/kill_tier
flatten_all() destructive YES Composite: tier=FLATTEN → cancel_all → close_all
rebalance_now() destructive YES POST /api/v1/daemon/rebalance
toggle_strategy(strategy_id, enabled) destructive YES POST /api/v1/strategies/{id}/toggle
run_reconciliation() destructive YES POST /api/v1/daemon/reconcile
get_pnl(), get_positions(), get_signals() read no Existing handlers

Guard rails: - Executor validates tool in registry — no arbitrary names - effect==destructive and not confirmed_token → refuse - Every tool execution appends OPS_TOOL_EXECUTION audit entry + Telegram alert - Idempotency via correlation_id; duplicates within 5s rejected - Rate limit: per-chat-ID, max 6 destructive calls per hour

Sudoers file — /etc/sudoers.d/qgtm-ops:

qgtm ALL=(root) NOPASSWD: /bin/systemctl restart qgtm-daemon
qgtm ALL=(root) NOPASSWD: /bin/systemctl restart qgtm-api
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-daemon *
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-api *
qgtm ALL=(root) NOPASSWD: /bin/journalctl -u qgtm-redis *

6. Secrets

Secret Source Purpose Rotate by
ANTHROPIC_API_KEY https://console.anthropic.com/settings/keys Claude fallback Generate new key → GH Actions Secrets → redeploy
OPENAI_API_KEY https://platform.openai.com/api-keys OpenAI fallback Same flow
TELEGRAM_OPS_ALLOWLIST manual — comma-separated chat IDs Auth Edit secret → redeploy
QGTM_LLM_ROUTER_MODE auto|ollama|claude|openai Pin mode Redeploy

Config additions:

anthropic_api_key: str = ""
openai_api_key: str = ""
llm_router_mode: Literal["auto", "ollama", "claude", "openai"] = "auto"
llm_router_quality_threshold: float = 0.6
llm_cache_ttl_sec: int = 900
telegram_ops_allowlist: str = ""
telegram_ops_confirm_ttl_sec: int = 120
ops_bot_enabled: bool = False
ops_monthly_budget_usd: float = 50.0

7. Router quality-scoring algorithm

Compound score in [0.0, 1.0]; < threshold (default 0.6) triggers escalation.

score(response, kind, prompt, ground_truth):
  s_length  = clip(len(response) in [50, 1200] -> linear map [0,1], else 0)
  s_refusal = 0 if response starts with ("i cannot", "as an ai", "i'm sorry") else 1
  s_halluc  = 1 - count(hallucinated_tickers(response) not in known_universe) / max(count(tickers(response)),1)
  s_format  = 1 if response matches required_sections(kind) else 0.5
  s_gt      = jaccard(tokens(response), ground_truth) if ground_truth else 1.0
  weights = kind_weights[kind]
  return sum(w_i * s_i) / sum(w_i)

Optional self-critique: second Ollama call grades its own answer 0-10.

kind threshold self-critique
status / why 0.5 off
debug 0.6 on
tool_use 0.7 (JSON parse must succeed) off
brief / eod 0.55 off
free 0.6 on

Escalation tree:

1. If req.pin != AUTO, hit that provider only.
2. Try OLLAMA.
   a. Timeout > 150s OR 5xx OR OOM → fall through.
   b. Score < threshold[kind] AND kind in {debug, tool_use, brief} → fall through.
   c. Else return Ollama.
3. Try CLAUDE (if key set AND budget not exceeded).
   a. 429 / 5xx / no key → fall through.
   b. Else return Claude.
4. Try OPENAI (same rules).
5. All fail → return hand-crafted deterministic fallback.

8. Caching

  • Backend: Redis
  • Key: qgtm:ops:cache:{sha256(prompt + kind + pinned_provider)}
  • Value: JSON {text, provider, quality_score, tokens, cost, ts}
  • TTL: llm_cache_ttl_sec (default 900s)
  • Bypass: tool_use, time-sensitive context (daemon_state), pin != AUTO with debug kind
  • Invalidate on: kill-tier change, /cancel_cache by owner

9. Dashboard wireframe — /ops

┌──────────────────────────────────────────────────────────────────────┐
│  QGTM / OPS                                  Mo  | logout             │
├──────────────────────────────────────────────────────────────────────┤
│  MODEL SELECTOR                                                       │
│  [ Auto ] [ Ollama ] [ Claude ] [ OpenAI ]       Quality threshold:  │
│  current: Auto (Ollama primary)                  [ 0.60 ▼ ]  [Save]  │
├────────────────────────────┬─────────────────────────────────────────┤
│  ASK                       │  AUDIT TAIL (live, polls every 3s)     │
│  > _                       │  12:41:03  mo       /status            │
│                            │  12:41:02  claude   debug query (382t)  │
│                            │  12:39:10  ollama   why mvg_v1 (95t)    │
│                            │  12:38:54  mo       /tier FLATTEN       │
│                            │  12:38:54  executor set_kill_tier OK    │
│  [ Send ]                  │  [ Load more ]                          │
├────────────────────────────┴─────────────────────────────────────────┤
│  FALLBACK METRICS (this month)                                       │
│  ┌─────────────┬────────────┬───────────┬──────────┬───────────────┐│
│  │ Provider    │ Calls      │ Success%  │ p50 ms   │  Cost $       ││
│  ├─────────────┼────────────┼───────────┼──────────┼───────────────┤│
│  │ Ollama      │ 1,412      │ 94.1%     │ 620      │  $0.00        ││
│  │ Claude      │ 38         │ 100%      │ 1,850    │  $1.42        ││
│  │ OpenAI      │ 4          │ 100%      │ 1,210    │  $0.11        ││
│  └─────────────┴────────────┴───────────┴──────────┴───────────────┘│
│  Budget: $1.53 / $50.00 monthly cap                                  │
└──────────────────────────────────────────────────────────────────────┘

10. Testing strategy

  • Router: table-driven routing decisions, quality scoring fixtures, cache TTL, budget enforcement
  • Tools: arg validation, confirm requirement negative tests, audit entry assertion
  • Mock-Telegram: dispatcher fixtures, allowlist enforcement, confirm flow
  • Integration: fake LLM emits tool call → executor → audit verified with audit_log.verify_chain()
  • Frontend: page.test.tsx mocks all 3 API endpoints

Target: ≥ 85% line coverage on qgtm_ops/; every destructive tool has a confirm-required negative test.

11. Rollout

v0 — read-only + Ollama (1 week)

  • qgtm_ops/llm_router.py with Ollama provider only
  • Cache + cost accounting scaffolding
  • Commands: /health, /why, /debug (no --claude yet), /audit, /cost
  • No destructive commands, no dashboard, no Claude, no OpenAI
  • Acceptance: /debug returns Ollama answer or deterministic fallback within 180s; non-allowlisted chat IDs get zero response; tests green

v0.5 — Claude fallback + quality router (1 week)

  • Claude provider + quality-scored fallback for debug and tool_use
  • --claude flag on /debug
  • Acceptance: low-quality Ollama → Claude escalation; cost accounting real; budget cap refuses above limit

v1 — destructive commands + dashboard + tool-use (2 weeks)

  • Tool registry + executor + sudoers (staging first)
  • Confirmation flow via Redis
  • Dashboard /ops page
  • Acceptance: /tier FLATTEN without confirm = no-op; with valid confirm = audited execution; dashboard Ask mirrors Telegram; rate limit kicks in

v1.5 — OpenAI fallback + full cost tracking (3 days)

  • OpenAI as tier-2 fallback
  • CostMetrics dashboard panel
  • Acceptance: Claude 429 → OpenAI used + logged

12. Non-goals (v0)

  • No voice input
  • No bot-to-bot / autonomous tool-chain loops
  • No destructive tools in v0
  • No fine-tuning
  • No streaming in Telegram flow
  • No RAG / vector DB
  • No multi-group coordination

13. Risks

# Risk Likelihood Impact Mitigation
R1 Telegram chat-ID spoofing Low Catastrophic Allowlist + typed confirm + rate limit + Telegram-alert on any destructive
R2 Ollama hallucination triggering destructive action Medium Catastrophic LLM can only propose; human types confirm; tool registry is hard allowlist
R3 API key leakage Medium High Keys only in /opt/trading/.env, masked in logs, audit doesn't store prompt body
R4 Cost blowup Medium Medium Hard monthly budget + per-minute rate cap + circuit breaker on 5 consecutive errors
R5 Sudoers misconfiguration Low Catastrophic Golden file in repo + visudo -cf in deploy CI + 0440 perms
R6 Confirm token brute-force Very low Catastrophic 3 attempts/min/chat, one-time, tied to pending ToolCall
R7 Redis outage drops confirms Low Medium Re-issue on failure, systemd restart, docs flag UX
R8 Ops bot becomes daemon dependency Low Medium Separate process, daemon never imports qgtm_ops, flag-gated
R9 Bad ops_bot deploy breaks control channel Medium Medium Additive design — existing handlers keep working via current dispatcher

14. Engineering cost — v0

  • Production LOC: ~760 new + ~90 modified
  • Test LOC: ~780
  • Test count: 62 new tests
  • Calendar: ~1 week (5 dev days + 3 days review/bake/runbooks)

15. Top 3 risks (summary)

  1. LLM hallucination driving a destructive action — mitigated by the "LLM proposes, human confirms" invariant. Load-bearing file: test_ops_confirm.py with ≥ 8 negative tests.
  2. Sudoers widening root privilege on the droplet — golden file in infra/systemd/sudoers.d/qgtm-ops, visudo -cf validation in CI, 0440 perms.
  3. Cost blowup via runaway loop or compromised chat ID — hard monthly budget cap, per-minute rate cap, circuit breaker on consecutive errors, audit alerts on any single call > $0.25.

Critical files for implementation

  • qgtm_bot/telegram.py — host for ops command dispatch
  • qgtm_bot/intelligence.py — Ollama client pattern to extract into qgtm_ops/providers/ollama.py
  • qgtm_core/config.py — all new settings
  • qgtm_core/audit_log.py — append-only Merkle-chained audit log
  • qgtm_web/src/app/ask/page.tsx — closest existing pattern for new /ops dashboard