Status & Roadmap
Production readiness gates, current evaluation results, and the sequenced roadmap for the Hyperliquid module.
Production readiness
Tier 0: Search Advisory — READY
All 13 gates pass (evaluated 2026-02-22, 24-query benchmark):
| Gate | Threshold | Observed | Status |
|---|---|---|---|
| ANN-inline execution success | >= 99% | 100.0% | Pass |
| ANN-inline SQL alignment | >= 95% | 100.0% | Pass |
| Retrieval top-1 hit rate | >= 85% | 91.1% | Pass |
| Retrieval NDCG@K | >= 90% | 95.3% | Pass |
| ANN recall@256 | >= 45% | 47.7% | Pass |
| Retrieval risk level | <= medium | medium | Pass |
| Router eval@2 | >= 90% | 91.5% | Pass |
| Invalid action rate | <= 3% | 0.0% | Pass |
| Penalty rate | <= 5% | 3.1% | Pass |
| BQ integrity (baseline) | pass | pass | Pass |
| BQ integrity (ANN-inline) | pass | pass | Pass |
| Execution success rate | >= 95% | 100.0% | Pass |
| Data provenance | implemented | yes | Pass |
Capability: Semantic trader search, cohort discovery, explanation payloads.
Tier 1: Copytrade Assist — READY
Adds allocation plan proposals and preview-only execution (no live commit). All Tier 0 gates apply plus 7 control-plane tests (all pass):
- Preview accepts valid plan
- Preview rejects disallowed coin
- Preview rejects weight violations
- Commit requires preview OK
- Kill switch blocks commit
- Idempotency replay
Tier 2: Copytrade Execute — PROVISIONAL
Adds live commit with risk policy enforcement. Simulation harness passes, but no live exchange integration tested. The /copytrade/execute/commit endpoint should remain restricted until:
- Authenticated order path dry-runs against Hyperliquid endpoint
- Error/retry semantics under venue failures validated
- Reconciliation against fill + position state tested
- Policy violation handling + rollback tested
What works end-to-end
| Intent Family | Template | Status | Evidence |
|---|---|---|---|
| risk_regime | risk_regime_stress_scoring_daily | Working | Sandbox t002: success (4/5) |
| whale_ranking | whale_ranking_by_position_daily | Working | Sandbox t011: partial (needs chaining) |
| screening | composite_trader_screening_daily | Working | Sandbox t007: success (4/5) |
| anomaly | anomaly_zscore_detection_daily | Working | Sandbox t009: success (4/5) |
| counterfactual | counterfactual_stop_policy_bucket_proxy | Working | Sandbox t006: success (4/5) |
| similarity | similarity_profile_scoring_daily | Partial | Sandbox t003: partial (address validation) |
| copy_lag | copy_lag_pairwise_corr_daily | Broken | Sandbox t004, t012, t015: blocked |
| help | (internal) | Working | Sandbox t018: success |
Known blockers
Single-address copytrade (critical)
copy_lag_pairwise_corr_daily SQL self-join returns 0 rows when only 1 address in shortlist. Cannot discover followers of a single whale automatically.
Workaround: Provide both --leader and --follower addresses explicitly.
Multi-coin query execution
When mentions_multi_coin=True, pipeline returns coin=None, template=None instead of iterating per coin.
Workaround: Run separate single-coin queries and compare manually.
Intent classification (regex_v1)
Regex-based cascading classifier. Works for 6/8 intent families but misroutes on:
- Negation ("NOT high risk" → should be screening, not risk_regime)
- Multi-intent ("find whales then check for copy-trading")
- Ambiguous risk/screening boundary
Status: Ownership transferred to RL team via REQ-006.
Competitive landscape
Surveyed 12 Hyperliquid-related tools (March 2026):
| Capability | Any Competitor? | HLQ |
|---|---|---|
| Natural language trader query | None | Yes |
| Behavioral embeddings / similarity search | None | Yes |
| Semantic trader profiling API | None | Yes |
| Copy-trading detection | Copin, Hyperdash, Hyperbot | Yes |
| Whale tracking | CoinGlass, HyperTracker | Yes |
| Anomaly detection | None | Yes |
| Counterfactual analysis | None | Yes |
| Per-result provenance | None | Yes |
HLQ occupies an uncontested position in NL query + behavioral embeddings for Hyperliquid.
Roadmap
Near-term (weeks)
- Freeze the harness contract — stabilize replay-pack schema, action names, result handles, working-set semantics, and terminal outputs.
- Frontier-operated harness baseline — run the Hyperliquid harness with Codex OAuth as the primary planner/operator and establish reference traces.
- Fix critical live-path gaps — single-address copytrade and multi-coin execution remain important if the legacy one-shot path stays exposed.
Medium-term (months)
- Targeted small models — add bounded specialists for action hints, keep/drop/prune, abstain calibration, and verifier/reranker checks.
- Monitoring & alerts — build alerting and change detection on top of the same harness and provenance contracts.
- Remote surfaces on one contract — keep CLI, MCP, and web surfaces aligned around the same replayable harness semantics.
Long-term (quarters)
- Broader learned search policy — only after the harness contract is stable and measurable.
- Richer bridge views — expand the bridge DSL carefully without letting the environment quietly do the reasoning.
- Cross-domain investigation — broaden beyond Hyperliquid only if provenance, replayability, and bounded state remain intact.
Related documents
| Document | Location |
|---|---|
| Agentic Search Proposal | ~/work/AGENTIC_SEARCH_PROPOSAL.md |
| MVP Approach Comparison | ~/work/MVP_APPROACH_COMPARISON.md |
| Behavioral Encoder Notes | ~/work/BEHAVIORAL_ENCODER_BREAKTHROUGH_NOTES.md |
| Policy Delta Roadmap | ~/work/hlq/POLICY_DELTA_ROADMAP.md |
| RL Requests | ~/work/hlq/RL_REQUESTS.md |
| Dev Notebook | ~/work/hlq/NOTEBOOK.md |