axe
Reference

Hyperliquid module — vision

What the Hyperliquid module is becoming — replayable search harness, frontier-planner architecture, deterministic substrate.

What the Hyperliquid module is becoming

The Hyperliquid intelligence module (codename HLQ) is evolving from a one-shot trader search pipeline into a replayable Hyperliquid search harness inside Axe.

The target shape is not "let the model do everything." The model plans and operates over a deterministic environment. The harness remains the source of truth for evidence, state transitions, provenance, and replay.

In practice, this means:

  • a pi-based Hyperliquid harness exposes bounded reads, working-set edits, and terminal decisions
  • a frontier model accessed through Codex OAuth acts as the heavyweight planner/operator
  • small domain-specific auxiliary models handle narrow, high-frequency tasks where latency, cost, and consistency matter
  • every run stays inspectable, replayable, and grounded in stable artifacts rather than hidden prompt state

Architecture at a glance

HLQ's target architecture has three layers.

1. Deterministic harness

The harness is the execution substrate. It defines:

  • bounded actions over replayable environment views
  • explicit keep/drop/prune working-set semantics
  • structured terminal outcomes, including finalize and abstain
  • trajectory logs and per-step provenance
  • stable replay packs for evaluation and debugging

This layer is the contract surface. It is where truth lives.

2. Frontier planner/operator

A heavyweight frontier model, accessed through Codex OAuth, sits on top of the harness.

Its job is to:

  • interpret the user brief
  • decide which harness actions to take next
  • decompose a task into shallow steps
  • package retained evidence into a final answer
  • operate conservatively when signal is weak

The frontier model is used for broad reasoning and flexible planning, not as the authority on facts.

3. Auxiliary domain models

Small specialized models are added where they clearly improve the system.

Early candidates include:

  • action policy selection
  • keep/drop/prune policy
  • abstain calibration
  • verifier or reranker passes on retained evidence

These models are not independent agents. They are cheap, narrow components that support the main planner and improve consistency on repeatable subproblems.

Why this architecture

This design follows a simple rule: use the frontier model for open-ended reasoning, and use small models for bounded domain decisions.

That gives HLQ four advantages.

Grounded execution

The harness, not the prompt, defines what the model can see and do. Evidence is retrieved through explicit actions and retained through explicit state changes.

Replayability

The same replay pack and the same code path should produce the same environment views and the same auditable trajectory. That makes debugging, evaluation, and regression testing possible.

Better economics

Heavy frontier calls are reserved for planning and synthesis. Narrow, repeated decisions can move to small models where latency and cost matter more than generality.

Safer iteration

We do not need to solve the full autonomous-search problem up front. We can improve planner quality, abstain behavior, and evidence management without changing the core harness contract.

Why not start with a separate parallel search model

We are not starting with an independent parallel search model as a second planner.

That path adds complexity before the harness contract and operating loop are mature. It creates another source of policy behavior to train, evaluate, and debug before we have enough evidence that the extra planner is necessary.

Instead, HLQ starts with:

  • one deterministic harness
  • one heavyweight planner/operator
  • a small number of targeted auxiliary models for narrow tasks

If a more independent search policy becomes useful later, it can be introduced against a stable harness and measured cleanly. It should not be the starting point.

Source of truth and contract surfaces

The deterministic harness stays central.

It is responsible for:

  • provenance on every retained artifact
  • replay-pack compatibility
  • explicit working-set transitions
  • bounded terminal outputs
  • bridge-mediated access to domain artifacts and DSL surfaces

The bridge DSL and replay-pack formats matter because they let planning improve without making the environment opaque. The planner can change. The truth surface should stay stable.

Phased plan

Phase 0: Freeze the harness contract

Goal: lock the environment and logging surfaces before adding more model complexity

Deliverables:

  • stable replay-pack schema
  • stable action names and terminal outputs
  • explicit keep/drop/prune semantics
  • trajectory logging that reconstructs working state step by step
  • bridge-backed deterministic environment views

Success looks like:

  • reproducible episodes
  • auditable failures
  • no ambiguity about what the planner saw or retained

Phase 1: Frontier-operated harness

Goal: run the harness with Codex OAuth as the main planner/operator

Deliverables:

  • prompt and control loop for bounded harness operation
  • evidence packaging for final outputs
  • conservative abstain behavior
  • baseline evals on replay packs

Success looks like:

  • useful multi-step investigations without changing the harness contract
  • better decomposition and synthesis than the one-shot path
  • clear traces for why a run finalized or abstained

Phase 2: Add targeted small models

Goal: move repeated narrow decisions to cheaper specialist components

Priority areas:

  • action policy hints
  • keep/drop/prune decisions
  • abstain calibration
  • verifier/reranker over candidate retained evidence

Success looks like:

  • lower latency and lower cost on repeated steps
  • improved consistency on bounded subproblems
  • unchanged replayability and auditability

Phase 3: Expand domain coverage carefully

Goal: grow from Hyperliquid-first harnessing into broader cross-domain investigation without losing determinism

Possible extensions:

  • richer bridge DSL views
  • more replay tasks and evaluation packs
  • additional domain-specific specialists
  • monitoring and alerting on top of the same harness contracts

Success looks like:

  • more useful investigations from the same core loop
  • stronger evidence handling, not more hidden agent behavior

Design principles

  • Harness first. The environment contract comes before policy complexity.
  • Provenance by default. Every conclusion should point back to retained evidence.
  • Replay over intuition. If behavior cannot be replayed, it is hard to trust.
  • Small models earn their place. They exist to solve narrow problems better, faster, or cheaper.
  • Abstention is a feature. HLQ should decline to overclaim when retained evidence is weak.

In one sentence

HLQ's vision is a deterministic Hyperliquid search harness, operated by a frontier planner and supported by narrow specialist models, with provenance and replayability kept as the non-negotiable source of truth.