Search Harness
Phase-0 contract for the Hyperliquid search harness — bounded reads, working-set edits, terminal decisions, replayable runs.
Status
This document freezes the Phase 0 contract for the Hyperliquid search harness v1.
It now reflects the preferred operating shape:
- a pi-based deterministic harness is the main runtime contract
- a frontier model accessed through Codex OAuth acts as the heavyweight planner/operator
- small domain-specific auxiliary models may support bounded repeated tasks inside the harness
- the harness, not hidden prompt state, remains the source of truth
Purpose
The v1 harness turns the current one-shot search flow into a bounded search environment where a policy can:
- inspect deterministic evidence in multiple steps
- keep or drop evidence from an explicit working set
- branch into shallow subqueries
- stop with either a bounded decision or an abstention
The environment is external to the prompt. The planner inspects slices of that environment through explicit actions rather than receiving one large preassembled bundle.
In the preferred operating mode, the frontier planner handles the broad reasoning while the harness enforces bounded reads, explicit state transitions, and replayable terminal outputs.
How This Differs From The Current Pipeline
Current one-shot search:
- accepts one natural-language query
- routes once
- renders one SQL template
- executes once against live data
- returns one ranked result payload plus provenance
Search harness v1:
- accepts one bounded monitoring or research query tied to a replay episode
- runs a stepwise action loop over deterministic replay-pack views
- maintains an explicit active working set separate from the full environment
- records keep/drop/prune decisions in the trajectory
- terminates with a structured decision object rather than a single one-shot result
In short, the current pipeline is a direct query executor. The harness is a replayable search environment operated by a heavyweight planner and later supportable by small specialist policies for bounded subproblems.
Scope
The v1 harness is intentionally narrow:
- Hyperliquid monitoring and research queries only
- deterministic replay-pack inputs only
- shallow branching only
- bounded action set only
- no broad web search
- no arbitrary-depth recursion
- no requirement to touch live BigQuery in the replay path
Core Concepts
Environment state
The environment is the full replayable episode state available to the runtime but not automatically injected into the active prompt.
Required episode-level fields:
episode_idqueryanchor_marketwindow_idstep_budgettoken_budget_classenvironment_viewsground_truth_reference
Artifact registry
Each retrievable view is represented as an artifact with a stable ID inside the episode.
Required artifact fields:
artifact_idartifact_typeview_nameanchor_marketwindow_idpayloadsource_refs
artifact_id stability matters for deterministic replay. For a fixed replay pack and fixed loader version, the same artifact must resolve to the same ID on every run.
Working set
The working set is the active evidence context selected by the policy.
Required working-set fields:
active_artifact_idsactive_artifact_summariesretrieval_historybranch_historystep_budget_remainingcontext_pressure_class
Working-set semantics are explicit:
- reading an artifact does not automatically keep it
keep_artifactcopies an artifact into active working memorydrop_artifactremoves it from active working memory only- dropped artifacts remain available in the episode registry for later revisit
prune_working_setis a bulk drop operation and must remain trajectory-visible
Frozen v1 Action Set
Action names below are the Phase 0 contract and should not be renamed without a deliberate contract update.
Read actions
| Action | Arguments | Returns |
|---|---|---|
read_market_state | anchor_market, window_id | Raw market snapshot artifact for one market-window pair |
read_derived_metrics | anchor_market, window_id | Normalized or derived metrics artifact for the same scope |
read_persistence | anchor_market, window_id | Persistence or follow-through artifact for the same scope |
compare_markets | anchor_market, peer_markets, window_id | Comparison artifact for ranking or prioritization |
retrieve_similar_prior_episodes | anchor_market, pattern_id_or_query | Ranked shortlist artifact of prior episodes |
Context-management actions
| Action | Arguments | Contract |
|---|---|---|
keep_artifact | artifact_id | Adds the artifact to the active working set |
drop_artifact | artifact_id | Removes the artifact from the active working set |
prune_working_set | artifact_ids, reason | Removes multiple artifacts in one explicit step |
Control actions
| Action | Arguments | Contract |
|---|---|---|
branch_subquery | subquery_type, arguments | Materializes a bounded child retrieval turn and returns artifacts into the same episode registry |
finalize | decision_class, retained_artifact_ids, stop_reason, open_risks | Completes the episode with a bounded decision |
abstain | retained_artifact_ids, stop_reason, open_risks | Completes the episode with an explicit no-decision outcome |
decision_class is frozen to:
finalize_signalfinalize_low_signal
abstain remains a separate terminal action rather than a decision_class value.
Branching Contract
v1 allows shallow explicit decomposition only.
branch_subquery must:
- name the branch via
subquery_type - carry structured
arguments - point back to the parent step in the trajectory
- register any produced artifacts in the same episode registry
Examples of subquery_type values:
derived_metricspersistence_checkpeer_comparisonprior_episode_lookup
Arbitrary recursive model self-calls are out of scope for v1.
Terminal Output Contract
Every harness run must end in exactly one terminal record.
Required terminal fields:
episode_idterminal_actiondecision_classretained_artifact_idsretained_evidenceopen_risksstop_reason
Terminal rules:
terminal_action=finalizerequiresdecision_classinfinalize_signal | finalize_low_signalterminal_action=abstainrequiresdecision_class=nullretained_evidenceis a materialized view of the final retained artifactsstop_reasonmust be bounded and schema-valid, not a long free-form essay
Replay-Pack Contract
The replay path is directory-based and immutable once produced.
Required pack layout:
manifest.jsonepisodes.jsonlREADME.md
manifest.json
Required manifest fields:
schema_versionpack_idgenerated_at_utcgenerator_scriptgenerator_repo_relpathgenerator_git_shasource_dataset_refsepisode_count
episodes.jsonl
Each line is one episode record with this minimum shape:
Notes:
peer_comparisonsandprior_episode_shortlistsare optional arrays that may be empty- field ordering should be stable where easy
- episode ordering should be stable for fixed inputs
- packs must be regenerated into a new timestamped directory instead of edited in place
Deterministic Replay Notes
Deterministic replay is part of the contract, not an implementation detail.
For fixed code revision and fixed replay-pack input, the runtime should aim for stable:
- artifact ID resolution
- episode ordering
- environment-view ordering
- step ordering
- terminal output shape
If any part cannot be fully deterministic, the nondeterministic component must be called out in the replay-pack manifest or runtime metadata.