axe
Reference

Trajectory Logging

Optional trajectory-native JSONL logging for search requests, plus the additive v1 search-harness trajectory contract.

HLQ can optionally emit trajectory-native JSONL logs for search requests.

This document now serves two purposes:

  • describe the current shipped one-shot logging substrate
  • freeze the additive v1 search-harness trajectory contract

Current shipped substrate

Current logging remains intentionally narrow:

  • logging is disabled by default
  • current query behavior is unchanged
  • records are append-only side effects for offline analysis and training prep

Enable logging with:

export HLQ_TRAJECTORY_LOG_DIR=/tmp/hlq_trajectory_logs

When this environment variable is unset, HLQ does not write any trajectory logs.

When enabled, current one-shot logging appends to:

  • episodes.jsonl
  • steps.jsonl
  • artifacts.jsonl

under the directory pointed to by HLQ_TRAJECTORY_LOG_DIR.

The current substrate logs:

  • help/capability requests
  • single-query dry-run executions
  • single-query live executions
  • minimal error records for unexpected exceptions

The current substrate does not yet implement:

  • workflow graphs
  • explicit branch policies
  • route scoring
  • public API guarantees around trajectory payloads

Current record shape

Episode

One user-visible request. Includes:

  • episode_id
  • user_query
  • topic_family
  • query_type
  • query_mode
  • time_horizon
  • status
  • source_surface

Step

One coarse product-side action. Includes:

  • step_id
  • episode_id
  • step_type
  • parsed_slots
  • route_family
  • chosen_route
  • retrieval_stage
  • requested_slots
  • supported_slots_estimated
  • unsupported_slots_estimated
  • post_execution_coverage
  • provenance_snapshot

Artifact

Typed evidence or execution objects touched by the step. Current v1 emits artifacts such as:

  • capability_manifest
  • ann_shortlist
  • sql_render
  • sql_result_set
  • coverage_report

Additive v1 search-harness contract

The shipped one-shot path is still terminal-heavy:

  • one query
  • one route
  • one execution
  • one result payload with provenance

The planned harness path is multi-step:

  • each environment read is explicit
  • working-set edits are explicit
  • branching is explicit
  • the terminal outcome is only the last step in a visible search trace

The provenance contract in provenance.md remains valid. The harness trajectory adds stepwise state transitions on top of that rather than replacing provenance.

Harness log structure

A harness run should emit one episode record plus zero or more step records and exactly one terminal record.

Required episode-level fields:

  • episode_id
  • query
  • anchor_market
  • window_id
  • policy_id
  • step_budget
  • step_count
  • terminal_action

Required per-step fields:

  • step_id
  • step_index
  • step_type
  • action_name
  • action_args
  • artifact_ids_read
  • working_set_before
  • working_set_after
  • context_pressure_class

Recommended per-step fields:

  • branch_parent_step_id
  • subquery_type
  • selected_artifact_ids
  • dropped_artifact_ids
  • stop_candidate
  • notes

For deterministic replay, step_index is the primary ordering field. Wall-clock timestamps are optional and non-normative.

Frozen v1 harness step types

The following step types are frozen for the Phase 0 contract:

Step typeMeaning
env_readA deterministic environment read action returned one or more artifacts
branch_subqueryThe policy requested a shallow child retrieval turn
keep_artifactThe policy added an artifact to the working set
drop_artifactThe policy removed an artifact from the working set
prune_working_setThe policy removed multiple inactive artifacts from the working set
decision_updateThe policy revised a provisional stop or confidence state before termination
finalizeThe episode ended with finalize_signal or finalize_low_signal
abstainThe episode ended with no bounded decision

action_name should match the corresponding harness action name from search_harness.md.

Step-type specific fields

env_read

Required:

  • action_name
  • action_args
  • artifact_ids_read
  • working_set_before
  • working_set_after

Rules:

  • reading does not imply keeping
  • working_set_after may be identical to working_set_before

branch_subquery

Required:

  • subquery_type
  • branch_parent_step_id

Rules:

  • the child turn writes any produced artifacts into the same episode registry
  • child artifacts must still be explicitly kept if the policy wants them in the working set

keep_artifact

Required:

  • selected_artifact_ids

Rules:

  • for v1, this should usually contain exactly one artifact ID
  • working_set_after must contain the selected artifact IDs

drop_artifact

Required:

  • dropped_artifact_ids

Rules:

  • dropping removes an artifact from active context only
  • the artifact remains in the replay environment for possible later revisit

prune_working_set

Required:

  • dropped_artifact_ids
  • action_args.reason

Rules:

  • use this for multi-artifact cleanup when context pressure is high
  • the reason should be short and schema-bounded

decision_update

Required:

  • stop_candidate

Rules:

  • this is non-terminal
  • use it to record provisional leaning before finalize or abstain

finalize

Required:

  • terminal_action=finalize
  • action_args.decision_class
  • selected_artifact_ids
  • action_args.stop_reason

Rules:

  • decision_class must be finalize_signal or finalize_low_signal
  • selected_artifact_ids must equal the terminal retained set

abstain

Required:

  • terminal_action=abstain
  • selected_artifact_ids
  • action_args.stop_reason

Rules:

  • decision_class must be omitted or null
  • selected_artifact_ids may be empty

Working-set semantics

The working set is the model-visible evidence context, not the full environment.

Normative rules:

  • every step records working_set_before and working_set_after
  • a kept artifact stays active until it is dropped, pruned, or the episode ends
  • artifacts outside the working set remain addressable by ID if they are still in the episode registry
  • trajectory readers must be able to reconstruct the active context at every step from the log alone

This is the main behavioral difference from the current one-shot pipeline, which has no explicit working-set state.

Backward compatibility

Phase 0 does not require retrofitting historical one-shot runs into this schema.

Compatibility expectations for later implementation:

  • existing one-shot search behavior should continue to work without harness trajectory output
  • harness-specific step fields should be additive rather than breaking for existing consumers
  • provenance remains the minimal audit surface for one-shot runs

Operational notes

  • Logging is best-effort and should never break product behavior.
  • Current CLI and MCP outputs remain compatible; trajectory logs are written to disk only.
  • Multi-step workflow structure is intentionally deferred to the harness implementation slices.