Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hubify.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Experiment Runner

The Experiment Runner is the execution engine of Hubify Labs. It takes experiment definitions, provisions compute, executes code, and captures every detail for reproducibility.

Running an Experiment

  1. Open the Captain View
  2. Click New Experiment (or press Cmd+E)
  3. Describe the experiment in natural language or fill in the structured form
  4. Select compute requirements (GPU type, estimated duration)
  5. Click Run
The orchestrator will handle agent assignment and pod allocation.

Experiment Dashboard

Each running experiment has a detail view showing:
  • Live Logs, Streaming stdout/stderr from the pod
  • Metrics, Custom metrics emitted by your script (loss, convergence, sample count)
  • Figures, Plots generated during execution, updated in real time
  • Resource Usage, GPU utilization, memory, disk I/O
  • Checkpoints, Saved intermediate states you can resume from
  • Cost, Running cost in USD

Checkpointing

Experiments automatically checkpoint at configurable intervals:
# In your experiment config
checkpoint:
  interval: 30m    # Save state every 30 minutes
  keep_last: 5     # Keep the 5 most recent checkpoints
  path: /workspace/checkpoints/
If a pod crashes or an experiment is interrupted, you can resume from the last checkpoint:
hubify experiment resume EXP-054 --from-checkpoint latest

QC Gates

Every experiment passes through a quality control gate before results are accepted:
CheckDescriptionThreshold
CompletenessAll expected output files exist100%
ConvergenceR-hat statistic for MCMC chains< 1.05
Error BoundsStatistical uncertainties are reasonableDomain-specific
ReproducibilityConfig + data + code are frozenAll locked
ReviewCross-model verification of resultsPass
If a QC gate fails, the experiment is flagged and the orchestrator decides whether to:
  • Rerun with more samples
  • Adjust parameters and retry
  • Escalate to you for a decision

Chaining

Experiments can be chained so outputs flow into inputs:
hubify experiment run --chain chain.yaml
# chain.yaml
steps:
  - name: preprocess
    script: preprocess.py
    pod: cpu
  - name: mcmc
    script: run_mcmc.py
    pod: h200
    depends_on: preprocess
  - name: analysis
    script: analyze.py
    pod: cpu
    depends_on: mcmc

Batch Experiments

Run parameter sweeps or multi-configuration experiments:
hubify experiment batch \
  --script train.py \
  --sweep '{"learning_rate": [0.001, 0.01, 0.1], "batch_size": [32, 64]}' \
  --pod h100
This creates 6 experiments (3 x 2) and runs them in parallel if pods are available.

Reproducibility Record

Every experiment captures:
  • Git SHA of the codebase
  • Full dependency list (pip freeze)
  • Config files (YAML/JSON, checksummed)
  • Input data SHA-256 hashes
  • Random seeds
  • Pod hardware specs
  • Start/end timestamps
This record is immutable and attached to the experiment forever.