API Documentation

Integrate Vizion computer vision models into your application with low-latency WebSocket sessions.

Quick Start

Vizion provides real-time GPU inference via WebSocket sessions. The flow is:

Call POST /api/v1/connect with your API key and model
Receive a ws_url for a dedicated GPU worker
Connect to the WebSocket and send frames for inference
Send "shutdown" when done — billing stops automatically

Authentication

All API requests require a Bearer token. Create an API key in your dashboard.

Authorization: Bearer vz_live_your_api_key_here

Python SDK

The easiest way to get started is with the official Python SDK. It handles session lifecycle, the binary WebSocket protocol, and cleanup automatically. Requires Python 3.10+.

Install

pip install git+https://github.com/CoreVisionX/vizion-sdk.git

With OpenCV + numpy for mask decoding and webcam demos:

pip install "vizion[cv] @ git+https://github.com/CoreVisionX/vizion-sdk.git"

Quick Start

import os
from vizion import VizionClient

client = VizionClient(os.environ["VIZION_API_KEY"])
client.connect()

with open("frame.jpg", "rb") as f:
    jpeg_bytes = f.read()

result = client.segment(jpeg_bytes, prompts=["person", "car"])

for det in result.results:
    print(f"{det.prompt}: {len(det.instances)} found")
    for inst in det.instances:
        print(f"  bbox=({inst.x1},{inst.y1})-({inst.x2},{inst.y2})  conf={inst.confidence:.2f}")

print(f"Latency: {result.decode_segment_ms:.1f}ms")

client.close()

Use a context manager to ensure the session always shuts down:

with VizionClient(os.environ["VIZION_API_KEY"]) as client:
    client.connect()
    result = client.segment(jpeg_bytes, prompts=["person"])
# session is automatically closed

Depth Estimation

Use model="depth-anything-3" for monocular metric depth estimation. The depth() method sends a JPEG and returns a depth map with metric values in metres.

from vizion import VizionClient

client = VizionClient(os.environ["VIZION_API_KEY"], model="depth-anything-3")
client.connect()

with open("frame.jpg", "rb") as f:
    jpeg_bytes = f.read()

result = client.depth(jpeg_bytes)

print(f"Depth range: {result.depth_min:.2f} – {result.depth_max:.2f} metres")

# Decode to (H, W) float32 numpy array in metres
depth = result.decode_depth()

client.close()

A live webcam demo with colourmap overlay is included in the SDK at examples/depth_webcam.py.

Decoding Masks

Each Instance has a decode_mask() method that returns a (H, W) boolean numpy array (requires the [cv] extra):

for det in result.results:
    for inst in det.instances:
        mask = inst.decode_mask()  # numpy bool array (H, W)

Webcam Demo

A full live-segmentation example with mask overlay is included in the SDK:

pip install "vizion[cv] @ git+https://github.com/CoreVisionX/vizion-sdk.git"
export VIZION_API_KEY="vz_live_..."
python examples/webcam.py

Other Methods

# List available models and pricing (no auth required)
models = client.models()
for m in models.models:
    print(f"{m.id}: {m.name} — {m.description}")
print(f"Cost: {models.cost_per_second_cents} cents/s")

# List your recent sessions
sessions = client.sessions()
for s in sessions:
    print(f"{s.id}  {s.status}  {s.duration_seconds}s")

Response Types

All methods return typed Pydantic models with full autocomplete support:

Method	Return Type
segment()	SegmentationResult
depth()	DepthResult
models()	ModelsResponse
sessions()	list[Session]

SegmentationResult — results: list[Detection], plus timing fields (decode_ms, vision_encode_ms, text_encode_ms, decode_segment_ms)

Detection — prompt: str, instances: list[Instance]

Instance — x1, y1, x2, y2, confidence, mask_rle, mask_height, mask_width, plus decode_mask()

DepthResult — depth_png_b64, depth_min, depth_max, height, width, plus timing fields and decode_depth()

ModelsResponse — models: list[ModelInfo], cost_per_second_cents: float

Session — id, model, status, started_at, ended_at, duration_seconds, credits_used

All models can be imported directly:

from vizion import SegmentationResult, Detection, Instance, DepthResult, ModelsResponse, Session

Available Models

All sessions are billed at a flat rate of $0.00155/s ($5.58/hr) regardless of model.

Model ID	Name	Description	Status
sam3	SAM-3	Segment Anything Model 3 — state-of-the-art image segmentation	Available
depth-anything-3	Depth Anything 3	Monocular depth estimation for any image	Available
yolo26	YOLO26	Real-time object detection and tracking	Coming Soon

Pricing

All sessions are billed per second of active connection time at a flat rate of $0.00155/s ($0.09/min · $5.58/hr).

REST API Reference

For direct API integration without the SDK.

Start a Session

POST /api/v1/connect

Authenticates your API key, starts a dedicated GPU worker, and returns a WebSocket URL for real-time inference. The backend handles all RunPod orchestration — your API key is never exposed to the GPU worker.

Request

curl -X POST https://www.vizion.fast/api/v1/connect \
  -H "Authorization: Bearer vz_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"model": "sam3"}'

Success Response (200)

{
  "session_id": "a1b2c3d4-...",
  "ws_url": "ws://203.0.113.10:8765"
}

Error Responses

// 401 — Invalid or revoked API key
{ "error": "Invalid API key" }

// 400 — Invalid model
{ "error": "Invalid model. Available: sam3, depth-anything-3" }

// 402 — Insufficient balance
{ "error": "Insufficient credits" }

// 504 — Worker failed to start in time
{ "error": "Session startup timed out" }

WebSocket Protocol

Once connected to the ws_url, send binary frames for inference and receive JSON results. The connection stays open for as long as you need — send as many frames as you want.

Sending a Frame

Each message is a binary frame with the following format:

[4 bytes: header length (uint32 LE)] + [JSON header] + [JPEG bytes]

Header JSON:
{
  "prompts": ["person", "car"],     // text prompts to segment
  "score_threshold": 0.5,           // confidence threshold (0-1)
  "mask_threshold": 0.5             // mask threshold (0-1)
}

Response Format

The server responds with a JSON text message containing per-prompt results with bounding boxes, confidence scores, and RLE-encoded segmentation masks.

{
  "results": [
    {
      "prompt": "person",
      "instances": [
        {
          "x1": 10, "y1": 20, "x2": 200, "y2": 400,
          "confidence": 0.95,
          "mask_rle": [100, 50, ...],
          "mask_height": 480,
          "mask_width": 640
        }
      ]
    }
  ],
  "decode_ms": 2.8,
  "vision_encode_ms": 64.0,
  "text_encode_ms": 33.2,
  "decode_segment_ms": 41.6
}

Depth Protocol (depth-anything-3)

The depth model uses a simpler protocol — send raw JPEG bytes with no header framing and receive a JSON response with a base64-encoded uint16 PNG depth map.

// Send: raw JPEG bytes (binary message)

// Receive: JSON
{
  "depth_png_b64": "iVBORw0KGgo...",  // base64 uint16 PNG
  "depth_min": 0.58,                   // metres
  "depth_max": 2.41,                   // metres
  "height": 378,
  "width": 504,
  "decode_ms": 1.8,
  "inference_ms": 214.9,
  "encode_ms": 43.1
}

To reconstruct metric depth from the uint16 PNG: decode the base64 PNG, read pixel values as uint16, then map [0, 65535] back to [depth_min, depth_max] metres. The Python SDK's result.decode_depth() handles this automatically.

Ending a Session

Send the text message "shutdown" to terminate the GPU worker. The session duration is calculated automatically and your account is billed based on time used. If the connection drops without a shutdown signal, the worker will auto-terminate after 60 seconds of inactivity.

List Sessions

GET /api/v1/sessions

Returns your session history with duration and cost information.

curl https://www.vizion.fast/api/v1/sessions \
  -H "Authorization: Bearer vz_live_your_api_key"

{
  "sessions": [
    {
      "id": "a1b2c3d4-...",
      "model": "sam3",
      "status": "completed",
      "started_at": "2025-01-15T10:30:00Z",
      "ended_at": "2025-01-15T10:35:00Z",
      "duration_seconds": 300,
      "credits_used": 4650
    }
  ]
}

List Models

GET /api/v1/models

Returns available models with pricing. No authentication required.

curl https://www.vizion.fast/api/v1/models