Integrate Vizion computer vision models into your application with low-latency WebSocket sessions.
Vizion provides real-time GPU inference via WebSocket sessions. The flow is:
POST /api/v1/connect with your API key and modelws_url for a dedicated GPU worker"shutdown" when done — billing stops automaticallyAll API requests require a Bearer token. Create an API key in your dashboard.
Authorization: Bearer vz_live_your_api_key_hereThe easiest way to get started is with the official Python SDK. It handles session lifecycle, the binary WebSocket protocol, and cleanup automatically. Requires Python 3.10+.
pip install git+https://github.com/CoreVisionX/vizion-sdk.gitWith OpenCV + numpy for mask decoding and webcam demos:
pip install "vizion[cv] @ git+https://github.com/CoreVisionX/vizion-sdk.git"import os
from vizion import VizionClient
client = VizionClient(os.environ["VIZION_API_KEY"])
client.connect()
with open("frame.jpg", "rb") as f:
jpeg_bytes = f.read()
result = client.segment(jpeg_bytes, prompts=["person", "car"])
for det in result.results:
print(f"{det.prompt}: {len(det.instances)} found")
for inst in det.instances:
print(f" bbox=({inst.x1},{inst.y1})-({inst.x2},{inst.y2}) conf={inst.confidence:.2f}")
print(f"Latency: {result.decode_segment_ms:.1f}ms")
client.close()Use a context manager to ensure the session always shuts down:
with VizionClient(os.environ["VIZION_API_KEY"]) as client:
client.connect()
result = client.segment(jpeg_bytes, prompts=["person"])
# session is automatically closedUse model="depth-anything-3" for monocular metric depth estimation. The depth() method sends a JPEG and returns a depth map with metric values in metres.
from vizion import VizionClient
client = VizionClient(os.environ["VIZION_API_KEY"], model="depth-anything-3")
client.connect()
with open("frame.jpg", "rb") as f:
jpeg_bytes = f.read()
result = client.depth(jpeg_bytes)
print(f"Depth range: {result.depth_min:.2f} – {result.depth_max:.2f} metres")
# Decode to (H, W) float32 numpy array in metres
depth = result.decode_depth()
client.close()A live webcam demo with colourmap overlay is included in the SDK at examples/depth_webcam.py.
Each Instance has a decode_mask() method that returns a (H, W) boolean numpy array (requires the [cv] extra):
for det in result.results:
for inst in det.instances:
mask = inst.decode_mask() # numpy bool array (H, W)A full live-segmentation example with mask overlay is included in the SDK:
pip install "vizion[cv] @ git+https://github.com/CoreVisionX/vizion-sdk.git"
export VIZION_API_KEY="vz_live_..."
python examples/webcam.py# List available models and pricing (no auth required)
models = client.models()
for m in models.models:
print(f"{m.id}: {m.name} — {m.description}")
print(f"Cost: {models.cost_per_second_cents} cents/s")
# List your recent sessions
sessions = client.sessions()
for s in sessions:
print(f"{s.id} {s.status} {s.duration_seconds}s")All methods return typed Pydantic models with full autocomplete support:
| Method | Return Type |
|---|---|
| segment() | SegmentationResult |
| depth() | DepthResult |
| models() | ModelsResponse |
| sessions() | list[Session] |
SegmentationResult — results: list[Detection], plus timing fields (decode_ms, vision_encode_ms, text_encode_ms, decode_segment_ms)
Detection — prompt: str, instances: list[Instance]
Instance — x1, y1, x2, y2, confidence, mask_rle, mask_height, mask_width, plus decode_mask()
DepthResult — depth_png_b64, depth_min, depth_max, height, width, plus timing fields and decode_depth()
ModelsResponse — models: list[ModelInfo], cost_per_second_cents: float
Session — id, model, status, started_at, ended_at, duration_seconds, credits_used
All models can be imported directly:
from vizion import SegmentationResult, Detection, Instance, DepthResult, ModelsResponse, SessionAll sessions are billed at a flat rate of $0.00155/s ($5.58/hr) regardless of model.
| Model ID | Name | Description | Status |
|---|---|---|---|
| sam3 | SAM-3 | Segment Anything Model 3 — state-of-the-art image segmentation | Available |
| depth-anything-3 | Depth Anything 3 | Monocular depth estimation for any image | Available |
| yolo26 | YOLO26 | Real-time object detection and tracking | Coming Soon |
All sessions are billed per second of active connection time at a flat rate of $0.00155/s ($0.09/min · $5.58/hr).
For direct API integration without the SDK.
POST /api/v1/connectAuthenticates your API key, starts a dedicated GPU worker, and returns a WebSocket URL for real-time inference. The backend handles all RunPod orchestration — your API key is never exposed to the GPU worker.
curl -X POST https://www.vizion.fast/api/v1/connect \
-H "Authorization: Bearer vz_live_your_api_key" \
-H "Content-Type: application/json" \
-d '{"model": "sam3"}'{
"session_id": "a1b2c3d4-...",
"ws_url": "ws://203.0.113.10:8765"
}// 401 — Invalid or revoked API key
{ "error": "Invalid API key" }
// 400 — Invalid model
{ "error": "Invalid model. Available: sam3, depth-anything-3" }
// 402 — Insufficient balance
{ "error": "Insufficient credits" }
// 504 — Worker failed to start in time
{ "error": "Session startup timed out" }Once connected to the ws_url, send binary frames for inference and receive JSON results. The connection stays open for as long as you need — send as many frames as you want.
Each message is a binary frame with the following format:
[4 bytes: header length (uint32 LE)] + [JSON header] + [JPEG bytes]
Header JSON:
{
"prompts": ["person", "car"], // text prompts to segment
"score_threshold": 0.5, // confidence threshold (0-1)
"mask_threshold": 0.5 // mask threshold (0-1)
}The server responds with a JSON text message containing per-prompt results with bounding boxes, confidence scores, and RLE-encoded segmentation masks.
{
"results": [
{
"prompt": "person",
"instances": [
{
"x1": 10, "y1": 20, "x2": 200, "y2": 400,
"confidence": 0.95,
"mask_rle": [100, 50, ...],
"mask_height": 480,
"mask_width": 640
}
]
}
],
"decode_ms": 2.8,
"vision_encode_ms": 64.0,
"text_encode_ms": 33.2,
"decode_segment_ms": 41.6
}The depth model uses a simpler protocol — send raw JPEG bytes with no header framing and receive a JSON response with a base64-encoded uint16 PNG depth map.
// Send: raw JPEG bytes (binary message)
// Receive: JSON
{
"depth_png_b64": "iVBORw0KGgo...", // base64 uint16 PNG
"depth_min": 0.58, // metres
"depth_max": 2.41, // metres
"height": 378,
"width": 504,
"decode_ms": 1.8,
"inference_ms": 214.9,
"encode_ms": 43.1
}To reconstruct metric depth from the uint16 PNG: decode the base64 PNG, read pixel values as uint16, then map [0, 65535] back to [depth_min, depth_max] metres. The Python SDK's result.decode_depth() handles this automatically.
Send the text message "shutdown" to terminate the GPU worker. The session duration is calculated automatically and your account is billed based on time used. If the connection drops without a shutdown signal, the worker will auto-terminate after 60 seconds of inactivity.
GET /api/v1/sessionsReturns your session history with duration and cost information.
curl https://www.vizion.fast/api/v1/sessions \
-H "Authorization: Bearer vz_live_your_api_key"{
"sessions": [
{
"id": "a1b2c3d4-...",
"model": "sam3",
"status": "completed",
"started_at": "2025-01-15T10:30:00Z",
"ended_at": "2025-01-15T10:35:00Z",
"duration_seconds": 300,
"credits_used": 4650
}
]
}GET /api/v1/modelsReturns available models with pricing. No authentication required.
curl https://www.vizion.fast/api/v1/models