Riven Models

Train, evaluate, and serve models at scale.

The Riven AI Platform gives you a unified control plane for the full model lifecycle — from fine-tuning and evaluation to production inference. Ship any HuggingFace or custom model in minutes, with sub-50ms p99 latency and full observability baked in.

Riven

Serving Engine

< 50ms

p99 Latency

2k+

Tokens / sec

Request access Read the docs

riven models — training run

finetune-v3 · RLRunning

GPU: A100batch: 32lr: 2e-4

Loss2.4

Eval score12%

Improving...

step 0 / 1000

Loading demo…

What's included

Everything you need, nothing you don't.

High-Performance Inference

Production-grade serving that scales with your traffic. Deploy any HuggingFace or custom model in minutes with < 50ms p99.

Fine-Tuning Pipelines

LoRA, QLoRA, and full fine-tuning workflows. Connect your dataset, pick a base model, and let the pipeline handle the rest.

Evaluation Framework

Built-in eval harness with MMLU, HellaSwag, and custom benchmarks. Compare model versions side-by-side with drift detection.

Inference Observability

Token throughput, latency percentiles, and per-request traces. Grafana dashboards auto-provisioned on deploy.

Full capability list

Optimized serving at any traffic scale

Multi-GPU horizontal scaling

LoRA / QLoRA fine-tuning

BM25 + vector hybrid retrieval

RLHF & DPO training loops

Model registry with versioning

A/B and shadow traffic routing

Auto-scaling on GPU utilization

OpenAI-compatible API surface

Prometheus + Grafana observability

Early access

Riven is in beta — pricing opens as we leave beta. Request access and we'll reach out within a few days.

Request access View all products