LTX-2.3

Identity

Property	Value
ID	`ltx-2.3`
Parameters	22B (dev/distilled checkpoints)
HuggingFace	`affectively-ai/ltx-2.3` (base: `Lightricks/LTX-2.3`)
Quantization	SafeTensors checkpoints (BF16/FP16)
License	LTX-2 Community License Agreement (HF metadata: `other`)

Axis 1: Architecture

Property	Value
Family	Diffusion Transformer (DiT), joint audio-video generation
Checkpoint set	`ltx-2.3-22b-dev`, `ltx-2.3-22b-distilled`, LoRA + temporal/spatial upscalers
Primary task design	Text/Image conditioned video synthesis, optional audio generation
Attention/latent internals	Not fully specified in published card metadata
Training/release orientation	Open-weight foundation + distilled inference checkpoint

Architecture assessment: LTX-2.3 is a specialist diffusion family for generative video. It is not a chat transformer and should be routed through a diffusion-native runtime path for full quality.

Axis 2: Runtime

Runtime	Viable	Notes
WASM (browser)	No	22B checkpoint family is beyond practical browser constraints
ONNX/WebGPU	No	No maintained ONNX/WebGPU path in this deployment
Native (device)	Conditional	Possible on high-end local GPU setups
Edge Worker	No	Worker memory/runtime ceilings are too small
Cloud Run (distributed CPU lane)	Yes	Current Aether route for API readiness and compatibility
Cloud GPU	Conditional	Best fit for full-quality denoising pipelines

Primary runtime: Cloud Run distributed coordinator/layer topology for routing and compatibility today; dedicated diffusion-native execution is the quality path.

Axis 3: Modality

Property	Value
Input	Text and/or image conditioning (plus optional audio workflows upstream)
Output	Video (with optional synchronized audio in upstream LTX workflows)
Category	Image-to-video / text-to-video

Axis 4: Task Fitness

Task	Fitness	Notes
Prompted short-form video generation	Very good	Core capability of the model family
Image-conditioned video generation	Very good	First-class upstream task
Audio-synchronized AV generation	Good	Supported in upstream LTX stack; runtime integration maturity varies
Document OCR / VLM reasoning	Poor	Wrong model class for extraction/reasoning tasks

Role in the zoo: Primary modern video-generation specialist. Route video requests here instead of overloading vision-language chat stacks.

Axis 5: Operational Cost

Property	Value
Checkpoint footprint	~44 GB for the core 22B checkpoints (plus upscalers)
Cloud Run topology	1 coordinator + 4 layer services (current distributed lane)
Cloud Run resources	2 vCPU / 4 GiB per service (current baseline config)
Timeout profile	600s request budget for long diffusion-style operations
Idle cost	~$0/month when min-instances remain `0`
Cold start profile	Noticeable on scale-from-zero; acceptable for non-realtime video jobs

Verdict

LTX-2.3 is the right specialist for video synthesis workloads in this model zoo. Keep it on an explicit video/diffusion routing lane, and avoid treating it like a chat model. For best quality, prioritize a dedicated diffusion-native runtime over compatibility inference shims.

Downloads last month: -

Model tree for affectively-ai/ltx-2.3

Base model

Lightricks/LTX-2.3

Finetuned

(13)

this model