LTX-2.3

Identity

Property Value
ID ltx-2.3
Parameters 22B (dev/distilled checkpoints)
HuggingFace affectively-ai/ltx-2.3 (base: Lightricks/LTX-2.3)
Quantization SafeTensors checkpoints (BF16/FP16)
License LTX-2 Community License Agreement (HF metadata: other)

Axis 1: Architecture

Property Value
Family Diffusion Transformer (DiT), joint audio-video generation
Checkpoint set ltx-2.3-22b-dev, ltx-2.3-22b-distilled, LoRA + temporal/spatial upscalers
Primary task design Text/Image conditioned video synthesis, optional audio generation
Attention/latent internals Not fully specified in published card metadata
Training/release orientation Open-weight foundation + distilled inference checkpoint

Architecture assessment: LTX-2.3 is a specialist diffusion family for generative video. It is not a chat transformer and should be routed through a diffusion-native runtime path for full quality.

Axis 2: Runtime

Runtime Viable Notes
WASM (browser) No 22B checkpoint family is beyond practical browser constraints
ONNX/WebGPU No No maintained ONNX/WebGPU path in this deployment
Native (device) Conditional Possible on high-end local GPU setups
Edge Worker No Worker memory/runtime ceilings are too small
Cloud Run (distributed CPU lane) Yes Current Aether route for API readiness and compatibility
Cloud GPU Conditional Best fit for full-quality denoising pipelines

Primary runtime: Cloud Run distributed coordinator/layer topology for routing and compatibility today; dedicated diffusion-native execution is the quality path.

Axis 3: Modality

Property Value
Input Text and/or image conditioning (plus optional audio workflows upstream)
Output Video (with optional synchronized audio in upstream LTX workflows)
Category Image-to-video / text-to-video

Axis 4: Task Fitness

Task Fitness Notes
Prompted short-form video generation Very good Core capability of the model family
Image-conditioned video generation Very good First-class upstream task
Audio-synchronized AV generation Good Supported in upstream LTX stack; runtime integration maturity varies
Document OCR / VLM reasoning Poor Wrong model class for extraction/reasoning tasks

Role in the zoo: Primary modern video-generation specialist. Route video requests here instead of overloading vision-language chat stacks.

Axis 5: Operational Cost

Property Value
Checkpoint footprint ~44 GB for the core 22B checkpoints (plus upscalers)
Cloud Run topology 1 coordinator + 4 layer services (current distributed lane)
Cloud Run resources 2 vCPU / 4 GiB per service (current baseline config)
Timeout profile 600s request budget for long diffusion-style operations
Idle cost ~$0/month when min-instances remain 0
Cold start profile Noticeable on scale-from-zero; acceptable for non-realtime video jobs

Verdict

LTX-2.3 is the right specialist for video synthesis workloads in this model zoo. Keep it on an explicit video/diffusion routing lane, and avoid treating it like a chat model. For best quality, prioritize a dedicated diffusion-native runtime over compatibility inference shims.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for affectively-ai/ltx-2.3

Base model

Lightricks/LTX-2.3
Finetuned
(13)
this model