Jonna Matthiesen's picture

Jonna Matthiesen

JonnaMat

embedl

·

AI & ML interests

None yet

Recent Activity

updated a collection about 8 hours ago

updated a collection about 8 hours ago

updated a model 8 days ago

embedl/Qwen3.5-9B-FlashHead

View all activity

Organizations

updated a collection about 8 hours ago

Cosmos-Reason2

nvidia/Cosmos-Reason2 multi-modal reasoning models optimized by Embedl. • 11 items • Updated about 8 hours ago • 4

updated 4 models 8 days ago

embedl/Qwen3.5-9B-FlashHead

Image-Text-to-Text • 10B • Updated 2 days ago • 443

embedl/Qwen3.5-4B-FlashHead

Image-Text-to-Text • 5B • Updated 2 days ago • 455

embedl/Qwen3.5-0.8B-FlashHead

Image-Text-to-Text • 0.9B • Updated 2 days ago • 428 • 1

embedl/Qwen3.5-2B-FlashHead

Image-Text-to-Text • 2B • Updated 2 days ago • 470

posted an update 8 days ago

Post

107

⚡ Qwen3.5, up to 1.4× faster. Same quality. Less latency.

We applied FlashHead to the Qwen3.5 family: Novel drop-in replacement of the LM head with measurably lower latency on edge hardware. Benchmarks and models below.

📊 embedl/Edge-Inference-Benchmarks

🤗 https://huggingface.co/collections/embedl/qwen35

updated 6 collections 8 days ago

NVIDIA Jetson AGX Orin

Models optimized and bench-marked for NVIDIA Jetson AGX Orin. Memory-efficient and latency-optimized variants designed for real-time edge inference. • 8 items • Updated 8 days ago • 3

NVIDIA Jetson AGX Thor

Models validated and performance-optimized for NVIDIA Jetson AGX Thor. Tailored for high-performance edge AI workloads. • 7 items • Updated 8 days ago • 1

FlashHead

Efficient Drop-In Replacement for the Classification Head in Language Model Inference. https://github.com/embedl/flash-head • 24 items • Updated 8 days ago • 2

EdgeN

Quantization strategy where most weights are converted to INT4, activations remain in FP16, and sensitive layers are preserved in FP16. • 4 items • Updated 8 days ago • 1

Qwen3.5

Qwen/Qwen3.5 variants optimized by embedl. • 6 items • Updated 8 days ago • 1

NVIDIA Jetson Orin Nano

Ultra-efficient model variants optimized for Jetson Orin Nano. Designed for constrained edge environments requiring low memory footprint. • 5 items • Updated 8 days ago • 4

updated a dataset 8 days ago

embedl/documentation-images

Viewer • Updated about 7 hours ago • 12 • 2.57k

published 2 models 8 days ago

embedl/Qwen3.5-27B-FlashHead

Image-Text-to-Text • 28B • Updated 2 days ago • 286

embedl/Qwen3.5-9B-FlashHead

Image-Text-to-Text • 10B • Updated 2 days ago • 443