Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

HannesVonEssenΒ 
posted an update 3 days ago
view post
Post
137
πŸ“£ I made a visualizer for Hugging Face models: https://hfviewer.com

✨ Simply paste a Hugging Face URL to get an interactive visualization of the architecture!

πŸ”— The recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B

Feel free to try it out and give me feedback on how it can be improved! ❀️
  • 1 reply
Β·
danielhanchenΒ 
posted an update 1 day ago
view post
Post
5868
We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api
DedeProGamesΒ 
posted an update 1 day ago
view post
Post
7391
GRaPE 2 Pro is now available.

SL-AI/GRaPE-2-Pro

This is the flagship model of the GRaPE 2 family and the largest model I have trained to date, sitting at 27B parameters. It is built on Qwen3.5-27B and trained on a closed-source proprietary dataset, with roughly half of post-training focused on code and the rest split between STEAM subjects and structured logical reasoning. It punches seriously above its weight class.

GRaPE 2 Pro supports multimodal input (image + text) and features 6 thinking modes via the tag. This gives you real control over how hard the model thinks, from skipping the reasoning phase entirely with minimal, all the way up to xtra-Hi for deep, extended thought on hard problems. For most agentic use, auto or low is the move to keep things snappy.

It also runs on consumer hardware. You can get it going with as low as 12GB of VRAM on a quantized build.

If you want to try it out and give feedback, that would be really appreciated. Email us at contact@skinnertopia.com
  • 1 reply
Β·
CrowneliusΒ 
posted an update about 13 hours ago
view post
Post
1086
Day 4-6 [05/05/2026]
Howdy,

Is anybody else willing to put a second mortgage on their house, just to spend 40k USD in compute credits? Just me? k...

I got dreams, man. The datasets I could build with 40k would be insane.
Somebody called me a genius the other day, they'd be shocked to find out, that I would put my house on the line for 30 days of runpod usage.

What would you do with it?
I would turn arxiv into a dataset. Turn each arxiv paper into a QnA.
Or... maybe if I got 40k USD in credit's Id end up like those 16 lost scientists.

Food for thought.
Anyways, I think I'm going to make a post once a week.
In the meantime you can find me building small llm's in discord here:
https://discord.gg/4DdwS9D8x9
salma-remyxΒ 
posted an update about 1 hour ago
view post
Post
VQASynth is the open source implementation of the SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities (2401.12168) paper, putting together the data synthesis pipeline behind remyxai/SpaceQwen2.5-VL-3B-Instruct, remyxai/SpaceThinker-Qwen2.5VL-3B, and several other spatial reasoning models we've shared here on HF.

From early development through production, different categories of evidence become available to guide what to try next. The strongest decisions combine evidence across categories rather than relying on any one.

Stage 1: Development history
Commit history holds the moments where things changed. For VQASynth, that's how scenes get parsed, how captions get generated, how spatial relations get encoded. Even before a model is in production, those milestones are a strong signal for what methods are semantically relevant to where the system is now.

Stage 2: Observational outcomes
Once a model is serving, the same commit history delineates changes against real-world results. That opens up quasi-experiments. You get causal evidence about which changes drove which outcomes, and inference on questions you haven't directly tested.

Stage 3: Controlled experiments
When teams start running interventions, those outcomes tighten the estimates further. This is the regime most people associate with rigor, but it's expensive and gated by traffic.

Stage 4: Counterfactual perturbations
When A/B testing becomes the operational bottleneck, instrumenting decision points in the production system lets you probe what would have happened under alternative choices. Shadow mode first, live traffic once audits pass.

Experimentation maturity is a journey, and every stage offers something to learn from.
More on these ideas: https://docs.remyx.ai/concepts/maturity-progression
ajibawa-2023Β 
posted an update about 8 hours ago
view post
Post
56
Stitched-Reasoning-Trajectories-7M

Dataset: ajibawa-2023/Stitched-Reasoning-Trajectories-7M
Stitched-Reasoning-Trajectories-7M is a massive-scale, synthetic multi-hop reasoning dataset. It was built by algorithmically "stitching" together discrete reasoning traces from the original glaiveai/reasoning-v1-20m dataset into continuous, coherent, and logically structured multi-agent trajectories.

By extracting internal sub-questions from <think> blocks and mapping high-information keyword overlaps, this dataset transforms single-turn Q&A pairs into deep, multi-step research plans. To ensure high quality and eliminate "topic drift," every trajectory has been verified using a dense semantic embedding model (BAAI/bge-large-en-v1.5).

The resulting dataset consists of 709 .jsonl files containing over 7.2 million entirely deduplicated, highly coherent reasoning chains.
ArtelTalebΒ 
posted an update about 9 hours ago
view post
Post
56

✈️ World Flight Arcade - Can you land in 60 seconds?

I just dropped a new browser game built entirely with Three.js: World Flight Arcade

The concept is brutally simple:
- πŸ• 60 seconds of flight above a neon wireframe city
- ✈️ One single attempt to land on the runway
- πŸ’€ No second chances. No respawn. Just you, the controls, and the clock.

The camera system is fully dynamic - it stays locked behind the plane within a Β±45Β° pitch/yaw envelope, giving you that cinematic flight feel while keeping full spatial awareness.

Can you nail the landing on your first try?

πŸ‘‰ Play here: ArtelTaleb/world-flight-arcade

Built by Artel3D - handcrafted in Three.js, zero dependencies, runs directly in your browser.

Drop your score in the comments πŸ‘‡

#gamedev #threejs #browserGame #webgl #artel3d #indiegame

Aurelien-MorganΒ 
posted an update 1 day ago
view post
Post
100
@retrain-pipelines v0.2.0 is out !
I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow.
Come meet me for a live in-person demo and a chat !
kanaria007Β 
posted an update 1 day ago
view post
Post
99
βœ… Article highlight: *Verifier Packs and Conformance Harness* (art-60-227, v0.1)

TL;DR:
This article argues that β€œhow we verify the spec” should itself be a governed artifact path.

A serious system should not stop at β€œwe ran the tests and passed.” It should be able to say exactly **which verifier pack** was used, under **which harness manifest**, against **which vector bundle**, with **which reason-code linkage**, producing **which normalized run verdicts**, **which replay result**, and **which profile-level conformance report lineage**.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
β€’ turns conformance from hidden CI behavior into portable, auditable artifacts
β€’ makes verifier choice, harness policy, vector completeness, and replay status explicit
β€’ prevents β€œgreen badge” claims that cannot later be reconstructed
β€’ keeps degraded, partial, and historically superseded runs visible instead of laundering them away

What’s inside:
β€’ a clean distinction between *specification*, *verifier pack*, *harness manifest*, *conformance run*, *replay verification*, and *profile conformance report*
β€’ a practical ladder: VH1 / VH2 / VH3
β€’ core portable artifacts like si/verifier-pack/v1, si/harness-manifest/v1, si/test-vector-bundle/v1, si/conformance-run-report/v1, and si/replay-verification-record/v1
β€’ hard gates for explicit pack, explicit harness, vector completeness, replay-backed claims, and report support
β€’ the rule that a profile conformance report must point to supporting runs rather than float free as a status badge

Key idea:
Do not say:

*β€œthe tests passed.”*

Say:

*β€œthis scope was checked by this verifier pack, under this harness manifest, against this declared vector bundle, with this linkage, producing these run verdicts and this replay-backed report lineage.”*
MikeDoesΒ 
posted an update 2 days ago
view post
Post
170
AI4Privacy datasets are being used to decide what data should never leave the device.

A new paper on privacy-preserving cloud computing uses the AI4Privacy PII-Masking-65K dataset to train models that classify text as private or public before it’s ever sent to the cloud.

This is a subtle but important shift.

Instead of encrypting everything or trusting the cloud by default, the authors ask a simpler question:

Can we detect sensitive text early enough to keep it local?

Using DistilBERT, trained partly on AI4Privacy PII data, the system learns to:

route private text to local processing

send non-sensitive text to the cloud

train collaboratively using federated learning, without sharing raw data

The result:

99.9% accuracy in private vs public text detection

Near-centralized performance in downstream tasks like SMS spam detection

Privacy protection enforced by design, not policy

What stands out here is not just the model performance, but the architectural idea:
privacy as a routing decision, backed by large-scale PII annotations.

This work reinforces a pattern we keep seeing: scalable privacy systems don’t start with encryption, they start with good PII data.

πŸ“„ Full Paper here: https://dl.acm.org/doi/full/10.1145/3773276.3774872

#Ai4Privacy #DataPrivacy #PIIMasking #FederatedLearning #PrivacyEngineering #OpenSourceAI #ResponsibleAI #AcademicResearch #LLMSecurity