Nocturne
Collection
Balance size model with good quality. • 6 items • Updated • 1
How to use DoppelReflEx/MiniusLight-24B-v2.1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="DoppelReflEx/MiniusLight-24B-v2.1")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("DoppelReflEx/MiniusLight-24B-v2.1")
model = AutoModelForCausalLM.from_pretrained("DoppelReflEx/MiniusLight-24B-v2.1")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use DoppelReflEx/MiniusLight-24B-v2.1 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DoppelReflEx/MiniusLight-24B-v2.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DoppelReflEx/MiniusLight-24B-v2.1",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/DoppelReflEx/MiniusLight-24B-v2.1
How to use DoppelReflEx/MiniusLight-24B-v2.1 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "DoppelReflEx/MiniusLight-24B-v2.1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DoppelReflEx/MiniusLight-24B-v2.1",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "DoppelReflEx/MiniusLight-24B-v2.1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DoppelReflEx/MiniusLight-24B-v2.1",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use DoppelReflEx/MiniusLight-24B-v2.1 with Docker Model Runner:
docker model run hf.co/DoppelReflEx/MiniusLight-24B-v2.1
A merge of most uncensored model TroyDoesAI/BlackSheep-24B and recipe of MiniusLight-24B: TheDrummer/Cydonia-24B-v2 and PocketDoc/Dans-PersonalityEngine-V1.2.0-24b.
Another version of v2, but far better than it. Vivid writing styles, and talk back to me, sometimes hard to control it. (Maybe just because my character card)
Best model of the series (for me). :)
PS: Highest NatInt for 24B model in UGI leaderboard (1st May 2025)
{
models:
- model: TroyDoesAI/BlackSheep-24B
parameters:
density: 0.9
weight: 1
- model: TheDrummer/Cydonia-24B-v2
parameters:
density: 0.6
weight: 0.8
- model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
parameters:
density: 0.8
weight: 0.6
merge_method: dare_ties
base_model: TroyDoesAI/BlackSheep-24B
tokenizer_source: base
parameters:
rescale: true
dtype: bfloat16
}