Instructions to use N8Programs/Coxcomb with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use N8Programs/Coxcomb with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="N8Programs/Coxcomb") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("N8Programs/Coxcomb") model = AutoModelForCausalLM.from_pretrained("N8Programs/Coxcomb") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use N8Programs/Coxcomb with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "N8Programs/Coxcomb" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "N8Programs/Coxcomb", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/N8Programs/Coxcomb
- SGLang
How to use N8Programs/Coxcomb with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "N8Programs/Coxcomb" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "N8Programs/Coxcomb", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "N8Programs/Coxcomb" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "N8Programs/Coxcomb", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use N8Programs/Coxcomb with Docker Model Runner:
docker model run hf.co/N8Programs/Coxcomb
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("N8Programs/Coxcomb")
model = AutoModelForCausalLM.from_pretrained("N8Programs/Coxcomb")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Model Card for Coxcomb
A creative writing model, using the superb senseable/WestLake-7B-v2 as a base, finetuned on GPT-4 outputs to a diverse variety of prompts. It in no way competes with GPT-4 - it's quality of writing is below it, and it is primarily meant to be run in offline, local environments. On creative writing benchmarks, it is consistently ranked higher than most other models - it scores 72.37, beating goliath-120b, yi chat, and mistral-large. It is designed for single-shot interactions. You ask it to write a story, and it does. It is NOT designed for chat purposes, roleplay, or follow-up questions.
Model Details
Trained w/ a 40M parameter lora on N8Programs/CreativeGPT for 3 epochs. Overfit slightly (for much better benchmark results).
Model Description
- Developed by: N8Programs
- Model type: Mistral
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: senseable/WestLake-7B-v2
Uses
Bot trained on NSFW (sexual or violent) content but will generate it when asked - it has not been trained with refusals. If you wish to ADD refusal behavior in, further tuning or filtering will be neccessary.
Direct Use
GGUFs available at Coxcomb-GGUF Should work with transformers (not officially tested).
Bias, Risks, and Limitations
Tends to generate stories with happy, trite endings. Most LLMs do this. It's very hard to get them not to.
Training Details
Trained on a single M3 Max in roughly 12 hours.
- Downloads last month
- 9

# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="N8Programs/Coxcomb") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)