metadata
license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-classification
tags:
- biology
- genomics
datasets:
- Genentech/human-chromhmm-fullstack-data
base_model:
- Genentech/enformer-model
human-chromhmm-fullstack-model
Model Description
This model is a multi-class classifier trained to predict chromatin state annotations for genomic DNA sequences. It classifies sequences into 16 chromatin states based on the ChromHMM fullstack annotation. It was trained by fine-tuning the Enformer model using the grelu library.
- Architecture: Fine-tuned Enformer (EnformerPretrainedModel)
- Input: Genomic sequences (hg38)
- Output: Probability distribution over 16 chromatin states
- Parameters: 71.5M total (all trainable)
Chromatin States
Acet, BivProm, DNase, EnhA, EnhWk, GapArtf, HET, PromF, Quies, ReprPC, TSS, Tx, TxEnh, TxEx, TxWk, znf
Performance
Metrics are computed per chromatin state and averaged across all 16 states.
Test Set
| Metric | Mean | Std | Min | Max |
|---|---|---|---|---|
| Accuracy | 0.4373 | 0.2162 | 0.2455 | 0.8528 |
| AUROC | 0.8609 | 0.0767 | 0.7652 | 0.9952 |
| Average Precision | 0.4113 | 0.1974 | 0.1362 | 0.8015 |
Validation Set
| Metric | Mean | Std | Min | Max |
|---|---|---|---|---|
| Accuracy | 0.4487 | 0.2098 | 0.2164 | 0.8696 |
| AUROC | 0.8654 | 0.0763 | 0.7594 | 0.9950 |
| Average Precision | 0.4155 | 0.1848 | 0.1241 | 0.7812 |
Per-class Test Metrics
| State | Accuracy | AUROC | AvgPrec |
|---|---|---|---|
| Acet | 0.2939 | 0.7973 | 0.2091 |
| BivProm | 0.5431 | 0.9373 | 0.3575 |
| DNase | 0.8528 | 0.9905 | 0.7527 |
| EnhA | 0.2950 | 0.8145 | 0.3368 |
| EnhWk | 0.2683 | 0.8144 | 0.2947 |
| GapArtf | 0.7988 | 0.9517 | 0.7029 |
| HET | 0.2455 | 0.8236 | 0.4982 |
| PromF | 0.5940 | 0.9557 | 0.6369 |
| Quies | 0.3662 | 0.8512 | 0.3610 |
| ReprPC | 0.2874 | 0.7652 | 0.2522 |
| TSS | 0.8302 | 0.9952 | 0.8015 |
| Tx | 0.2590 | 0.8072 | 0.3197 |
| TxEnh | 0.2694 | 0.8252 | 0.2770 |
| TxEx | 0.5336 | 0.8821 | 0.3563 |
| TxWk | 0.2510 | 0.7781 | 0.2880 |
| znf | 0.3079 | 0.7851 | 0.1362 |
Training Details
| Parameter | Value |
|---|---|
| Task | Multiclass classification |
| Loss | Binary Cross-Entropy (with class weights) |
| Optimizer | Adam |
| Learning rate | 0.0001 |
| Batch size | 512 |
| Max epochs | 10 |
| Devices | 4 |
| n_transformers | 1 |
| crop_len | 0 |
| grelu version | 1.0.4.post1.dev39 |
Repository Content
model.ckpt: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).2_train.ipynb: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.output.log: Training logs.
How to use
To load this model for inference or fine-tuning, use the grelu interface:
from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="Genentech/human-chromhmm-fullstack-model",
filename="model.ckpt"
)
model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()