Lal

Rewrite model card with correct information

7140455 about 2 months ago

3.15 kB

license: mit
library_name: pytorch-lightning
pipeline_tag: tabular-classification
tags:
  - biology
  - genomics
datasets:
  - Genentech/human-chromhmm-fullstack-data
base_model:
  - Genentech/enformer-model

human-chromhmm-fullstack-model

Model Description

This model is a multi-class classifier trained to predict chromatin state annotations for genomic DNA sequences. It classifies sequences into 16 chromatin states based on the ChromHMM fullstack annotation. It was trained by fine-tuning the Enformer model using the grelu library.

Architecture: Fine-tuned Enformer (EnformerPretrainedModel)
Input: Genomic sequences (hg38)
Output: Probability distribution over 16 chromatin states
Parameters: 71.5M total (all trainable)

Chromatin States

Acet, BivProm, DNase, EnhA, EnhWk, GapArtf, HET, PromF, Quies, ReprPC, TSS, Tx, TxEnh, TxEx, TxWk, znf

Performance

Metrics are computed per chromatin state and averaged across all 16 states.

Test Set

Metric	Mean	Std	Min	Max
Accuracy	0.4373	0.2162	0.2455	0.8528
AUROC	0.8609	0.0767	0.7652	0.9952
Average Precision	0.4113	0.1974	0.1362	0.8015

Validation Set

Metric	Mean	Std	Min	Max
Accuracy	0.4487	0.2098	0.2164	0.8696
AUROC	0.8654	0.0763	0.7594	0.9950
Average Precision	0.4155	0.1848	0.1241	0.7812

Per-class Test Metrics

State	Accuracy	AUROC	AvgPrec
Acet	0.2939	0.7973	0.2091
BivProm	0.5431	0.9373	0.3575
DNase	0.8528	0.9905	0.7527
EnhA	0.2950	0.8145	0.3368
EnhWk	0.2683	0.8144	0.2947
GapArtf	0.7988	0.9517	0.7029
HET	0.2455	0.8236	0.4982
PromF	0.5940	0.9557	0.6369
Quies	0.3662	0.8512	0.3610
ReprPC	0.2874	0.7652	0.2522
TSS	0.8302	0.9952	0.8015
Tx	0.2590	0.8072	0.3197
TxEnh	0.2694	0.8252	0.2770
TxEx	0.5336	0.8821	0.3563
TxWk	0.2510	0.7781	0.2880
znf	0.3079	0.7851	0.1362

Training Details

Parameter	Value
Task	Multiclass classification
Loss	Binary Cross-Entropy (with class weights)
Optimizer	Adam
Learning rate	0.0001
Batch size	512
Max epochs	10
Devices	4
n_transformers	1
crop_len	0
grelu version	1.0.4.post1.dev39

Repository Content

model.ckpt: The trained model weights and hyperparameters (PyTorch Lightning checkpoint).
2_train.ipynb: Jupyter notebook containing the training logic, architecture definition, and evaluation loops.
output.log: Training logs.

How to use

To load this model for inference or fine-tuning, use the grelu interface:

from grelu.lightning import LightningModel
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="Genentech/human-chromhmm-fullstack-model",
    filename="model.ckpt"
)

model = LightningModel.load_from_checkpoint(ckpt_path, weights_only=False)
model.eval()