FinBERT-Multilingual

A multilingual extension of the FinBERT paradigm: domain-adapted transformer for financial sentiment classification across six languages (EN, ZH, JA, DE, FR, ES).

While the original FinBERT demonstrated the effectiveness of domain-specific pre-training for English financial NLP, this model extends that approach to a multilingual setting using XLM-RoBERTa-base as the backbone, enabling cross-lingual financial sentiment analysis without language-specific models.

Model Architecture

Base model: xlm-roberta-base (278M parameters)
Task: 3-class sequence classification (Negative / Neutral / Positive)
Domain adaptation: Task-Adaptive Pre-Training (TAPT) via Masked Language Modeling on 35K+ financial texts
Languages: English, Chinese, Japanese, German, French, Spanish

Training Pipeline

Stage 1: Task-Adaptive Pre-Training (TAPT)

Following Gururangan et al. (2020), we perform continued MLM pre-training on the unlabeled financial corpus to adapt the model's representations to the financial domain. This stage exposes the model to domain-specific vocabulary and discourse patterns across all six target languages using approximately 35,000 financial text samples.

Stage 2: Supervised Fine-Tuning

The domain-adapted model is then fine-tuned on the labeled sentiment classification task.

Hyperparameters:

Parameter	Value
Learning rate	2e-5
LR scheduler	Cosine annealing
Label smoothing	0.1
Checkpoint selection	SWA (top-3 checkpoints)
Base model	xlm-roberta-base

Stochastic Weight Averaging (SWA): Rather than selecting a single best checkpoint, we average the weights of the top-3 performing checkpoints. This produces a flatter loss minimum and more robust generalization, particularly beneficial for multilingual settings where overfitting to dominant languages is a risk.

Label smoothing (0.1): Prevents overconfident predictions and improves calibration, which is important for financial applications where prediction confidence informs downstream decisions.

Evaluation Results

Overall Metrics

Metric	Score
Accuracy	0.8103
F1 (weighted)	0.8102
Precision (weighted)	0.8111
Recall (weighted)	0.8103

Per-Class Performance

Class	Precision	Recall	F1-Score
Negative	0.78	0.83	0.81
Neutral	0.83	0.79	0.81
Positive	0.80	0.82	0.81

The balanced per-class performance (all F1 scores at 0.81) indicates that the model does not exhibit significant class bias, despite the imbalanced training distribution (Neutral: 45.5%, Positive: 30.8%, Negative: 23.7%).

Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="Kenpache/finbert-multilingual")

# English
classifier("The company reported record quarterly earnings, driven by strong demand.")
# [{'label': 'positive', 'score': 0.95}]

# German
classifier("Die Aktie verlor nach der Gewinnwarnung deutlich an Wert.")
# [{'label': 'negative', 'score': 0.92}]

# Japanese
classifier("同社の売上高は前年同期比で横ばいとなった。")
# [{'label': 'neutral', 'score': 0.88}]

# Chinese
classifier("该公司宣布大规模裁员计划，股价应声下跌。")
# [{'label': 'negative', 'score': 0.91}]

Direct Model Loading

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("Kenpache/finbert-multilingual")
model = AutoModelForSequenceClassification.from_pretrained("Kenpache/finbert-multilingual")

text = "Les bénéfices du groupe ont augmenté de 15% au premier trimestre."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    pred = torch.argmax(probs, dim=-1).item()

labels = {0: "negative", 1: "neutral", 2: "positive"}
print(f"Prediction: {labels[pred]} ({probs[0][pred]:.4f})")

Training Data

The model was trained on Kenpache/multilingual-financial-sentiment, a curated dataset of ~39K financial news sentences from 80+ sources across six languages.

Language	Samples	Sources
Japanese	8,287	Nikkei, Nikkan Kogyo, Reuters JP, Minkabu, etc.
Chinese	7,930	Sina Finance, EastMoney, 10jqka, etc.
Spanish	7,125	Expansión, Cinco Días, Bloomberg Línea, etc.
English	6,887	CNBC, Yahoo Finance, Fortune, Benzinga, etc.
German	5,023	Börse.de, FAZ, NTV Börse, Handelsblatt, etc.
French	3,935	Boursorama, Tradingsat, BFM Business, etc.

Comparison with FinBERT

Feature	FinBERT	FinBERT-Multilingual
Base model	BERT-base	XLM-RoBERTa-base
Languages	English only	6 languages
Domain adaptation	Financial corpus pre-training	TAPT on multilingual financial texts
Classes	3 (Pos/Neg/Neu)	3 (Pos/Neg/Neu)
Checkpoint selection	Single best	SWA (top-3)

Citation

If you use this model in your research, please cite:

@misc{finbert-multilingual-2025,
  title={FinBERT-Multilingual: Cross-Lingual Financial Sentiment Analysis with Domain-Adapted XLM-RoBERTa},
  author={Kenpache},
  year={2025},
  url={https://huggingface.co/Kenpache/finbert-multilingual}
}

License

Apache 2.0

Downloads last month: 49

Safetensors

Model size

0.3B params

Tensor type

F32

Dataset used to train Kenpache/finbert-multilingual

Papers for Kenpache/finbert-multilingual

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Paper • 2004.10964 • Published Apr 23, 2020

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

Paper • 1908.10063 • Published Aug 27, 2019 • 3

Evaluation results

Accuracy
self-reported

0.810
F1 (weighted)
self-reported

0.810