Mohamed Aymane Farhi's picture

Open to Work

Mohamed Aymane Farhi

ayymen

·

AI & ML interests

NLP

Recent Activity

updated a collection 4 days ago

upvoted a collection 7 days ago

new activity 9 days ago

deepseek-ai/DeepSeek-R1-Zero:readability and language mixing issues

View all activity

Organizations

upvoted a collection 7 days ago

Tamazight

https://huggingface.co/Tamazight-NLP • 2 items • Updated 7 days ago • 1

upvoted a paper 17 days ago

Faithful Persona-based Conversational Dataset Generation with Large Language Models

Paper • 2312.10007 • Published Dec 15, 2023 • 10

upvoted a paper 20 days ago

Awal -- Community-Powered Language Technology for Tamazight

Paper • 2510.27407 • Published Oct 31, 2025 • 1

upvoted a paper about 2 months ago

M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks

Paper • 2407.03791 • Published Jul 4, 2024 • 2

upvoted 2 papers 3 months ago

Massively Multilingual Adaptation of Large Language Models Using Bilingual Translation Data

Paper • 2506.00469 • Published May 31, 2025 • 4

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 500

upvoted a collection 3 months ago

OLDI and friends

This collection groups the datasets that have been featured as part of WMT’s Open Language Data Initiative shared task. • 4 items • Updated Oct 6, 2025 • 4

upvoted an article 3 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

93

upvoted a paper 4 months ago

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling

Paper • 2403.10691 • Published Mar 15, 2024 • 1

upvoted an article 5 months ago

Article

Introducing Wikipedia Monthly: Fresh, Clean Wikipedia Dumps for NLP & AI Research

Jul 19, 2025

•

5

upvoted a paper 5 months ago

Synthetic Voice Data for Automatic Speech Recognition in African Languages

Paper • 2507.17578 • Published Jul 23, 2025 • 2

upvoted a collection 6 months ago

T5Gemma

32 items • Updated Jul 10, 2025 • 78

upvoted 2 papers 6 months ago

The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages

Paper • 2505.20564 • Published May 26, 2025 • 1

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 75

upvoted a collection 8 months ago

MT Quality Estimation

Models for reference-free quality estimation of machine translation • 10 items • Updated Jan 29, 2025 • 4

upvoted a paper 8 months ago

Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis

Paper • 2412.05862 • Published Dec 8, 2024 • 1

upvoted 2 articles 9 months ago

Article

Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers

Jan 19, 2024

•

41

Article

Fine-Tune XLSR-Wav2Vec2 for low-resource ASR with 🤗 Transformers

Nov 15, 2021

•

37

upvoted an article 10 months ago

Article

Fine-Tune MMS Adapter Models for low-resource ASR

Jun 19, 2023

•

24

upvoted a collection 10 months ago

Dictionaries

3 items • Updated Mar 3, 2025 • 1