AI Leaderboards
Los mejores modelos de IA por categoría, actualizados diariamente.
Última actualización: 19 de mayo de 2026
💻 Mejores para Código
Fuente: Aider LLM Leaderboard (aider.chat) — Updated 2026-05-18
| # | Modelo | Score |
|---|---|---|
| 🥇 | GPT-5 (high) | 88.0% |
| 🥈 | GPT-5 (medium) | 86.7% |
| 🥉 | o3-pro (high) | 84.9% |
| 4 | Gemini 2.5 Pro (06-05, 32k think) | 83.1% |
| 5 | GPT-5 (low) | 81.3% |
| 6 | o3 (high) | 81.3% |
| 7 | Grok 4 (high) | 79.6% |
| 8 | Gemini 2.5 Pro (06-05, default think) | 79.1% |
| 9 | o3 (high) + GPT-4.1 | 78.2% |
| 10 | o3 | 76.9% |
🧠 Mejores para Razonamiento
Fuente: Arena AI (LMSYS) — Updated 2026-05-18
| # | Modelo | Score |
|---|---|---|
| 🥇 | Claude Opus 4.6 (thinking) | #1 Arena |
| 🥈 | Claude Opus 4.7 (thinking) | #2 Arena |
| 🥉 | Claude Opus 4.6 | #3 Arena |
| 4 | Claude Opus 4.7 | #4 Arena |
| 5 | Muse Spark | #5 Arena |
| 6 | Gemini 3.1 Pro Preview | #6 Arena |
| 7 | Gemini 3 Pro | #7 Arena |
| 8 | GPT-5.5 (high) | #8 Arena |
| 9 | GPT-5.4 (high) | #9 Arena |
| 10 | Grok 4.20 Beta 1 | #10 Arena |
🔊 Mejores TTS
Fuente: TTS Arena / Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | ElevenLabs Turbo v4 | 4.6/5 |
| 🥈 | OpenAI TTS HD | 4.5/5 |
| 🥉 | Google Cloud TTS | 4.3/5 |
| 4 | Fish Speech | 4.2/5 |
| 5 | Kokoro ONNX | 4.1/5 |
🇪🇸 Mejores TTS Español
Fuente: Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | ElevenLabs Turbo v4 | 4.7/5 |
| 🥈 | OpenAI TTS HD | 4.4/5 |
| 🥉 | Kokoro ONNX | 4.2/5 |
| 4 | Google Cloud TTS (es-ES) | 4.0/5 |
| 5 | Piper TTS | 3.8/5 |
🎤 Mejores STT Español
Fuente: Common Voice / Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | Whisper Large V3 Turbo | WER 6.2% |
| 🥈 | Google Cloud STT | WER 6.5% |
| 🥉 | Azure Speech | WER 7.0% |
| 4 | Deepgram Nova-2 | WER 7.3% |
| 5 | AssemblyAI | WER 7.8% |
✍️ Escritura Creativa
Fuente: Arena AI (LMSYS)
| # | Modelo | Score |
|---|---|---|
| 🥇 | Claude Opus 4.7 (thinking) | 1503 |
| 🥈 | Claude Opus 4.6 (thinking) | 1502 |
| 🥉 | Muse Spark | 1490 |
| 4 | Gemini 3.1 Pro Preview | 1492 |
| 5 | Gemini 3 Pro | 1486 |
🌍 Multilingües
Fuente: Arena AI (LMSYS)
| # | Modelo | Score |
|---|---|---|
| 🥇 | Claude Opus 4.7 (thinking) | 1503 |
| 🥈 | Claude Opus 4.6 (thinking) | 1502 |
| 🥉 | Claude Opus 4.7 | 1491 |
| 4 | Gemini 3.1 Pro Preview | 1492 |
| 5 | Grok 4.20 | 1480 |
💻 Mejores Open Source para Código
Fuente: Aider LLM Leaderboard (aider.chat)
| # | Modelo | Score |
|---|---|---|
| 🥇 | DeepSeek-V3.2-Exp (Reasoner) | 74.2% |
| 🥈 | DeepSeek R1 (0528) | 71.4% |
| 🥉 | DeepSeek-V3.2-Exp (Chat) | 70.2% |
| 4 | Qwen3 235B A22B | 59.6% |
| 5 | Kimi K2 | 59.1% |
| 6 | DeepSeek R1 | 56.9% |
| 7 | Qwen3 32B | 40.0% |
| 8 | Gemma 3 27B | 4.9% |
🧠 Mejores LLM Open Source
Fuente: Open LLM Leaderboard / MMLU-Pro / GPQA / AA Index
| # | Modelo | Score |
|---|---|---|
| 🥇 | Gemini 3 Pro | #7 Arena (1486) |
| 🥈 | Muse Spark | #6 Arena (1490) |
| 🥉 | Grok 4.20 Beta 1 | #9 Arena (1480) |
| 4 | Gemini 3 Flash | #16 Arena |
| 5 | Qwen3.5 Max Preview | #25 Arena |
| 6 | DeepSeek V4 Pro | #27 Arena |
| 7 | Kimi K2.6 | #28 Arena |
| 8 | Gemma 4 31B | #39 Arena |
🎨 Mejores Open Source: Generación de Imágenes
Fuente: Artificial Analysis / Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | FLUX.1 [schnell] | ~4.4/5 |
| 🥈 | HunyuanImage-3.0 | ~4.3/5 |
| 🥉 | FLUX.1 [dev] | ~4.2/5 |
| 4 | Stable Diffusion 3.5 Large | ~4.1/5 |
| 5 | HiDream-I1-Full | ~4.0/5 |
| 6 | SANA-Sprint 1.6B | ~3.7/5 |
🎬 Mejores Open Source: Imagen a Vídeo
Fuente: Artificial Analysis / Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | WAN2.2-14B | ~4.3/5 |
| 🥈 | HunyuanVideo | ~4.1/5 |
| 🥉 | LTX-2.3 | ~3.9/5 |
📄 Mejores Open Source: OCR
Fuente: Community benchmarks / Artificial Analysis
| # | Modelo | Score |
|---|---|---|
| 🥇 | GLM-OCR | ~4.5/5 |
| 🥈 | nemotron-ocr-v2 | ~4.3/5 |
| 🥉 | Falcon-OCR | ~4.1/5 |
| 4 | TrOCR-large | ~3.8/5 |
| 5 | BLIP-large | ~3.6/5 |
🔊 Mejores TTS Open Source
Fuente: TTS Arena / Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | Qwen3-TTS | ~4.4/5 |
| 🥈 | Fish Speech S2 | ~4.2/5 |
| 🥉 | CosyVoice 3.0 | ~4.1/5 |
| 4 | Kokoro ONNX | 4.1/5 |
| 5 | Piper TTS | 3.8/5 |
🎤 Mejores ASR Open Source
Fuente: Common Voice / Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | Whisper Large V3 Turbo | WER 6.2% |
| 🥈 | FunASR | ~WER 6.5% |
| 🥉 | VibeVoice-ASR | ~WER 7.1% |
🎵 Mejores Open Source: Generación de Música
Fuente: Community benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | ACE-Step 1.5 | ~4.2/5 |
| 🥈 | MusicGen-large | ~3.9/5 |
| 🥉 | AudioLDM-2 | ~3.5/5 |
💻 Mejores Modelos Locales por Hardware
💻 Local: 8GB RAM
Fuente: Ollama benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | Qwen3.5 9B | ~72 MMLU |
| 🥈 | Gemma 4 E4B | 69.4 MMLU-Pro |
| 🥉 | Qwen3 8B | MMLU 72.1 |
| 4 | Gemma 4 E2B | 60.0 MMLU-Pro |
| 5 | Phi-4 Mini | MMLU 67.2 |
🖥️ Local: 16GB RAM
Fuente: Ollama benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | Gemma 4 31B | 85.2 MMLU-Pro |
| 🥈 | Qwen3.5 35B | ~80 MMLU-Pro |
| 🥉 | Llama 3.3 70B (Q4) | MMLU 80.1 |
| 4 | Qwen3 32B | MMLU 79.5 |
| 5 | Mistral Small 24B | MMLU 78.3 |
🎮 Local: RTX 4060 (8GB VRAM)
Fuente: Ollama benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | Gemma 4 26B A4B | 82.6 MMLU-Pro |
| 🥈 | Qwen3.5 9B (Q8) | ~72 MMLU |
| 🥉 | Qwen3 8B (Q8) | MMLU 72.1 |
| 4 | Gemma 4 E4B | 69.4 MMLU-Pro |
| 5 | Phi-4 Mini (Q6) | MMLU 66.8 |
🍎 Local: M1 Mac (16GB)
Fuente: MLX benchmarks
| # | Modelo | Score |
|---|---|---|
| 🥇 | Gemma 4 26B A4B | 82.6 MMLU-Pro |
| 🥈 | Qwen3.5 35B (Q4) | ~80 MMLU-Pro |
| 🥉 | Llama 3.3 70B (Q2) | ~75 MMLU |
| 4 | Qwen3 32B (Q4) | ~78 MMLU |
| 5 | Gemma 4 31B (Q3) | ~80 MMLU-Pro |