🏆 AI Leaderboards
Los mejores modelos de IA por categoría, actualizados diariamente.
Última actualización: 19 de abril de 2026
💻 Mejores para Código
Fuente: Aider LLM Leaderboard / LMSYS Arena
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | Claude 4 Opus | Anthropic | 72.4% |
| 🥈 | GPT-5 | OpenAI | 71.8% |
| 🥉 | Gemini 2.5 Pro | 70.2% | |
| 4 | Claude 4 Sonnet | Anthropic | 68.5% |
| 5 | GPT-4.1 | OpenAI | 67.1% |
🧠 Mejores para Razonamiento
Fuente: LMSYS Chatbot Arena
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | Claude 4 Opus | Anthropic | 1407 |
| 🥈 | GPT-5 | OpenAI | 1395 |
| 🥉 | Gemini 2.5 Pro | 1380 | |
| 4 | DeepSeek R2 | DeepSeek | 1365 |
| 5 | Claude 4 Sonnet | Anthropic | 1358 |
🔊 Mejores TTS
Fuente: TTS Arena / Community benchmarks
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | ElevenLabs Turbo v4 | ElevenLabs | 4.6/5 |
| 🥈 | OpenAI TTS HD | OpenAI | 4.5/5 |
| 🥉 | Google Cloud TTS | 4.3/5 | |
| 4 | Fish Speech | Fish Audio | 4.2/5 |
| 5 | Kokoro ONNX | Open Source | 4.1/5 |
🇪🇸 Mejores TTS Español
Fuente: Community benchmarks
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | ElevenLabs Turbo v4 | ElevenLabs | 4.7/5 |
| 🥈 | OpenAI TTS HD | OpenAI | 4.4/5 |
| 🥉 | Kokoro ONNX | Open Source | 4.2/5 |
| 4 | Google Cloud TTS (es-ES) | 4.0/5 | |
| 5 | Piper TTS | Open Source | 3.8/5 |
🎤 Mejores STT Español
Fuente: Common Voice / Community benchmarks
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | Whisper Large V3 Turbo | OpenAI (OS) | WER 6.2% |
| 🥈 | Google Cloud STT | WER 6.5% | |
| 🥉 | Azure Speech | Microsoft | WER 7.0% |
| 4 | Deepgram Nova-2 | Deepgram | WER 7.3% |
| 5 | AssemblyAI | AssemblyAI | WER 7.8% |
✍️ Escritura Creativa
Fuente: LMSYS Arena / Community
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | GPT-5 | OpenAI | 1385 |
| 🥈 | Claude 4 Opus | Anthropic | 1370 |
| 🥉 | Gemini 2.5 Pro | 1355 | |
| 4 | Mistral Large 2 | Mistral | 1330 |
| 5 | Llama 4 Maverick | Meta | 1310 |
🌍 Multilingües
Fuente: LMSYS Arena Multilingual
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | GPT-5 | OpenAI | 1410 |
| 🥈 | Claude 4 Opus | Anthropic | 1398 |
| 🥉 | Gemini 2.5 Pro | 1392 | |
| 4 | Llama 4 Maverick | Meta | 1340 |
| 5 | Qwen3 235B | Alibaba | 1335 |
💻 Mejores Modelos Locales por Hardware
💻 Local: 8GB RAM
Fuente: Ollama benchmarks
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | Qwen3 8B | Alibaba | MMLU 72.1 |
| 🥈 | Gemma 3 12B | MMLU 71.8 | |
| 🥉 | Llama 3.1 8B | Meta | MMLU 68.4 |
| 4 | Phi-4 Mini | Microsoft | MMLU 67.2 |
| 5 | Mistral 7B | Mistral | MMLU 63.6 |
🖥️ Local: 16GB RAM
Fuente: Ollama benchmarks
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | Llama 3.3 70B (Q4) | Meta | MMLU 80.1 |
| 🥈 | Qwen3 32B | Alibaba | MMLU 79.5 |
| 🥉 | Mistral Small 24B | Mistral | MMLU 78.3 |
| 4 | Gemma 3 27B | MMLU 76.8 | |
| 5 | DeepSeek R1 14B | DeepSeek | MMLU 74.2 |
🎮 Local: RTX 4060 (8GB VRAM)
Fuente: Ollama benchmarks
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | Qwen3 8B (Q8) | Alibaba | MMLU 72.1 |
| 🥈 | Gemma 3 12B (Q5) | MMLU 70.5 | |
| 🥉 | Llama 3.1 8B (Q8) | Meta | MMLU 68.4 |
| 4 | Phi-4 Mini (Q6) | Microsoft | MMLU 66.8 |
| 5 | Mistral 7B (Q8) | Mistral | MMLU 64.1 |
🍎 Local: M1 Mac (16GB)
Fuente: MLX benchmarks
| # | Modelo | Provider | Score |
|---|---|---|---|
| 🥇 | Qwen3 32B (Q4) | Alibaba | ~78 MMLU |
| 🥈 | Llama 3.3 70B (Q2) | Meta | ~75 MMLU |
| 🥉 | Mistral Small 24B (Q4) | Mistral | ~76 MMLU |
| 4 | Gemma 3 27B (Q4) | ~74 MMLU | |
| 5 | Qwen3 8B (Q8) | Alibaba | 72.1 MMLU |