menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Programming News

>

Deepseek v...
source image

Dev

1M

read

85

img
dot

Image Credit: Dev

Deepseek v3 0324: Finally, the Sonnet 3.5 at Home

  • Deepseek released the new v3 o324 silently without much marketing, showcasing a 641GB MIT-licensed base model.
  • The v3 o324 model is a significant upgrade over the previous v3, with improved reasoning capabilities, code generation, and better user intention comprehension through RL training with GRPO.
  • It outperforms Claude 3.5 Sonnet on many tasks and excels in several benchmarks, making it a top-performing non-reasoning model on various real-world coding tasks.
  • The model has an MIT license, is freely available, allows for training opt-out, and is noted for its exceptional price-to-performance ratio.
  • Improvements in reasoning capabilities include better performance on benchmarks like MMLU-Pro, GPQA, AIME, and LiveCodeBench.
  • The model also excels in front-end development, Chinese writing, analysis, and research capabilities.
  • Deepseek v3 o324 ranks high on various private benchmarks, often surpassing Claude Sonnet models and achieving notable performance jumps in evaluations like Misguided Attention.
  • The model exhibits signs of improved reasoning capabilities, showing promising performance in GRPO training for better reasoning.
  • Running the model locally is feasible, with options like mlx-lm and LLM MLX for quick deployment on machines like a 512 GB MacBook.
  • Real-world performance showcases the model's superiority over competitors like Claude Sonnet, with users creating diverse applications and games using the Deepseek v3 o324.

Read Full Article

like

5 Likes

For uninterrupted reading, download the app