DeepSeek Unveils New AI Reasoning Model with Tsinghua, Teases R2 Launch

Chinese AI startup DeepSeek is teasing its next major breakthrough with the DeepSeek-GRM models, developed in collaboration with Tsinghua University. This new reasoning method combines generative reward modeling (GRM) and self-principled tuning, aiming to improve the accuracy, efficiency, and human alignment of large language model outputs. The paper reports that DeepSeek-GRM outperformed existing public reward models in some benchmarks.

DeepSeek

Following the success of its R1 model, known for competitive reasoning on a low budget, industry watchers expect R2 to debut soon, though the company has not confirmed this. DeepSeek has also introduced DeepSeek-V3-0324, which enhances reasoning, Chinese writing, and web development capabilities.

DeepSeek

Backed by High-Flyer Quant and quietly gaining support from top Chinese leadership, DeepSeek is making waves globally. With plans to open-source its GRM models, the company continues its research-first, transparency-driven approach, potentially setting a new standard in reinforcement learning efficiency using Mixture of Experts (MoE) architecture.

Please follow and like us:
Abishek D Praphullalumar
We will be happy to hear your thoughts

      Leave a reply


      error

      Enjoy this blog? Please spread the word :)

      PixelHowl
      Logo