DeepSeek Unveils New AI Reasoning Model with Tsinghua, Teases R2 Launch

Chinese AI startup DeepSeek is teasing its next major breakthrough with the DeepSeek-GRM models, developed in collaboration with Tsinghua University. This new reasoning method combines generative reward modeling (GRM) and self-principled tuning, aiming to improve the accuracy, efficiency, and human alignment of large language model outputs. The paper reports that DeepSeek-GRM outperformed existing public reward models in some benchmarks.

DeepSeek

Following the success of its R1 model, known for competitive reasoning on a low budget, industry watchers expect R2 to debut soon, though the company has not confirmed this. DeepSeek has also introduced DeepSeek-V3-0324, which enhances reasoning, Chinese writing, and web development capabilities.

DeepSeek

Backed by High-Flyer Quant and quietly gaining support from top Chinese leadership, DeepSeek is making waves globally. With plans to open-source its GRM models, the company continues its research-first, transparency-driven approach, potentially setting a new standard in reinforcement learning efficiency using Mixture of Experts (MoE) architecture.

Please follow and like us:
Abishek D Praphullalumar
Love to know your thoughts on this:

      Leave a reply


      PixelHowl HQ
      Your ultimate playground for DevOps adventures, thrilling tech news, and super-fun tutorials. Let's build the future, together!
      Chat with us!
      connect@pixelhowl.com
      Feel free to discuss your ideas with us!
      © 2025 PixelHowl. All rights reserved. Made with ♥ by tech enthusiasts.
      PixelHowl
      Logo