Blog

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Since the release of OpenAI’s o1, significant efforts have been made within the research community to enhance open-source LLMs with advanced reasoning capabilities. This includes various approaches, including distillation using a strong teacher model, MCTS, and reward model gui...

February 1, 2025 · 31 min · 6518 words · Satori Team