Blog

Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering

LLMs perform well on coding benchmarks like LiveCodeBench but struggle with real-world software engineering (SWE) tasks (Jimenez et al. 2024). Even large models like Claude reach only around 60% accuracy on SWE-bench, despite using carefully engineered prompting pipelines (Xia...

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Since the release of OpenAI’s o1, significant efforts have been made within the research community to enhance open-source LLMs with advanced reasoning capabilities. This includes various approaches, including distillation using a strong teacher model, MCTS, and reward model gui...