In addition, they show a counter-intuitive scaling limit: their reasoning hard work increases with dilemma complexity up to some extent, then declines despite possessing an suitable token spending budget. By evaluating LRMs with their conventional LLM counterparts below equal inference compute, we identify 3 general performance regimes: (1) minimal-complexity https://www.youtube.com/watch?v=snr3is5MTiU