Implement test-time compute scaling for math problems
thought that reasoning into LLMs without modification