自适应锐化相关论文

标题 动机 做法
Assessing Diversity Collapse in Reasoning finetune 让pass1增加pass减少 把当前和历史checkpoint混起来
TURNING UP THE HEAT: MIN-p SAMPLING FOR CREATIVE AND COHERENT LLM OUTPUTS 用来不确定时保留的多些 最大p下一定比例保留
Maximizing Confidence Alone Improves Reasoning 锐化 奖励信号是熵
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning 确定的保留,不确定的多学 只RL不确定的

有一个自我提升workshop

https://iclr.cc/virtual/2025/workshop/23971


自适应锐化相关论文
https://childofcuriosity.github.io/2025/07/11/自适应锐化相关论文/
Author
childofcuriosity
Posted on
July 11, 2025
Licensed under