自适应锐化相关论文

标题	动机	做法
Assessing Diversity Collapse in Reasoning	finetune 让pass1增加pass减少	把当前和历史checkpoint混起来
TURNING UP THE HEAT: MIN-p SAMPLING FOR CREATIVE AND COHERENT LLM OUTPUTS	用来不确定时保留的多些	最大p下一定比例保留
Maximizing Confidence Alone Improves Reasoning	锐化	奖励信号是熵
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning	确定的保留，不确定的多学	只RL不确定的

有一个自我提升workshop

自适应锐化相关论文

https://childofcuriosity.github.io/2025/07/11/自适应锐化相关论文/

Author

childofcuriosity

Posted on

July 11, 2025

Licensed under