D
Deconstructed/Public paper breakdown
Loading paper…
Learning to Summarize from Human Feedback Explained: Reward Models, RLHF | Deconstructed