Human-Aligned Large Language Models

1 minute read

Published: February 19, 2024

About recent LLM alignment finetuning techniques such as RLHF, DPO, KTO, IPO and SPIN

Table of Content

A Bit of Context
Reinforced Finetuning using Reward Model
- Reward Learning
- Policy Optimization
- AI Feedback
Preference Optimization without RL
- Direct Preference Optimization (DPO)
- Alternative Preference Optimization
- Self-Play Preference Optimization
Finetuning with Rating Feedback

HALLM

A Bit of Context

In alignment training, the goal is to ensure that the outputs generated by machine learning models align closely with human preferences, values, and intentions. The research traces back to the classic era of machine learning, studying preference models to support various tasks in classification and reinforcement learning [1,2]. Until 2017, researchers have scaled up the approach and employed reinforcement learning algorithms such as PPO in aligning models to predict according to human preference [3]. As Large Language Models become more prevalent, alignment training emerges as a crucial factor in their success, especially in the case of Chat-GPT. The LLM finetuning framework generally consists of two steps:

Supervised finetuning: LLM is finetuned on data with ground truth labels for the task.

Alignment finetuning: LLM is finetuned on human feedback data, often in the form of preference such as comparison feedback.

🧠 Why is alignment necessary instead of keeping supervised fine-tuning?

Limited training data leads to overfitting with excessive fine-tuning.

Obtaining more supervised data requires costly labeling of ground truth answers.

Supervised fine-tuning using LLM is expensive and does not prioritize generating user-desired outputs.

👉 This article focuses on step 2: LLM alignment. A common scheme for alignment training is to use feedback-based labels as training signals to reduce the labeling cost. Another consideration is that the finetuned model should not deviate too much from the base model, ensuring that the final model retains the properties learned from extensive pretraining and supervised tuning data.

Read the full article

Check our papers:

Multi-Reference Preference Optimization for Large Language Models

Share on

Twitter Facebook LinkedIn

Human-Aligned Large Language Models

Table of Content

A Bit of Context

Share on

You May Also Enjoy

Reason on the Fly: How RL Boosts LLM Reasoning On the Spot

Think Before You Speak: Reinforcement Learning for LLM Reasoning

The Best of Time-Series Forecasting (Part II): Advancements in Time Series Modeling Through Large Language Models

The Best of Time-Series Forecasting (Part I): From Seasonal Patterns to Transformer Models