Instruction Tuning

Tags: Blog, LLM Created time: September 15, 2023 11:57 AM

Context

Pretraining LLMs on “next token prediction” task has proven to show incredible generalisation powers as long as you throw enough data, parameters and compute at it. However, it is possible to get more out of your language model if you finetune it on a smaller set. Many have already experimented with finetuning LLMs on downstream tasks. But you can also improve their generalisation and instruction-following abilities by using a dataset that presents tasks as instructions and expects LLM to predict the output.

What is Instruction Tuning?

A form of fine-tuning that improves a generative AI model’s ability to follow instructions. Instruction tuning involves training a model on a series of instructions prompts, typically covering a wide variety of tasks. The resulting instruction-tuned model then tends to generate useful responses to zero-shot prompts across a variety of tasks. - Google Dev

Why Instruction Tuning?

Given the scaling law, we can expect models to get better with more dataset and parameters. But it is possible to squeeze out more performance by methods like instruction tuning that allow few-shot learning (also called in-context learning; ICL) and zero-shot learning. This way, user can provide prompts with instructions and expect model to perform tasks accordingly.

Think pretraining as barebones for building world knowledge and instruction tuning as lessons on problem solving.

How?

It’s simple, just use the dataset with an input construction as described below. Different models use different approaches but the idea is same, provide instructions that have details of the task and then ask model to predict the output.

Untitled

Untitled

ref(FLAN-T5)

Stanford NLP group released Alpaca: an instruction-tuned model that starts from LLaMA and uses instructions generated by GPT-3 as a dataset

Examples from data used to train Alpaca

Instruction Output
Give three tips for staying healthy. 1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.
Give two examples of a liquid. Here are two examples of a liquid:1. Water: Water is a clear, colorless and odorless liquid that is vital for all forms of life. It exists in lakes, rivers, oceans, and even underground.2. Oil: Oil is a viscous liquid that is derived from petroleum. It is typically used as a fuel or lubricant in various industries, such as transportation and manufacturing.

Untitled

ref(https://arxiv.org/pdf/2110.08207.pdf)

Conclusion

It’s nice but the amount of data needed is still pretty large (~10k-100k). Also, deciding the best format for instructions and output is another unbounded experiment.

References

Outerbounds Blog - Beautifully covers Instruction Tuning

Flan-T5 - introduces “instruction tuning”

Multi-task finetuning paper

OpenAI paper Instruct GPT - RLHF with Supervised finetuning step that is instruction tuning

Alpaca - Open source instruction tuned model

scaling law