Towards Safe and Controlled Large Language Models

Date: November 25, 2024

AJCAI 2024 Tutorial: Towards Safe and Controlled Large Language Models

Hung Le and Manh Nguyen, Deakin University

Time and Location

9:00am – 12:00am (UTC/GMT+8, AWST) on Monday, 25 November 2024
Location: Level 3, RMIT University Building 80, Melbourne, VIC, Australia.

Abstract

Large Language Models (LLMs) have significantly advanced natural language generation, yet they can produce dangerously flawed outputs, such as misinformation in medical advice or content that violates ethical standards. This tutorial explores these pressing issues and presents cutting-edge methods for creating safer, more controlled LLMs. Part I focuses on understanding hallucinations—instances where LLMs generate confidently incorrect or misleading information. It presents a taxonomy of hallucinations, and covers techniques for detecting them, including uncertainty assessment methods, score-based detection, and advanced model-based approaches.
Part II studies various strategies to mitigate hallucinations and enhance LLM reliability, such as reinforcement learning with human feedback to align models with human values, and direct preference optimization to fine-tune behaviour. It also examines prompt optimization methods to guide LLMs towards safer, more accurate responses without extensive retraining. Through practical examples and case studies, participants will learn essential tools and techniques to build LLMs that are not only powerful but also safe, reliable, and aligned with ethical standards.

Target Audience

This tutorial is primarily aimed at students and academics working with large language models (LLMs). It is also open to research engineers and industry professionals who need to apply LLMs safely and in a controlled manner in their work. While a basic familiarity with LLMs is expected, additional knowledge of deep learning and neural networks will be helpful. No special equipment is needed; however, attendees are encouraged to bring laptops for hands-on experimentation with the models.

Expected Outcomes

Technical Understanding: Participants will develop a deeper understanding of the challenges of controlling and safely deploying LLMs, focusing on methods to reduce hallucinations and improve alignment.
Practical Skills: Attendees will gain hands-on experience in implementing various techniques for enhancing LLM reliability and safety, including practical demonstrations in prompt optimization, hallucination mitigation, and alignment training.

Tutorial Outline

Time	Topic	Duration
Part I: Detecting LLM Hallucinations and Uncertainty (70 minutes)
9:00 AM – 9:05 AM	Introduction and Overview	5 minutes
	∘ Introduction to the tutorial's objectives, structure, and its importance in the field
9:05 AM – 9:15 AM	Understanding LLM Hallucinations	10 minutes
	∘ Overview of different types and causes of hallucinations produced by LLMs
9:15 AM – 9:55 AM	Methods for Detecting Hallucinations	40 minutes
	∘ Score-Based Techniques: Heuristic and theoretical uncertainty scores
	∘ Model-Based Approaches: LLM evaluators and conformal predictors
9:55 AM – 10:10 AM	Q&A and Demo	15 minutes
Break
10:10 AM – 10:40 AM	Break	30 minutes
Part II: Fixing Hallucinations and Ensuring Reliable Outputs (80 minutes)
10:40 AM – 11:00 AM	Inference-Time Hallucination Mitigation	20 minutes
	∘ Decoding adjustment
	∘ Representation steering
11:00 AM – 11:20 AM	Alignment Training	20 minutes
	∘ RLHF: Aligning LLM behaviour with human feedback
	∘ Direct Preference Optimization: Optimizing outputs based on user preferences
11:20 AM – 11:40 AM	Prompt Optimization Strategies	20 minutes
	∘ Heuristic Prompting
	∘ Prompt Optimization
11:40 AM – 11:50 AM	Q&A	10 minutes
11:50 AM – 12:00 PM	Conclusion and Wrap-Up	10 minutes

About Presenter

Dr. Hung Le is an Australian Research Council DECRA Fellow and a Research Lecturer at Deakin University, Australia. He is a senior member of the Applied Artificial Intelligence Institute (A2I2) where he currently supervises several PhD students in research areas focused on machine learning (ML) and reinforcement learning (RL). He specializes in deep learning and is dedicated to pioneering new agents equipped with artificial neural memory. His extensive work includes multi-modal, adaptive, and generative memory, efficient policy optimization, and memory-based reinforcement learning agents. With applications spanning health, dialogue agents, robotics, reinforcement learning, machine reasoning, and natural language processing, Dr. Le consistently publishes in premier ML/RL/AI conferences and journals, including ICLR, NeurIPS, ICML, AAAI, IJCAI, TMLR, KDD, NAACL, ECCV, and AAMAS. He earned his Bachelor of Engineering (Honors) from Hanoi University of Science and Technology and completed his PhD in Computer Science at Deakin University in 2015 and 2020, respectively.
Manh Nguyen is a first-year PhD student at A2I2, specializing in LLM Explainability. He earned his Bachelor of Electronics and Communications Engineering from Hanoi University of Science and Technology in 2017.

Tutorial Materials

Related Blogs- Hallucination Detection 
 - Human-aligned LLMs 
Demo codeTutorial code 
Slides- PDF 
 - SlideShare 

Share on

Twitter Facebook LinkedIn