Towards Safe and Controlled Large Language Models

Date:

WebsiteSlideCode

AJCAI 2024 Tutorial: Towards Safe and Controlled Large Language Models


Hung Le and Manh Nguyen, Deakin University


logo Presenter logo



Time and Location

9:00am12:00am (UTC/GMT+8, AWST) on Monday, 25 November 2024
Location: Level 3, RMIT University Building 80, Melbourne, VIC, Australia.

Abstract


Large Language Models (LLMs) have significantly advanced natural language generation, yet they can produce dangerously flawed outputs, such as misinformation in medical advice or content that violates ethical standards. This tutorial explores these pressing issues and presents cutting-edge methods for creating safer, more controlled LLMs. Part I focuses on understanding hallucinations—instances where LLMs generate confidently incorrect or misleading information. It presents a taxonomy of hallucinations, and covers techniques for detecting them, including uncertainty assessment methods, score-based detection, and advanced model-based approaches.
Part II studies various strategies to mitigate hallucinations and enhance LLM reliability, such as reinforcement learning with human feedback to align models with human values, and direct preference optimization to fine-tune behaviour. It also examines prompt optimization methods to guide LLMs towards safer, more accurate responses without extensive retraining. Through practical examples and case studies, participants will learn essential tools and techniques to build LLMs that are not only powerful but also safe, reliable, and aligned with ethical standards.

Target Audience


This tutorial is primarily aimed at students and academics working with large language models (LLMs). It is also open to research engineers and industry professionals who need to apply LLMs safely and in a controlled manner in their work. While a basic familiarity with LLMs is expected, additional knowledge of deep learning and neural networks will be helpful. No special equipment is needed; however, attendees are encouraged to bring laptops for hands-on experimentation with the models.

Expected Outcomes


Technical Understanding: Participants will develop a deeper understanding of the challenges of controlling and safely deploying LLMs, focusing on methods to reduce hallucinations and improve alignment.
Practical Skills: Attendees will gain hands-on experience in implementing various techniques for enhancing LLM reliability and safety, including practical demonstrations in prompt optimization, hallucination mitigation, and alignment training.

Tutorial Outline


TimeTopicDuration
Part I: Detecting LLM Hallucinations and Uncertainty (70 minutes)
9:00 AM – 9:05 AMIntroduction and Overview5 minutes
∘ Introduction to the tutorial's objectives, structure, and its importance in the field
9:05 AM – 9:15 AMUnderstanding LLM Hallucinations10 minutes
∘ Overview of different types and causes of hallucinations produced by LLMs
9:15 AM – 9:55 AMMethods for Detecting Hallucinations40 minutes
∘ Score-Based Techniques: Heuristic and theoretical uncertainty scores
∘ Model-Based Approaches: LLM evaluators and conformal predictors
9:55 AM – 10:10 AMQ&A and Demo15 minutes
Break
10:10 AM – 10:40 AMBreak30 minutes
Part II: Fixing Hallucinations and Ensuring Reliable Outputs (80 minutes)
10:40 AM – 11:00 AMInference-Time Hallucination Mitigation20 minutes
∘ Decoding adjustment
∘ Representation steering
11:00 AM – 11:20 AMAlignment Training20 minutes
∘ RLHF: Aligning LLM behaviour with human feedback
∘ Direct Preference Optimization: Optimizing outputs based on user preferences
11:20 AM – 11:40 AMPrompt Optimization Strategies20 minutes
∘ Heuristic Prompting
∘ Prompt Optimization
11:40 AM – 11:50 AMQ&A10 minutes
11:50 AM – 12:00 PMConclusion and Wrap-Up10 minutes


About Presenter


Dr. Hung Le is an Australian Research Council DECRA Fellow and a Research Lecturer at Deakin University, Australia. He is a senior member of the Applied Artificial Intelligence Institute (A2I2) where he currently supervises several PhD students in research areas focused on machine learning (ML) and reinforcement learning (RL). He specializes in deep learning and is dedicated to pioneering new agents equipped with artificial neural memory. His extensive work includes multi-modal, adaptive, and generative memory, efficient policy optimization, and memory-based reinforcement learning agents. With applications spanning health, dialogue agents, robotics, reinforcement learning, machine reasoning, and natural language processing, Dr. Le consistently publishes in premier ML/RL/AI conferences and journals, including ICLR, NeurIPS, ICML, AAAI, IJCAI, TMLR, KDD, NAACL, ECCV, and AAMAS. He earned his Bachelor of Engineering (Honors) from Hanoi University of Science and Technology and completed his PhD in Computer Science at Deakin University in 2015 and 2020, respectively.
Manh Nguyen is a first-year PhD student at A2I2, specializing in LLM Explainability. He earned his Bachelor of Electronics and Communications Engineering from Hanoi University of Science and Technology in 2017.

Tutorial Materials


Related Blogs

- Hallucination Detection
- Human-aligned LLMs

Demo code

Tutorial code

Slides

- PDF
- SlideShare