Towards Safe and Controlled Large Language Models
Date:
9:00am – 12:00am (UTC/GMT+8, AWST) on Monday, 25 November 2024
Location: Level 3, RMIT University Building 80, Melbourne, VIC, Australia.
Large Language Models (LLMs) have significantly advanced natural language generation, yet they can produce dangerously flawed outputs, such as misinformation in medical advice or content that violates ethical standards. This tutorial explores these pressing issues and presents cutting-edge methods for creating safer, more controlled LLMs. Part I focuses on understanding hallucinations—instances where LLMs generate confidently incorrect or misleading information. It presents a taxonomy of hallucinations, and covers techniques for detecting them, including uncertainty assessment methods, score-based detection, and advanced model-based approaches.
Part II studies various strategies to mitigate hallucinations and enhance LLM reliability, such as reinforcement learning with human feedback to align models with human values, and direct preference optimization to fine-tune behaviour. It also examines prompt optimization methods to guide LLMs towards safer, more accurate responses without extensive retraining. Through practical examples and case studies, participants will learn essential tools and techniques to build LLMs that are not only powerful but also safe, reliable, and aligned with ethical standards.
This tutorial is primarily aimed at students and academics working with large language models (LLMs). It is also open to research engineers and industry professionals who need to apply LLMs safely and in a controlled manner in their work. While a basic familiarity with LLMs is expected, additional knowledge of deep learning and neural networks will be helpful. No special equipment is needed; however, attendees are encouraged to bring laptops for hands-on experimentation with the models.
Technical Understanding: Participants will develop a deeper understanding of the challenges of controlling and safely deploying LLMs, focusing on methods to reduce hallucinations and improve alignment.
Practical Skills: Attendees will gain hands-on experience in implementing various techniques for enhancing LLM reliability and safety, including practical demonstrations in prompt optimization, hallucination mitigation, and alignment training.
Time | Topic | Duration | |
---|---|---|---|
Part I: Detecting LLM Hallucinations and Uncertainty (70 minutes) | |||
9:00 AM – 9:05 AM | Introduction and Overview | 5 minutes | |
∘ Introduction to the tutorial's objectives, structure, and its importance in the field | |||
9:05 AM – 9:15 AM | Understanding LLM Hallucinations | 10 minutes | |
∘ Overview of different types and causes of hallucinations produced by LLMs | |||
9:15 AM – 9:55 AM | Methods for Detecting Hallucinations | 40 minutes | |
∘ Score-Based Techniques: Heuristic and theoretical uncertainty scores | |||
∘ Model-Based Approaches: LLM evaluators and conformal predictors | |||
9:55 AM – 10:10 AM | Q&A and Demo | 15 minutes | |
Break | |||
10:10 AM – 10:40 AM | Break | 30 minutes | |
Part II: Fixing Hallucinations and Ensuring Reliable Outputs (80 minutes) | |||
10:40 AM – 11:00 AM | Inference-Time Hallucination Mitigation | 20 minutes | |
∘ Decoding adjustment | |||
∘ Representation steering | |||
11:00 AM – 11:20 AM | Alignment Training | 20 minutes | |
∘ RLHF: Aligning LLM behaviour with human feedback | |||
∘ Direct Preference Optimization: Optimizing outputs based on user preferences | |||
11:20 AM – 11:40 AM | Prompt Optimization Strategies | 20 minutes | |
∘ Heuristic Prompting | |||
∘ Prompt Optimization | |||
11:40 AM – 11:50 AM | Q&A | 10 minutes | |
11:50 AM – 12:00 PM | Conclusion and Wrap-Up | 10 minutes |
Dr. Hung Le is an Australian Research Council DECRA Fellow and a Research Lecturer at Deakin University, Australia. He is a senior member of the Applied Artificial Intelligence Institute (A2I2) where he currently supervises several PhD students in research areas focused on machine learning (ML) and reinforcement learning (RL). He specializes in deep learning and is dedicated to pioneering new agents equipped with artificial neural memory. His extensive work includes multi-modal, adaptive, and generative memory, efficient policy optimization, and memory-based reinforcement learning agents. With applications spanning health, dialogue agents, robotics, reinforcement learning, machine reasoning, and natural language processing, Dr. Le consistently publishes in premier ML/RL/AI conferences and journals, including ICLR, NeurIPS, ICML, AAAI, IJCAI, TMLR, KDD, NAACL, ECCV, and AAMAS. He earned his Bachelor of Engineering (Honors) from Hanoi University of Science and Technology and completed his PhD in Computer Science at Deakin University in 2015 and 2020, respectively.
Manh Nguyen is a first-year PhD student at A2I2, specializing in LLM Explainability. He earned his Bachelor of Electronics and Communications Engineering from Hanoi University of Science and Technology in 2017.
Related Blogs
- Hallucination Detection- Human-aligned LLMs
Demo code
Tutorial codeSlides
- PDF- SlideShare