Multi-Agent Collaborative Language Model Alignment

Published:

  • Funding: $10,000 USD in API Credit.
  • Investigator: Hung Le
  • Role: Sole Principle Investigator.
  • Abstract: This project focuses on aligning large language models (LLMs) with human intentions through multi-agent collaboration, both during the training process and in executing downstream tasks such as decision-making and time-series forecasting. Traditional preference learning methods typically rely on a single LLM reference model, limiting adaptability and the diversity of perspectives. We propose a novel approach where multiple LLMs interact and optimize a primary LLM’s alignment to human preferences using Reinforcement Learning (RL) and Direct Preference Optimization (DPO). The resulting framework expects to incorporate LLMs’ distinct architectures and pretraining for greater robustness and better alignment with human’s objectives. Cohere LLM models will play a critical role in this process, acting as reference agents for preference learning tasks, generating preference data, and assessing the output of trained models to ensure alignment with human values and task objectives throughout the collaboration.