Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

基本信息

摘要

Modern large language models (LLMs) like ChatGPT have shown remarkable performance on general language tasks but still struggle on complex reasoning tasks, which drives the research on cognitive behaviors of LLMs to explore human-like problem-solving strategies. Along this direction, one representative strategy is self-reflection, which asks an LLM to refine the solution with the feedback generated by itself iteratively. However, our study shows that such reflection-style methods suffer from the Degeneration-of-Thought (DoT) problem: once the LLM has established confidence in its solutions, it is unable to generate novel thoughts later through reflection even if its initial stance is incorrect. To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of “tit for tat” and a judge manages the debate process to obtain a final solution. Clearly, our MAD framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation. Experiment results on two challenging datasets, commonsense machine translation and counter-intuitive arithmetic reasoning, demonstrate the effectiveness of our MAD framework. Extensive analyses suggest that the adaptive break of debate and the modest level of “tit for tat” state are required for MAD to obtain good performance. Moreover, we find that LLMs might not be a fair judge if different LLMs are used for agents. Code is available at https://github.com/Skytliang/Multi-Agents-Debate.

核心贡献

  1. 揭示思维退化问题(Degeneration-of-Thought, DoT):发现现有的自反思(self-reflection)方法存在严重缺陷——一旦 LLM 对自身方案建立信心,即使初始立场错误,后续反思也无法产生新的有效思考,导致思维僵化。
  2. 提出多智能体辩论框架(MAD):通过多个智能体以”针锋相对”(tit for tat)的方式表达论点,配合裁判管理辩论过程,有效鼓励 LLM 的发散性思维(divergent thinking),突破 DoT 困境。
  3. 发现 LLM 裁判的公平性问题:实验发现当使用不同 LLM 作为辩论智能体时,LLM 裁判可能无法保持公平判断,存在偏好倾向。
  4. 辩论过程的关键调控因素:识别出辩论的自适应终止(adaptive break)和适度的对抗程度是 MAD 取得良好性能的关键因素。
  5. 开源代码:代码公开于 GitHub(Skytliang/Multi-Agents-Debate),便于复现和扩展。

方法概述

MAD(Multi-Agent Debate)框架的核心方法如下:

  1. 多智能体辩论机制:初始化多个 LLM 智能体,每个智能体独立生成对问题的回答。随后进入多轮辩论,每个智能体根据其他智能体的观点修改或坚持自己的论点,形成”针锋相对”的辩论态势。
  2. 裁判管理:引入一个裁判智能体(或同一 LLM 的裁判角色)监控辩论过程,判断各方论点的质量,管理辩论的进行和终止。
  3. 自适应终止:辩论并非固定轮次,而是根据论点的收敛程度和一致性动态决定何时终止,避免过度辩论或过早停止。
  4. 发散性思维鼓励:与自反思方法不同,MAD 通过外部不同视角的碰撞来激发新的思考,而非依赖 LLM 自身的迭代改进,从而有效避免了 DoT 问题。
  5. 最终方案生成:辩论结束后,裁判综合各方论点,生成最终解决方案。

实验结果

MAD 在两个具有挑战性的数据集上进行了实验验证:

相关概念

分析信息


导入时间: 2026-05-01 23:30 导入方式: url