「论文阅读」Triton LLM

AlphaEvolve: A coding agent for scientific and algorithmic discovery

AlphaEvolve 使用进化方法，不断接收来自一个或多个评估者的反馈，迭代改进算法，从而有可能带来新的科学和实践发现。

Introduction

AlphaEvolve represents the candidates (for example, new mathematical objects or practical heuristics) as algorithms and uses a set of LLMs to generate, critique, and evolve a pool of such algorithms.

AlphaEvolve

June 25, 2025

「论文阅读」Kimi-Researcher

这篇技术报告提出了完全通过端到端 agentic reinforcement learning 进行训练的自主智能体 Kimi-Researcher，旨在通过多步骤规划、推理和工具使用来解决复杂问题。

—— End-to-end agentic RL is promising but challenging

传统 agent

基于工作流：需要随着模型或环境的变化而频繁手动更新，缺乏可扩展性和灵活性。
使用监督微调 (SFT)进行模仿学习：在数据标记方面存在困难；特定的工具版本紧密耦合。

Kimi-Researcher：给定一个查询，agent 探索大量可能的策略，获得正确解决方案的奖励 —— 所有技能（规划、感知和工具使用）都是一起学习的，无需手工制作的rule/workflow。

建模

给定状态观察(如系统提示符、工具声明和用户查询)，Kimi-Researcher 会生成 think和action (action 可以是工具调用，也可以是终止轨迹的指示)。

Approach

主要利用三个工具：a)并行、实时、内部的 search tool; b) 用于交互式 Web 任务的基于文本的 browser tool; c)用于自动执行代码的 coding tool.

June 25, 2025

dify - Agent

基础实现

举例：WikiAgent

prompt

***xmi
‹instruction>
- The Al Agent should be knowledgeable about the TV show "The Office".
- If the question asked is not related to "The Office" or if the Al does not know the answer, it should search for the answer using the Google search tool.
- The output should not contain any XML tags.

<example>
- If asked "Who is the regional manager in 'The Office'?", the Al should provide the correct answer.
- If asked "What year did 'The Office' first premiere?", the Al should provide the correct answer or search for it if unknown.

Agent Workflow

Prompt Chaining

将任务分解为关键步骤，用gate来验证前面的输出是否符合后续处理的条件。

Articles

The soul selects her own society, then shuts the door. To her divine majority, present no more.