Below you will find pages that utilize the taxonomy term “LLM”

June 25, 2025

dify - Agent

基础实现

举例：WikiAgent

prompt

***xmi
‹instruction>
- The Al Agent should be knowledgeable about the TV show "The Office".
- If the question asked is not related to "The Office" or if the Al does not know the answer, it should search for the answer using the Google search tool.
- The output should not contain any XML tags.

<example>
- If asked "Who is the regional manager in 'The Office'?", the Al should provide the correct answer.
- If asked "What year did 'The Office' first premiere?", the Al should provide the correct answer or search for it if unknown.

Agent Workflow

Prompt Chaining

将任务分解为关键步骤，用gate来验证前面的输出是否符合后续处理的条件。

June 24, 2025

langchain - 混合搜索

先通过BM25快速筛选关键字，再用Reranker对候选文档进行精细排序。

def keyword_and_reranking_search(query, top_k=3, num_candidates=10):
    print("Input question:", query)

    ##### BM25 search (lexical search) #####
    bm25_scores = bm25.get_scores(bm25_tokenizer(query))
    top_n = np.argpartition(bm25_scores, -num_candidates)[-num_candidates:]		# 选取分数最高的 num_candidates 个文档
    bm25_hits = [{'corpus_id': idx, 'score': bm25_scores[idx]} for idx in top_n]
    bm25_hits = sorted(bm25_hits, key=lambda x: x['score'], reverse=True)

    print(f"Top-3 lexical search (BM25) hits")
    for hit in bm25_hits[0:top_k]:
        print("\t{:.3f}\t{}".format(hit['score'], texts[hit['corpus_id']].replace("\n", " ")))

    
    #Add re-ranking
    docs = [texts[hit['corpus_id']] for hit in bm25_hits]

    print(f"\nTop-3 hits by rank-API ({len(bm25_hits)} BM25 hits re-ranked)")
    results = co.rerank(query=query, documents=docs, top_n=top_k, return_documents=True)
    for hit in results.results:
        print("\t{:.3f}\t{}".format(hit.relevance_score, hit.document.text.replace("\n", " ")))

bm25
基于词频和逆文档频率，计算每个文档与查询的关键词匹配分数。

June 20, 2025

langchain - Agent

可将 Agent 视为 state machine

agent_react_docstore

June 12, 2025

Prompting Guide

可以在 langchainhub 上找 prompt

1. Agentic Workflows

System Prompt Reminders

在提示中包含三种关键类型的提醒：

持久性

确保模型理解它正在进入多消息轮次，并防止它过早地将控制权交还给用户。示例如下：

You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved.

工具调用

鼓励模型充分利用其工具，并降低其产生幻觉或猜测答案的可能性。示例如下：

If you are not sure about file content or codebase structure pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.

规划 [可选]
可确保模型在文本中明确规划和反映每个工具调用，而不是通过将一系列单独的工具调用链接在一起来完成任务。示例如下：

May 11, 2025

LLM - 4.大模型增强 - RAG

0-1

May 2, 2025

LLM - 3.指令理解阶段(核心) - 强化学习

May 1, 2025

LLM - 3.指令理解阶段(核心) - 指令微调

指令微调又称有监督微调，旨在使模型具备指令遵循（Instruction Following）能力。

核心问题：如何构造指令数据？如何高效低成本地进行指令微调训练？如何在语言模型基础上进一步扩大上下文？

April 27, 2025

LLM - 2.预训练阶段

April 20, 2025

LLM - 1.基础理论

April 16, 2025

BERT

名字来源：美国的一个动画片芝麻街里的主人公

论文：https://arxiv.org/abs/1810.04805

NLP 里的迁移学习

在 bert 之前：使用预训练好的模型来抽取词、句子的特征
- 如用 word2vec 或语言模型（当作embedding层）
- 不更新预训练好的模型
- 缺点
  - 需要构建新的网络来抓取新任务需要的信息
  - Word2vec 忽略了时序信息，语言模型只看了一个方向
bert 的动机
- 基于微调的 NLP 模型
  前面的层不用动，改最后一层的 output layer 即可
- 预训练的模型抽取了足够多的信息，新的任务只需要增加一个简单的输出层

BERT 架构

本质：一个砍掉解码器、只有编码器的 transformer

bert 的工作：证明了效果非常好

两个版本：
Base: #blocks=12, hidden size=768, #heads=12, #parameters=110M
Large: #blocks=24, hidden size=1024, #heads=1, #parameter=340M
在大规模数据上训练>3B词

March 19, 2025

微调

大模型预训练

1 从零开始的预训练

2 在已有开源模型基础上针对特定任务进行训练

LoRa

通过化简权重矩阵，实现高效微调

将loraA与loraB相乘得到一个lora权重矩阵，将lora权重矩阵加在原始权重矩阵上，就得到了对原始网络的更新。

训练参数量减少，但微调效果基本不变。

两个重要参数：

February 9, 2025

MLLM

1基础

1. 特征提取

一、CV中的特征提取

1. 传统方法（手工设计特征）

(1) 低级视觉特征：颜色、纹理、边缘与形状…

(2) 中级语义特征：SIFT（尺度不变特征变换）、SURF（加速鲁棒特征）、LBP（局部二值模式）…

2. 深度学习方法（自动学习特征）

(1) 卷积神经网络（CNN）

核心思想：通过卷积层提取局部特征，池化层降低维度，全连接层进行分类。
经典模型：LeNet-5、AlexNet、VGGNet、ResNet(使用残差可以训练更深的网络)…

(2) 视觉Transformer（ViT）

核心思想：将图像分割为小块（patches），通过自注意力机制建模全局关系。
优势：无需局部卷积先验，直接建模长距离依赖; 在ImageNet等任务上超越传统CNN。

February 8, 2025

transformer

一、Transformer架构

基于编码器-解码器架构来处理序列对
跟使用注意力的seq2seq不同，Transformer是纯基于注意力
- seq2seq
- transformer