关于人工智能:摆脱-OpenAI-依赖8-分钟教你用开源生态构建全栈-AI-应用

大模型时代的到来使得 AI 利用开发变得更加轻松、省时，尤其是在 CVP Stack 的范式下，开发者甚至能够用一个周末的工夫做出一个残缺的应用程序。

本文将利用实践于实际，给大家演示如何利用 Milvus、Xinference、Llama 2-70B 开源模型和 LangChain，构筑出一个全功能的问答零碎。Xinference 使得本地部署 Llama 2 模型变得简洁高效，而 Milvus 则提供了高效的向量存储和检索能力。

解脱对 OpenAI 的依赖，借助开源生态系统构建出全流程的 AI 利用，当初开始！

01.我的项目介绍

Milvus

Milvus（https://milvus.io/docs/overview.md）是一个向量数据库，其次要性能是存储、索引和治理大规模的嵌入向量，这些向量由深度神经网络和其余机器学习模型生成。与传统的关系数据库不同，Milvus 专门解决输出向量的查问，并可能索引规模达到万亿级别的向量。

Milvus 的设计从底层开始，特地思考了解决来自非结构化数据的嵌入向量。随着互联网的倒退，非结构化数据如电子邮件、论文、传感器数据、照片等变得越来越广泛。为了让计算机可能了解和解决这些非结构化数据，这些数据会被转换成向量，应用嵌入技术。Milvus 的工作就是存储和索引这些向量，并通过计算向量之间的类似度来剖析它们的相关性。

Xinference

Xinference（https://github.com/xorbitsai/inference）使得本地模型部署变得非常简单。用户能够轻松地一键下载和部署内置的各种前沿开源模型，例如 Llama 2（https://ai.meta.com/llama/）、chatglm2、通义千问等。为了让应用 OpenAI API 的用户可能无缝迁徙，Xinference 提供了与 OpenAI 兼容的 RESTful 接口。与 OpenAI 这样的专有大模型计划相比，Xinference 有以下劣势：

更平安：在私有化部署下，数据齐全不外流，大大降低了数据泄露的危险。

老本更低：与OpenAI的LLM服务相比，私有化的LLM容许用户在定制化的根底上，用更小的模型达到类似的成果。这能够大大减少硬件需要并进步推理效率。

可定制：用户能够基于开源根底模型，应用本人的数据集进行微调，从而创立一个属于本人的模型。

Xinference 还能够在分布式集群中部署，实现高并发推理，并简化了扩容和缩容的过程。Xinference 不仅反对在CPU上进行推理，而且在 GPU 忙碌时，能够将局部计算工作交给 CPU 来实现，从而进步零碎的吞吐率。

LangChain

LangChain（https://github.com/langchain-ai/langchain）为开发基于语言模型的利用提供了一个灵便且易用的框架，使利用可能与数据源交互并高效地适应其环境。它不仅提供了各种性能组件，还为每一层的形象都提供了多种实现形式。在这个示例中，LangChain 胜利地将诸如 Milvus、Xinference Embedding 和 Xinference LLM 等模块连接起来。咱们应用 LangChain 对工作流程进行编排，极大地简化了 AI 利用的开发过程。

以下是问答零碎的工作流程图：

02.具体操作

装置&启动服务

通过 PyPI 装置 LangChain、Milvus 和 Xinference：

pip install langchain milvus "xinference[all]"

这条命令将在本地 19530 端口启动 Milvus 向量检索服务：

$ milvus-server

这条命令将在本地 9997 端口启动 Xinference 模型推理服务：

$ xinference

部署 Llama 2 和 Falcon 模型

在本示例中，咱们将通过 Xinference 的命令行工具的 launch命令在本地部署两个模型服务：

Falcon-40B-Instruct：Falcon-40B-Instruct 是一个具备 400 亿万参数的因果解码器模型，它在 Baize 数据集的混合数据上进行了微调。咱们将应用这个模型来为文档块生成词向量。

Llama 2-Chat-70B：Llama 2系列模型是一组GPT-like (也被称作自回归Transformer 模型)大语言模型，Llama 2-Chat 针对聊天对话场景进行了微调，采纳了监督微调（SFT）和人类反馈强化学习（RLHF）进行了迭代优。咱们将应用这个模型作为 LLM 后端，进行对话。

启动 Falcon-40B-Instruct 模型：

xinference launch --model-name "falcon-instruct" \   --model-format pytorch \   -size-in-billions 40 \    --endpoint "http://127.0.0.1:9997"

启动 Llama 2-Chat-70B 模型：

$ xinference launch --model-name "llama-2-chat" \   --model-format ggmlv3 \  --size-in-billions 70 \  --endpoint "http://127.0.0.1:9997"

上述两个命令都会返回 model_uid，能够利用它在 LangChain 中与它们交互。要理解如何在集群中部署 Xinference，可参考 Xinference 的 README。

用Xinference Embeddings抽取向量

在这个示例中，咱们抉择了这个文件（https://github.com/hwchase17/chat-your-data/blob/master/state_of_the_union.txt）来作为问答零碎的“知识库”，能够将它替换为其余内容，或者减少更多的文档。咱们应用 LangChain 的 TextLoader 和 RecursiveCharacterTextSplitter 来记录以及对文档分块。

from langchain.document_loaders import TextLoaderfrom langchain.text_splitter import RecursiveCharacterTextSplitterloader = TextLoader("../../state_of_the_union.txt") # 替换成任何你想要进行问答的txt文件documents = loader.load()text_splitter = RecursiveCharacterTextSplitter(    chunk_size = 512,    chunk_overlap  = 100,    length_function = len,)docs = text_splitter.split_documents(documents)

连贯到咱们在上一步中创立的 Xinference Embedding 服务端点，即 Falcon-40B-Instruct 模型。之后，咱们能够应用embed_query或 embed_documents办法来提取查问或文档片段的文本向量。

from langchain.embeddings import XinferenceEmbeddingsxinference_embeddings = XinferenceEmbeddings(    server_url="http://127.0.0.1:9997", # 换成设置的url，这里用的是默认端口    model_uid = {model_uid}  # 替换成之前返回的Falcon-Instruct模型model_uid)

用Milvus实现向量搜寻

通过方才的步骤，咱们将一篇长文章进行了分块，并且对每个分块进行了向量化。接下来咱们借助 LangChain 提供的 from_documents办法将向量化后的文档写入了 Milvus：

from langchain.vectorstores import Milvusvector_db = Milvus.from_documents(    docs,    xinference_embeddings,    connection_args={"host": "0.0.0.0", "port": "19530"},)

而后咱们就能够开始对文档进行搜寻了！在这里咱们应用了 LangChain 提供的 similarity_search（https://python.langchain.com/docs/modules/model_io/prompts/ex...）接口，它的原理是寻找和 query 具备最大余弦类似度的 Embedding，因而它召回的内容也都是文本中的原句。

query = "what does the president say about Ketanji Brown Jackson"docs = vector_db.similarity_search(query, k=10)print(docs[0].page_content)

后果如下，能够看出 Top-k 答复中的第一个答复就给出了文本中对于 Ketanji Brown Jackson 的上下文原句。

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.

用 Xinference LLM 构建对话式问答零碎

接下来咱们将演示如何利用 LLM 对文本内容进行演绎和总结，并且借助 LangChain 发明对话式的问答体验。

首先，咱们引入 LangChain 的 Xinference LLM 模块，用咱们之前启动的 Llama2 模型作为 LLM 来提供对话的能力：

from langchain.llms import Xinferencexinference_llm = Xinference(    server_url="http://127.0.0.1:9997", # 换成设置的url，这里用的是默认端口    model_uid = {model_uid} # 替换成上一步返回的Llama 2 chat模型的model_uid)

咱们先试试看，在不利用文档信息的前提下，看看 Llama 2 会给出什么答案：

xinference_llm(prompt="What did the president say about Ketanji Brown Jackson?")

'\nWhat did the president say about Ketanji Brown Jackson?\nPresident Joe Biden called Judge Ketanji Brown Jackson a "historic" and "inspiring" nominee when he introduced her as his pick to replace retiring Supreme Court Justice Stephen Breyer. He highlighted her experience as a public defender and her commitment to justice and equality, saying that she would bring a unique perspective to the court.\n\nBiden also praised Jackson\'s reputation for being a "fair-minded" and "thoughtful" jurist who is known for her ability to build'

为了让 LLM 能记住上下文，咱们用 LangChain 中 ConversationBufferMemory 模块的创立一个领有“记忆”对象，用来保留对话的历史记录。

from langchain.memory import ConversationBufferMemorymemory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

咱们应用 LangChain 中的 ConversationalRetrievalChain 作为外围模块，LangChain 替咱们解决了和 Memory 打交道并且从向量数据库中召回文档的细节，使得咱们在开发利用的过程中无需关注它们的实现：首先，LangChain 会将聊天历史（这里是从提供的 Memory 中检索的）与以后问题合并，造成一个独立的问题；而后，依据独立的问题从检索器中查找相干文档；最初，将检索到的文档和独立的问题传递给问答链，生成答复。

from langchain.chains import ConversationalRetrievalChainchain = ConversationalRetrievalChain.from_llm(    llm=xinference_llm,    retriever=vector_db.as_retriever(),    memory=memory)

创立好这个Chain之后，咱们就能够开始对文档进行发问了：

query = "What did the president say about Ketanji Brown Jackson"result = chain({"question": query})print(result["answer"])

' According to the provided text, President Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago, and that she is one of our nation’s top legal minds who will continue Justice Breyer’s legacy of excellence.'

比照之前不借助文档间接发问的成果，应用了文档搜寻之后的答复看起来更靠谱了，因为 LLM 应用了咱们提前准备好的“知识库”来答复，并且对文档内容进行了肯定的总结。

因为有 Memory 的加持，咱们还可能和 LLM 继续对话：

query = "Did he mention who she succeeded"result = chain({"question": query})print(result["answer"])

'According to the given text, President Biden said that Ketanji Brown Jackson succeeded Justice Breyer on the Supreme Court.'

能够看到 Llama 2 精准地意识到“he”指代上一个 query 中的 the president， “she” 指代上一个 query 中的 Ketanji Brown Jackson。

再换一个问题，这次 Llama 2 找到了文章中对于总统对 COVID-19 的评论，并总结出了相干的答案：

query = "Summarize the President's opinion on COVID-19"result = chain({"question": query})print(result['answer'])

'  According to the text, the president views COVID-19 as a "God-awful disease" and wants to move forward in addressing it in a unified manner, rather than allowing it to continue being a partisan dividing line.'

03.总结

在这篇文章中，咱们以构建本地全功能问答零碎为例，展现如何奇妙地交融 Xinference 和 Milvus 两大弱小的开源工具，并通过 LangChain 实现顺畅的串联，并且利用了开源模型的弱小为例，突破了被繁多大模型 API 解放的枷锁。咱们坚信，通过继续摸索和技术的协同，肯定能够实现更多激动人心的利用与技术冲破，发明更大的价值。

参考链接：

https://python.langchain.com/docs/get_started/introduction.html
https://python.langchain.com/docs/integrations/vectorstores/m...
https://python.langchain.com/docs/use_cases/question_answerin...
https://xorbits.cn/blogs/xorbits-inference
https://zhuanlan.zhihu.com/p/644659157
https://zhuanlan.zhihu.com/p/645878506

「寻找 AIGC 时代的 CVP 实际之星」专题流动行将启动！

Zilliz 将联合国内头部大模型厂商一起甄选利用场景，由单方提供向量数据库与大模型顶级技术专家为用户赋能，一起打磨利用，晋升落地成果，赋能业务自身。

如果你的利用也适宜 CVP 框架，且正为利用落地和实际效果发愁，可间接申请参加流动，取得最业余的帮忙和领导！分割邮箱为 business@zilliz.com。

如果在应用 Milvus 或 Zilliz 产品有任何问题，可增加小助手微信 “zilliz-tech” 退出交换群。
欢送关注微信公众号“Zilliz”，理解最新资讯。

本文由mdnice多平台公布