Skip to content

Commit 16ae521

Browse files
authored
✨ feat: add Brave & Google PSE & Kagi as build-in Search Provider (#8172)
* ✨ feat: add Brave Search as build-in Search Provider * 🐛 fix: fix build error * ✨ feat: add Kagi as build-in Search Provider * 🐛 fix: fix build error * ✨ feat: add Google PSE as build-in Search Provider * ✨ feat: add Anspire (安思派) as build-in Search Provider * 🐛 fix: fix build error * 📝 docs: update `online-search` docs
1 parent ad47d4d commit 16ae521

File tree

11 files changed

+927
-9
lines changed

11 files changed

+927
-9
lines changed

docs/self-hosting/advanced/online-search.mdx

Lines changed: 123 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,134 @@ tags:
1414

1515
# Configuring Online Search Functionality
1616

17-
LobeChat supports configuring online search functionality for AI, allowing it to access the latest web information and provide more accurate and timely responses. The online search feature is based on the [SearXNG](https://github.com/searxng/searxng) search engine, which is a privacy-respecting metasearch engine that aggregates results from multiple search engines.
17+
LobeChat supports configuring **web search functionality** for AI, enabling it to retrieve real-time information from the internet to provide more accurate and up-to-date responses. Web search supports multiple search engine providers, including [SearXNG](https://github.com/searxng/searxng), [Search1API](https://www.search1api.com), [Google](https://programmablesearchengine.google.com), and [Brave](https://brave.com/search/api), among others.
1818

19-
<Callout type={'info'}>
20-
SearXNG is an open-source metasearch engine that can be self-hosted or accessed via public
21-
instances. By configuring SearXNG, LobeChat enables AI to retrieve the latest internet
22-
information, allowing it to answer time-sensitive questions and provide up-to-date news.
19+
<Callout type="info">
20+
Web search allows AI to access time-sensitive content, such as the latest news, technology trends, or product information. You can deploy the open-source SearXNG yourself, or choose to integrate mainstream search services like Search1API, Google, Brave, etc., combining them freely based on your use case.
2321
</Callout>
2422

23+
By setting the search service environment variable `SEARCH_PROVIDERS` and the corresponding API Keys, LobeChat will query multiple sources and return the results. You can also configure crawler service environment variables such as `CRAWLER_IMPLS` (e.g., `browserless`, `firecrawl`, `tavily`, etc.) to extract webpage content, enhancing the capability of search + reading.
24+
2525
# Core Environment Variables
2626

27+
## `CRAWLER_IMPLS`
28+
29+
Configure available web crawlers for structured extraction of webpage content.
30+
31+
```env
32+
CRAWLER_IMPLS="native,search1api"
33+
```
34+
35+
Supported crawler types are listed below:
36+
37+
| Value | Description | Environment Variable |
38+
| ------------- | ------------------------------------------------------------------------------------------------------------------- | -------------------------- |
39+
| `browserless` | Headless browser crawler based on [Browserless](https://www.browserless.io/), suitable for rendering complex pages. | `BROWSERLESS_TOKEN` |
40+
| `exa` | Crawler capabilities provided by [Exa](https://exa.ai/), API required. | `EXA_API_KEY` |
41+
| `firecrawl` | [Firecrawl](https://firecrawl.dev/) headless browser API, ideal for modern websites. | `FIRECRAWL_API_KEY` |
42+
| `jina` | Crawler service from [Jina AI](https://jina.ai/), supports fast content summarization. | `JINA_READER_API_KEY` |
43+
| `native` | Built-in general-purpose crawler for standard web structures. | |
44+
| `search1api` | Page crawling capabilities from [Search1API](https://www.search1api.com), great for structured content extraction. | `SEARCH1API_CRAWL_API_KEY` |
45+
| `tavily` | Web scraping and summarization API from [Tavily](https://www.tavily.com/). | `TAVILY_API_KEY` |
46+
47+
> 💡 Setting multiple crawlers increases success rate; the system will try different ones based on priority.
48+
49+
---
50+
51+
## `SEARCH_PROVIDERS`
52+
53+
Configure which search engine providers to use for web search.
54+
55+
```env
56+
SEARCH_PROVIDERS="searxng"
57+
```
58+
59+
Supported search engines include:
60+
61+
| Value | Description | Environment Variable |
62+
| ------------ | ---------------------------------------------------------------------------------------- | ------------------------------------------- |
63+
| `anspire` | Search service provided by [Anspire](https://anspire.ai/). | `ANSPIRE_API_KEY` |
64+
| `bocha` | Search service from [Bocha](https://open.bochaai.com/). | `BOCHA_API_KEY` |
65+
| `brave` | [Brave](https://search.brave.com/help/api), a privacy-friendly search source. | `BRAVE_API_KEY` |
66+
| `exa` | [Exa](https://exa.ai/), a search API designed for AI. | `EXA_API_KEY` |
67+
| `firecrawl` | Search capabilities via [Firecrawl](https://firecrawl.dev/). | `FIRECRAWL_API_KEY` |
68+
| `google` | Uses [Google Programmable Search Engine](https://programmablesearchengine.google.com/). | `GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID` |
69+
| `jina` | Semantic search provided by [Jina AI](https://jina.ai/). | `JINA_READER_API_KEY` |
70+
| `kagi` | Premium search API by [Kagi](https://kagi.com/), requires a subscription key. | `KAGI_API_KEY` |
71+
| `search1api` | Aggregated search capabilities from [Search1API](https://www.search1api.com). | `SEARCH1API_CRAWL_API_KEY` |
72+
| `searxng` | Use a self-hosted or public [SearXNG](https://searx.space/) instance. | `SEARXNG_URL` |
73+
| `tavily` | [Tavily](https://www.tavily.com/), offers fast web summaries and answers. | `TAVILY_API_KEY` |
74+
75+
> ⚠️ Some search providers require you to apply for an API Key and configure it in your `.env` file.
76+
77+
---
78+
79+
## `BROWSERLESS_URL`
80+
81+
Specifies the API endpoint for [Browserless](https://www.browserless.io/), used for web crawling tasks. Browserless is a browser automation platform based on Headless Chrome, ideal for rendering dynamic pages.
82+
83+
```env
84+
BROWSERLESS_URL=https://chrome.browserless.io
85+
```
86+
87+
> 📌 Usually used together with `CRAWLER_IMPLS=browserless`.
88+
89+
---
90+
91+
## `GOOGLE_PSE_ENGINE_ID`
92+
93+
Configure the Search Engine ID for Google Programmable Search Engine (Google PSE), used to restrict the search scope. Must be used alongside `GOOGLE_PSE_API_KEY`.
94+
95+
```env
96+
GOOGLE_PSE_ENGINE_ID=your-google-cx-id
97+
```
98+
99+
> 🔑 How to get it: Visit [programmablesearchengine.google.com](https://programmablesearchengine.google.com/), create a search engine, and obtain the `cx` parameter.
100+
101+
---
102+
103+
## `FIRECRAWL_URL`
104+
105+
Sets the access URL for the [Firecrawl](https://firecrawl.dev/) API, used for web content scraping. Default value:
106+
107+
```env
108+
FIRECRAWL_URL=https://api.firecrawl.dev/v1
109+
```
110+
111+
> ⚙️ Usually does not need to be changed unless you’re using a self-hosted version or a proxy service.
112+
113+
---
114+
115+
## `TAVILY_SEARCH_DEPTH`
116+
117+
Configure the result depth for [Tavily](https://www.tavily.com/) searches.
118+
119+
```env
120+
TAVILY_SEARCH_DEPTH=basic
121+
```
122+
123+
Supported values:
124+
125+
* `basic`: Fast search, returns brief results;
126+
* `advanced`: Deep search, returns more context and web page details.
127+
128+
---
129+
130+
## `TAVILY_EXTRACT_DEPTH`
131+
132+
Configure how deeply Tavily extracts content from web pages.
133+
134+
```env
135+
TAVILY_EXTRACT_DEPTH=basic
136+
```
137+
138+
Supported values:
139+
140+
* `basic`: Extracts basic info like title and content summary;
141+
* `advanced`: Extracts structured data, lists, charts, and more from web pages.
142+
143+
---
144+
27145
## `SEARXNG_URL`
28146

29147
The URL of the SearXNG instance, which is a necessary configuration to enable the online search functionality. For example:

docs/self-hosting/advanced/online-search.zh-CN.mdx

Lines changed: 123 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,134 @@ tags:
1010

1111
# 配置联网搜索功能
1212

13-
LobeChat 支持为 AI 配置联网搜索功能,这使得 AI 能够获取最新的网络信息,从而提供更准确、更及时的回答。联网搜索功能基于 [SearXNG](https://github.com/searxng/searxng) 搜索引擎,它是一个尊重隐私的元搜索引擎,可以聚合多个搜索引擎的结果
13+
LobeChat 支持为 AI 配置**联网搜索功能**,使其能够实时获取互联网信息,从而提供更准确、最新的回答。联网搜索支持多个搜索引擎提供商,包括 [SearXNG](https://github.com/searxng/searxng)[Search1API](https://www.search1api.com)[Google](https://programmablesearchengine.google.com)[Brave](https://brave.com/search/api)
1414

15-
<Callout type={'info'}>
16-
SearXNG 是一个开源的元搜索引擎,可以自行部署,也可以使用公共实例。通过配置 SearXNG,LobeChat
17-
可以让 AI 获取最新的互联网信息,从而回答时效性问题、提供最新资讯。
15+
<Callout type="info">
16+
联网搜索可以让 AI 获取时效性内容,如最新新闻、技术动态或产品信息。你可以使用开源的 SearXNG 自行部署,也可以选择集成主流搜索引擎服务,如 Search1API、Google、Brave 等,根据你的使用场景自由组合。
1817
</Callout>
1918

19+
通过设置搜索服务环境变量 `SEARCH_PROVIDERS` 和对应的 API Key,LobeChat 将在多个搜索源中查询并返回结果。你还可以搭配配置爬虫服务环境变量 `CRAWLER_IMPLS`(如 `browserless``firecrawl``tavily` 等)以提取网页内容,实现搜索+阅读的增强能力。
20+
2021
# 核心环境变量
2122

23+
## `CRAWLER_IMPLS`
24+
25+
配置可用的网页爬虫,用于对网页进行结构化内容提取。
26+
27+
```env
28+
CRAWLER_IMPLS="native,search1api"
29+
```
30+
31+
支持的爬虫类型如下:
32+
33+
|| 说明 | 环境变量 |
34+
| ------------- | -------------------------------------------------------------------------------------- | -------------------------- |
35+
| `browserless` | 基于 [Browserless](https://www.browserless.io/) 的无头浏览器爬虫,适合渲染复杂页面。 | `BROWSERLESS_TOKEN` |
36+
| `exa` | 使用 [Exa](https://exa.ai/) 提供的爬虫能力,需申请 API。 | `EXA_API_KEY` |
37+
| `firecrawl` | [Firecrawl](https://firecrawl.dev/) 无头浏览器 API,适合现代网站抓取。 | `FIRECRAWL_API_KEY` |
38+
| `jina` | 使用 [Jina AI](https://jina.ai/) 的爬虫服务,支持快速提取摘要信息。 | `JINA_READER_API_KEY` |
39+
| `native` | 内置通用爬虫,适用于标准网页结构。 | |
40+
| `search1api` | 利用 [Search1API](https://www.search1api.com) 提供的页面抓取能力,适合结构化内容提取。 | `SEARCH1API_CRAWL_API_KEY` |
41+
| `tavily` | 使用 [Tavily](https://www.tavily.com/) 的网页抓取与摘要 API。 | `TAVILY_API_KEY` |
42+
43+
> 💡 设置多个爬虫可提升成功率,系统将根据优先级尝试不同爬虫。
44+
45+
---
46+
47+
## `SEARCH_PROVIDERS`
48+
49+
配置联网搜索使用的搜索引擎提供商。
50+
51+
```env
52+
SEARCH_PROVIDERS="searxng"
53+
```
54+
55+
支持的搜索引擎如下:
56+
57+
|| 说明 | 环境变量 |
58+
| ------------ | ---------------------------------------------------------------------------------------- | ------------------------------------------- |
59+
| `anspire` | 基于 [Anspire(安思派)](https://anspire.ai/) 提供的搜索服务。 | `ANSPIRE_API_KEY` |
60+
| `bocha` | 基于 [Bocha(博查)](https://open.bochaai.com/) 提供的搜索服务。 | `BOCHA_API_KEY` |
61+
| `brave` | [Brave](https://search.brave.com/help/api),隐私友好的搜索源。 | `BRAVE_API_KEY` |
62+
| `exa` | [Exa](https://exa.ai/),面向 AI 的搜索 API。 | `EXA_API_KEY` |
63+
| `firecrawl` | 支持 [Firecrawl](https://firecrawl.dev/) 提供的搜索服务。 | `FIRECRAWL_API_KEY` |
64+
| `google` | 使用 [Google Programmable Search Engine](https://programmablesearchengine.google.com/)| `GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID` |
65+
| `jina` | 使用 [Jina AI](https://jina.ai/) 提供的语义搜索服务。 | `JINA_READER_API_KEY` |
66+
| `kagi` | [Kagi](https://kagi.com/) 提供的高级搜索 API,需订阅 Key。 | `KAGI_API_KEY` |
67+
| `search1api` | 使用 [Search1API](https://www.search1api.com) 聚合搜索能力。 | `SEARCH1API_CRAWL_API_KEY` |
68+
| `searxng` | 使用自托管或公共 [SearXNG](https://searx.space/) 实例。 | `SEARXNG_URL` |
69+
| `tavily` | [Tavily](https://www.tavily.com/),快速网页摘要与答案返回。 | `TAVILY_API_KEY` |
70+
71+
> ⚠️ 某些搜索提供商需要单独申请 API Key,并在 `.env` 中配置相关凭证。
72+
73+
---
74+
75+
## `BROWSERLESS_URL`
76+
77+
指定 [Browserless](https://www.browserless.io/) 服务的 API 地址,用于执行网页爬取任务。Browserless 是一个基于无头浏览器(Headless Chrome)的浏览器自动化平台,适合处理需要渲染的动态页面。
78+
79+
```env
80+
BROWSERLESS_URL=https://chrome.browserless.io
81+
```
82+
83+
> 📌 通常需要搭配 `CRAWLER_IMPLS=browserless` 启用。
84+
85+
---
86+
87+
## `GOOGLE_PSE_ENGINE_ID`
88+
89+
配置 Google Programmable Search Engine(Google PSE)的搜索引擎 ID,用于限定搜索范围。需配合 `GOOGLE_PSE_API_KEY` 一起使用。
90+
91+
```env
92+
GOOGLE_PSE_ENGINE_ID=your-google-cx-id
93+
```
94+
95+
> 🔑 获取方式:访问 [programmablesearchengine.google.com](https://programmablesearchengine.google.com/),创建搜索引擎后获取 `cx` 参数值。
96+
97+
---
98+
99+
## `FIRECRAWL_URL`
100+
101+
设置 [Firecrawl](https://firecrawl.dev/) API 的访问地址。用于网页内容抓取,默认值如下:
102+
103+
```env
104+
FIRECRAWL_URL=https://api.firecrawl.dev/v1
105+
```
106+
107+
> ⚙️ 一般无需修改,除非你使用的是自托管版本或代理服务。
108+
109+
---
110+
111+
## `TAVILY_SEARCH_DEPTH`
112+
113+
配置 [Tavily](https://www.tavily.com/) 搜索的结果深度。
114+
115+
```env
116+
TAVILY_SEARCH_DEPTH=basic
117+
```
118+
119+
支持的值:
120+
121+
* `basic`: 快速搜索,返回简要结果;
122+
* `advanced`: 深度搜索,返回更多上下文和网页信息。
123+
124+
---
125+
126+
## `TAVILY_EXTRACT_DEPTH`
127+
128+
配置 Tavily 在抓取网页内容时的提取深度。
129+
130+
```env
131+
TAVILY_EXTRACT_DEPTH=basic
132+
```
133+
134+
支持的值:
135+
136+
* `basic`: 提取标题、正文摘要等基础信息;
137+
* `advanced`: 提取网页的结构化信息、列表、图表等更多内容。
138+
139+
---
140+
22141
## `SEARXNG_URL`
23142

24143
SearXNG 实例的 URL 地址,这是启用联网搜索功能的必要配置。例如:

0 commit comments

Comments
 (0)