Skip to content
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,8 @@
"en/tools/search-research/youtubevideosearchtool",
"en/tools/search-research/tavilysearchtool",
"en/tools/search-research/tavilyextractortool",
"en/tools/search-research/valyusearchtool",
"en/tools/search-research/valyuextractortool",
"en/tools/search-research/arxivpapertool",
"en/tools/search-research/serpapi-googlesearchtool",
"en/tools/search-research/serpapi-googleshoppingtool",
Expand Down
2 changes: 2 additions & 0 deletions docs/en/concepts/tools.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,8 @@ Here is a list of the available tools and their descriptions:
| **PDFSearchTool** | A RAG tool aimed at searching within PDF documents, ideal for processing scanned documents. |
| **PGSearchTool** | A RAG tool optimized for searching within PostgreSQL databases, suitable for database queries. |
| **Vision Tool** | A tool for generating images using the DALL-E API. |
| **ValyuSearchTool** | A tool for unified search across web, academic, financial, and proprietary data sources. |
| **ValyuExtractorTool** | A tool for extracting clean, structured content from web pages with AI summarization. |
| **RagTool** | A general-purpose RAG tool capable of handling various data sources and types. |
| **ScrapeElementFromWebsiteTool** | Enables scraping specific elements from websites, useful for targeted data extraction. |
| **ScrapeWebsiteTool** | Facilitates scraping entire websites, ideal for comprehensive data collection. |
Expand Down
174 changes: 174 additions & 0 deletions docs/en/tools/search-research/valyuextractortool.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
title: "Valyu Extractor Tool"
description: "Extract clean, structured content from web pages using the Valyu API"
icon: square-poll-horizontal
mode: "wide"
---

The `ValyuExtractorTool` allows CrewAI agents to extract clean, structured content from web pages using the Valyu API. It can process single URLs or lists of URLs (up to 10) and provides options for controlling content length, extraction quality, screenshots, and AI-powered summarization.

## Installation

To use the `ValyuExtractorTool`, you need to install the `valyu` library:

```shell
pip install 'crewai[tools]' valyu
```

You also need to set your Valyu API key as an environment variable:

```bash
export VALYU_API_KEY='your-valyu-api-key'
```

Get an API key at https://platform.valyu.ai/ (sign up, then create a key from the dashboard).

## Example Usage

Here's how to initialize and use the `ValyuExtractorTool` within a CrewAI agent:

```python
import os
from crewai import Agent, Task, Crew
from crewai_tools import ValyuExtractorTool

# Ensure VALYU_API_KEY is set in your environment
# os.environ["VALYU_API_KEY"] = "YOUR_API_KEY"

# Initialize the tool
valyu_extractor = ValyuExtractorTool()

# Create an agent that uses the tool
extractor_agent = Agent(
role='Web Content Extractor',
goal='Extract key information from specified web pages',
backstory='You are an expert at extracting relevant content from websites using the Valyu API.',
tools=[valyu_extractor],
verbose=True
)

# Define a task for the agent
extract_task = Task(
description='Extract the main content from the URL https://example.com.',
expected_output='A JSON string containing the extracted content from the URL.',
agent=extractor_agent
)

# Create and run the crew
crew = Crew(
agents=[extractor_agent],
tasks=[extract_task],
verbose=True
)

result = crew.kickoff()
print(result)
```

## Configuration Options

The `ValyuExtractorTool` accepts the following arguments:

- `urls` (Union[List[str], str]): **Required**. A single URL string or a list of URL strings to extract data from. Maximum 10 URLs per request.
- `response_length` (Literal["short", "medium", "large", "max"], optional): Content length per result. `"short"` (25K chars), `"medium"` (50K), `"large"` (100K), or `"max"` (unlimited). Defaults to `"short"`.
- `extract_effort` (Literal["normal", "high", "auto"], optional): Processing quality level. Use `"normal"` for fastest extraction, `"high"` for better quality, or `"auto"` for automatic selection. Defaults to `"normal"`.
- `screenshot` (bool, optional): Whether to request page screenshots as pre-signed URLs. Defaults to `False`.
- `summary` (Union[bool, str], optional): Enable AI-powered summarization. Pass `True` for default summary, or a string with custom instructions. Defaults to `False`.

## Advanced Usage

### Multiple URLs with High-Quality Extraction

```python
# Example with multiple URLs and high extraction effort
multi_extract_task = Task(
description='Extract content from https://example.com and https://anotherexample.org.',
expected_output='A JSON string containing the extracted content from both URLs.',
agent=extractor_agent
)

# Configure the tool with custom parameters
custom_extractor = ValyuExtractorTool(
extract_effort='high',
response_length='medium',
screenshot=True
)

agent_with_custom_tool = Agent(
role="Advanced Content Extractor",
goal="Extract comprehensive content with screenshots",
tools=[custom_extractor]
)
```

### AI-Powered Summarization

```python
# Initialize with AI summarization
summarizing_extractor = ValyuExtractorTool(
summary=True, # Enable default summarization
response_length='large'
)

# Or with custom summarization instructions
custom_summary_extractor = ValyuExtractorTool(
summary="Extract key points and main arguments from the article",
response_length='medium'
)

summarizer_agent = Agent(
role="Content Summarizer",
goal="Extract and summarize web content",
tools=[custom_summary_extractor]
)
```

### Tool Parameters

You can customize the tool's behavior by setting parameters during initialization:

```python
# Initialize with custom configuration
extractor_tool = ValyuExtractorTool(
extract_effort='high', # Better quality extraction
response_length='medium', # 50K character limit
screenshot=True, # Include page screenshots
summary=True # Enable AI summarization
)
```

## Features

- **Single or Multiple URLs**: Extract content from one URL or process up to 10 URLs in a single request
- **Configurable Quality**: Choose between normal (fast) and high (comprehensive) extraction modes
- **Flexible Content Length**: Control response size from 25K to unlimited characters
- **Screenshot Support**: Optionally capture page screenshots as pre-signed URLs
- **AI Summarization**: Get AI-powered summaries with default or custom instructions
- **Clean Markdown Output**: Returns well-formatted markdown content
- **Structured Output**: Returns well-formatted JSON containing the extracted content
- **Error Handling**: Robust handling of network timeouts and extraction errors

## Response Format

The tool returns a JSON string representing the structured data extracted from the provided URL(s).

Common response elements include:
- **title**: The page title
- **url**: The processed URL
- **content**: Main text content in markdown format (or structured JSON if using schema)
- **description**: Page meta description
- **source**: Source identifier
- **price**: Cost for this extraction
- **length**: Character count of extracted content
- **screenshot_url**: Pre-signed screenshot URL (when `screenshot=True`)

## Use Cases

- **Content Analysis**: Extract and analyze content from competitor websites
- **Research**: Gather structured data from multiple sources for analysis
- **Content Migration**: Extract content from existing websites for migration
- **Monitoring**: Regular extraction of content for change detection
- **Data Collection**: Systematic extraction of information from web sources
- **Summarization**: Get AI-powered summaries of lengthy articles

Refer to the [Valyu API documentation](https://docs.valyu.ai/api-reference/endpoint/contents) for detailed information about the response structure and available options.
152 changes: 152 additions & 0 deletions docs/en/tools/search-research/valyusearchtool.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
title: "Valyu Search Tool"
description: "Search across the public web, academic, financial, healthcare, biomedical and proprietary data sources using the Valyu API"
icon: "magnifying-glass"
mode: "wide"
---

The `ValyuSearchTool` provides an interface to the Valyu Search API, enabling CrewAI agents to search across web, academic, financial, healthcare, biomedical and proprietary data sources through a single API. It allows for specifying search types, relevance thresholds, date ranges, included/excluded sources, and response lengths.

## Installation

To use the `ValyuSearchTool`, you need to install the `valyu` library:

```shell
pip install 'crewai[tools]' valyu
```

## Environment Variables

Ensure your Valyu API key is set as an environment variable:

```bash
export VALYU_API_KEY='your_valyu_api_key'
```

Get an API key at https://platform.valyu.ai/ (sign up, then create a key from the dashboard).

## Example Usage

Here's how to initialize and use the `ValyuSearchTool` within a CrewAI agent:

```python
import os
from crewai import Agent, Task, Crew
from crewai_tools import ValyuSearchTool

# Ensure the VALYU_API_KEY environment variable is set
# os.environ["VALYU_API_KEY"] = "YOUR_VALYU_API_KEY"

# Initialize the tool
valyu_tool = ValyuSearchTool()

# Create an agent that uses the tool
researcher = Agent(
role='Research Analyst',
goal='Find comprehensive information from web and academic sources',
backstory='An expert research analyst with access to diverse data sources.',
tools=[valyu_tool],
verbose=True
)

# Create a task for the agent
research_task = Task(
description='Search for the latest research on large language models.',
expected_output='A comprehensive report summarizing key findings from web and academic sources.',
agent=researcher
)

# Form the crew and kick it off
crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=True
)

result = crew.kickoff()
print(result)
```

## Configuration Options

The `ValyuSearchTool` accepts the following arguments during initialization or when calling the `run` method:

- `query` (str): **Required**. The search query string.
- `search_type` (Literal["all", "web", "proprietary", "news"], optional): The type of search to perform. Defaults to `"all"`.
- `max_num_results` (int, optional): The maximum number of search results to return (1-20). Defaults to `10`.
- `relevance_threshold` (float, optional): Minimum relevance score for results (0.0-1.0). Higher values return more relevant results. Defaults to `0.5`.
- `included_sources` (Sequence[str], optional): A list of specific sources or domains to include in the search. Defaults to `None`.
- `excluded_sources` (Sequence[str], optional): A list of specific sources or domains to exclude from the search. Defaults to `None`.
- `start_date` (str, optional): Start date for filtering results (YYYY-MM-DD format). Defaults to `None`.
- `end_date` (str, optional): End date for filtering results (YYYY-MM-DD format). Defaults to `None`.
- `response_length` (Literal["short", "medium", "large", "max"], optional): Content length per result. Defaults to `"short"`.
- `country_code` (str, optional): 2-letter ISO country code to bias results geographically. Defaults to `None`.

## Advanced Usage

You can configure the tool with custom parameters:

```python
# Example: Initialize with specific parameters for academic research
academic_valyu_tool = ValyuSearchTool(
search_type='proprietary',
max_num_results=15,
relevance_threshold=0.7,
response_length='medium'
)

# The agent will use these defaults
agent_with_custom_tool = Agent(
role="Academic Researcher",
goal="Find high-quality academic and research content",
tools=[academic_valyu_tool]
)
```

```python
# Example: Initialize for news search with date filtering
news_valyu_tool = ValyuSearchTool(
search_type='news',
max_num_results=10,
start_date='2024-01-01',
end_date='2024-12-31'
)

# The agent will use these defaults
news_agent = Agent(
role="News Analyst",
goal="Monitor and analyze recent news articles",
tools=[news_valyu_tool]
)
```

## Features

- **Unified Search**: Access web, academic, financial, and proprietary data sources through a single API
- **Multiple Search Types**: Choose between "all", "web", "proprietary", or "news" search modes
- **Academic Content**: Access research papers from arXiv, PubMed, medRxiv, BioRxiv and other scholarly sources
- **Financial Data**: Search SEC filings, market data, and financial reports
- **Relevance Control**: Fine-tune result quality with configurable relevance thresholds
- **Date Filtering**: Limit results to specific time periods
- **Source Control**: Include or exclude specific domains and data sources
- **Geographic Targeting**: Bias results based on country codes

## Data Sources

Valyu provides access to diverse data sources including:

- **Web**: Real-time web search across the internet
- **Academic**: arXiv, PubMed, scholarly papers, and research publications
- **Financial**: Stock prices, SEC filings, market metrics, and financial reports
- **Medical**: Peer-reviewed literature, clinical trials, FDA drug labels
- **Proprietary**: Licensed datasets and specialized content providers

## Response Format

The tool returns search results as a JSON string containing:
- Search results with titles, URLs, and content snippets
- Relevance scores for each result
- Source attribution and metadata
- Full-text content based on the configured response length

Content for each result is returned based on the `response_length` parameter to balance between comprehensive information and context window efficiency.
6 changes: 6 additions & 0 deletions lib/crewai-tools/src/crewai_tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,10 @@
TavilyExtractorTool,
)
from crewai_tools.tools.tavily_search_tool.tavily_search_tool import TavilySearchTool
from crewai_tools.tools.valyu_extractor_tool.valyu_extractor_tool import (
ValyuExtractorTool,
)
from crewai_tools.tools.valyu_search_tool.valyu_search_tool import ValyuSearchTool
from crewai_tools.tools.txt_search_tool.txt_search_tool import TXTSearchTool
from crewai_tools.tools.vision_tool.vision_tool import VisionTool
from crewai_tools.tools.weaviate_tool.vector_search import WeaviateVectorSearchTool
Expand Down Expand Up @@ -281,6 +285,8 @@
"TXTSearchTool",
"TavilyExtractorTool",
"TavilySearchTool",
"ValyuExtractorTool",
"ValyuSearchTool",
"VisionTool",
"WeaviateVectorSearchTool",
"WebsiteSearchTool",
Expand Down
6 changes: 6 additions & 0 deletions lib/crewai-tools/src/crewai_tools/tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,11 @@
from crewai_tools.tools.tavily_extractor_tool.tavily_extractor_tool import (
TavilyExtractorTool,
)
from crewai_tools.tools.valyu_extractor_tool.valyu_extractor_tool import (
ValyuExtractorTool,
)
from crewai_tools.tools.tavily_search_tool.tavily_search_tool import TavilySearchTool
from crewai_tools.tools.valyu_search_tool.valyu_search_tool import ValyuSearchTool
from crewai_tools.tools.txt_search_tool.txt_search_tool import TXTSearchTool
from crewai_tools.tools.vision_tool.vision_tool import VisionTool
from crewai_tools.tools.weaviate_tool.vector_search import WeaviateVectorSearchTool
Expand Down Expand Up @@ -264,6 +268,8 @@
"TXTSearchTool",
"TavilyExtractorTool",
"TavilySearchTool",
"ValyuExtractorTool",
"ValyuSearchTool",
"VisionTool",
"WeaviateVectorSearchTool",
"WebsiteSearchTool",
Expand Down
Loading