Skip to content
119 changes: 119 additions & 0 deletions docs/tool/providers/custom_provider.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# How to Add a New Provider

This guide explains how to integrate a new storage provider (e.g., DropBox, OneDrive) into DocBinder-OSS. The process involves creating configuration and client classes, registering the provider, and ensuring compatibility with the system’s models and interfaces.

---

## 1. Create a Service Configuration Class

Each provider must define a configuration class that inherits from [`ServiceConfig`](src/docbinder_oss/services/base_class.py):

```python
# filepath: src/docbinder_oss/services/my_provider/my_provider_service_config.py
from docbinder_oss.services.base_class import ServiceConfig

class MyProviderServiceConfig(ServiceConfig):
type: str = "my_provider"
name: str
# Add any other provider-specific fields here
api_key: str
```

- `type` must be unique and match the provider’s identifier.
- `name` is a user-defined label for this provider instance.

---

## 2. Implement the Storage Client

Create a client class that inherits from [`BaseStorageClient`](src/docbinder_oss/services/base_class.py) and implements all abstract methods:

```python
# filepath: src/docbinder_oss/services/my_provider/my_provider_client.py
from typing import Optional, List
from docbinder_oss.services.base_class import BaseStorageClient
from docbinder_oss.core.schema import File, Permission
from .my_provider_service_config import MyProviderServiceConfig

class MyProviderClient(BaseStorageClient):
def __init__(self, config: MyProviderServiceConfig):
self.config = config
# Initialize SDK/client here

def test_connection(self) -> bool:
# Implement connection test
pass

def list_files(self, folder_id: Optional[str] = None) -> List[File]:
# Implement file listing
pass

def get_file_metadata(self, item_id: str) -> File:
# Implement metadata retrieval
pass

def get_permissions(self, item_id: str) -> List[Permission]:
# Implement permissions retrieval
pass
```

- Use the shared models [`File`](src/docbinder_oss/core/schemas.py), [`Permission`](src/docbinder_oss/core/schemas.py), etc., for return types.

---

## 3. Register the Provider

Add an `__init__.py` in your provider’s folder with a `register()` function:

```python
# filepath: src/docbinder_oss/services/my_provider/__init__.py
from .my_provider_client import MyProviderClient
from .my_provider_service_config import MyProviderServiceConfig

def register():
return {
"display_name": "my_provider",
"config_class": MyProviderServiceConfig,
"client_class": MyProviderClient,
}
```

---

## 4. Ensure Discovery

The system will automatically discover your provider if it’s in the `src/docbinder_oss/services/` directory and contains a `register()` function in `__init__.py`.

---

## 5. Update the Config File

Add your provider’s configuration to `~/.config/docbinder/config.yaml`:

```yaml
providers:
- type: my_provider
name: my_instance
# Add other required fields
api_key: <your-api-key>
```

---

## 6. Test Your Provider

- Run the application and ensure your provider appears and works as expected.
- The config loader will validate your config using your `ServiceConfig` subclass.

---

## Reference

- [src/docbinder_oss/services/base_class.py](src/docbinder_oss/services/base_class.py)
- [src/docbinder_oss/core/schemas.py](src/docbinder_oss/core/schemas.py)
- [src/docbinder_oss/services/google_drive/](src/docbinder_oss/services/google_drive/) (example implementation)
- [src/docbinder_oss/services/__init__.py](src/docbinder_oss/services/__init__.py)

---

**Tip:** Use the Google Drive as a template for your implementation. Make sure to follow the abstract method signatures and use the shared models for compatibility.
68 changes: 68 additions & 0 deletions docs/tool/providers/google_drive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Google Drive Configuration Setup

This guide will help you configure Google Drive as a provider for DocBinder.

## Prerequisites

- A Google account
- Access to [Google Cloud Console](https://console.cloud.google.com/)
- DocBinder installed

## Step 1: Create a Google Cloud Project

1. Go to the [Google Cloud Console](https://console.cloud.google.com/).
2. Click on **Select a project** and then **New Project**.
3. Enter a project name and click **Create**.

## Step 2: Enable Google Drive API

1. In your project dashboard, navigate to **APIs & Services > Library**.
2. Search for **Google Drive API**.
3. Click **Enable**.

## Step 3: Create OAuth 2.0 Credentials

1. Go to **APIs & Services > Credentials**.
2. Click **+ CREATE CREDENTIALS** and select **OAuth client ID**.
3. Configure the consent screen if prompted.
4. Choose **Desktop app** or **Web application** as the application type.
5. Enter a name and click **Create**.
6. Download the `credentials.json` file.

## Step 4: Configure DocBinder

1. Place your downloaded credentials file somewhere accessible (e.g., ~/gcp_credentials.json).
2. The application will generate a token file (e.g., ~/gcp_token.json) after the first authentication.

## Step 5: Edit the Config File

Create the config file, and add a provider entry for Google Drive:
```yaml
providers:
- type: google_drive
name: my_gdrive
gcp_credentials_json: ./gcp_credentials.json
gcp_token_json: ./gcp_token.json
```

* type: Must be google_drive.
* name: A unique name for this provider.
* gcp_credentials_json: Absolute/relative path to your Google Cloud credentials file.
* gcp_token_json: Absolute/relative path where the token will be stored/generated.

## Step 6: Authenticate and Test

1. Run DocBinder with the Google Drive provider enabled.
2. On first run, follow the authentication prompt to grant access.
3. Verify that DocBinder can access your Google Drive files.

## Troubleshooting

- Ensure your credentials file is in the correct location.
- Check that the Google Drive API is enabled for your project.
- Review the [Google API Console](https://console.developers.google.com/) for error messages.

## References

- [Google Drive API Documentation](https://developers.google.com/drive)
- [DocBinder Documentation](../README.md)
39 changes: 3 additions & 36 deletions src/docbinder_oss/cli/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from docbinder_oss.providers import create_provider_instance
from docbinder_oss.helpers.config import Config
from docbinder_oss.providers.base_class import BaseProvider
from docbinder_oss.helpers.writer import MultiFormatWriter

app = typer.Typer()

Expand Down Expand Up @@ -75,19 +76,8 @@ def search(
max_size=max_size,
)

if not export_format:
typer.echo(current_files)
return

elif export_format.lower() == "csv":
__write_csv(current_files, "search_results.csv")
typer.echo("Results written to search_results.csv")
elif export_format.lower() == "json":
__write_json(current_files, "search_results.json", flat=True) # or flat=False for grouped
typer.echo("Results written to search_results.json")
else:
typer.echo(f"Unsupported export format: {export_format}")
raise typer.Exit(code=1)
MultiFormatWriter.write(current_files, export_format)
return


def filter_files(
Expand Down Expand Up @@ -202,26 +192,3 @@ def __write_csv(files_by_provider, filename):
if isinstance(parents, list):
file_dict["parents"] = ";".join(str(p) for p in parents)
writer.writerow({fn: file_dict.get(fn, "") for fn in fieldnames})


def __write_json(files_by_provider, filename, flat=False):
with open(filename, "w") as jsonfile:
if flat:
all_files = []
for provider, files in files_by_provider.items():
for file in files:
file_dict = (
file.model_dump() if hasattr(file, "model_dump") else file.__dict__.copy()
)
file_dict["provider"] = provider
all_files.append(file_dict)
json.dump(all_files, jsonfile, default=str, indent=2)
else:
grouped = {
provider: [
file.model_dump() if hasattr(file, "model_dump") else file.__dict__.copy()
for file in files
]
for provider, files in files_by_provider.items()
}
json.dump(grouped, jsonfile, default=str, indent=2)
21 changes: 8 additions & 13 deletions src/docbinder_oss/core/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,36 +41,31 @@ class FileCapabilities(BaseModel):
class File(BaseModel):
"""Represents a file or folder"""

id: str
name: str
mime_type: str
kind: Optional[str]
id: str = Field(repr=True, description="Unique identifier for the file or folder.")
name: str = Field(
repr=True, description="Name of the file or folder. May not be unique."
)
mime_type: str = Field(repr=True, description="MIME type of the file or folder.")
kind: Optional[str] = Field(repr=True, description="Kind of the item, e.g., 'drive#file'.")

is_folder: bool = Field(False, description="True if the item is a folder, False otherwise.")

web_view_link: Optional[HttpUrl]
icon_link: Optional[HttpUrl]

created_time: Optional[datetime]
modified_time: Optional[datetime]
modified_time: Optional[datetime] = Field(repr=True, description="Last modified time of the file or folder.")

owners: Optional[List[User]]
owners: Optional[List[User]] = Field(repr=True, description="List of owners of the file or folder.")
last_modifying_user: Optional[User]

size: Optional[str] = Field(description="Size in bytes, as a string. Only populated for files.")
parents: Optional[List[str]] = Field(description="Parent folder IDs, if applicable.")

capabilities: Optional[FileCapabilities] = None

shared: Optional[bool]
starred: Optional[bool]
trashed: Optional[bool]

# Add full_path as an optional field for export/CLI assignment
full_path: Optional[str] = Field(
default=None, description="Full path of the file/folder, computed at runtime."
)

def __init__(self, **data: Any):
# Coerce parents to a list of strings or None
if "parents" in data:
Expand Down
92 changes: 92 additions & 0 deletions src/docbinder_oss/helpers/writer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import csv
import json
from abc import ABC, abstractmethod
from pathlib import Path
from typing import Any, Dict, List, Union
from pydantic import BaseModel
from rich import print

import logging


logger = logging.getLogger(__name__)


class Writer(ABC):
"""Abstract base writer class."""

@abstractmethod
def write(self, data: Any, file_path: Union[None, str, Path]) -> None:
"""Write data to file."""
pass


class MultiFormatWriter:
"""Factory writer that automatically detects format from file extension."""

_writers = {
'.csv': 'CSVWriter',
'.json': 'JSONWriter',
}

@classmethod
def write(cls, data: Any, file_path: Union[None, str, Path]) -> None:
"""Write data to file, format determined by extension."""
if file_path is None:
# If no file path is provided, write to console
ConsoleWriter().write(data)
return
path = Path(file_path)
extension = path.suffix.lower()

if extension not in cls._writers:
raise ValueError(f"Unsupported format: {extension}")

writer_class = globals()[cls._writers[extension]]
writer = writer_class()
writer.write(data, file_path)


class CSVWriter(Writer):
def get_fieldnames(self, data: Dict[str, List[BaseModel]]) -> List[str]:
fieldnames = next(iter(data.values()))[0].model_fields_set
return ["provider", *fieldnames]

def write(self, data: List[Dict], file_path: Union[str, Path]) -> None:
if not data:
logger.warning("No data to write to CSV.")
return

with open(file_path, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=self.get_fieldnames(data))
writer.writeheader()
for provider, items in data.items():
for item in items:
item_dict = item.model_dump() if isinstance(item, BaseModel) else item
item_dict['provider'] = provider
writer.writerow(item_dict)


class JSONWriter(Writer):
def write(self, data: Dict[str, List[BaseModel]], file_path: Union[str, Path]) -> None:
data = {
provider: [item.model_dump() for item in items]
for provider, items in data.items()
}
with open(file_path, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False, default=str)


class ConsoleWriter(Writer):
def write(self, data: Dict) -> None:
from rich.table import Table

table = Table(title="Files and Folders")
table.add_column("Provider", justify="right", style="cyan", no_wrap=True)
table.add_column("Id", style="magenta")
table.add_column("Name", style="magenta")
table.add_column("Kind", style="magenta")
for provider, items in data.items():
for item in items:
table.add_row(provider, item.id, item.name, item.kind)
print(table)
Loading
Loading