Skip to content
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
<?php

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the copyright, this code sample is meant to be used by other freely

* @copyright Copyright (C) Ibexa AS. All rights reserved.
* @license For full copyright and license information view LICENSE file distributed with this source code.
*/
declare(strict_types=1);

namespace Ibexa\Taxonomy;

use Ibexa\Contracts\Core\Repository\SearchService;
use Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQueryBuilder;
use Ibexa\Contracts\Core\Repository\Values\Content\Query\Criterion\ContentTypeIdentifier;
use Ibexa\Contracts\Core\Repository\Values\Content\Search\SearchHit;
use Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding;
use Symfony\Component\Console\Attribute\AsCommand;
use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;

#[AsCommand(
name: 'ibexa:taxonomy:find-by-embedding',
description: 'Finds content using a taxonomy embedding query.'
)]
final class FindByTaxonomyEmbeddingCommand extends Command
{
public function __construct(private readonly SearchService $searchService)
{
parent::__construct();
}

protected function execute(
InputInterface $input,
OutputInterface $output
): int {
$io = new SymfonyStyle($input, $output);

// Example embedding vector.
// In a real-life scenario, generate it with an embedding provider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should show people how to do this.

Can we just inject embeddingProviderResolver here, and do:

            $embeddingProvider = $this->embeddingProviderResolver->resolve();
            $embeddings = $embeddingProvider->getEmbedding('example_content');

(and private EmbeddingProviderResolverInterface $embeddingProviderResolver) in the constructor

// and make sure its dimensions match the configured model.
$vector = [
0.0123,
-0.9876,
0.4567,
0.1111,
];

$query = EmbeddingQueryBuilder::create()
->withEmbedding(new TaxonomyEmbedding($vector))
->setFilter(new ContentTypeIdentifier('article'))
->setLimit(10)
->setOffset(0)
->setPerformCount(true)
->build();

$result = $this->searchService->findContent($query);

$io->success(sprintf('Found %d items.', $result->totalCount));

foreach ($result->searchHits as $searchHit) {
assert($searchHit instanceof SearchHit);

/** @var \Ibexa\Contracts\Core\Repository\Values\Content\Content $content */
$content = $searchHit->valueObject;
$contentInfo = $content->versionInfo->contentInfo;

$io->writeln(sprintf(
'%d: %s',
$contentInfo->id,
$contentInfo->name
));
}

return self::SUCCESS;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<?php

/**
* @copyright Copyright (C) Ibexa AS. All rights reserved.
* @license For full copyright and license information view LICENSE file distributed with this source code.
*/
declare(strict_types=1);

namespace Ibexa\Taxonomy;

use Ibexa\Contracts\Core\Repository\SearchService;
use Ibexa\Contracts\Core\Repository\Values\Content\Content;
use Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQueryBuilder;
use Ibexa\Contracts\Core\Repository\Values\Content\Search\SearchResult;
use Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding;

final class TaxonomyEmbeddingSearchService
{
public function __construct(private readonly SearchService $searchService)
{
}

/**
* @param float[] $vector
*
* @return SearchResult<Content>
*/
public function searchByEmbedding(array $vector): SearchResult
{
$query = EmbeddingQueryBuilder::create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is the lack of content type filter?
I'd combine these two into one sample, to make the maintenence easier for us in the future

->withEmbedding(new TaxonomyEmbedding($vector))
->setLimit(10)
->setOffset(0)
->build();

return $this->searchService->findContent($query);
}
}
2 changes: 1 addition & 1 deletion docs/content_management/content_api/managing_content.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ $this->trashService->recover($trashItem, $newParent);
```

You can also search through Trash items and sort the results using several public PHP API Search Criteria and Sort Clauses that have been exposed for `TrashService` queries.
For more information, see [Searching in trash](search_api.md#searching-in-trash).
For more information, see [Search in trash](search_api.md#search-in-trash).

## Content types

Expand Down
2 changes: 1 addition & 1 deletion docs/release_notes/ez_platform_v3.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ A customizable search controller has been extracted and placed in `ezplatform-se

You can now search through the contents of Trash and sort the search results based on a number of Search Criteria and Sort Clauses that can be used by the `\eZ\Publish\API\Repository\TrashService::findTrashItems` method only.

For more information, see [Searching in trash](https://doc.ibexa.co/en/latest/api/public_php_api_search/#searching-in-trash).
For more information, see [Search in trash](https://doc.ibexa.co/en/latest/api/public_php_api_search/#search-in-trash).

### Repository filtering

Expand Down
50 changes: 50 additions & 0 deletions docs/search/embeddings_reference/embeddings_reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link to the search_api page for an example, or just include the example here as well.

In general, there's a lot of duplicated content between these two pages (this one and search_api.md), and it's hard to say which one is the primary one - search api links here for more information, but it's search_api.md that contains more information 🤔

This one does not include:

  1. Description of EmbeddingQueryBuilder
  2. Examples

Do we need this page?

month_change: true
description: Embedding queries, embedding configuration, providers, and embedding search fields
---

# Embeddings search reference

Embeddings provide vector representations of content or text, enabling semantic similarity search.
Foundational abstractions are provided for embedding-based search, while embedding providers generate vector representations.

## EmbeddingQuery

- [`Ibexa\Contracts\Core\Repository\Values\Content\EmbeddingQuery`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-EmbeddingQuery.html): Represents a semantic similarity search request.
It encapsulates an [Embedding](#embedding) instance and supports filtering, pagination, aggregations, and result counting through the same API as standard content queries.
Embedding queries do not support criteria, Sort Clauses, facet builders, or spellcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd mention the search engines where they are available.

Solr 9, ES, but not Legacy Search?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faceted searched is removed in v5, and deprecated in v4.6

Let's replace it with aggregations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding queries do not support criteria

What does it mean? We say that it supports filtering, and filtering is defined with search criteria?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, better would be to write "Criteria are not used to express the embedding similarity part of the query." or "Embedding queries do not use criteria for similarity itself; similarity is defined by the embedded query vector, while additional filtering can still be applied through the query filter."


## Embedding

- [`Ibexa\Contracts\Core\Repository\Values\Content\Query\Embedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-Query-Embedding.html): Represents the vector input used
for similarity search.
It stores embedding values as float arrays, while providers generate those vectors from text input

## Embedding providers

Embedding providers generate vector representations for inputs.
Out of the box, embedding search integration is provided for TaxonomyEmbedding.
If you use a custom embedding value type, implement matching embedding
visitors for your search engine (Solr/Elasticsearch).
Otherwise, query execution may fail with "No visitor available".

### Provider contracts

- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html): Generates embeddings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html): Generates embeddings
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderInterface.html): Generates embeddings for the provided text


- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderRegistryInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderRegistryInterface.html): Lists available embedding providers or get one by its identifier


- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html): Resolves the provider for a given embedding configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how? Looking at https://github.com/ibexa/core/blob/d01b203204e4f9bc3bdaced92ade82eff0afbdd3/src/lib/Search/Embedding/EmbeddingProviderResolver.php#L17 , I'd say it's for the default configuration.

Suggested change
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html): Resolves the provider for a given embedding configuration
- [`Ibexa\Contracts\Core\Search\Embedding\EmbeddingProviderResolverInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-Embedding-EmbeddingProviderResolverInterface.html): Resolves the embedding provider based on system configuration

You can also mention the default_embedding_model configuration property here, as this is what's returned.


## Embedding fields

- [`Ibexa\Contracts\Core\Search\FieldType\EmbeddingFieldFactory`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Search-FieldType-EmbeddingFieldFactory.html): Creates dedicated search fields that store embedding vectors

## Validation

- [`Ibexa\Contracts\Core\Repository\Values\Content\QueryValidatorInterface`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Core-Repository-Values-Content-QueryValidatorInterface.html): Validates embedding queries before they reach the search engine

!!! note "Taxonomy embeddings"

Searching for embeddings can be used to support the [Taxonomy suggestions](taxonomy.md#taxonomy-suggestions) feature.
The [`Ibexa\Contracts\Taxonomy\Search\Query\Value\TaxonomyEmbedding`](/api/php_api/php_api_reference/classes/Ibexa-Contracts-Taxonomy-Search-Query-Value-TaxonomyEmbedding.html) allows embedding queries to target taxonomy data.
Loading
Loading