Skip to content

[Feature]: Add ability to pass weights via SDK #1768

@Vasilije1990

Description

@Vasilije1990

Problem Statement

I might want to specify custom weight like "relevancy" and that should be passed to formed data point objects and templated into the model

Then i should be able to use it on search side

Proposed Solution

At ingestion time we currently have a function cognee.add that takes documents.

We want to enable it to take an additional argument:

importance_weight: a float in the range [0–1] (default = 0.5)

Ingestion + Cognification

Importance weights should be stored on both Document and DocumentChunk datapoints.

Chunks should always inherit the weight from their parent document.

Ideally, there should also be a memify task to propagate the weights to other nodes and edges in the graph

To be discussed: propagation strategy, decay, etc.

Retrievers

All retrievers should be tweaked to take advantage of the weights. In particular:

get_triplets_function should be modified to account for the weights (triplet score is currently the sum of three scores — update to a weighted sum)

The RAG/chunks retriever should be adjusted so that the weights can influence ranking (single score should be multiplied by the weight)

Follow up: two phase retrievers

Raw embeddings for prefiltering retrieval

Importance weighted reranking of the prefiltered triplets

We already have a limit, explore how it can be used and should it be modified?

Potential follow-up: abstract away the aggregating strategy and make it interchangeable, explore ranking-correlation-based combining of multiple aggregations

Notes

Backward compatibility: Retrievers should safely attempt to read the weight; if missing, assume 0.5. This ensures compatibility with old graphs.

Alternatives Considered

No response

Use Case

Adding priority to specific files and allowing you to determine importance yourself

Implementation Ideas

No response

Additional Context

No response

Pre-submission Checklist

  • I have searched existing issues to ensure this feature hasn't been requested already
  • I have provided a clear problem statement and proposed solution
  • I have described my specific use case

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions