-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Problem Statement
I might want to specify custom weight like "relevancy" and that should be passed to formed data point objects and templated into the model
Then i should be able to use it on search side
Proposed Solution
At ingestion time we currently have a function cognee.add that takes documents.
We want to enable it to take an additional argument:
importance_weight: a float in the range [0–1] (default = 0.5)
Ingestion + Cognification
Importance weights should be stored on both Document and DocumentChunk datapoints.
Chunks should always inherit the weight from their parent document.
Ideally, there should also be a memify task to propagate the weights to other nodes and edges in the graph
To be discussed: propagation strategy, decay, etc.
Retrievers
All retrievers should be tweaked to take advantage of the weights. In particular:
get_triplets_function should be modified to account for the weights (triplet score is currently the sum of three scores — update to a weighted sum)
The RAG/chunks retriever should be adjusted so that the weights can influence ranking (single score should be multiplied by the weight)
Follow up: two phase retrievers
Raw embeddings for prefiltering retrieval
Importance weighted reranking of the prefiltered triplets
We already have a limit, explore how it can be used and should it be modified?
Potential follow-up: abstract away the aggregating strategy and make it interchangeable, explore ranking-correlation-based combining of multiple aggregations
Notes
Backward compatibility: Retrievers should safely attempt to read the weight; if missing, assume 0.5. This ensures compatibility with old graphs.
Alternatives Considered
No response
Use Case
Adding priority to specific files and allowing you to determine importance yourself
Implementation Ideas
No response
Additional Context
No response
Pre-submission Checklist
- I have searched existing issues to ensure this feature hasn't been requested already
- I have provided a clear problem statement and proposed solution
- I have described my specific use case