-
Notifications
You must be signed in to change notification settings - Fork 394
Implement determinant-diversity search in the new post-processing pipeline #858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
e116635
855d673
ecd81bb
f3d308a
4a74343
2729ac7
f764aa4
ab56c93
5cad020
3d79d6d
b8c952e
b1662c1
2149388
0540f42
6a1872d
750d542
68e4bfd
48cd833
c0f2053
054398b
bb3d60a
6b8b1f5
5e2db52
349d0b6
749d59b
008c09b
669dec1
46f05b3
6dec4bc
1cc2816
2ad1ed4
0ee76b1
5fef991
ca84f41
0fcfde6
7ee952b
bc11046
4d44eb9
cbb2deb
1f688bb
635012d
3145904
58579bb
329793b
595e949
f4bc307
43ccd91
3ff406d
55d3a3e
75f9aaa
58e8917
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -397,7 +397,7 @@ mod tests { | |
| let _: Type<f32> = value.convert::<DataType, _>().unwrap(); | ||
|
|
||
| // An invalid match should return an error. | ||
| let value = Any::new(0usize, "random-rag"); | ||
| let value = Any::new(0usize, "random-determinant-diversity"); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Did find and replace get a little aggressive? |
||
| let err = value.convert::<DataType, Type<f32>>().unwrap_err(); | ||
| let msg = err.to_string(); | ||
| assert!(msg.contains("invalid dispatch"), "{}", msg); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| { | ||
| "search_directories": [ | ||
| "C:/data/openai" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this follow the pattern of the other examples and use a relative path? I know it's designed to run on OpenAI, but it's an example. Shouldn't it be able to run on test-data? |
||
| ], | ||
| "jobs": [ | ||
| { | ||
| "type": "disk-index", | ||
| "content": { | ||
| "source": { | ||
| "disk-index-source": "Load", | ||
| "data_type": "float32", | ||
| "load_path": "C:/data/openai/openai_index_normal" | ||
| }, | ||
| "search_phase": { | ||
| "queries": "openai_query.bin", | ||
| "groundtruth": "openai_gt_50.bin", | ||
| "search_list": [100, 200, 400], | ||
| "beam_width": 4, | ||
| "recall_at": 10, | ||
| "num_threads": 8, | ||
| "is_flat_search": false, | ||
| "distance": "squared_l2", | ||
| "vector_filters_file": null | ||
| } | ||
| } | ||
| }, | ||
| { | ||
| "type": "disk-index", | ||
| "content": { | ||
| "source": { | ||
| "disk-index-source": "Load", | ||
| "data_type": "float32", | ||
| "load_path": "C:/data/openai/openai_index_normal" | ||
| }, | ||
| "search_phase": { | ||
| "queries": "openai_query.bin", | ||
| "groundtruth": "openai_gt_50.bin", | ||
| "search_list": [100, 200, 400], | ||
| "beam_width": 4, | ||
| "recall_at": 10, | ||
| "num_threads": 8, | ||
| "is_flat_search": false, | ||
| "distance": "squared_l2", | ||
| "vector_filters_file": null, | ||
| "is_determinant_diversity_search": true, | ||
| "determinant_diversity_eta": 0.01, | ||
| "determinant_diversity_power": 2.0 | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think of a way we can get rid of
KNNWithPostProcessor. Generally, decorating structs like this (KNNWithPostProcessor) is not a viable long-term strategy. It is not extendable and does not set a good pattern for other implementations to follow. For example, will we then need aRangeSearchWithPostProcessor?MultiHopSearchWithPostProcessor?Maybe something like