Releases: pinecone-io/pinecone-python-client
Release v3.1.0
Listing vector ids by prefix in a namespace (for serverless indexes)
We've implemented SDK support for a new data plane endpoint used to list ids by prefix in a given namespace. If the prefix empty string is passed, this can be used to list all ids in a namespace.
The index client now has list and list_paginated. With clever assignment of vector ids, this can be used to help model hierarchical relationships between different vectors such as when you have embeddings for multiple chunks or fragments related to the same document.
The list method returns a generator that handles pagination on your behalf.
from pinecone import Pinecone
pc = Pinecone(api_key='xxx')
index = pc.Index(host='hosturl')
# To iterate over all result pages using a generator function
for ids in index.list(prefix='pref', limit=3, namespace=namespace):
print(ids) # ['pref1', 'pref2', 'pref3']
# Now you can pass this id array to other methods, such as fetch or delete.
vectors = index.fetch(ids=ids, namespace=namespace)There is also an option to fetch each page of results yourself with list_paginated.
from pinecone import Pinecone
pc = Pinecone(api_key='xxx')
index = pc.Index(host='hosturl')
namespace = 'foo-namespace'
# For manual control over pagination
results = index.list_paginated(
prefix='pref',
limit=3,
namespace='foo',
pagination_token='eyJza2lwX3Bhc3QiOiI5IiwicHJlZml4IjpudWxsfQ=='
)
print(results.namespace) # 'foo'
print([v.id for v in results.vectors]) # ['pref1', 'pref2', 'pref3']
print(results.pagination.next) # 'eyJza2lwX3Bhc3QiOiI5IiwicHJlZml4IjpudWxsfQ=='
print(results.usage) # { 'read_units': 1 }Python 3.11 and 3.12 support
We made an adjustment to our declared python version support (from python >=3.8,<3.13 to ^3.8) to make it easier for tools with more expansive statements on what python versions they support to include the pinecone sdk as a dependency. Alongside this change, we expanded our test matrix to include more robust testing with python versions 3.11 and 3.12. Python 3.13 is still in alpha and is not yet part of our test matrix.
- Adjust supported python versions to ^3.8 by @jhamon in #312
- Update pytest-timeout to support python >= 3.12 by @mjvankampen in #314
Chores
- Sync models from pinecone-protos by @fsxfreak in #315
- Fix minor README docs issues in client reference by @austin-denoble in #316
New Contributors
- @mjvankampen made their first contribution in #314
Full Changelog: v3.0.3...v3.1.0
Release v3.0.3
Fixes
- gRPC: parse_query_response: Skip parsing empty Usage by @daverigby in #301
- Support overriding
additional_headerswithPINECONE_ADDITIONAL_HEADERSenvironment variable by @fsxfreak in #304 - upsert_from_dataframe: Hide all progressbars if !show_progress by @daverigby in #310
Chores
- Update github actions dependencies to fix warnings by @jhamon in #308
- Update generated openapi code by @jhamon in #309
New Contributors
Full Changelog: v3.0.2...v3.0.3
Release v3.0.2
Fixes
Create indexes using source_collection option in PodSpec
This release resolves a bug when passing source_collection as part of the PodSpec. This option is used when creating a new index from vector data stored in a collection. The value of this field should be a collection you have created previously from an index and that shows with pc.list_collections(). Currently collections and pod-based indexes are not portable across environments.
from pinecone import Pinecone
pc = Pinecone(api_key='YOUR_API_KEY')
pc.create_index(
name='my-index',
dimension=1536,
metric='cosine',
spec=PodSpec(
environment='us-east1-gcp',
source_collection='collection-2024jan16',
)
)Pass optional GRPCClientConfig when using PineconeGRPC
This could be considered as a fix for a UX bug or a micro-feature, depending on your perspective. In 3.0.2 we updated the pc.Index helper method that is used to build instances of the GRPCIndex class. It now accepts an optional keyword param grpc_config. Before this fix, you would need to import GRPCIndex and instantiate GRPCIndex yourself in order to pass this configuration and customize some settings, which was a bit clunky.
from pinecone.grpc import PineconeGRPC, GRPCClientConfig
pc = PineconeGRPC(api_key='YOUR_API_KEY')
grpc_config = GRPCClientConfig(
timeout=10,
secure=True,
reuse_channel=True
)
index = pc.Index(
name='my-index',
host='host',
grpc_config=grpc_config
)
# Now do data operations
index.upsert(...)Pass optional pool_threads config on the index.
Similar to the grpc_config option, some people requested the ability to pass pool_threads when targeting an index rather than in the initial client initialization. Now the optional configuration is accepted in both places, with the value passed to .Index() taking precedence.
Now these are both valid approaches:
from pinecone import Pinecone
pc = Pinecone(api_key='key', pool_threads=5)
pc.Index(host='host')
pc.upsert(...)from pinecone import Pinecone
pc = Pinecone(api_key='key')
index = pc.Index(host='host', pool_threads=5)
index.upsert(...)Debugging
This is probably only relevant for internal Pinecone employees or support agents, but the index client now accepts configuration to attach additional headers to each data plane request. This can help with tracing requests in logs.
from pinecone import Pinecone
pc = Pinecone(api_key='xxx')
index = pc.Index(
host='hosturl',
additional_headers={ 'header-1': 'header-1-value' }
)
# Now do things
index.upsert(...)The equivalent concept for PineconeGRPC is to pass additional_metadata. gRPC metadata fill a similar role as HTTP request headers, and should not be confused with metadata associated with vectors stored in your Pinecone indexes.
from pinecone.grpc import PineconeGRPC, GRPCClientConfig
pc = PineconeGRPC(api_key='YOUR_API_KEY')
grpc_config = GRPCClientConfig(additional_metadata={'extra-header': 'value123'})
index = pc.Index(
name='my-index',
host='host',
grpc_config=grpc_config
)
# do stuff
index.upsert(...)Changelog
- README.md: Update install steps to escape brackets by @daverigby in #298
- Expose missing configurations for
grpc_configandpool_threadsby @jhamon in #296 - Integration tests for collections by @jhamon in #299
- Optional configs to pass
additional_headers/additional_metadatato indexes by @jhamon in #297
New Contributors
- @daverigby made their first contribution in #298
Full Changelog: v3.0.1...v3.0.2
Release v3.0.1
This is a quick follow-up to the v3.0.0 release earlier this week. This release adds improved error messages to help guide people on how to address some of the breaking changes in v3, such as the migration of core functionality from attributes on the pinecone module into methods of the Pinecone class.
If you're updating from v2.2.x from the first time, you will still want to checkout the v3.0.0 Migration Guide for a walkthrough of all the new features and changes. All of that information is still accurate for this release.
Release v3.0.0
- Existing users will want to checkout the v3.0.0 Migration Guide for a walkthrough of all the new features and changes.
- New users should start with the README and Reference Docs
Serverless indexes are currently in public preview, so make sure to review the current limitations and test thoroughly before using in production.
Changes overview
- Deploy Pineconeโs new serverless indexes. The
create_indexmethod has been refactored to accept aPodSpecorServerlessSpecdepending on how you would like to deploy your index. Many old properties such aspod_type,replicas, etc are moved intoPodSpecsince they do not apply to serverless indexes. - Understand cost. The quantity of read units consumed by each serverless
queryandfetchcall are now returned with the response. - Flexible API Keys. The v3.0.0 Python SDK is consuming the new Control Plane API hosted at
https://api.pinecone.io/. This new API allows for a lot more flexibility in how API keys are used in comparison to the past when a rigid 1:1 relationship was enforced between projects and environments. - State encapsulation with classes. Weโve refactored away from global state variables set with
pinecone.initinto newPineconeclass instances that encapsulate their configuration state. This change enables users to interact with Pinecone using multiple API keys if they wish. - Streamlined dependencies, smoother installs.
- Removed many dependencies:
numpy,pyyaml,loguru,requests,dnspython - Expanded the
urllib3support back to1.26.x - Everything GRPC-related is now moved into a subpackage,
pinecone.grpc, so that GRPC code is only imported when needed. For applications using REST, this will mean quicker startup and fewer dependency clashes with other packages.
- Removed many dependencies:
- Richer responses. The
list_indexesandlist_collectionsmethods now return an array with full descriptions of each resource, not merely an array of names. - Migration to the Apache 2 open source license.ย Weโve moved from a proprietary EULA to a more welcoming Apache 2 license to make it easier than ever for people to incorporate the Pinecone Python SDK into their projects.
- Bug fixes:
- Removed code that was erroneously parsing some metadata into
DateTimeobjects. - Refactored
urllib3usage to stop spamming deprecation warning messages. - Suppressed a
tqdmwarning that was appearing during notebook runs.
- Removed code that was erroneously parsing some metadata into
- Tidying up / Breaking changes
list_indexesnow returns additional data, and to continue iterating over an array of names you need to chain a call to a new helper method.names(). See here.list_collectionshas changed very similar to list_indexes. Use.names(). See here.describe_indextakes the same arguments as before (the index name), but returns data in a different shape reflecting the move of some configurations under thespeckey and elevation ofhostto the top level. See a table of changed properties here.- The order of positional arguments to the
querymethod has been updated to reflect thattop_kis a required parameter. If you previously relied on passing your query vector as the first positional argument, youโll see a strange error from the API about duplicatetop_kvalues being passed. We recommend adopting keyword arguments to fix and be resilient to any future changes, e.g.index.query(vector=vec, top_k=10) query()no longer accepts multiple queries via thequerieskeyword argument.
- Debugging tools. See what data is coming and going with a new environment variable,
PINECONE_DEBUG_CURL='true'
New Contributors
- @zackproser migrated the repository onto poetry in #193
- @austin-denoble made numerous documentation and CI contributions, beginning with #208
- @loisaidasam spotted some typos in #254
Full Changelog: v2.2.4...v3.0.0.dev10
Release v2.2.4
What's Changed
- Bump protobuf dependency to 3.20.x by @jhamon in #185
- CI setup for nightly python builds by @jhamon in #179
- Docs improvements
- by @byronnlandry in #187
- by @byronnlandry in #188
- by @efung in #191
- Fixing annoying urllib3 deprecation error
- Give feedback when
environmentkwarg mispelled by @tdonia in #198
New Contributors
- @efung made their first contribution in #191
- @izeye made their first contribution in #195
- @tdonia made their first contribution in #197
Full Changelog: v2.2.2...v2.2.4
Release v2.2.2
Changelog
Security Fixes
numpydependency from unpinned to>=1.22.0to address low severity CVE-2021-34141protobufdependency from3.19.3to~=3.19.5to address a potential denial-of-service vector. This should only affect those consuming the grpc-flavored version of the client viapinecone-client[grpc].
Numpy features deprecated
We plan to remove our dependency on numpy in a future release to simplify the install experience. Deprecation warnings have been added to code paths where numpy is currently in use. Let us know if you have concerns about this.
End of Python 3.7 Support
We have also removed support for Python 3.7 which has reached the official end-of-life. The last version of the pinecone-client to support Python 3.7 is v2.2.1. Our numpy dependency forced our hand in this decision to drop support because numpy 1.22.0 no longer supports Python 3.7.
Release v2.2.0
Change log:
- Support for Vector
sparse_values - Added function
upsert_from_dataframe()which allows upserting a large dataset of vectors by providing a Pandas dataframe - Added option to pass vectors to
upsert()as a list of dictionaries - Implemented GRPC retry by directly configuring the low-level
grpciobehavior, instead of wrapping with an interceptor
Release 2.1.0
Change log:
- Fix "Connection Reset by peer" error after long idle periods
- Add typing and explicit names for arguments in all client operations
- Add docstrings to all client operations
- Support batch upsert by passing
batch_sizetoupsertmethod - Improve gRPC query results parsing performance