Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
- `<name>` (_required_) - A unique name for this connector.
- `<host-name>:<port-number>` (_required_) - The OpenSearch instance's host (and `:` followed by the port number, if you're using a local OpenSearch instance).
- `<host-name>:<port-number>` (_required_) - The OpenSearch instance's host, followed by `:`, followed by the instance's port number.
- `<index-name>` (_required_) - The name of the search index on the instance.
- `<username>` - If you're using basic authentication to the instance, the domain's master user's name.
- `<password>` - If you're using basic authentication to the instance, the domain's master user's password.
Expand Down
2 changes: 1 addition & 1 deletion snippets/general-shared-text/opensearch-cli-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d

The following environment variables:

- `OPENSEARCH_HOST` - The hostname and port number, defined as `<hostname>:<port-number>` and represented by `--hosts` (CLI) or `hosts` (Python).
- `OPENSEARCH_HOST` - The host name and port number, defined as `https://<host>:<port>` and represented by `--hosts` (CLI) or `hosts` (Python).
- `OPENSEARCH_INDEX_NAME` - The name of the search index, represented by `--index-name` (CLI) or `index_name` (Python).

If you're using basic authentication to the instance:
Expand Down
2 changes: 1 addition & 1 deletion snippets/general-shared-text/opensearch-platform.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Fill in the following fields:

- **Name** (_required_): A unique name for this connector.
- **Host** (_required_): The OpenSearch instance's host.
- **Host** (_required_): The OpenSearch instance's host name and port number, specified as `https://<host>:<port>`.
- **Index Name** (_required_): The name of the search index on the instance.

If you're using a master user and password as the domain's master user, fill in the following fields:
Expand Down
79 changes: 73 additions & 6 deletions snippets/general-shared-text/opensearch.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,19 @@
- For the [Unstructured UI](/ui/overview) or the [Unstructured API](/api-reference/overview), local OpenSearch instances are not supported.
- For [Unstructured Ingest](/open-source/ingestion/overview), local and non-local OpenSearch instances are supported.

For example, to set up an [AWS OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html) instance, complete steps similar to the following:
For example, to set up [OpenSearch in IBM watsonx.data](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-gs_opensearch), do the following:

1. Create an [IBM Cloud account](https://cloud.ibm.com/registration), if you do not already have one.
2. Create an IBM watsonx.data [Lite plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-tutorial_prov_lite_1)
or [Enterprise plan](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-getting-started_1), if you do not already have one,
within your IBM Cloud account.
3. Open your IBM watsonx.data resource, if it is not already open.<br/>
4. On the sidebar, click **Infrastructure manager**.<br/>
5. Click **Add component**.<br/>
6. Under **Engines**, click **OpenSearch**, and then click **Next**.<br/>
7. Complete the on-screen instructions to finish creating the OpenSearch service instance.<br/>

To set up an [AWS OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html) instance, complete steps similar to the following:

1. Sign in to your AWS account, and then open your AWS Management Console.
2. Open your Amazon OpenSearch Service console.
Expand Down Expand Up @@ -46,7 +58,14 @@
allowfullscreen
></iframe>

- The instance's host identifier (and port number, if you're using a local OpenSearch instance), as follows:
- The instance's host identifier and port number, as follows:

- For OpenSearch in IBM watsonx.data, do the following:

1. Sign in to your IBM Cloud account, and then open the IBM watsonx.data resource in which the OpenSearch service instance is located.
2. On the sidebar, click **Infrastructure manager**.
3. Click the target OpenSearch service instance.
4. On the **Details** tab, note the value of **HTTPS host**, which should have a a format similar to `https://<instance-id>.lakehouse.ibmappdomain.cloud:<port>`.

- For an AWS OpenSearch Service instance, do the following:

Expand All @@ -63,20 +82,44 @@
For the destination connector, if you need to create an index and you're using a master user and password as the domain's master user, you can use for example the following `curl` command. Replace the following placeholders:

- Replace `<host>` with the instance's host identifier.
- If you're using a local OpenSearch instance, replace `<port>` with the instance's port number.
- Replace `<port>` with the instance's port number.
- Replace `<master-username>` with the master user's name, and replace `<master-password>` with the master user's password.
- Replace `<index-name>` with the name of the new search index on the instance.
- Replace `<index-schema>` with the schema for the new search index on the instance. A schema is optional; see the explanation
following this `curl` command for more information.

```bash
curl --request PUT "<host>[:<port>]/<index-name>" \
curl --request PUT "<host>:<port>/<index-name>" \
--user "<master-username>:<master-password>" \
[--header "Content-Type: application/json" \
--data '<index-schema>']
```

If you're using an existing AWS IAM user as the domain's master user instead, you should use the AWS Command Line Interface (CLI) to create the index instead of using the preceding`curl` command. To learn how, see [create-index](https://docs.aws.amazon.com/cli/latest/reference/opensearch/create-index.html) in the AWS CLI Command Reference.
If you're using OpenSearch in IBM watsonx.data, `<master-username>` is typically `ibmlhapikey_<your-IBMid>`, where `<your-IBMid>` is your IBMid, for example `me@example.com`; and
`<master-password>` is the IBM Cloud user API key for your IBM Cloud account.

To get your IBMid, do the following:

1. Sign in to your IBM Cloud account.
2. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
3. On the sidebar, expand **Manage identities**, and then click **Users**.
4. In the list of users, click your user name.
5. On the **User details** tab, in the **Details** tile, note the value of **IBMid**.

To get your IBM Cloud user API key, do the following:

1. Sign in to your IBM Cloud account.
2. In the title bar, click **Manage** and then, under **Security and access**, click **Access (IAM)**.
3. On the sidebar, under **Manage identities**, click **API keys**. If the sidebar is not visible, click the **Navigation Menu** icon to the far left of the title bar.
4. Click **Create**.
5. Enter some **Name** for the API key.
6. Optionally, enter some **Description** for the API key.
7. For **Leaked action**, leave **Disable the leaked key** selected.
8. For **Session management**, leave **No** selected.
9. Click **Create**.
10. Click **Copy** or **Download** to copy or save the API key to a secure location. You won't be able to access this API key from this screen again. If you lose this API key, you can create a new one (and you should then delete the old one).

If you're using the AWS OpenSearch Service and an existing AWS IAM user as the domain's master user instead, you should use the AWS Command Line Interface (CLI) to create the index instead of using the preceding`curl` command. To learn how, see [create-index](https://docs.aws.amazon.com/cli/latest/reference/opensearch/create-index.html) in the AWS CLI Command Reference.

For the destination connector, the index does not need to contain a schema beforehand. If Unstructured encounters an index without a schema,
Unstructured will automatically create a compatible schema for you before inserting items into the index. Nonetheless,
Expand Down Expand Up @@ -151,7 +194,7 @@
}
```

You can adapt the following index schema example for your own needs. Note that outside of `metadata`, the following fields are
You can adapt the following index schema example for your own needs. Note that outside of `metadata`, the following fields are
required by Unstructured whenever you create your own index schema:

- `element_id`
Expand Down Expand Up @@ -299,6 +342,25 @@
}
```

For OpenSearch in IBM watsonx.data, in the preceding index schema example, the `embeddings` field might need to be adjusted similar to the following,
making sure that `dimension` is set to the same number of dimensions as the embedding model generates:

```json
"embeddings": {
"type": "knn_vector",
"dimension": 384,
"space_type": "l2",
"method": {
"name": "hnsw",
"engine": "lucene",
"parameters": {
"ef_construction": 128,
"m": 24
}
}
}
```

See also:

- [Create an index](https://opensearch.org/docs/latest/api-reference/index-apis/create-index/)
Expand All @@ -308,6 +370,11 @@
- [Unstructured document elements and metadata](/api-reference/partition/document-elements)

- For non-local OpenSearch instances, or if you're using basic authentication to a local OpenSearch instance, the master user's name and password.

- For OpenSearch in IBM watsonx.data, the master user's name and password are the same as the IBMid and IBM Cloud user API key for your IBM Cloud account, respectively. To get
these values, see the procedures earlier in this article.
- For an AWS OpenSearch Service instance, the master user's name and password were specified when the instance was created.

- For local OpenSearch instances, if you're using certificates for authentication instead of basic authentication:

- The path to the Certificate Authority (CA) bundle, if you use intermediate CAs with your root CA.
Expand Down