Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 28 additions & 14 deletions bigquery/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,37 @@
# Overview
BigQuery interpreter for Apache Zeppelin
BigQuery interpreter for Apache Zeppelin using the modern [google-cloud-bigquery](https://github.com/googleapis/java-bigquery) library.

# Unit Tests
BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point.
# Authentication
The interpreter supports multiple ways to authenticate with Google Cloud:

If you like to run these tests manually, please follow the following steps:
* [Create a new project](https://support.google.com/cloud/answer/6251787?hl=en)
* [Create a Google Compute Engine instance](https://cloud.google.com/compute/docs/instances/create-start-instance)
* Copy the project ID that you created and add it to the property "projectId" in `resources/constants.json`
* Run the command ./mvnw <options> -Dbigquery.text.exclude='' test -pl bigquery -am
1. **Application Default Credentials (ADC)**:
This is the recommended way. If Zeppelin is running on GCE, GKE, or any environment where `gcloud auth application-default login` has been executed, the interpreter will automatically discover the credentials.

# Connection
The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables.
2. **Service Account JSON Key (Manual Fallback)**:
If ADC is not available, the interpreter will prompt you to input your Service Account JSON key through the Zeppelin GUI.
- To get a JSON key:
1. Go to the [GCP Console Service Accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts).
2. Select your project and service account.
3. Click **Keys** -> **Add Key** -> **Create new key**.
4. Select **JSON** and click **Create**.
5. Copy the entire content of the downloaded JSON file and paste it into the Zeppelin input box when prompted. Treat this JSON key as a secret.
- **Security caution:** Do not paste this key into shared notes, notebooks, version control, or any place where it might be stored or visible to others. Prefer using Application Default Credentials (ADC) or Zeppelin's secure credentials mechanisms where possible, and only use this manual JSON key approach as a fallback when more secure options are not available.

# Google BigQuery API Javadoc
[API Javadocs](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/)
[Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar)
# Configuration
| Property | Default | Description |
| --- | --- | --- |
| `zeppelin.bigquery.project_id` | | GCP Project ID |
| `zeppelin.bigquery.wait_time` | 5000 | Query Timeout in ms |
| `zeppelin.bigquery.max_no_of_rows` | 100000 | Max Result size |
| `zeppelin.bigquery.sql_dialect` | | SQL Dialect (standardsql or legacysql) |
| `zeppelin.bigquery.region` | | GCP Region |

We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons.
# Unit Tests
BigQuery unit tests are integration tests that require access to a real GCP project.
By default, they are excluded. To run them:
1. Setup ADC locally (`gcloud auth application-default login`).
2. Create `src/test/resources/constants.json` with your project and test queries.
3. Run: `./mvnw test -pl bigquery -am -Dbigquery.test.exclude=""`

# Sample Screenshot

Expand Down
32 changes: 3 additions & 29 deletions bigquery/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,50 +32,24 @@
<name>Zeppelin: BigQuery interpreter</name>

<properties>
<project.http.version>1.34.0</project.http.version>
<project.oauth.version>1.30.5</project.oauth.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<bigquery.test.exclude>**/BigQueryInterpreterTest.java</bigquery.test.exclude>

<!-- library versions -->
<bigquery.api.version>v2-rev20190917-1.30.3</bigquery.api.version>
<guava.version>24.1.1-jre</guava.version>

<interpreter.name>bigquery</interpreter.name>
</properties>

<dependencies>

<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>${bigquery.api.version}</version>
</dependency>
<dependency>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client</artifactId>
<version>${project.oauth.version}</version>
</dependency>
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client-jackson2</artifactId>
<version>${project.http.version}</version>
</dependency>
<dependency>
<groupId>com.google.oauth-client</groupId>
<artifactId>google-oauth-client-jetty</artifactId>
<version>${project.oauth.version}</version>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>2.38.0</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>${gson.version}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
Expand Down
Loading
Loading