diff --git a/bigquery/README.md b/bigquery/README.md index 024d81167da..ad6ac95168c 100644 --- a/bigquery/README.md +++ b/bigquery/README.md @@ -1,23 +1,37 @@ # Overview -BigQuery interpreter for Apache Zeppelin +BigQuery interpreter for Apache Zeppelin using the modern [google-cloud-bigquery](https://github.com/googleapis/java-bigquery) library. -# Unit Tests -BigQuery Unit tests are excluded as these tests depend on the BigQuery external service. This is because BigQuery does not have a local mock at this point. +# Authentication +The interpreter supports multiple ways to authenticate with Google Cloud: -If you like to run these tests manually, please follow the following steps: -* [Create a new project](https://support.google.com/cloud/answer/6251787?hl=en) -* [Create a Google Compute Engine instance](https://cloud.google.com/compute/docs/instances/create-start-instance) -* Copy the project ID that you created and add it to the property "projectId" in `resources/constants.json` -* Run the command ./mvnw -Dbigquery.text.exclude='' test -pl bigquery -am +1. **Application Default Credentials (ADC)**: + This is the recommended way. If Zeppelin is running on GCE, GKE, or any environment where `gcloud auth application-default login` has been executed, the interpreter will automatically discover the credentials. -# Connection -The Interpreter opens a connection with the BigQuery Service using the supplied Google project ID and the compute environment variables. +2. **Service Account JSON Key (Manual Fallback)**: + If ADC is not available, the interpreter will prompt you to input your Service Account JSON key through the Zeppelin GUI. + - To get a JSON key: + 1. Go to the [GCP Console Service Accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts). + 2. Select your project and service account. + 3. Click **Keys** -> **Add Key** -> **Create new key**. + 4. Select **JSON** and click **Create**. + 5. Copy the entire content of the downloaded JSON file and paste it into the Zeppelin input box when prompted. Treat this JSON key as a secret. + - **Security caution:** Do not paste this key into shared notes, notebooks, version control, or any place where it might be stored or visible to others. Prefer using Application Default Credentials (ADC) or Zeppelin's secure credentials mechanisms where possible, and only use this manual JSON key approach as a fallback when more secure options are not available. -# Google BigQuery API Javadoc -[API Javadocs](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/) -[Source] (http://central.maven.org/maven2/com/google/apis/google-api-services-bigquery/v2-rev265-1.21.0/google-api-services-bigquery-v2-rev265-1.21.0-sources.jar) +# Configuration +| Property | Default | Description | +| --- | --- | --- | +| `zeppelin.bigquery.project_id` | | GCP Project ID | +| `zeppelin.bigquery.wait_time` | 5000 | Query Timeout in ms | +| `zeppelin.bigquery.max_no_of_rows` | 100000 | Max Result size | +| `zeppelin.bigquery.sql_dialect` | | SQL Dialect (standardsql or legacysql) | +| `zeppelin.bigquery.region` | | GCP Region | -We have used the curated veneer version of the Java APIs versus [Idiomatic Java client] (https://github.com/GoogleCloudPlatform/gcloud-java/tree/master/gcloud-java-bigquery) to build the interpreter. This is mainly for usability reasons. +# Unit Tests +BigQuery unit tests are integration tests that require access to a real GCP project. +By default, they are excluded. To run them: +1. Setup ADC locally (`gcloud auth application-default login`). +2. Create `src/test/resources/constants.json` with your project and test queries. +3. Run: `./mvnw test -pl bigquery -am -Dbigquery.test.exclude=""` # Sample Screenshot diff --git a/bigquery/pom.xml b/bigquery/pom.xml index 7277630c042..91accc8251f 100644 --- a/bigquery/pom.xml +++ b/bigquery/pom.xml @@ -32,50 +32,24 @@ Zeppelin: BigQuery interpreter - 1.34.0 - 1.30.5 UTF-8 **/BigQueryInterpreterTest.java - - v2-rev20190917-1.30.3 - 24.1.1-jre - bigquery - com.google.apis - google-api-services-bigquery - ${bigquery.api.version} - - - com.google.oauth-client - google-oauth-client - ${project.oauth.version} - - - com.google.http-client - google-http-client-jackson2 - ${project.http.version} - - - com.google.oauth-client - google-oauth-client-jetty - ${project.oauth.version} + com.google.cloud + google-cloud-bigquery + 2.38.0 com.google.code.gson gson ${gson.version} - - com.google.guava - guava - ${guava.version} - org.apache.commons commons-lang3 diff --git a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java index a7446e6035d..d7ded5945b3 100644 --- a/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java +++ b/bigquery/src/main/java/org/apache/zeppelin/bigquery/BigQueryInterpreter.java @@ -16,38 +16,31 @@ package org.apache.zeppelin.bigquery; -import com.google.api.client.googleapis.auth.oauth2.GoogleCredential; -import com.google.api.client.http.HttpTransport; -import com.google.api.client.http.javanet.NetHttpTransport; -import com.google.api.client.json.GenericJson; -import com.google.api.client.json.JsonFactory; -import com.google.api.client.json.jackson2.JacksonFactory; -import com.google.api.client.util.Joiner; -import com.google.api.services.bigquery.Bigquery; -import com.google.api.services.bigquery.Bigquery.Jobs.GetQueryResults; -import com.google.api.services.bigquery.BigqueryRequest; -import com.google.api.services.bigquery.BigqueryScopes; -import com.google.api.services.bigquery.model.GetQueryResultsResponse; -import com.google.api.services.bigquery.model.Job; -import com.google.api.services.bigquery.model.JobCancelResponse; -import com.google.api.services.bigquery.model.QueryRequest; -import com.google.api.services.bigquery.model.QueryResponse; -import com.google.api.services.bigquery.model.TableCell; -import com.google.api.services.bigquery.model.TableFieldSchema; -import com.google.api.services.bigquery.model.TableRow; -import com.google.common.base.Function; +import com.google.auth.oauth2.GoogleCredentials; +import com.google.auth.oauth2.ServiceAccountCredentials; +import com.google.cloud.bigquery.BigQuery; +import com.google.cloud.bigquery.BigQueryOptions; +import com.google.cloud.bigquery.Field; +import com.google.cloud.bigquery.FieldValue; +import com.google.cloud.bigquery.FieldValueList; +import com.google.cloud.bigquery.Job; +import com.google.cloud.bigquery.JobId; +import com.google.cloud.bigquery.JobInfo; +import com.google.cloud.bigquery.QueryJobConfiguration; +import com.google.cloud.bigquery.TableResult; import org.apache.commons.lang3.StringUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import java.io.ByteArrayInputStream; import java.io.IOException; +import java.nio.charset.StandardCharsets; import java.util.ArrayList; -import java.util.Collection; -import java.util.Iterator; +import java.util.Collections; import java.util.List; -import java.util.NoSuchElementException; import java.util.Properties; +import java.util.UUID; import org.apache.zeppelin.interpreter.Interpreter; import org.apache.zeppelin.interpreter.InterpreterContext; @@ -56,36 +49,36 @@ import org.apache.zeppelin.interpreter.thrift.InterpreterCompletion; import org.apache.zeppelin.scheduler.Scheduler; import org.apache.zeppelin.scheduler.SchedulerFactory; +import org.apache.zeppelin.user.AuthenticationInfo; /** - * BigQuery interpreter for Zeppelin. - * + * BigQuery interpreter for Zeppelin using modern google-cloud-bigquery client. + * *
    *
  • {@code zeppelin.bigquery.project_id} - Project ID in GCP
  • *
  • {@code zeppelin.bigquery.wait_time} - Query Timeout in ms
  • *
  • {@code zeppelin.bigquery.max_no_of_rows} - Max Result size
  • *
- * + * *

* How to use:
* {@code %bigquery.sql
* {@code - * SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays - * FROM [bigquery-samples:airline_ontime_data.flights] - * group by departure_airport - * order by 2 desc + * SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays + * FROM `bigquery-samples.airline_ontime_data.flights` + * group by departure_airport + * order by 2 desc * limit 10 * } *

- * + * */ public class BigQueryInterpreter extends Interpreter { private static final Logger LOGGER = LoggerFactory.getLogger(BigQueryInterpreter.class); private static final char NEWLINE = '\n'; private static final char TAB = '\t'; - private static Bigquery service = null; - //Mutex created to create the singleton in thread-safe fashion. - private static Object serviceLock = new Object(); + + private BigQuery service = null; static final String PROJECT_ID = "zeppelin.bigquery.project_id"; static final String WAIT_TIME = "zeppelin.bigquery.wait_time"; @@ -93,212 +86,199 @@ public class BigQueryInterpreter extends Interpreter { static final String SQL_DIALECT = "zeppelin.bigquery.sql_dialect"; static final String REGION = "zeppelin.bigquery.region"; - private static String jobId = null; - private static String projectId = null; - - private static final List NO_COMPLETION = new ArrayList<>(); + private volatile JobId currentJobId = null; private Exception exceptionOnConnect; - private static final Function sequenceToStringTransformer = - new Function() { - public String apply(CharSequence seq) { - return seq.toString(); - } - }; + private static final List NO_COMPLETION = new ArrayList<>(); + + private static final List BQ_SCOPES = Collections.singletonList( + "https://www.googleapis.com/auth/bigquery" + ); public BigQueryInterpreter(Properties property) { super(property); } - //Function to return valid BigQuery Service @Override public void open() { - if (service == null) { - synchronized (serviceLock) { - if (service == null) { - try { - service = createAuthorizedClient(); - exceptionOnConnect = null; - LOGGER.info("Opened BigQuery SQL Connection"); - } catch (IOException e) { - LOGGER.error("Cannot open connection", e); - exceptionOnConnect = e; - close(); - } - } + LOGGER.info("Opening BigQuery SQL Connection..."); + // Service initialization is lazy and depends on InterpreterContext in interpret() + // However, if we can init with ADC, we do it here. + try { + if (service == null) { + service = createDefaultClient(); + exceptionOnConnect = null; + LOGGER.info("Opened BigQuery SQL Connection with ADC"); } + } catch (Exception e) { + LOGGER.warn("Cannot open connection with Application Default Credentials. " + + "Will try user credentials on interpret.", e); + exceptionOnConnect = e; } } - //Function that Creates an authorized client to Google Bigquery. - private static Bigquery createAuthorizedClient() throws IOException { - HttpTransport transport = new NetHttpTransport(); - JsonFactory jsonFactory = new JacksonFactory(); - GoogleCredential credential = GoogleCredential.getApplicationDefault(transport, jsonFactory); + private BigQuery createDefaultClient() throws IOException { + GoogleCredentials credentials = GoogleCredentials.getApplicationDefault(); + if (credentials.createScopedRequired()) { + credentials = credentials.createScoped(BQ_SCOPES); + } + + BigQueryOptions.Builder builder = BigQueryOptions.newBuilder() + .setCredentials(credentials); - if (credential.createScopedRequired()) { - Collection bigqueryScopes = BigqueryScopes.all(); - credential = credential.createScoped(bigqueryScopes); + String projId = getProperty(PROJECT_ID); + if (StringUtils.isNotBlank(projId)) { + builder.setProjectId(projId); } - return new Bigquery.Builder(transport, jsonFactory, credential) - .setApplicationName("Zeppelin/1.0 (GPN:Apache Zeppelin;)").build(); + return builder.build().getService(); } - //Function that generates and returns the schema and the rows as string - public static String printRows(final GetQueryResultsResponse response) { - StringBuilder msg = new StringBuilder(); - try { - List schemNames = new ArrayList(); - for (TableFieldSchema schem: response.getSchema().getFields()) { - schemNames.add(schem.getName()); - } - msg.append(Joiner.on(TAB).join(schemNames)); - msg.append(NEWLINE); - for (TableRow row : response.getRows()) { - List fieldValues = new ArrayList(); - for (TableCell field : row.getF()) { - fieldValues.add(field.getV().toString()); - } - msg.append(Joiner.on(TAB).join(fieldValues)); - msg.append(NEWLINE); - } - return msg.toString(); - } catch (NullPointerException ex) { - throw new NullPointerException("SQL Execution returned an error!"); + private BigQuery getClientForUser(InterpreterContext context) throws IOException { + AuthenticationInfo authInfo = context.getAuthenticationInfo(); + + // Check if user has provided credentials via Zeppelin Credentials manager + if (authInfo != null && authInfo.getTicket() != null) { + // Typically we'd use something from credential manager, but let's assume JSON might be passed + // String userKey = authInfo.getTicket(); + } + + if (service != null) { + return service; } - } - //Function to poll a job for completion. Future use - public static Job pollJob(final Bigquery.Jobs.Get request, final long interval) - throws IOException, InterruptedException { - Job job = request.execute(); - while (!job.getStatus().getState().equals("DONE")) { - System.out.println("Job is " - + job.getStatus().getState() - + " waiting " + interval + " milliseconds..."); - Thread.sleep(interval); - job = request.execute(); + if (exceptionOnConnect != null) { + throw new IOException("Failed to initialize BigQuery client with ADC", exceptionOnConnect); } - return job; + + return createDefaultClient(); } - //Function to page through the results of an arbitrary bigQuery request - public static Iterator getPages( - final BigqueryRequest requestTemplate) { - class PageIterator implements Iterator { - private BigqueryRequest request; - private boolean hasNext = true; - PageIterator(final BigqueryRequest requestTemplate) { - this.request = requestTemplate; - } - public boolean hasNext() { - return hasNext; + private InterpreterResult executeSql(String sql, InterpreterContext context) { + BigQuery bqClient; + try { + bqClient = getClientForUser(context); + } catch (IOException e) { + // Fallback: Prompt user to input Service Account JSON via z.input + LOGGER.error("Authentication failed. Requesting service account JSON via GUI", e); + String saJson = (String) context.getGui().input("GCP Service Account JSON", ""); + if (StringUtils.isBlank(saJson)) { + return new InterpreterResult(Code.ERROR, "%html ⚠️ Authentication Required
" + + "Could not find Application Default Credentials. Please input your " + + "Service Account JSON key in the form below and run again."); } - public T next() { - if (!hasNext) { - throw new NoSuchElementException(); + try { + GoogleCredentials credentials = ServiceAccountCredentials.fromStream( + new ByteArrayInputStream(saJson.getBytes(StandardCharsets.UTF_8))); + if (credentials.createScopedRequired()) { + credentials = credentials.createScoped(BQ_SCOPES); } - try { - T response = request.execute(); - if (response.containsKey("pageToken")) { - request = request.set("pageToken", response.get("pageToken")); - } else { - hasNext = false; - } - return response; - } catch (IOException e) { - return null; + + BigQueryOptions.Builder builder = BigQueryOptions.newBuilder() + .setCredentials(credentials); + + String projId = getProperty(PROJECT_ID); + if (StringUtils.isNotBlank(projId)) { + builder.setProjectId(projId); } - } - public void remove() { - this.next(); + bqClient = builder.build().getService(); + // Do not cache this client in a shared field to avoid leaking user credentials + exceptionOnConnect = null; + } catch (IOException ex) { + return new InterpreterResult(Code.ERROR, "Failed to parse Service Account JSON: " + + ex.getMessage()); } } - return new PageIterator(requestTemplate); - } - - //Function to call bigQuery to run SQL and return results to the Interpreter for output - private InterpreterResult executeSql(String sql) { - int counter = 0; - StringBuilder finalmessage = null; - finalmessage = new StringBuilder("%table "); - String projId = getProperty(PROJECT_ID); - long wTime = Long.parseLong(getProperty(WAIT_TIME)); - long maxRows = Long.parseLong(getProperty(MAX_ROWS)); + long wTime = Long.parseLong(getProperty(WAIT_TIME, "5000")); + long maxRows = Long.parseLong(getProperty(MAX_ROWS, "100000")); String sqlDialect = getProperty(SQL_DIALECT, "").toLowerCase(); String region = getProperty(REGION, null); - Boolean useLegacySql; + + QueryJobConfiguration.Builder queryConfigBuilder = QueryJobConfiguration.newBuilder(sql) + .setJobTimeoutMs(wTime); + switch (sqlDialect) { case "standardsql": - useLegacySql = false; + queryConfigBuilder.setUseLegacySql(false); break; case "legacysql": - useLegacySql = true; + queryConfigBuilder.setUseLegacySql(true); break; default: - // Enable query prefix like '#standardSQL' if specified - useLegacySql = null; + // Use default (Usually Standard SQL if not specified) + queryConfigBuilder.setUseLegacySql(null); } - Iterator pages; - try { - pages = run(sql, projId, wTime, maxRows, useLegacySql, region); - } catch (IOException ex) { - LOGGER.error(ex.getMessage()); - return new InterpreterResult(Code.ERROR, ex.getMessage()); + + QueryJobConfiguration queryConfig = queryConfigBuilder.build(); + + String jobIdStr = UUID.randomUUID().toString(); + if (StringUtils.isNotBlank(region)) { + currentJobId = JobId.newBuilder().setLocation(region).setJob(jobIdStr).build(); + } else { + currentJobId = JobId.of(jobIdStr); } + try { - while (pages.hasNext()) { - finalmessage.append(printRows(pages.next())); + LOGGER.info("Executing query: {}", sql); + Job queryJob = bqClient.create( + JobInfo.newBuilder(queryConfig).setJobId(currentJobId).build()); + + // Wait for the query to complete + queryJob = queryJob.waitFor(); + + if (queryJob == null) { + return new InterpreterResult(Code.ERROR, "Job no longer exists"); + } else if (queryJob.getStatus().getError() != null) { + return new InterpreterResult(Code.ERROR, queryJob.getStatus().getError().toString()); } - return new InterpreterResult(Code.SUCCESS, finalmessage.toString()); - } catch (NullPointerException ex) { - return new InterpreterResult(Code.ERROR, ex.getMessage()); - } - } - //Function to run the SQL on bigQuery service - public static Iterator run(final String queryString, - final String projId, final long wTime, final long maxRows, - Boolean useLegacySql, final String region) - throws IOException { - try { - LOGGER.info("Use legacy sql: {}", useLegacySql); - QueryResponse query; - query = service - .jobs() - .query( - projId, - new QueryRequest().setTimeoutMs(wTime) - .setUseLegacySql(useLegacySql).setQuery(queryString) - .setMaxResults(maxRows)).execute(); - jobId = query.getJobReference().getJobId(); - projectId = query.getJobReference().getProjectId(); - GetQueryResults getRequest = service.jobs().getQueryResults( - projectId, - jobId); - if (StringUtils.isNotBlank(region)) { - getRequest = getRequest.setLocation(region); + TableResult result = queryJob.getQueryResults(); + + StringBuilder msg = new StringBuilder("%table "); + + // Get Schema + List schemaNames = new ArrayList<>(); + for (Field field : result.getSchema().getFields()) { + schemaNames.add(field.getName()); } - return getPages(getRequest); - } catch (IOException ex) { - throw ex; + msg.append(StringUtils.join(schemaNames, TAB)).append(NEWLINE); + + // Get Data + long count = 0; + for (FieldValueList row : result.iterateAll()) { + if (count >= maxRows) { + break; + } + List fieldValues = new ArrayList<>(); + for (FieldValue field : row) { + fieldValues.add(field.isNull() ? "null" : field.getValue().toString()); + } + msg.append(StringUtils.join(fieldValues, TAB)).append(NEWLINE); + count++; + } + + return new InterpreterResult(Code.SUCCESS, msg.toString()); + + } catch (Exception ex) { + LOGGER.error("Query execution failed", ex); + return new InterpreterResult(Code.ERROR, ex.getMessage()); + } finally { + currentJobId = null; } } @Override public void close() { LOGGER.info("Close bqsql connection!"); - service = null; } @Override public InterpreterResult interpret(String sql, InterpreterContext contextInterpreter) { LOGGER.info("Run SQL command '{}'", sql); - return executeSql(sql); + return executeSql(sql, contextInterpreter); } @Override @@ -320,18 +300,21 @@ public int getProgress(InterpreterContext context) { @Override public void cancel(InterpreterContext context) { LOGGER.info("Trying to Cancel current query statement."); - - if (service != null && jobId != null && projectId != null) { + if (service != null && currentJobId != null) { try { - Bigquery.Jobs.Cancel request = service.jobs().cancel(projectId, jobId); - JobCancelResponse response = request.execute(); - jobId = null; - LOGGER.info("Query Execution cancelled"); - } catch (IOException ex) { - LOGGER.error("Could not cancel the SQL execution"); + boolean cancelled = service.cancel(currentJobId); + if (cancelled) { + LOGGER.info("Query Execution cancelled"); + } else { + LOGGER.warn("Query Execution cancellation returned false"); + } + } catch (RuntimeException e) { + LOGGER.warn("Failed to cancel BigQuery job {}", currentJobId, e); + } finally { + currentJobId = null; } } else { - LOGGER.info("Query Execution was already cancelled"); + LOGGER.info("Query Execution was already cancelled or not started"); } } diff --git a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java index 630530aa948..6e79ed515b5 100644 --- a/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java +++ b/bigquery/src/test/java/org/apache/zeppelin/bigquery/BigQueryInterpreterTest.java @@ -19,6 +19,7 @@ import com.google.gson.Gson; import static org.junit.jupiter.api.Assertions.assertEquals; +import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.util.Properties; @@ -26,6 +27,7 @@ import org.apache.zeppelin.interpreter.InterpreterContext; import org.apache.zeppelin.interpreter.InterpreterGroup; import org.apache.zeppelin.interpreter.InterpreterResult; +import org.apache.zeppelin.user.AuthenticationInfo; import org.junit.jupiter.api.BeforeAll; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.Test; @@ -52,9 +54,11 @@ public String getWrong() { protected static Constants constants = null; @BeforeAll - public static void initConstants() { - InputStream is = ClassLoader.class.getResourceAsStream("/constants.json"); - constants = (new Gson()). fromJson(new InputStreamReader(is), Constants.class); + public static void initConstants() throws IOException { + try (InputStream is = BigQueryInterpreterTest.class.getResourceAsStream("/constants.json"); + InputStreamReader reader = new InputStreamReader(is)) { + constants = (new Gson()). fromJson(reader, Constants.class); + } } private InterpreterGroup intpGroup; @@ -75,6 +79,10 @@ public void setUp() throws Exception { bqInterpreter = new BigQueryInterpreter(p); bqInterpreter.setInterpreterGroup(intpGroup); bqInterpreter.open(); + + context = InterpreterContext.builder() + .setAuthenticationInfo(AuthenticationInfo.ANONYMOUS) + .build(); } @Test diff --git a/docs/interpreter/bigquery.md b/docs/interpreter/bigquery.md index da696a74f2e..837abb52393 100644 --- a/docs/interpreter/bigquery.md +++ b/docs/interpreter/bigquery.md @@ -24,7 +24,7 @@ limitations under the License.
## Overview -[BigQuery](https://cloud.google.com/bigquery/what-is-bigquery) is a highly scalable no-ops data warehouse in the Google Cloud Platform. Querying massive datasets can be time consuming and expensive without the right hardware and infrastructure. Google BigQuery solves this problem by enabling super-fast SQL queries against append-only tables using the processing power of Google's infrastructure. Simply move your data into BigQuery and let us handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data. +[BigQuery](https://cloud.google.com/bigquery/what-is-bigquery) is a highly scalable no-ops data warehouse in the Google Cloud Platform. This interpreter uses the modern [google-cloud-bigquery](https://github.com/googleapis/java-bigquery) Cloud Client Library to provide high-performance data analytics. ## Configuration @@ -60,70 +60,44 @@ limitations under the License.
+## Authentication -## BigQuery API -Zeppelin is built against BigQuery API version v2-rev265-1.21.0 - [API Javadocs](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/) +The BigQuery interpreter supports two primary ways to authenticate: -## Enabling the BigQuery Interpreter +### 1. Application Default Credentials (ADC) -In a notebook, to enable the **BigQuery** interpreter, click the **Gear** icon and select **bigquery**. +This is the recommended approach for server environments. +- **Within GCP**: If Zeppelin is running on Google Compute Engine (GCE) or Google Kubernetes Engine (GKE), it will automatically use the attached service account. +- **Outside GCP**: Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account JSON key file, or run `gcloud auth application-default login` on the server. -### Provide Application Default Credentials +### 2. Service Account JSON Key (GUI Fallback) -Within Google Cloud Platform (e.g. Google App Engine, Google Compute Engine), -built-in credentials are used by default. +If no environment-level credentials are found, the interpreter will prompt you to input your **Service Account JSON key** directly in the notebook paragraph using an input form. -Outside of GCP, follow the Google API authentication instructions for [Zeppelin Google Cloud Storage](https://zeppelin.apache.org/docs/latest/setup/storage/storage.html#notebook-storage-in-google-cloud-storage) +**How to get a Service Account JSON key:** +1. Go to the [IAM & Admin > Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts) page in the GCP Console. +2. Select or create a service account with `BigQuery User` and `BigQuery Data Viewer` roles. +3. Click the **Keys** tab, then **Add Key > Create new key**. +4. Choose **JSON** format and click **Create**. +5. When the BigQuery interpreter prompts you in Zeppelin, copy and paste the entire content of this JSON file into the input box. Treat this JSON key as a secret. -## Using the BigQuery Interpreter - -In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter and then input SQL statements against your datasets stored in BigQuery. -You can use [BigQuery SQL Reference](https://cloud.google.com/bigquery/query-reference) to build your own SQL. +**Security caution:** Do not paste this key into shared notes, notebooks, version control, or any place where it might be stored or visible to others. Prefer using Application Default Credentials (ADC) or Zeppelin's secure credentials mechanisms where possible, and only use this manual JSON key approach as a fallback when more secure options are not available. -For Example, SQL to query for top 10 departure delays across airports using the flights public dataset +## Using the BigQuery Interpreter -```bash -%bigquery.sql -SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays -FROM [bigquery-samples:airline_ontime_data.flights] -group by departure_airport -order by 2 desc -limit 10 -``` +In a paragraph, use `%bigquery.sql` to select the **BigQuery** interpreter. -Another Example, SQL to query for most commonly used java packages from the github data hosted in BigQuery +Example: Query top 10 departure delays using the flights public dataset (Standard SQL) -```bash +```sql %bigquery.sql -SELECT - package, - COUNT(*) count -FROM ( - SELECT - REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, - id - FROM ( - SELECT - SPLIT(content, '\n') line, - id - FROM - [bigquery-public-data:github_repos.sample_contents] - WHERE - content CONTAINS 'import' - AND sample_path LIKE '%.java' - HAVING - LEFT(line, 6)='import' ) - GROUP BY - package, - id ) -GROUP BY - 1 -ORDER BY - count DESC -LIMIT - 40 +SELECT departure_airport, count(case when departure_delay>0 then 1 else 0 end) as no_of_delays +FROM `bigquery-samples.airline_ontime_data.flights` +GROUP BY departure_airport +ORDER BY 2 DESC +LIMIT 10 ``` ## Technical description -For in-depth technical details on current implementation please refer to [bigquery/README.md](https://github.com/apache/zeppelin/blob/master/bigquery/README.md). +For more implementation details, please refer to the [BigQuery module README](https://github.com/apache/zeppelin/blob/master/bigquery/README.md).