This document has instructions for running BERT Large inference using Intel-optimized TensorFlow with Intel® Data Center GPU Max Series.
-
Intel® Data Center GPU Max Series
-
Follow instructions to install the latest ITEX version and other prerequisites.
-
Intel® oneAPI Base Toolkit: Need to install components of Intel® oneAPI Base Toolkit
-
Intel® oneAPI DPC++ Compiler
-
Intel® oneAPI Threading Building Blocks (oneTBB)
-
Intel® oneAPI Math Kernel Library (oneMKL)
-
Follow instructions to download and install the latest oneAPI Base Toolkit.
-
Set environment variables for Intel® oneAPI Base Toolkit: Default installation location
{ONEAPI_ROOT}is/opt/intel/oneapifor root account,${HOME}/intel/oneapifor other accountssource {ONEAPI_ROOT}/compiler/latest/env/vars.sh source {ONEAPI_ROOT}/mkl/latest/env/vars.sh source {ONEAPI_ROOT}/tbb/latest/env/vars.sh source {ONEAPI_ROOT}/mpi/latest/env/vars.sh source {ONEAPI_ROOT}/ccl/latest/env/vars.sh
-
Download and unzip the BERT Large uncased (whole word masking) model from the
google bert repo.
Then, download the Stanford Question Answering Dataset (SQuAD) dataset file dev-v1.1.json into the wwm_uncased_L-24_H-1024_A-16 directory that was just unzipped.
wget https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip
unzip wwm_uncased_L-24_H-1024_A-16.zip
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -P wwm_uncased_L-24_H-1024_A-16
Set the DATASET_DIR to point to that directory when running BERT Large inference using the SQuAD data.
| Script name | Description |
|---|---|
benchmark.sh |
This script runs bert large fp16 and fp32 inference. |
accuracy.sh |
This script runs bert large fp16 and fp32 inference in accuracy mode. |
Install the following pre-requisites:
-
Create and activate virtual environment.
virtualenv -p python <virtualenv_name> source <virtualenv_name>/bin/activate
-
Download the frozen graph model file, and set the FROZEN_GRAPH environment variable to point to where it was saved:
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_7_0/fp32_bert_squad.pb
-
Download the pretrained model directory and set the PRETRAINED_DIR environment variable to point where it was saved:
wget https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip unzip wwm_uncased_L-24_H-1024_A-16.zip
-
Download the SQUAD directory and set the SQUAD_DIR environment variable to point where it was saved:
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json wget https://raw.githubusercontent.com/allenai/bi-att-flow/master/squad/evaluate-v1.1.py
-
Clone the Model Zoo repository:
git clone https://github.com/IntelAI/models.git
Navigate to the BERT Large inference directory, and set environment variables:
cd models
export OUTPUT_DIR=<path where output log files will be written>
export PRECISION=<Set precision: fp16 or fp32>
export FROZEN_GRAPH=<path to pretrained model file (*.pb)>
export PRETRAINED_DIR=<path to pretrained directory>
export SQUAD_DIR=<path to squad directory>
# Set `Tile` env variable only for running `benchmark.sh` script:
export Tile=2
# Run quickstart script:
./quickstart/language_modeling/tensorflow/bert_large/inference/gpu/<script name>.sh