This document has instructions for running BERT Large training using Intel-optimized TensorFlow with Intel® Data Center GPU Max Series.
-
Intel® Data Center GPU Max Series
-
Follow instructions to install the latest ITEX version and other prerequisites.
-
Intel® oneAPI Base Toolkit: Need to install components of Intel® oneAPI Base Toolkit
-
Intel® oneAPI DPC++ Compiler
-
Intel® oneAPI Threading Building Blocks (oneTBB)
-
Intel® oneAPI Math Kernel Library (oneMKL)
-
Follow instructions to download and install the latest oneAPI Base Toolkit.
-
Set environment variables for Intel® oneAPI Base Toolkit: Default installation location
{ONEAPI_ROOT}is/opt/intel/oneapifor root account,${HOME}/intel/oneapifor other accountssource {ONEAPI_ROOT}/compiler/latest/env/vars.sh source {ONEAPI_ROOT}/mkl/latest/env/vars.sh source {ONEAPI_ROOT}/tbb/latest/env/vars.sh source {ONEAPI_ROOT}/mpi/latest/env/vars.sh source {ONEAPI_ROOT}/ccl/latest/env/vars.sh
-
Download and extract the bert large uncased (whole word masking) pretrained model checkpoints
from the google bert repo.
The extracted directory should be set to the BERT_LARGE_DIR environment
variable when running the quickstart scripts. A dummy dataset will be auto generated and
used for training scripts.
| Script name | Description |
|---|---|
bfloat16_training.sh |
bfloat16 precision script for bert large pretraining |
bfloat16_training_hvd.sh |
bfloat16 precision script for bert large pretraining with Intel® Optimization for Horovod* support |
Install the following pre-requisites:
- Create and activate virtual environment.
virtualenv -p python <virtualenv_name> source <virtualenv_name>/bin/activate
- Clone the Model Zoo repository:
git clone https://github.com/IntelAI/models.git
See the datasets section of this document for instructions on
downloading the pretrained model. A path to
this directory will need to be set in the BERT_LARGE_DIR
environment variable prior to running a quickstart script.
Navigate to the BERT Large training directory, and set environment variables:
cd models
export OUTPUT_DIR=<path where output log files will be written>
export PRECISION=bfloat16
export BERT_LARGE_DIR=<path to the wwm_uncased_L-24_H-1024_A-16 directory>
# Set the following `Tile` env variable only for running `bfloat16_training.sh` script:
export Tile=2
# Run `bfloat16_training.sh` script:
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/bfloat_training.sh
# To run `bfloat16_training_hvd.sh` script:
# Install `bfloat16_training_hvd.sh` script specific dependencies:
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/setup.sh
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/bfloat_training_hvd.sh