BERT Large training for Intel® Data Center GPU Max Series

Description

This document has instructions for running BERT Large training using Intel-optimized TensorFlow with Intel® Data Center GPU Max Series.

Software Requirements:

Intel® Data Center GPU Max Series
Follow instructions to install the latest ITEX version and other prerequisites.
Intel® oneAPI Base Toolkit: Need to install components of Intel® oneAPI Base Toolkit
- Intel® oneAPI DPC++ Compiler
- Intel® oneAPI Threading Building Blocks (oneTBB)
- Intel® oneAPI Math Kernel Library (oneMKL)
- Follow instructions to download and install the latest oneAPI Base Toolkit.
- Set environment variables for Intel® oneAPI Base Toolkit: Default installation location {ONEAPI_ROOT} is /opt/intel/oneapi for root account, ${HOME}/intel/oneapi for other accounts
```
source {ONEAPI_ROOT}/compiler/latest/env/vars.sh
source {ONEAPI_ROOT}/mkl/latest/env/vars.sh
source {ONEAPI_ROOT}/tbb/latest/env/vars.sh
source {ONEAPI_ROOT}/mpi/latest/env/vars.sh
source {ONEAPI_ROOT}/ccl/latest/env/vars.sh
```

Datasets

Pretrained models

Download and extract the bert large uncased (whole word masking) pretrained model checkpoints from the google bert repo. The extracted directory should be set to the BERT_LARGE_DIR environment variable when running the quickstart scripts. A dummy dataset will be auto generated and used for training scripts.

Quick Start Scripts

Script name	Description
`bfloat16_training.sh`	bfloat16 precision script for bert large pretraining
`bfloat16_training_hvd.sh`	bfloat16 precision script for bert large pretraining with Intel® Optimization for Horovod* support

Run the model

Install the following pre-requisites:

Create and activate virtual environment.

virtualenv -p python <virtualenv_name>
source <virtualenv_name>/bin/activate

Clone the Model Zoo repository:

git clone https://github.com/IntelAI/models.git

See the datasets section of this document for instructions on downloading the pretrained model. A path to this directory will need to be set in the BERT_LARGE_DIR environment variable prior to running a quickstart script.

Run the model on Baremetal

Navigate to the BERT Large training directory, and set environment variables:

cd models

export OUTPUT_DIR=<path where output log files will be written>
export PRECISION=bfloat16
export BERT_LARGE_DIR=<path to the wwm_uncased_L-24_H-1024_A-16 directory>

# Set the following `Tile` env variable only for running `bfloat16_training.sh` script:
export Tile=2

# Run `bfloat16_training.sh` script:
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/bfloat_training.sh

# To run `bfloat16_training_hvd.sh` script:
# Install `bfloat16_training_hvd.sh` script specific dependencies:
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/setup.sh
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/bfloat_training_hvd.sh

License

LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT Large training for Intel® Data Center GPU Max Series

Description

Software Requirements:

Datasets

Pretrained models

Quick Start Scripts

Run the model

Run the model on Baremetal

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

BERT Large training for Intel® Data Center GPU Max Series

Description

Software Requirements:

Datasets

Pretrained models

Quick Start Scripts

Run the model

Run the model on Baremetal

License