Skip to content

Latest commit

 

History

History
95 lines (73 loc) · 3.59 KB

File metadata and controls

95 lines (73 loc) · 3.59 KB

BERT Large training for Intel® Data Center GPU Max Series

Description

This document has instructions for running BERT Large training using Intel-optimized TensorFlow with Intel® Data Center GPU Max Series.

Software Requirements:

  • Intel® Data Center GPU Max Series

  • Follow instructions to install the latest ITEX version and other prerequisites.

  • Intel® oneAPI Base Toolkit: Need to install components of Intel® oneAPI Base Toolkit

    • Intel® oneAPI DPC++ Compiler

    • Intel® oneAPI Threading Building Blocks (oneTBB)

    • Intel® oneAPI Math Kernel Library (oneMKL)

    • Follow instructions to download and install the latest oneAPI Base Toolkit.

    • Set environment variables for Intel® oneAPI Base Toolkit: Default installation location {ONEAPI_ROOT} is /opt/intel/oneapi for root account, ${HOME}/intel/oneapi for other accounts

      source {ONEAPI_ROOT}/compiler/latest/env/vars.sh
      source {ONEAPI_ROOT}/mkl/latest/env/vars.sh
      source {ONEAPI_ROOT}/tbb/latest/env/vars.sh
      source {ONEAPI_ROOT}/mpi/latest/env/vars.sh
      source {ONEAPI_ROOT}/ccl/latest/env/vars.sh

Datasets

Pretrained models

Download and extract the bert large uncased (whole word masking) pretrained model checkpoints from the google bert repo. The extracted directory should be set to the BERT_LARGE_DIR environment variable when running the quickstart scripts. A dummy dataset will be auto generated and used for training scripts.

Quick Start Scripts

Script name Description
bfloat16_training.sh bfloat16 precision script for bert large pretraining
bfloat16_training_hvd.sh bfloat16 precision script for bert large pretraining with Intel® Optimization for Horovod* support

Run the model

Install the following pre-requisites:

  • Create and activate virtual environment.
    virtualenv -p python <virtualenv_name>
    source <virtualenv_name>/bin/activate
  • Clone the Model Zoo repository:
    git clone https://github.com/IntelAI/models.git

See the datasets section of this document for instructions on downloading the pretrained model. A path to this directory will need to be set in the BERT_LARGE_DIR environment variable prior to running a quickstart script.

Run the model on Baremetal

Navigate to the BERT Large training directory, and set environment variables:

cd models

export OUTPUT_DIR=<path where output log files will be written>
export PRECISION=bfloat16
export BERT_LARGE_DIR=<path to the wwm_uncased_L-24_H-1024_A-16 directory>

# Set the following `Tile` env variable only for running `bfloat16_training.sh` script:
export Tile=2

# Run `bfloat16_training.sh` script:
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/bfloat_training.sh

# To run `bfloat16_training_hvd.sh` script:
# Install `bfloat16_training_hvd.sh` script specific dependencies:
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/setup.sh
./quickstart/language_modeling/tensorflow/bert_large/training/gpu/bfloat_training_hvd.sh

License

LICENSE