This document has instructions for running ResNet50 v1.5 training using Intel® Extension for TensorFlow* with Intel® Data Center GPU Max Series.
-
Intel® Data Center GPU Max Series
-
Follow instructions to install the latest ITEX version and other prerequisites.
-
Intel® oneAPI Base Toolkit: Need to install components of Intel® oneAPI Base Toolkit
-
Intel® oneAPI DPC++ Compiler
-
Intel® oneAPI Threading Building Blocks (oneTBB)
-
Intel® oneAPI Math Kernel Library (oneMKL)
-
Follow instructions to download and install the latest oneAPI Base Toolkit.
-
Set environment variables for Intel® oneAPI Base Toolkit: Default installation location
{ONEAPI_ROOT}is/opt/intel/oneapifor root account,${HOME}/intel/oneapifor other accountssource {ONEAPI_ROOT}/compiler/latest/env/vars.sh source {ONEAPI_ROOT}/mkl/latest/env/vars.sh source {ONEAPI_ROOT}/tbb/latest/env/vars.sh source {ONEAPI_ROOT}/mpi/latest/env/vars.sh source {ONEAPI_ROOT}/ccl/latest/env/vars.sh
-
Download and preprocess the ImageNet dataset using the instructions here. After running the conversion script you should have a directory with the ImageNet dataset in the TF records format.
Set the DATASET_DIR to point to the TF records directory when running ResNet50 v1.5.
| Script name | Description |
|---|---|
bfloat16_training_full.sh |
Runs full bfloat16 training |
bfloat16_training_hvd.sh |
Runs bfloat16 training with Intel® Optimization for Horovod* |
Install the following pre-requisites:
- Create and activate virtual environment.
virtualenv -p python <virtualenv_name> source <virtualenv_name>/bin/activate
- Clone the Model Zoo repository:
git clone https://github.com/IntelAI/models.git
See the datasets section of this document for instructions on
downloading and preprocessing the ImageNet dataset. The path to the ImageNet
TF records files will need to be set as the DATASET_DIR environment variable
prior to running a quickstart script.
Navigate to the ResNet50 v1.5 training directory, and set environment variables:
cd models
export OUTPUT_DIR=<path where output log files will be written>
export PRECISION=bfloat16
export DATASET_DIR=<path to the preprocessed imagenet dataset directory>
# Optional envs
export BATCH_SIZE=<Set batch_size else it will run with default batch>
# Run quickstart script:
./quickstart/image_recognition/tensorflow/resnet50v1_5/training/gpu/bfloat16_training_hvd.sh
# Set 'Tile' env variable only for running "bfloat16_training_full.sh" script:
export Tile=2
./quickstart/image_recognition/tensorflow/resnet50v1_5/training/gpu/bfloat16_training_full.sh