Skip to content

Latest commit

 

History

History
89 lines (70 loc) · 3.35 KB

File metadata and controls

89 lines (70 loc) · 3.35 KB

ResNet50 v1.5 Training

Description

This document has instructions for running ResNet50 v1.5 training using Intel® Extension for TensorFlow* with Intel® Data Center GPU Max Series.

Software Requirements:

  • Intel® Data Center GPU Max Series

  • Follow instructions to install the latest ITEX version and other prerequisites.

  • Intel® oneAPI Base Toolkit: Need to install components of Intel® oneAPI Base Toolkit

    • Intel® oneAPI DPC++ Compiler

    • Intel® oneAPI Threading Building Blocks (oneTBB)

    • Intel® oneAPI Math Kernel Library (oneMKL)

    • Follow instructions to download and install the latest oneAPI Base Toolkit.

    • Set environment variables for Intel® oneAPI Base Toolkit: Default installation location {ONEAPI_ROOT} is /opt/intel/oneapi for root account, ${HOME}/intel/oneapi for other accounts

      source {ONEAPI_ROOT}/compiler/latest/env/vars.sh
      source {ONEAPI_ROOT}/mkl/latest/env/vars.sh
      source {ONEAPI_ROOT}/tbb/latest/env/vars.sh
      source {ONEAPI_ROOT}/mpi/latest/env/vars.sh
      source {ONEAPI_ROOT}/ccl/latest/env/vars.sh

Datasets

Download and preprocess the ImageNet dataset using the instructions here. After running the conversion script you should have a directory with the ImageNet dataset in the TF records format.

Set the DATASET_DIR to point to the TF records directory when running ResNet50 v1.5.

Quick Start Scripts

Script name Description
bfloat16_training_full.sh Runs full bfloat16 training
bfloat16_training_hvd.sh Runs bfloat16 training with Intel® Optimization for Horovod*

Run the model

Install the following pre-requisites:

  • Create and activate virtual environment.
    virtualenv -p python <virtualenv_name>
    source <virtualenv_name>/bin/activate
  • Clone the Model Zoo repository:
    git clone https://github.com/IntelAI/models.git

See the datasets section of this document for instructions on downloading and preprocessing the ImageNet dataset. The path to the ImageNet TF records files will need to be set as the DATASET_DIR environment variable prior to running a quickstart script.

Run the model on Baremetal

Navigate to the ResNet50 v1.5 training directory, and set environment variables:

cd models
export OUTPUT_DIR=<path where output log files will be written>
export PRECISION=bfloat16
export DATASET_DIR=<path to the preprocessed imagenet dataset directory>

# Optional envs
export BATCH_SIZE=<Set batch_size else it will run with default batch>

# Run quickstart script:
./quickstart/image_recognition/tensorflow/resnet50v1_5/training/gpu/bfloat16_training_hvd.sh

# Set 'Tile' env variable only for running "bfloat16_training_full.sh" script: 
export Tile=2
./quickstart/image_recognition/tensorflow/resnet50v1_5/training/gpu/bfloat16_training_full.sh

License

LICENSE