Ft_v1 QAIC-profiler hotfix#994
Conversation
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
d887cfd to
aa24bb9
Compare
quic-akuruvil
left a comment
There was a problem hiding this comment.
MAke sure to run all local unit tests. ALso the existing distributed techniques on smaller samples.
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
…6 flags Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
…ling Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
|
|
||
| monkeypatch.setattr(callbacks_module, "init_qaic_profiling", _mock_start) | ||
|
|
||
| callback = QAICProfilerCallback(start_step=3, end_step=9, trace_dir="/tmp/hw-trace", device_ids=[2]) |
There was a problem hiding this comment.
Why is /tmp dir used here. It is best to avoid /tmp dir from the root.
There was a problem hiding this comment.
Using the project root now and added cleanup code too
| lambda use_profiler, device_type, trace_dir=None: calls.append((use_profiler, device_type, trace_dir)), | ||
| ) | ||
|
|
||
| callback = QAICProfilerCallback(start_step=0, trace_dir="/tmp/hw-trace") |
There was a problem hiding this comment.
Same as above. the tmp folder from root dir fills fast. So move to some home folder, even for temporary results. I think by default these logs will be dumped into in qaic-dumps directory (in current Qeff folder). Can we not maintain same path?
There was a problem hiding this comment.
Using the project root now and added cleanup code too
| - Use `qaic_op_by_op_verifier_callback` with `training.fp16: false` and `model.torch_dtype: fp32`, for only `1-3` steps. | ||
|
|
||
| **References to some commonly used Hugging Face callbacks**: | ||
| https://huggingface.co/docs/transformers/en/main_classes/callback |
There was a problem hiding this comment.
Update the docs for model dtype arg also.
There was a problem hiding this comment.
added the documentation for torch_dtype, fp16 and bf16 params
|
Summarize the changes in the description of pR |
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Moves torch_dtype from training config to model config.
Adds explicit training.fp16 and training.bf16 flags with validation.
Hardens QAIC profiler and op-by-op verifier callback handling.
Updates configs, docs, and tests to match the new precision schema.