This page outlines the workflow for contributing to the ChemNLP project where changes to the Git submodules are required. The project currently has two submodules:
where both of these are forks from EleutherAI.
Submodules allow us to keep seperate Git repositories as subdirectories inside ChemNLP. As these submodules are forks we can both make any changes we require to them (and pin a specific commit) as well as periodically integrate changes from the original upstream (EleutherAI) repository.
You can think of both the gpt-neox and lm-eval2 submodules as separate Git repositories with their own remotes, commit history and branches etc...
In essence, all the ChemNLP project does is to track which commit we are using for each submodule (to see this run git submodule status from chemnlp).
There are many excellent introductions to submodules online and we won't repeat them here. Instead we'll outline the process for working with them on the ChemNLP project and we encourage you to read more about them if of interest. Here are some links you might find useful:
- 7.11 Git Tools - Submodules - section from Pro Git.
- Git submodule docs - the documentation.
The instructions below attempt to guide you through the process of working with submodules. However, if you are still confused please reach out on GitHub or Discord to a project maintainer.
Example of making a change to the gpt-neox submodule for a feature called add-peft-method.
- Fork the ChemNLP repository from your personal GitHub account.
- Clone your fork and the submodules, see: Cloning submodules.
- [Optional, if required for the issue] Install
chemnlpin your virtual env usingpip install -e(see installation instructions here). - Make a new branch e.g.
feat(sub):add-peft-methodin thegpt-neoxsubmodule, not inchemnlp. - Make changes to the
gpt-neoxsubmodule per the issue you are working on. - Commit changes in the
gpt-neoxsubmodule. - Push the submodule changes to remote and open a PR in gpt-neox.
- Once the changes to the submodule are approved, merge them (or a reviewer will).
The above only updates the gpt-neox submodule on remote - it does not change which commit chemnlp is tracking. To do this:
- On your fork of
chemnlp, update to get the latest changes for thegpt-neoxsubmodule only:git submodule update --remote gpt-neox - This will checkout the latest commit on the
mainbranch ofgpt-neox.- Note: if you want to track a different commit of
gpt-neoxother than the latest then navigate to thegpt-neoxdirectory and checkout a specific commit (e.g. your recent merge commit from thegpt-neoxpull request above):git checkout <commit-hash>
- Note: if you want to track a different commit of
- In
chemnlpmake a new branch e.g.feat:update-gpt-neox-submodule - Commit this change, push to your fork's remote and open a PR from your fork to the ChemNLP repository which will update the commit the
chemnlpproject tracks.
Things to note:
- The remote of
chemnlpshould be your fork. - The remote of
gpt-neoxshould be the OpenBioML fork.
To see the remotes for a Git repository run: git remote -v
If you need to make changes to the main chemnlp project at the same time as a submodule the above workflow can be modified to accomodate this. It's advisable to make changes to the submodule first then once these are merged, submit a PR to the ChemNLP repository which (i) adds changes to chemnlp and (ii) updates the gpt-neox commit which chemnlp tracks.
Usually, when working with Git, you have a certain branch checked out. However, Git also allows you to check out any arbitrary commit. Working in such a non-branch scenario is called having a "detached HEAD".
With submodules: using the update command (e.g. git submodule update) on a submodule checks out a specific commit - not a branch. This means that the submodule repository will be in a "detached HEAD" state.
🚨 Don't commit on a detached HEAD 🚨
When you work in the submodule directly you should create or checkout a branch before committing your work.
See also: why did Git detach my HEAD?
Any checkout of a commit that is not the name of one of your branches will get you a detached HEAD. A SHA1 which represents the tip of a branch still gives a detached HEAD. Only a checkout of a local branch name avoids that mode.