feat: allow kit init to work with remote repositories#1074
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 833da5bd08
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
amisevsk
left a comment
There was a problem hiding this comment.
Sorry for the delay reviewing this PR.
@arnab2001 could you rebase this PR onto main (or merge changes from main to your PR branch)? I think the changes here are related to / depend on #1007 (especially regarding dataset handling), which is now merged)
Once that is done, I'll review the PR more deeply.
Signed-off-by: Arnab Chatterjee <arnabchat2001@gmail.com>
1f054c5 to
1488780
Compare
|
@amisevsk , I have done the rebase as you suggested , looking forward to your review. |
On Windows, os.Stat on URL-like strings (e.g. https://...) returns syscall.ERROR_INVALID_NAME which is not fs.ErrNotExist, causing HuggingFace URLs to be misclassified as local paths. Only treat fs.ErrPermission specifically as "path exists but unreadable", letting all other non-ErrNotExist errors (including Windows invalid name errors for URLs) fall through to HuggingFace URL parsing. Signed-off-by: Arnab Chatterjee <arnabchat2001@gmail.com>
There was a problem hiding this comment.
This approach (entirely separate flows for remote vs local) leads to quite a bit of code duplication for what is potentially a small difference between the two. Can we unify the flows to differ only where necessary? I would prefer simplicity, where we can add complexity if it is required in the future.
For example, we could
- Improve options handling so that options are more consistently interpreted
- Add flag
--remoteto specify we want a remote init explicitly. This avoids the need for (potentially fragile) checks to infer whether the user means local or remote- With the remote flag, we can verify that remote-only options are not set if
--remoteis not specified
- With the remote flag, we can verify that remote-only options are not set if
- Change how opts.outputPath is defaulted:
- If specified, output Kitfile to path provided in both local and remote case
- If unspecified, default to
filepath.Join(opts.path, constants.DefaultKitfileName)for local-for remote. Interpret outputPath-as "output to stdout", enabling it to be used in local cases as well
- If remote flag is specified, do not prompt for input, since we get name/author from the remote repo. For local, prompt as normal
- Add flag
- Unify the complete flow for the two options as described above (drop
completeLocal,completeRemote) - Break init into two steps:
- Get
kfgen.DirectoryListingaccording to local or remote --hf.ListFilesfor remote,kfgen.DirectoryListingFromFSfor local - Init given a
kfgen.DirectoryListing, regardless of where it came from
- Get
This could look something like (omitting changes to complete)
func runCommand(opts *initOptions) func(*cobra.Command, []string) error {
return func(cmd *cobra.Command, args []string) error {
if err := opts.complete(cmd.Context(), args); err != nil {
return output.Fatalf("Invalid arguments: %s", err)
}
var dirContents *kfgen.DirectoryListing
var listErr error
if opts.remote {
dirContents, listErr = hf.ListFiles(cmd.Context(), opts.repo, opts.repoRef, opts.token, opts.repoType)
if listErr != nil {
return output.Fatalf("Error fetching remote repository: %s", listErr)
}
} else {
dirContents, listErr = kfgen.DirectoryListingFromFS(opts.path)
if listErr != nil {
return output.Fatalf("Error processing directory: %s", listErr)
}
}
return runInit(dirContents, opts)
}
}
func runInit(dirContents *kfgen.DirectoryListing, opts *initOptions) error {
modelPackage := buildPackageFromRepo(opts.repo, opts.modelkitName, opts.modelkitDescription, opts.modelkitAuthor)
kitfile, err := kfgen.GenerateKitfile(dirContents, modelPackage)
if err != nil {
return output.Fatalf("Error generating Kitfile: %s", err)
}
bytes, err := kitfile.MarshalToYAML()
if err != nil {
return output.Fatalf("Error formatting Kitfile: %s", err)
}
if opts.outputPath == "-" {
// print to stdout
fmt.Print(string(bytes))
return nil
}
return writeKitfile(bytes, opts)
}
func writeKitfile(bytes []byte, opts *initOptions) error {
if _, err := os.Stat(opts.outputPath); err == nil {
if !opts.overwrite {
return output.Fatalf("Kitfile already exists at %s. Use '--force' to overwrite", opts.outputPath)
}
} else if !errors.Is(err, fs.ErrNotExist) {
return output.Fatalf("Error checking for existing Kitfile: %s", err)
}
if _, err := os.Stat(opts.outputPath); err == nil {
if !opts.overwrite {
return output.Fatalf("Kitfile already exists at %s. Use '--force' to overwrite", opts.outputPath)
}
} else if !errors.Is(err, fs.ErrNotExist) {
return output.Fatalf("Error checking for existing Kitfile: %s", err)
}
if err := os.WriteFile(opts.outputPath, bytes, 0644); err != nil {
return output.Fatalf("Failed to write Kitfile: %s", err)
}
output.Infof("Generated Kitfile:\n\n%s", string(bytes))
output.Infof("Saved to path '%s'", opts.outputPath)
return nil
}Unify local and remote init flows into a single runCommand/runInit pipeline. Replace fragile filesystem-based auto-detection of remote repos with an explicit --remote flag. Remove isLocalPath, detectRemoteRepo, tilde expansion, and separate local/remote complete helpers. Validate --ref and --token require --remote. Default output is stdout for remote, Kitfile-in-directory for local. Signed-off-by: Arnab Chatterjee <arnabchat2001@gmail.com>
|
@amisevsk Resolved the comments |
amisevsk
left a comment
There was a problem hiding this comment.
LGTM. Thanks @arnab2001
Description
This PR updates the
kit initcommand to support generating Kitfiles directly from remote repositories (specifically HuggingFace models and datasets) without needing to clone or import them first. This functionality allows users to inspect and generate a Kitfile for a remote resource, which can then be edited before running an import.Key changes:
kit initto detect if the input path is a remote HuggingFace repository (URL ororg/repoformat).hf.ParseHuggingFaceRepoto correctly identify and parse references for both models and datasets.hf.ListFilesto support listing files for both models and datasets using the correct API endpoints via a new RepositoryType enum.kit importto utilize the improved hf library components.Linked issues
closes #1055
AI-Assisted Code