Skip to content

feat: allow kit init to work with remote repositories#1074

Merged
amisevsk merged 3 commits intokitops-ml:mainfrom
arnab2001:arnab/kit-init-to-work-with-remote-repositories
Mar 11, 2026
Merged

feat: allow kit init to work with remote repositories#1074
amisevsk merged 3 commits intokitops-ml:mainfrom
arnab2001:arnab/kit-init-to-work-with-remote-repositories

Conversation

@arnab2001
Copy link
Copy Markdown
Contributor

Description

This PR updates the kit init command to support generating Kitfiles directly from remote repositories (specifically HuggingFace models and datasets) without needing to clone or import them first. This functionality allows users to inspect and generate a Kitfile for a remote resource, which can then be edited before running an import.

Key changes:

  • Remote Detection: Updated kit init to detect if the input path is a remote HuggingFace repository (URL or org/repo format).
  • Repo Parsing: Implemented hf.ParseHuggingFaceRepo to correctly identify and parse references for both models and datasets.
  • API Integration: Updated hf.ListFiles to support listing files for both models and datasets using the correct API endpoints via a new RepositoryType enum.
  • Refactoring: Updated kit import to utilize the improved hf library components.
  • Testing: Added unit tests for remote detection logic, repository URL parsing, and package metadata extraction.

Linked issues

closes #1055

AI-Assisted Code

  • This PR contains AI-generated code that I have reviewed and tested
  • I take full responsibility for all code in this PR, regardless of how it was created

@arnab2001
Copy link
Copy Markdown
Contributor Author

Requesting review @gorkem @amisevsk

@arnab2001 arnab2001 changed the title feat: allow kit init to work with remote repositories (#1055) feat: allow kit init to work with remote repositories Jan 29, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 833da5bd08

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/cmd/kitimport/hfimport.go Outdated
Comment thread pkg/cmd/kitinit/cmd.go Outdated
@arnab2001
Copy link
Copy Markdown
Contributor Author

@amisevsk @bmicklea Requesting review , Thanks

Copy link
Copy Markdown
Contributor

@amisevsk amisevsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay reviewing this PR.

@arnab2001 could you rebase this PR onto main (or merge changes from main to your PR branch)? I think the changes here are related to / depend on #1007 (especially regarding dataset handling), which is now merged)

Once that is done, I'll review the PR more deeply.

Signed-off-by: Arnab Chatterjee <arnabchat2001@gmail.com>
@arnab2001 arnab2001 force-pushed the arnab/kit-init-to-work-with-remote-repositories branch from 1f054c5 to 1488780 Compare February 25, 2026 22:04
@arnab2001
Copy link
Copy Markdown
Contributor Author

@amisevsk , I have done the rebase as you suggested , looking forward to your review.

On Windows, os.Stat on URL-like strings (e.g. https://...) returns
syscall.ERROR_INVALID_NAME which is not fs.ErrNotExist, causing HuggingFace
URLs to be misclassified as local paths.

Only treat fs.ErrPermission specifically as "path exists but unreadable",
letting all other non-ErrNotExist errors (including Windows invalid name
errors for URLs) fall through to HuggingFace URL parsing.

Signed-off-by: Arnab Chatterjee <arnabchat2001@gmail.com>
Copy link
Copy Markdown
Contributor

@amisevsk amisevsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach (entirely separate flows for remote vs local) leads to quite a bit of code duplication for what is potentially a small difference between the two. Can we unify the flows to differ only where necessary? I would prefer simplicity, where we can add complexity if it is required in the future.

For example, we could

  1. Improve options handling so that options are more consistently interpreted
    • Add flag --remote to specify we want a remote init explicitly. This avoids the need for (potentially fragile) checks to infer whether the user means local or remote
      • With the remote flag, we can verify that remote-only options are not set if --remote is not specified
    • Change how opts.outputPath is defaulted:
      • If specified, output Kitfile to path provided in both local and remote case
      • If unspecified, default to
        • filepath.Join(opts.path, constants.DefaultKitfileName) for local
        • - for remote. Interpret outputPath - as "output to stdout", enabling it to be used in local cases as well
    • If remote flag is specified, do not prompt for input, since we get name/author from the remote repo. For local, prompt as normal
  2. Unify the complete flow for the two options as described above (drop completeLocal, completeRemote)
  3. Break init into two steps:
    • Get kfgen.DirectoryListing according to local or remote -- hf.ListFiles for remote, kfgen.DirectoryListingFromFS for local
    • Init given a kfgen.DirectoryListing, regardless of where it came from

This could look something like (omitting changes to complete)

func runCommand(opts *initOptions) func(*cobra.Command, []string) error {
	return func(cmd *cobra.Command, args []string) error {
		if err := opts.complete(cmd.Context(), args); err != nil {
			return output.Fatalf("Invalid arguments: %s", err)
		}

		var dirContents *kfgen.DirectoryListing
		var listErr error
		if opts.remote {
			dirContents, listErr = hf.ListFiles(cmd.Context(), opts.repo, opts.repoRef, opts.token, opts.repoType)
			if listErr != nil {
				return output.Fatalf("Error fetching remote repository: %s", listErr)
			}
		} else {
			dirContents, listErr = kfgen.DirectoryListingFromFS(opts.path)
			if listErr != nil {
				return output.Fatalf("Error processing directory: %s", listErr)
			}
		}

		return runInit(dirContents, opts)
	}
}

func runInit(dirContents *kfgen.DirectoryListing, opts *initOptions) error {
	modelPackage := buildPackageFromRepo(opts.repo, opts.modelkitName, opts.modelkitDescription, opts.modelkitAuthor)

	kitfile, err := kfgen.GenerateKitfile(dirContents, modelPackage)
	if err != nil {
		return output.Fatalf("Error generating Kitfile: %s", err)
	}
	bytes, err := kitfile.MarshalToYAML()
	if err != nil {
		return output.Fatalf("Error formatting Kitfile: %s", err)
	}

	if opts.outputPath == "-" {
		// print to stdout
		fmt.Print(string(bytes))
		return nil
	}
	return writeKitfile(bytes, opts)
}

func writeKitfile(bytes []byte, opts *initOptions) error {
	if _, err := os.Stat(opts.outputPath); err == nil {
		if !opts.overwrite {
			return output.Fatalf("Kitfile already exists at %s. Use '--force' to overwrite", opts.outputPath)
		}
	} else if !errors.Is(err, fs.ErrNotExist) {
		return output.Fatalf("Error checking for existing Kitfile: %s", err)
	}
	if _, err := os.Stat(opts.outputPath); err == nil {
		if !opts.overwrite {
			return output.Fatalf("Kitfile already exists at %s. Use '--force' to overwrite", opts.outputPath)
		}
	} else if !errors.Is(err, fs.ErrNotExist) {
		return output.Fatalf("Error checking for existing Kitfile: %s", err)
	}
	if err := os.WriteFile(opts.outputPath, bytes, 0644); err != nil {
		return output.Fatalf("Failed to write Kitfile: %s", err)
	}
	output.Infof("Generated Kitfile:\n\n%s", string(bytes))
	output.Infof("Saved to path '%s'", opts.outputPath)
	return nil
}

Comment thread pkg/cmd/kitinit/cmd.go Outdated
Comment thread pkg/cmd/kitinit/cmd.go Outdated
Comment thread pkg/cmd/kitinit/cmd.go Outdated
Unify local and remote init flows into a single runCommand/runInit
pipeline. Replace fragile filesystem-based auto-detection of remote
repos with an explicit --remote flag. Remove isLocalPath, detectRemoteRepo,
tilde expansion, and separate local/remote complete helpers. Validate
--ref and --token require --remote. Default output is stdout for remote,
Kitfile-in-directory for local.

Signed-off-by: Arnab Chatterjee <arnabchat2001@gmail.com>
@arnab2001
Copy link
Copy Markdown
Contributor Author

@amisevsk Resolved the comments

Copy link
Copy Markdown
Contributor

@amisevsk amisevsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @arnab2001

@amisevsk amisevsk merged commit 11bb2e3 into kitops-ml:main Mar 11, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow kit init to work with remote repositories

2 participants