Skip to content

Support importing/loading data without an extension #373

@polarathene

Description

@polarathene

Docs for import: https://gedenkt.at/jaq/manual/#data-import

If a file to load has been given without extension, such as decode and binary above, then jaq adds an extension (.jq for modules or .json for data).
jq adds an extension unconditionally; that is, even if an extension has been given as part of the file name, jq adds an extension.

If the file has an extension (or rather a . delimiter), no extension is appended but the file will not be loaded if it's not an extension that jaq supports. However sometimes files have a non-standard extension/filename (like config files that are .examplerc) or no extension at all (OCI image manifests with their SHA256 digest as the filename).

It would be nice if you could load a file as per the import path given when it exists. The format can presumably be inferred from the context jaq already has from input (or default to JSON), or support a way to hint the format when it's not something that can be inferred?

EDIT: Actually I just noticed that foo.bar as the import will load foo.json which was a bit unexpected 🤔 (perhaps a bug, it can be avoided with the placeholder extension logic shown below which is what the config crate does)

yq has a load operator (and some variants for specific formats), which might be worth considering?


Additional context

I implemented / refactored similar logic for the config crate, although the maintainer there wasn't fond of the approach and rejected it in favor of their own.

  1. It first checks for an exact filename match on disk:
  • If a hint was provided it'll go with that
  • Otherwise it will try to infer the format is supported by take the file extension (if one exists)
  1. When the filename provided doesn't exist on disk:
  • If a hint was provided try the extensions for that format (eg: .yaml/.yml)
  • Otherwise try find a file on disk by appending extensions of known formats (internal_formats).

Since a filename without an extension (or a recognized one) would be ambiguous, you either attempt a fallback format like JSON (or whatever format was used as input with jaq 3.0 supporting different formats) or fail without the hint being present.

Reference snippets from the earlier PR link:

// Fallback to checking compatibility with internally supported formats (`FileFormat` enum):
let mut internal_formats = all_extensions().keys().cloned();

// Ideally there is an exact filename match with a format hint, otherwise:
// - Without a hint => Try to identify the format via the file extension
// - Without an exact filename => Check if the filename exists with any known format extensions
if filename.is_file() {
    return match format_hint {
        Some(format) => Ok((filename, Box::new(format))),
        None => {
            let identify_format = |ext: &std::ffi::OsStr| {
                let ext = ext.to_str()?;
                internal_formats.find(|f| f.file_extensions().contains(&ext))
            };
            
            let format = filename
                .extension()
                .and_then(identify_format)
                .ok_or_else(|| self.error_invalid_format())?;
             
            Ok((filename, Box::new(format)))
        }
    }
}

// Preserve any extension-like text within the provided file stem by appending a fake extension
// which will be replaced by `set_extension()` calls (e.g. `example.file.placeholder` => `example.file.json`)
let mut filename = filename;
if filename.extension().is_some() {
    filename.as_mut_os_string().push(".placeholder");
};

match format_hint {
    Some(_) => format_hint.and_then(find_file_with_format(&filename)),
    None => internal_formats.find_map(find_file_with_format(&filename)),
}
.ok_or_else(|| self.error_invalid_path())

Helper method:

fn find_file_with_format<F: FileStoredFormat + Format + 'static>(
    filename: &PathBuf,
) -> impl Fn(F) -> Option<(PathBuf, Box::<dyn Format>)> {
    |format| {
        let mut filename = filename.clone();

        let file_exists = format.file_extensions().iter().any(|ext| {
            filename.set_extension(ext);
            filename.is_file()
        })

        file_exists.then_some((filename, Box::new(format)))
    }
}

The alternative to this would be to pipe stdout to stdin of another jaq command, but I assume being able to load in data from files within a single command would be useful? (another scenario may be reading base64 or binary data in for the base64 methods in jaq? so perhaps a generic file load to pair with the from format methods)

I've also noticed that the import path value cannot be dynamic (variable binding or from a field/arg input), and that absolute paths don't seem supported either? 😓 (would be useful to have that clarified in the docs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions