-
Notifications
You must be signed in to change notification settings - Fork 63
Replace regex with semver crate #1108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,12 +12,37 @@ use crate::registry_repo::RegistryRepo; | |
| use crate::semconv::{SemConvSpecV1WithProvenance, SemConvSpecWithProvenance}; | ||
| use crate::stats::Stats; | ||
| use crate::Error; | ||
| use regex::Regex; | ||
| use std::collections::HashMap; | ||
| use std::path::Path; | ||
| use std::sync::LazyLock; | ||
| use weaver_common::result::WResult; | ||
|
|
||
| /// Extracts a semantic version from a path string. | ||
| /// | ||
| /// This function searches through path segments (separated by '/') looking for | ||
| /// a segment that starts with 'v' followed by a valid semantic version. | ||
| /// | ||
| /// # Arguments | ||
| /// | ||
| /// * `path` - A path string that may contain a version segment (e.g., "/path/v1.26.0/file") | ||
| /// | ||
| /// # Returns | ||
| /// | ||
| /// * `Some(String)` - The version string including the 'v' prefix (e.g., "v1.26.0") | ||
| /// * `None` - If no valid semantic version is found in the path | ||
| fn extract_semconv_version(path: &str) -> Option<String> { | ||
| for segment in path.split('/') { | ||
| if segment.starts_with('v') && segment.chars().nth(1).is_some_and(|c| c.is_ascii_digit()) { | ||
| // Try to parse it with semver (strip the 'v' first) | ||
| if let Some(version_part) = segment.strip_prefix('v') { | ||
| if semver::Version::parse(version_part).is_ok() { | ||
| return Some(segment.to_owned()); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| None | ||
| } | ||
|
|
||
| /// A semantic convention registry is a collection of semantic convention | ||
| /// specifications indexed by group id. | ||
| #[derive(Default, Debug)] | ||
|
|
@@ -125,10 +150,6 @@ impl SemConvRegistry { | |
| registry_repo: &RegistryRepo, | ||
| semconv_specs: Vec<SemConvSpecWithProvenance>, | ||
| ) -> Result<SemConvRegistry, Error> { | ||
| // ToDo We should use: https://docs.rs/semver/latest/semver/ and URL parser that can give us the last element of the path to send to the parser. | ||
| static VERSION_REGEX: LazyLock<Regex> = | ||
| LazyLock::new(|| Regex::new(r".*(v\d+\.\d+\.\d+).*").expect("Invalid regex")); | ||
|
|
||
| // Load all the semantic convention registry. | ||
| let mut registry = SemConvRegistry::new(registry_repo.id().as_ref()); | ||
|
|
||
|
|
@@ -137,16 +158,11 @@ impl SemConvRegistry { | |
| } | ||
|
|
||
| if registry_repo.manifest().is_none() { | ||
| let mut semconv_version = "unversioned".to_owned(); | ||
|
|
||
| // No registry manifest found. | ||
| // Try to infer the manifest from the registry path by detecting the | ||
| // presence of the following pattern in the registry path: v\d+\.\d+\.\d+. | ||
| if let Some(captures) = VERSION_REGEX.captures(registry_repo.registry_path_repr()) { | ||
| if let Some(captured_text) = captures.get(1) { | ||
| semconv_version = captured_text.as_str().to_owned(); | ||
| } | ||
| } | ||
| // Try to infer the manifest from the registry path by detecting a | ||
| // valid semantic version segment (e.g., v1.26.0) in the path. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've changed the comment here since semver accepts more than just |
||
| let semconv_version = extract_semconv_version(registry_repo.registry_path_repr()) | ||
| .unwrap_or_else(|| "unversioned".to_owned()); | ||
|
|
||
| registry.set_manifest(RegistryManifest { | ||
| name: registry_repo.id().as_ref().to_owned(), | ||
|
|
@@ -447,4 +463,59 @@ mod tests { | |
| .collect::<Vec<_>>(); | ||
| assert_eq!(groups.len(), 3); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_extract_semconv_version() { | ||
| use super::extract_semconv_version; | ||
|
|
||
| // Test with GitHub release URL pattern | ||
| assert_eq!( | ||
| extract_semconv_version( | ||
| "https://github.com/open-telemetry/semantic-conventions/releases/download/v1.26.0/semantic-conventions.zip" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SemConv archive URLs have the following format. We should test with them. https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.38.0.zip
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have a list of file format suffixes that Weaver allows? Semver won't be able to parse those suffixes, so I believe I'll have to manually remove this suffixes before parsing the version |
||
| ), | ||
| Some("v1.26.0".to_owned()) | ||
| ); | ||
|
|
||
| // Test with local path containing version | ||
| assert_eq!( | ||
| extract_semconv_version("/some/path/v1.2.3/registry"), | ||
| Some("v1.2.3".to_owned()) | ||
| ); | ||
|
|
||
| // Test with version at the end of path | ||
| assert_eq!( | ||
| extract_semconv_version("/path/to/semantic-conventions-v2.0.0"), | ||
| None // "v2.0.0" is not a standalone segment | ||
| ); | ||
|
|
||
| // Test with no version in path | ||
| assert_eq!( | ||
| extract_semconv_version("/some/path/without/version"), | ||
| None | ||
| ); | ||
|
|
||
| // Test with invalid semver (missing patch version) | ||
| assert_eq!(extract_semconv_version("/path/v1.2/registry"), None); | ||
|
|
||
| // Test with pre-release version | ||
| assert_eq!( | ||
| extract_semconv_version("/path/v1.0.0-alpha.1/registry"), | ||
| Some("v1.0.0-alpha.1".to_owned()) | ||
| ); | ||
|
|
||
| // Test with build metadata | ||
| assert_eq!( | ||
| extract_semconv_version("/path/v1.0.0+build.123/registry"), | ||
| Some("v1.0.0+build.123".to_owned()) | ||
| ); | ||
|
|
||
| // Test empty path | ||
| assert_eq!(extract_semconv_version(""), None); | ||
|
|
||
| // Test path with only slashes | ||
| assert_eq!(extract_semconv_version("///"), None); | ||
|
|
||
| // Test version-like but not starting with 'v' | ||
| assert_eq!(extract_semconv_version("/path/1.2.3/registry"), None); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern here (and also one with the previous regex) is that the
vis not required, see: https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/schemas#schema-urlThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, gotcha. Reading the link you shared here, it also says that version is always the last part of the URL, so if I'm understanding correctly, I don't need to try to parse every segment and check if it's a valid semver... just the last segment is enough, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the documentation on
schema_urlis relevant here, because in this context Weaver does not derive a version from aschema_url, but from a semantic conventions registry (local, Git, or archive). For recent registries, Weaver retrieves the version directly from the manifest file, but for older registries this manifest does not exist, and in that case the version is derived from the archive path, for example. See the code of thefrom_semconv_specs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we need to be flexible. What I'm saying is that eventually I'd like for SchemaURL to work. Right now it's forcing the
vas required, I think we should allow it as optional.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit lost here if in this PR we want to solve
vbeing optional or not 😅There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it wasn't optional before, you can leave it as is for now, I'll clean that up in a follow up.
For context - I'm working on new resolution specification where we can resolve "resolved schema" directly from a Schema URL - so I'll need to either merge in your PR and fix in my branch or have you fix it so my local fix doesn't get blown away.
I'd rather merge yours in and fix later if you're nervous about the amount of rust.