Skip to content

Add QEMU Disk Image Extractor#1691

Closed
0xXA wants to merge 12 commits intogoogle:mainfrom
0xXA:qemu-plugin
Closed

Add QEMU Disk Image Extractor#1691
0xXA wants to merge 12 commits intogoogle:mainfrom
0xXA:qemu-plugin

Conversation

@0xXA
Copy link
Copy Markdown
Contributor

@0xXA 0xXA commented Jan 20, 2026

Closes: #1213

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Jan 20, 2026

Testbed: google/security-testbeds#187

Comment thread extractor/filesystem/embeddedfs/qcow2/qcow2.go Outdated
Comment thread extractor/filesystem/embeddedfs/qcow2/qcow2.go Outdated
@alessandro-Doyensec
Copy link
Copy Markdown
Collaborator

Hello @0xXA

Thanks for the contribution!

I left a few comments that may need your attention.

Could you also consider splitting the qcow2.go file into multiple ones? For example, the structs related to IV generation could be moved to a separate file. This isn’t a strict requirement, so I’m open to your suggestions on the best approach.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 5, 2026

Could you also consider splitting the qcow2.go file into multiple ones? For example, the structs related to IV generation could be moved to a separate file. This isn’t a strict requirement, so I’m open to your suggestions on the best approach.

Thanks for the comments gentleman.

I don't think it's a wise idea to split qcow2.go into multiple files because it'll get hard to manage. In addition, if any extractor in the future requires IV generation, I will move this logic to common.go so other extractors can use it. For now it's good as it is.

@alessandro-Doyensec
Copy link
Copy Markdown
Collaborator

Hi @0xXA

I don't think it's a wise idea to split qcow2.go into multiple files because it'll get hard to manage. In addition, if any extractor in the future requires IV generation, I will move this logic to common.go so other extractors can use it. For now it's good as it is.

I'm having troubles understanding the code flow logic right now. Please, split this into separate files.

I noticed that some components implement complicated patterns such as encryption and raw binary data manipulation. To ensure correctness please also test those components separately.

Note: Please, in future let me mark conversation as resolved in order for me to understand what is resolved or not

@erikvarga
Copy link
Copy Markdown
Collaborator

if there are specific larger categories of helper functions those would be useful to move to separate files (ideally with separate tests for their major functions)
e.g. the Extractor interface functions (that mostly just call the other helper funcs) could be stay in qcov.go, IV related functions could go into an iv.go (and thus easier to move into common later). We can also have a common.go for anything else that doesn't fit elsewhere. All of this can use the same package name so we'd only be moving code.

@erikvarga
Copy link
Copy Markdown
Collaborator

As a separate question, are there any helper libraries available for any of the qcow2 parsing logic being implemented here? Since we already need to import a couple new helper libs like github.com/emmansun/gmsm/sm4 it should be fine to introduce a few more if that helps reduce the SCALIBR code size.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 5, 2026

Some encryption code related to grains and tables parsing can't be tested individually because we can't fully emulate or recreate format behavior at the spot. It'll complicate the plugin because all we want is to just read and understand the format for our job.

I'll move rest of the code to separate files.

As a separate question, are there any helper libraries available for any of the qcow2 parsing logic being implemented here? Since we already need to import a couple new helper libs like github.com/emmansun/gmsm/sm4 it should be fine to introduce a few more if that helps reduce the SCALIBR code size.

Good question. There are no known working qcow2 parsers in Golang. It took me 2 months to create this parser: Reverse Engineering the format by hand from disk, slicing files sector by sector, testing those sectors individually through manually encrypting / decrypting using masterkey materials, manually recreating the masterkey materials to match the behavior of qemu-img, recreating table and grain parsing, verifying structure defs using official documentation and so on. In short, if there were any parsing libraries available, it'll only take around 3 minutes to build this parser.

But that's what makes it interesting. Just like a Patek Philippe, it's a masterpiece.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 5, 2026

if there are specific larger categories of helper functions those would be useful to move to separate files (ideally with separate tests for their major functions)
e.g. the Extractor interface functions (that mostly just call the other helper funcs) could be stay in qcov.go, IV related functions could go into an iv.go (and thus easier to move into common later). We can also have a common.go for anything else that doesn't fit elsewhere. All of this can use the same package name so we'd only be moving code.

I will separate IV related logic into iv.go, crypto related logic in crypto.go, and compression related logic in compression.go, format related stuff in format.go. Everything else will be in qcow2.go.

Does this align with what you intend to do Erik !?

0xXA added 2 commits February 6, 2026 12:35
Closes: google#1213
Signed-off-by: Yuvraj Saxena <ysaxenax@gmail.com>
Signed-off-by: Yuvraj Saxena <ysaxenax@gmail.com>
@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 6, 2026

@erikvarga Done!

Comment thread extractor/filesystem/embeddedfs/qcow2/qcow2.go Outdated
@alessandro-Doyensec
Copy link
Copy Markdown
Collaborator

Hello @0xXA

Thanks for the changes.

I noticed that the initESSIV function and the essivCipher field are never used inside tests. Could you please add tests also for that? A simple test like this would suffice:

  • Cipher some data using "aes"
  • Call initESSIV
  • Call config.Decrypt
  • Verify that gotData is equal to the one generated by aes

Alternatively, adding another .qcow2 which uses ivGenESSIV in the full TestExtractValidQCOW2 would suffice, but I assume it would be complicated.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 6, 2026

Hello @0xXA

Thanks for the changes.

I noticed that the initESSIV function and the essivCipher field are never used inside tests. Could you please add tests also for that? A simple test like this would suffice:

* Cipher some data using "aes"

* Call initESSIV

* Call config.Decrypt

* Verify that gotData is equal to the one generated by aes

Alternatively, adding another .qcow2 which uses ivGenESSIV in the full TestExtractValidQCOW2 would suffice, but I assume it would be complicated.

Thanks for reviewing @alessandro-Doyensec really appreciate your effort man.

I have added the mentioned test cases.

Signed-off-by: Yuvraj Saxena <ysaxenax@gmail.com>
@alessandro-Doyensec
Copy link
Copy Markdown
Collaborator

Hello @0xXA,

To clarify my previous #1691 (comment): GetDiskPartitions returns nil, nil, err when it encounters an error. This would result in a nil pointer dereference in your current implementation. Ref:

I noticed that this pattern/bug is present in other filesystem/embeddedfs packages like vdi and vmdk.

I suggest to update GetDiskPartitions to directly close the disk before returning if an error occurs. Thus removing the need to close it from the outside if any error happens.

⚠️ Note: Please, in future let me mark conversation as resolved in order for me to understand what is resolved or not.

@erikvarga
Copy link
Copy Markdown
Collaborator

I will separate IV related logic into iv.go, crypto related logic in crypto.go, and compression related logic in compression.go, format related stuff in format.go. Everything else will be in qcow2.go.

Does this align with what you intend to do Erik !?

Yes, this should work for me. wrt testing, as long as we trigger the code paths added to the businesslogic with either individual tests or qcow2 testdata files with the right format I think that should be good enough for the test coverage.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 6, 2026

I suggest to update GetDiskPartitions to directly close the disk before returning if an error occurs. Thus removing the need to close it from the outside if any error happens.

I think it's best to remove the call to disk.Close() in that case. Also, I won't touch code in other extractors unless you consider that as extra work or you can do it yourself.

Yes, this should work for me. wrt testing, as long as we trigger the code paths added to the businesslogic with either individual tests or qcow2 testdata files with the right format I think that should be good enough for the test coverage.

Sure thing.

Signed-off-by: Yuvraj Saxena <ysaxenax@gmail.com>
@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 6, 2026

@erikvarga can you rerun the actions !? Just addressed a comment made by @alessandro-Doyensec

@erikvarga
Copy link
Copy Markdown
Collaborator

I think it's best to remove the call to disk.Close() in that case

IIUIC this is currently a problem in other extractors as well then? That they call Close() on disk which is nil if GetDiskPartitions returns an error?

If that's the case we can do a separate cleanup pass (e.g. moving the close into GetDiskPartitions) after this PR is merged.

@alessandro-Doyensec
Copy link
Copy Markdown
Collaborator

alessandro-Doyensec commented Feb 6, 2026

IIUIC this is currently a problem in other extractors as well then? That they call Close() on disk which is nil if GetDiskPartitions returns an error?

👍

Basically every disk.Close call should be moved into the GetDiskPartitions function.

@alessandro-Doyensec alessandro-Doyensec added the lgtm The PR has been reviewed and approved ("Looks Good To Me") by vendors helping with code reviews. label Feb 6, 2026
@erikvarga
Copy link
Copy Markdown
Collaborator

Good question. There are no known working qcow2 parsers in Golang

Looking around I see a couple qcow2 implementations such as

https://github.com/zchee/go-qcow2
https://github.com/dypflying/go-qcow2lib
https://github.com/dpeckett/qcow2

Do these generally not parse the disk file in a format compatible with the fs interface we're using?

Just checking that we're doing our due diligence before adding a large amount of new code to SCALIBR.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 6, 2026

Just checking that we're doing our due diligence before adding a large amount of new code to SCALIBR.

I highly doubt they’re going to work. Besides none of the above mentioned links support qcow2 v3 header. Moreover, some of them are experimental and don’t support the features encountered in everyday qcow2 file.

For example,
dpeckett/qcow2: It’s mentioned in the readme that it’s experimental and doesn’t have support for encryption and compression:

The library is not yet complete. It can read and write most QCOW2 images, but some features are not supported:

Compression (expect for reading DEFLATE)
Encryption
Backing files
External data
You shouldn't use this library in any application that requires data integrity. It has not been tested thoroughly and definitely will result in data loss.

The best way to verify this is by comparing the qcow2 header against what I’ve implemented.

What I meant by that statement is that there are no known working parsers that fully adhere to the current qcow2 specification. Experimental or partially implemented features don’t count.

The whole point of this plugin is to handle regular files as well as edge cases (including encryption). If it doesn’t even support basic features like compression (which almost all real-world qcow2 files use)then there’s no point in adding this plugin. It would only take up more space.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 9, 2026

@erikvarga I have addressed all the comments.

Comment thread extractor/filesystem/embeddedfs/qcow2/format.go
@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 9, 2026

Whats sup with this internal server error !?

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 10, 2026

I have added the required comment, could you rerun the actions @erikvarga !?

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 11, 2026

Hi @erikvarga

As previously agreed, all Salesforce extractors were to be considered as 2 validations and 2 detections, and the n-tuples plugin + pair plugin rework was to be treated as additional work.

However, the issued reward does not seem to align with the overall amount of work completed.

Could you please discuss this with the panel and consider revising the reward accordingly?

Since this PR is merged, can you please assign me something else to work on !?

@erikvarga
Copy link
Copy Markdown
Collaborator

I brought up the payout amount for this submission with the rest of the panel and the consensus was that different secret types for a single platform should be grouped as one PRP and that reward amounts should be capped at the maximum amount for the category. For transparency, this was the only secret scanner contribution that received the maximum payout so far since the PRP has resumed this year - other contributions involving less work have received lower amounts.
We'll clarify the above panel decision in the PRP rules page with an update shortly.

@erikvarga
Copy link
Copy Markdown
Collaborator

Since this PR is merged, can you please assign me something else to work on !?

We haven't finished evaluating all the new PRP submissions yet so we'll likely need until tomorrow to decide on new issue assignments.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 12, 2026

I brought up the payout amount for this submission with the rest of the panel and the consensus was that different secret types for a single platform should be grouped as one PRP and that reward amounts should be capped at the maximum amount for the category. For transparency, this was the only secret scanner contribution that received the maximum payout so far since the PRP has resumed this year - other contributions involving less work have received lower amounts. We'll clarify the above panel decision in the PRP rules page with an update shortly.

Appreciate you looking into it brother :)

From my understanding,

Payout for,

1 detection + 1 validation (complex) = n-detection + n-validation

Is that true ?

Capping the payout at max for different secret types belonging to same product makes sense but Salesforce is not the only group of secret detectors that will benefit from n-tuples plugin. I thought you’ll increase the reward beyond max payout if it benefits several other secret detectors. Just like you did with VMDK and embedded FS extractors or is this thing only applies to secret detectors !?

We haven't finished evaluating all the new PRP submissions yet so we'll likely need until tomorrow to decide on new issue assignments.

Sure thing.

@erikvarga
Copy link
Copy Markdown
Collaborator

From my understanding,
Payout for,
1 detection + 1 validation (complex) = n-detection + n-validation
Is that true ?

More specifically, detection for the N most relevant secrets of a single platform is accepted as one submission, and the payout for detection+validation of the secrets is capped to the amounts described. See the updated "Secret Detectors" section of https://bughunters.google.com/about/rules/open-source/osv-scalibr-patch-rewards-program-rules#reward-amounts

@erikvarga
Copy link
Copy Markdown
Collaborator

Hi @0xXA ,

Our internal crypto reviewer noticed that there are no unit tests for the usage of the XTS cipher mode and for Serpent.
IIUIC the two qcow2 testdata images do not use these encryptions, right? In that case, would it be possible to add unit tests for XTS and Serpent to crypto_test.go?

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 17, 2026

would it be possible to add unit tests for XTS and Serpent to crypto_test.go?

Done!

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 18, 2026

@erikvarga Any update on this ?

@erikvarga
Copy link
Copy Markdown
Collaborator

@0xXA I added your changes to our internal import and am now waiting for the crypto reviewer to take another look.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 18, 2026

@0xXA I added your changes to our internal import and am now waiting for the crypto reviewer to take another look.

Appreciate the update.
LOoks like there are some import order issues let me fix it quickly!

Signed-off-by: Yuvraj Saxena <ysaxenax@gmail.com>
@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 20, 2026

Any update on this, @erikvarga ?

@erikvarga
Copy link
Copy Markdown
Collaborator

Still waiting for crypto review, I'll ping the reviewer on Monday if there's no progress.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 20, 2026

Sure.

What does this mean in the context of Software Inventory Extractor payouts:

Up to XXXX for extraction capabilities that can be matched to a publicly available vulnerability feed (e.g. OSV.dev) to find a significant amount of new vulnerabilities.

Can you give an example of a plugin which satisfies this ?
Also, did someone ever receive this reward !?

@erikvarga
Copy link
Copy Markdown
Collaborator

For example, plugin should find new vulns that we wouldn't previously have found when enabled together with osvdev/vulnmatch. If the OSV matcher doesn't work it can also be fine if the results can be combined with other vuln matcher tools, though it's of course less immediately useful since special setup is needed.

I don't recall us having paid out this reward amount so far since most new package types don't have existing vuln feeds integrated into OSV.dev. One candidate for this might be #1775 depending on the number of vulns associated with MCP that we wouldn't have otherwise found.

@0xXA
Copy link
Copy Markdown
Contributor Author

0xXA commented Feb 20, 2026

Im not sure if I understood it correctly but let’s say I added X extractor, X can’t be detected through standard package manager configurations because it wasn’t stalled with any standard package manager(say dpkg, etc). Now if I add an extractor for X to osv-scalibr. What else do I need to add so I get the full Reward for this category?

copybara-service Bot pushed a commit that referenced this pull request Feb 23, 2026
PiperOrigin-RevId: 873956307
@erikvarga
Copy link
Copy Markdown
Collaborator

Merged in 0afbfa3

@erikvarga erikvarga closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm The PR has been reviewed and approved ("Looks Good To Me") by vendors helping with code reviews.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PRP: Extractor for QEMU disk images

3 participants