Changes the value of config vm_memballoon_stats_period to 60#8520
Changes the value of config vm_memballoon_stats_period to 60#8520RodrigoDLopez wants to merge 2 commits intoapache:4.22from
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #8520 +/- ##
=============================================
+ Coverage 18.28% 30.80% +12.52%
- Complexity 16826 33998 +17172
=============================================
Files 4848 5341 +493
Lines 324301 375027 +50726
Branches 45561 54554 +8993
=============================================
+ Hits 59290 115525 +56235
+ Misses 256437 244211 -12226
- Partials 8574 15291 +6717
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
@andrijapanicsb @alexandremattioli any ideas on this? |
|
this value can be set manually, or by ansible/chef/puppet. |
|
many of the global settings are set to a certain value, many of them to 0 - and can be easily changed (like most of the other global settings). I prefer to keep it as it is, and allow operator to decide what to change. |
|
The value of this configuration was originally proposed with BTW: using the value 0 makes the monitoring stop working in KVM. |
we are free to change the default value in a PR, or before official release. Once the configuration is included in a official release, we should be careful to change the default value.@RodrigoDLopez |
worth to mention that I have faced an issue caused by it in the past . #8453 (comment) although @shaerul said it worked well in his testing, I think it would be better to enable/disable memory ballooning and statistics at vm level. |
|
(For what is worth it - I've been told by a RH engineer (in 2019) that memory auto-ballooning is an abandoned project - so be careful with the feature) |
|
I believe that metrics monitoring should work without operators having to worry about configurations and enabling/disabling the feature. Changing the configuration value from '60' to '0' compromised the functionality for the KVM hypervisor; which in turn, can make it seem as it does not work out of the box. An example of this is operators lacking visibility regarding the necessity of adding this configuration in the agent.properties file (see: #8453 (comment)). From my perspective, using the value '60' resolves issues similar to those faced by @shaerul in #8453. What are the main impacts you foresee with this change? Regarding the issue encountered during the migration of a Windows instance with the memory balloon enabled (#8453 (comment)), could you explain the error scenario in detail? This would allow me to attempt to reproduce locally and, if necessary, extend this PR to address the issue related to the migration of Windows VMs with the memory balloon feature enabled. |
In my opinion, the wrong memory usage is not a critical issue for operators.
As I said, I have faced an issue in the past. I am not sure if the issue was caused by windows virtio drivers or qemu/libvirt.
I do not remember the details very clearly. the kvm host is Ubuntu 18 or 20 |
GutoVeronezi
left a comment
There was a problem hiding this comment.
@andrijapanicsb @weizhouapache, what are your concerns with this change? Without it, a nice feature in ACS, that enables users to monitor users' VMs via UI does not work out of the box for KVM deployments, which can cause disappointments and misunderstandings as the ones already noticed in the mailing list and issues in Github.
can please see my comments above ? @GutoVeronezi |
@weizhouapache you said you have faced an issue in the past, but did not specify how to reproduce it or its details. On the other hand, you said that you are not sure if the issue was caused by the virtio drivers; therefore, we do not have a solid reason to not change it. |
@GutoVeronezi Anyone who wants to enable the memory stats collection and happy to take the risky, go for it. ACS has already provided the option to do it. |
@weizhouapache do you have steps to reproduce it? |
@GutoVeronezi |
But then we do not have a solid reason to not change the value, just speculations. If at least we could reproduce the error you saying that exist, we could look for the real cause and solve the problem (or at least find an explanation). |
@GutoVeronezi changing vm_memballoon_stats_period to 60, will add a new setting in the vm definition, which will probably cause some regression which is out of our control. We should disable it by default. |
But again, it is a speculation. We have solid use cases of operators having to contact the community to understand why their Windows VMs are with wrong stats; and every time, the solution is to change the Please, bring solid use cases so we can discuss it further. |
interesting |
|
@weizhouapache @GutoVeronezi , is this still worth discussing? |
|
Is workaround that @weizhouapache mention already implement on newest version ? im facing same issues on CloudStack 4.19.1.1 and fix it with the workaround provide. I have not experienced any VM frozen problems during live migration after implementing the workaround. |
@leonyonz feel free to continue @RodrigoDLopez @GutoVeronezi |
|
Dear @RodrigoDLopez, Thank you for taking the initiative to define a useful default value for the After identifying the configuration tweaks that resolved our previous issues, I documented them using Ansible to ensure parameters are consistently set as needed. For example: I’d also like to suggest a few additional defaults that could be beneficial to consider: Lastly, it might also be worth setting the log level to DEBUG by default, since detailed logs are always required for support and troubleshooting: These are just suggestions, shared to contribute to the collective knowledge of developers and practitioners. Best regards, |
|
@RodrigoDLopez Since this is for the 4.22.1 release, could you retarget the PR to the 4.22 branch? |
There was a problem hiding this comment.
Pull request overview
This pull request changes the default value of the vm_memballoon_stats_period configuration from 0 to 60 seconds to fix memory metrics reporting for Windows instances on KVM hypervisors. The issue caused CloudStack to report 100% memory utilization for Windows VMs even when actual usage was significantly lower.
Changes:
- Updated default value from 0 to 60 seconds in
AgentProperties.java(code and JavaDoc) - Updated the commented example value in
agent.propertiesconfiguration file
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| agent/src/main/java/com/cloud/agent/properties/AgentProperties.java | Changed the default value for VM_MEMBALLOON_STATS_PERIOD property from 0 to 60 and updated the JavaDoc documentation |
| agent/conf/agent.properties | Updated the commented example configuration value from 0 to 60 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
Currently, the default value for the
vm_memballoon_stats_periodconfiguration is set to0. As a result, out of the box in ACS, memory metrics for Windows instances (with virtio drivers correctly installed and configured) is compromised when using KVM hypervisor. This pull request aims to propose setting thevm_memballoon_stats_periodconfiguration value to60. This correction addresses situations such as the one reported through the issue: #8453.fixes: #8453
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
Before changes:

After changes:

How Has This Been Tested?
A Windows VM was created, and it was confirmed that the VirtIO drivers were correctly installed. However, upon inspecting the metrics collected by ACS, it was observed that they did not align with the actual consumption of the VM. Subsequently, after changing the configuration value from
0to60, the metrics collected by ACS matched the consumption values reported by the VM.