[spyre] Update the ulimit for the sentient group#37
[spyre] Update the ulimit for the sentient group#37sahithiRavindranath wants to merge 3 commits intolinux-ras:spyrefrom
Conversation
5e6aded to
3a6673c
Compare
There was a problem hiding this comment.
Can you define a function push the above logic there? Add some documentation for that function.
There was a problem hiding this comment.
Please add an example that matches with above patter.
3a6673c to
4401e6a
Compare
There was a problem hiding this comment.
How about is_mem_limit_config_valid?
There was a problem hiding this comment.
This is a of helper function so add some document of top of this function:
"""
Short one-line summary of what the function does.
More detailed explanation if needed. Describe the purpose,
behavior, and any important implementation notes.
Args:
arg1 (type): Description of arg1.
arg2 (type): Description of arg2.
optional_arg (type, optional): Description of optional_arg.
Defaults to None.
Returns:
return_type: Description of what is returned.
"""
0ed8667 to
cb4f677
Compare
There was a problem hiding this comment.
pic card? Did you mean PCI cards?
cb4f677 to
6efcb7e
Compare
There was a problem hiding this comment.
How about is_mem_limit_config valid?
Also function arguments are not aligned.
Don't update the argument list in this function. Just validate the config and return True or False.
Let the caller take the action.
There was a problem hiding this comment.
updated the function
There was a problem hiding this comment.
This helper function is implicit, so remove it.
6efcb7e to
2605725
Compare
If the currently set ulimit value is higher than the configured limit, it does not need to be reduced, as higher values are harmless. The validation parses the configuration and checks the memlimit value. If the currently set memlimit is greater than or equal to the configured value, the check passes; otherwise, it fails. Signed-off-by: Sahithi Ravindranath <Sahithi.Ravindranath@ibm.com>
If the repair function updates the existing memlock configuration, any redundant or outdated configuration entries must be removed. Only the updated configuration should remain, rather than appending a new entry alongside the old ones. The memlimit fix now parses the existing configuration, identifies matching entries, removes old or duplicate configurations, and adds only the updated configuration. Signed-off-by: Sahithi Ravindranath <Sahithi.Ravindranath@ibm.com>
Multi-card Spyre containers can eventually exhaust locked memory, causing the containers to exit unexpectedly. Update the memlock configuration to scale based on the number of Spyre cards present in the system. This sets the memlock limit to the maximum value containers may consume, preventing exits due to locked memory exhaustion. The memlimit value was determined experimentally as: memlimit = (number_of_spyre_cards) * (128GB + 16MB) Signed-off-by: Sahithi Ravindranath <Sahithi.Ravindranath@ibm.com>
2605725 to
3093f02
Compare
| continue | ||
|
|
||
| return True | ||
|
|
There was a problem hiding this comment.
Just a minor update needed in the commit message.
Generally, currently set value is the one configured in the system. But I think in this context configured mean what is expected.
So lets replace:
configured limit with expected limit.
and
configured value with expected limit
| bool: True if current memlimit config is valid, False otherwise. | ||
| """ | ||
| def is_mem_limit_config(self, line, conf): | ||
| """Is mem limit config""" |
There was a problem hiding this comment.
This commit message is nothing but a function name without _. Lets remove if we don't have anything meaningful.
| for conf in vfio_mem_conf[:]: | ||
| if self.is_mem_limit_config(line, conf): | ||
| conf_check.add_attribute(line, True, None, None) | ||
| vfio_mem_conf.remove(conf) |
There was a problem hiding this comment.
Consider /etc/security/limits.d/memlock.conf has:
memlock unlimited
@sentient 1234
I am expecting the ulimit will be set to 1234 for sentient group members.
And in this scenario we are still passing the memlock check. Isn't it?
There was a problem hiding this comment.
My suggestion is:
Call is_mem_limit_config for all configs in the list (one by one) and pass the file path. Let the is_mem_limit_config() open the file and find the given config is valid or not.
|
|
||
| # Example strings matching the pattern: | ||
| # "sentient 1234", "memlock unlimited", "memlock 7890", | ||
| # "sentient memlock unlimited" |
There was a problem hiding this comment.
Why sentient 1234 and sentient memlock unlimited is matching with the pattern?
Isn't our target is sentient group? And group is denoted as @<group_name>
| conf_value = conf_match.group(2) | ||
| if line_str == conf_str: | ||
| if (line_value == "unlimited" | ||
| or (int(line_value) >= int(conf_value))): |
There was a problem hiding this comment.
I know that we used patter to populate line_value and conf_value. But I suggest to keep int() in try catch block.
| append_to_file(user_mem_conf_check.get_file_path(), | ||
| "\n"+config) | ||
| config_match = re.sub(r'\s(\d+|unlimited)$', '', config) | ||
| try: |
| """Returns True if plugin is applicable otherwise False""" | ||
|
|
||
| return Spyre.is_spyre_card_exists() | ||
| return Spyre.get_number_of_spyre_cards()>0 |
There was a problem hiding this comment.
Missing white spaces around > operator.
Please use pylint to find such code style issues.
For multi-card spyre containers eventually runs out of locked memory and caused the conatiners to exit.
Updating the memlock config to the max value containers might consume based on the number of cards in the system