Skip to content

[spyre] Update the ulimit for the sentient group#37

Open
sahithiRavindranath wants to merge 3 commits intolinux-ras:spyrefrom
sahithiRavindranath:spyrenew
Open

[spyre] Update the ulimit for the sentient group#37
sahithiRavindranath wants to merge 3 commits intolinux-ras:spyrefrom
sahithiRavindranath:spyrenew

Conversation

@sahithiRavindranath
Copy link

For multi-card spyre containers eventually runs out of locked memory and caused the conatiners to exit.

Updating the memlock config to the max value containers might consume based on the number of cards in the system

@sahithiRavindranath sahithiRavindranath force-pushed the spyrenew branch 5 times, most recently from 5e6aded to 3a6673c Compare February 11, 2026 09:10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define a function push the above logic there? Add some documentation for that function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an example that matches with above patter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about is_mem_limit_config_valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a of helper function so add some document of top of this function:

"""
Short one-line summary of what the function does.

More detailed explanation if needed. Describe the purpose,
behavior, and any important implementation notes.

Args:
arg1 (type): Description of arg1.
arg2 (type): Description of arg2.
optional_arg (type, optional): Description of optional_arg.
Defaults to None.

Returns:
return_type: Description of what is returned.
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@sahithiRavindranath sahithiRavindranath force-pushed the spyrenew branch 3 times, most recently from 0ed8667 to cb4f677 Compare February 16, 2026 05:27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pic card? Did you mean PCI cards?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pci cards

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about is_mem_limit_config valid?

Also function arguments are not aligned.

Don't update the argument list in this function. Just validate the config and return True or False.
Let the caller take the action.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper function is implicit, so remove it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

If the currently set ulimit value is higher than the configured limit,
it does not need to be reduced, as higher values are harmless.

The validation parses the configuration and checks the memlimit value.
If the currently set memlimit is greater than or equal to the configured value,
the check passes; otherwise, it fails.

Signed-off-by: Sahithi Ravindranath <Sahithi.Ravindranath@ibm.com>
If the repair function updates the existing memlock configuration,
any redundant or outdated configuration entries must be removed.
Only the updated configuration should remain, rather than appending
a new entry alongside the old ones.

The memlimit fix now parses the existing configuration, identifies
matching entries, removes old or duplicate configurations, and
adds only the updated configuration.

Signed-off-by: Sahithi Ravindranath <Sahithi.Ravindranath@ibm.com>
Multi-card Spyre containers can eventually exhaust locked memory,
causing the containers to exit unexpectedly.

Update the memlock configuration to scale based on the number of
Spyre cards present in the system. This sets the memlock limit to
the maximum value containers may consume, preventing exits due to
locked memory exhaustion.

The memlimit value was determined experimentally as:
memlimit = (number_of_spyre_cards) * (128GB + 16MB)

Signed-off-by: Sahithi Ravindranath <Sahithi.Ravindranath@ibm.com>
continue

return True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor update needed in the commit message.

Generally, currently set value is the one configured in the system. But I think in this context configured mean what is expected.

So lets replace:
configured limit with expected limit.
and
configured value with expected limit

bool: True if current memlimit config is valid, False otherwise.
"""
def is_mem_limit_config(self, line, conf):
"""Is mem limit config"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit message is nothing but a function name without _. Lets remove if we don't have anything meaningful.

for conf in vfio_mem_conf[:]:
if self.is_mem_limit_config(line, conf):
conf_check.add_attribute(line, True, None, None)
vfio_mem_conf.remove(conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider /etc/security/limits.d/memlock.conf has:

memlock unlimited
@sentient 1234

I am expecting the ulimit will be set to 1234 for sentient group members.

And in this scenario we are still passing the memlock check. Isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is:

Call is_mem_limit_config for all configs in the list (one by one) and pass the file path. Let the is_mem_limit_config() open the file and find the given config is valid or not.


# Example strings matching the pattern:
# "sentient 1234", "memlock unlimited", "memlock 7890",
# "sentient memlock unlimited"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sentient 1234 and sentient memlock unlimited is matching with the pattern?

Isn't our target is sentient group? And group is denoted as @<group_name>

conf_value = conf_match.group(2)
if line_str == conf_str:
if (line_value == "unlimited"
or (int(line_value) >= int(conf_value))):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that we used patter to populate line_value and conf_value. But I suggest to keep int() in try catch block.

append_to_file(user_mem_conf_check.get_file_path(),
"\n"+config)
config_match = re.sub(r'\s(\d+|unlimited)$', '', config)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not touch any line that doesn't start with @sentient. Only remove line which start @sentient else ignore.

"""Returns True if plugin is applicable otherwise False"""

return Spyre.is_spyre_card_exists()
return Spyre.get_number_of_spyre_cards()>0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing white spaces around > operator.

Please use pylint to find such code style issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments