Conversation
|
still under test (don't merge) |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPer-VRF FIB state was moved into per-interface (iface) structures. New API types were added: 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/cli/route.c`:
- Around line 178-187: The grammar change removed the "route config set vrf ..."
path that smoke tests still call; either restore that grammar entry or update
the tests. To fix in-code, add a second CLI grammar entry that maps the "set vrf
<name> ..." path back to the existing handler (route_config_set) so existing
scripts keep working — mirror the existing "set default ..." declaration but use
the "set vrf" token and vrf name node, and ensure the same with_help argument
structure and route_config_set callback; alternatively, update
smoke/fib4_fullview_manualtest.sh and smoke/fib6_fullview_manualtest.sh to call
the new "interface set vrf main ..." command.
In `@modules/infra/control/vrf.c`:
- Around line 253-260: The cleanup loop at label fib_fail incorrectly uses af as
the upper bound and may call fib_ops[i]->fini on uninitialized entries; change
the logic to iterate only over the FIBs that were successfully initialized by
tracking a count (e.g., initialized_fibs) when you populate fib_ops and then use
for (unsigned i = 0; i < initialized_fibs; i++) { if (fib_ops[i])
fib_ops[i]->fini(iface); } in the fib_fail block (or alternatively use the
actual number-of-fibs variable used elsewhere) so that fib_ops is only finalized
for entries that were set up. Ensure the new counter is updated where fib_ops
entries are created.
In `@modules/ip6/control/route.c`:
- Around line 50-56: The helpers fib6_get_max_routes and fib6_get_num_tbl8
currently dereference vrf_fib_get(vrf, GR_AF_IP6) without a NULL check; update
both to defensively handle a NULL return from vrf_fib_get by checking the
pointer (result of vrf_fib_get(...)) and returning a safe default (e.g., 0) when
NULL, otherwise returning ->max_routes / ->num_tbl8; keep the functions inline
and preserve existing return types and semantics when the FIB exists.
In `@smoke/config_vrf_test.sh`:
- Around line 83-91: The test never verifies that a VRF created after changing
the defaults inherits those defaults: after running "grcli route config set
default rib4-routes 256 rib6-routes 256" and "grcli interface add vrf newvrf"
you must assert that newvrf actually got the default FIB sizes before you later
change it; add an immediate check after creating newvrf that queries "grcli -j
interface show name newvrf" and asserts ".rib4_max_routes == 256" (and similarly
for rib6 if desired) so the creation-time inheritance is validated prior to the
subsequent "grcli interface set vrf newvrf rib4-routes 128" call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 10ea66c8-dc76-4c93-af08-64a8c3f00dde
📒 Files selected for processing (12)
modules/infra/api/gr_infra.hmodules/infra/cli/route.cmodules/infra/cli/vrf.cmodules/infra/control/gr_vrf.hmodules/infra/control/vrf.cmodules/ip/api/gr_ip4.hmodules/ip/cli/route.cmodules/ip/control/route.cmodules/ip6/api/gr_ip6.hmodules/ip6/cli/route.cmodules/ip6/control/route.csmoke/config_vrf_test.sh
8bc9e1c to
ab74f0a
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modules/ip/control/route.c (1)
50-56: Potential NULL dereference in helper functions.
vrf_fib_get()can return NULL if the AF is not found. These inline helpers dereference the result without checking. While the current call flow ensures the FIB slot exists (init runs after the slot is populated iniface_vrf_init), this is fragile.Defensive fix
static inline uint32_t fib4_get_max_routes(const struct iface *vrf) { - return vrf_fib_get(vrf, GR_AF_IP4)->max_routes; + const struct gr_iface_info_vrf_fib *fib = vrf_fib_get(vrf, GR_AF_IP4); + return fib ? fib->max_routes : 0; } static inline uint32_t fib4_get_num_tbl8(const struct iface *vrf) { - return vrf_fib_get(vrf, GR_AF_IP4)->num_tbl8; + const struct gr_iface_info_vrf_fib *fib = vrf_fib_get(vrf, GR_AF_IP4); + return fib ? fib->num_tbl8 : 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/ip/control/route.c` around lines 50 - 56, The two inline helpers fib4_get_max_routes and fib4_get_num_tbl8 currently dereference vrf_fib_get(...) without NULL checks; update each to first store the result of vrf_fib_get(vrf, GR_AF_IP4) into a local pointer, check if it's NULL, and if so return a safe default (e.g., 0) or handle error path (optional log/assert) otherwise return the requested field; mention vrf_fib_get, fib4_get_max_routes, fib4_get_num_tbl8 and keep iface_vrf_init behavior in mind so the change is defensive only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modules/ip/control/route.c`:
- Around line 50-56: The two inline helpers fib4_get_max_routes and
fib4_get_num_tbl8 currently dereference vrf_fib_get(...) without NULL checks;
update each to first store the result of vrf_fib_get(vrf, GR_AF_IP4) into a
local pointer, check if it's NULL, and if so return a safe default (e.g., 0) or
handle error path (optional log/assert) otherwise return the requested field;
mention vrf_fib_get, fib4_get_max_routes, fib4_get_num_tbl8 and keep
iface_vrf_init behavior in mind so the change is defensive only.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b0578b5e-91b2-49f8-99e9-a439cdac03df
📒 Files selected for processing (14)
modules/infra/api/gr_infra.hmodules/infra/cli/route.cmodules/infra/cli/vrf.cmodules/infra/control/gr_vrf.hmodules/infra/control/vrf.cmodules/ip/api/gr_ip4.hmodules/ip/cli/route.cmodules/ip/control/route.cmodules/ip6/api/gr_ip6.hmodules/ip6/cli/route.cmodules/ip6/control/route.csmoke/config_vrf_test.shsmoke/fib4_fullview_manualtest.shsmoke/fib6_fullview_manualtest.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- smoke/config_vrf_test.sh
9cb30c9 to
6258b7d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/control/vrf.c`:
- Around line 274-275: The current check in vrf.c rejects any set_attrs bits
other than GR_IFACE_SET_NAME and GR_VRF_SET_FIB, causing a description-only
update to be treated as EOPNOTSUPP even though iface_reconfig() already saved
GR_IFACE_SET_DESCR; update the mask to allow GR_IFACE_SET_DESCR by including it
in the allowed bits (i.e., change the condition that tests set_attrs & ~(...) to
also permit GR_IFACE_SET_DESCR), so the function (where set_attrs is validated
in vrf.c) no longer rejects description-only VRF reconfig requests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6128bfcf-64c7-4601-bfec-39cffdadd9bd
📒 Files selected for processing (14)
modules/infra/api/gr_infra.hmodules/infra/cli/route.cmodules/infra/cli/vrf.cmodules/infra/control/gr_vrf.hmodules/infra/control/vrf.cmodules/ip/api/gr_ip4.hmodules/ip/cli/route.cmodules/ip/control/route.cmodules/ip6/api/gr_ip6.hmodules/ip6/cli/route.cmodules/ip6/control/route.csmoke/config_vrf_test.shsmoke/fib4_fullview_manualtest.shsmoke/fib6_fullview_manualtest.sh
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/ip/cli/route.c
- smoke/fib4_fullview_manualtest.sh
- smoke/fib6_fullview_manualtest.sh
6258b7d to
af61b5e
Compare
af61b5e to
dae1ae1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/ip/control/route.c`:
- Around line 666-672: old_fib returned by vrf_fib_ptr_get can be NULL but is
passed unguarded into rte_fib_get_rib; before creating new_fib and calling
rib4_iter_one, check old_fib for NULL (same pattern as fib4_fini) and skip the
migration/rib4_iter_one call if old_fib is NULL (or handle appropriately),
ensuring fib4_migrate_ctx is only populated and rib4_iter_one invoked when
old_fib != NULL; update the code around vrf_fib_ptr_get, create_fib,
fib4_migrate_ctx, and the rib4_iter_one call to perform this guard.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6ea8fc10-8e16-44f0-a00b-c00f6f725a1b
📒 Files selected for processing (14)
modules/infra/api/gr_infra.hmodules/infra/cli/route.cmodules/infra/cli/vrf.cmodules/infra/control/gr_vrf.hmodules/infra/control/vrf.cmodules/ip/api/gr_ip4.hmodules/ip/cli/route.cmodules/ip/control/route.cmodules/ip6/api/gr_ip6.hmodules/ip6/cli/route.cmodules/ip6/control/route.csmoke/config_vrf_test.shsmoke/fib4_fullview_manualtest.shsmoke/fib6_fullview_manualtest.sh
🚧 Files skipped from review as they are similar to previous changes (4)
- smoke/fib6_fullview_manualtest.sh
- modules/infra/api/gr_infra.h
- modules/ip/api/gr_ip4.h
- modules/infra/cli/vrf.c
| old_fib = vrf_fib_ptr_get(vrf, GR_AF_IP4); | ||
| new_fib = create_fib(vrf); | ||
| if (new_fib == NULL) | ||
| return api_out(errno, 0, NULL); | ||
| return errno_log(errno, "create_fib"); | ||
|
|
||
| struct fib4_migrate_ctx ctx = {.new_fib = new_fib}; | ||
| rib4_iter_one(rte_fib_get_rib(vrf_fibs[req->vrf_id]), req->vrf_id, fib4_migrate_cb, &ctx); | ||
| rib4_iter_one(rte_fib_get_rib(old_fib), vrf->id, fib4_migrate_cb, &ctx); |
There was a problem hiding this comment.
Missing NULL check on old_fib before use.
vrf_fib_ptr_get can return NULL, but old_fib is passed directly to rte_fib_get_rib() at line 672 without a guard. If reconfig is called on a VRF that hasn't been initialized (or was already finalized), this will crash.
The fib4_fini function correctly guards against this at line 754. The same pattern should be applied here.
Proposed fix
old_fib = vrf_fib_ptr_get(vrf, GR_AF_IP4);
+ if (old_fib == NULL)
+ return errno_set(ENOENT);
new_fib = create_fib(vrf);
if (new_fib == NULL)
return errno_log(errno, "create_fib");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| old_fib = vrf_fib_ptr_get(vrf, GR_AF_IP4); | |
| new_fib = create_fib(vrf); | |
| if (new_fib == NULL) | |
| return api_out(errno, 0, NULL); | |
| return errno_log(errno, "create_fib"); | |
| struct fib4_migrate_ctx ctx = {.new_fib = new_fib}; | |
| rib4_iter_one(rte_fib_get_rib(vrf_fibs[req->vrf_id]), req->vrf_id, fib4_migrate_cb, &ctx); | |
| rib4_iter_one(rte_fib_get_rib(old_fib), vrf->id, fib4_migrate_cb, &ctx); | |
| old_fib = vrf_fib_ptr_get(vrf, GR_AF_IP4); | |
| if (old_fib == NULL) | |
| return errno_set(ENOENT); | |
| new_fib = create_fib(vrf); | |
| if (new_fib == NULL) | |
| return errno_log(errno, "create_fib"); | |
| struct fib4_migrate_ctx ctx = {.new_fib = new_fib}; | |
| rib4_iter_one(rte_fib_get_rib(old_fib), vrf->id, fib4_migrate_cb, &ctx); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip/control/route.c` around lines 666 - 672, old_fib returned by
vrf_fib_ptr_get can be NULL but is passed unguarded into rte_fib_get_rib; before
creating new_fib and calling rib4_iter_one, check old_fib for NULL (same pattern
as fib4_fini) and skip the migration/rib4_iter_one call if old_fib is NULL (or
handle appropriately), ensuring fib4_migrate_ctx is only populated and
rib4_iter_one invoked when old_fib != NULL; update the code around
vrf_fib_ptr_get, create_fib, fib4_migrate_ctx, and the rib4_iter_one call to
perform this guard.
dae1ae1 to
75fd061
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/api/gr_infra.h`:
- Around line 128-141: The single GR_VRF_SET_FIB flag is too coarse causing
gr_iface_info_vrf_fib values {0,0} to be ambiguous; update the API to carry
per-AF presence/validity flags so callers can explicitly clear or set IPv4/IPv6
FIBs: add either per-AF bits (e.g. GR_VRF_SET_FIB_IPV4 and GR_VRF_SET_FIB_IPV6)
or a validity bitfield inside struct gr_iface_info_vrf (or inside
gr_iface_info_vrf_fib) and update the GR_IFACE_SET handling to check those
presence bits when applying or clearing max_routes/num_tbl8 for
gr_iface_info_vrf.ipv4 and .ipv6 so 0 is treated as a real value when the
corresponding presence flag is set and as “no change” when it is not.
In `@modules/infra/control/vrf.c`:
- Around line 298-342: The loop that resizes per-address-family FIBs (using
vrf_fib_api, vrf_fib_conf, fib_ops[af] and calling fib_ops[af]->reconfig(iface))
can partially commit: if a later reconfig() fails you only restore that AF’s
snapshot (old) and return, leaving earlier AFs changed; fix by recording
successful AF indices and their pre-change snapshots (e.g. an array of struct
gr_iface_info_vrf_fib old_configs indexed by af or a bitmap/list of succeeded
afs) before calling reconfig(), and if any reconfig() fails iterate the list of
previously succeeded AFs in reverse to restore their vrf_fib_conf entries and
call their fib_ops[af]->reconfig(iface) to roll them back (preserve/propagate
errno semantics and restore LOG messages accordingly) so the function leaves the
VRF in its original state on error.
In `@modules/ip/control/route.c`:
- Around line 699-708: The gr_fib4_info payload is sending uninitialized tbl8
counters when vrf_fib_ptr_get(vrf, GR_AF_IP4) returns NULL; ensure
info.used_tbl8 and info.num_tbl8 are initialized (e.g., set to 0) before calling
rte_fib_tbl8_get_stats or before sending via gr_fib4_info in route.c (symbols:
info, vrf_fib_ptr_get, rte_fib_tbl8_get_stats, gr_fib4_info,
HAVE_RTE_FIB_TBL8_GET_STATS); specifically, initialize those fields to 0 prior
to the vrf_fib_ptr_get() result check and only overwrite them if a valid struct
rte_fib* is returned.
- Around line 750-759: fib4_fini (and likewise fib6_fini) removes the published
FIB pointer with vrf_fib_ptr_set but immediately calls rte_fib_free, which can
race with in-flight lookups; add the same QSBR synchronization used in
fib4_reconfig: after calling vrf_fib_ptr_set(vrf, GR_AF_IP4, NULL) invoke the
QSBR grace-wait function (the same QSBR wait/advance used in fib4_reconfig) to
ensure all readers have exited before calling rte_fib_free(fib) and performing
route4_vrf_rm, ensuring the wait occurs between vrf_fib_ptr_set and rte_fib_free
(repeat the analogous change in fib6_fini).
- Around line 666-676: The migration currently iterates old_fib with
rib4_iter_one calling fib4_migrate_cb which may call nexthop_decref() while
old_fib is still live for readers; to avoid a use-after-free, change the flow so
migration of routes happens first but any cleanup that can free nexthops (the
failed-route cleanup currently in fib4_migrate_cb that invokes nexthop_decref())
is deferred until after vrf_fib_ptr_set(vrf, GR_AF_IP4, new_fib) and a full
rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID); in
practice, modify fib4_migrate_cb (or the iterator callback) to record
failed-route nexthops into a temporary list/vec instead of calling
nexthop_decref() immediately, then after the pointer swap and rcu synchronize
iterate that collected list and call nexthop_decref() and finally call
rte_fib_free(old_fib).
In `@modules/ip6/control/route.c`:
- Around line 740-749: The tbl8 counters info.used_tbl8 and info.num_tbl8 can
remain uninitialized when vrf_fib_ptr_get(vrf) returns NULL; modify the block
around vrf_fib_ptr_get/vrf_fib_ptr_get(vrf, GR_AF_IP6) and
rte_fib6_tbl8_get_stats so that when vrf_fib_ptr_get returns NULL you explicitly
set info.used_tbl8 = 0 and info.num_tbl8 = 0 (or initialize them to 0 before the
HAVE_RTE_FIB_TBL8_GET_STATS ifdef) so gr_fib6_info always sends defined values
instead of stack garbage.
- Around line 791-800: After unpublishing the IPv6 FIB with vrf_fib_ptr_set in
fib6_fini, wait for the RCU grace period (use the same RCU synchronization call
used in fib6_reconfig) before calling rte_fib6_free and releasing nexthop
references; apply the same change to fib4_fini so both functions perform an RCU
synchronize after vrf_fib_ptr_set and prior to freeing the FIB.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6706fb2a-4cc7-45a9-b59f-44765b8897d7
📒 Files selected for processing (14)
modules/infra/api/gr_infra.hmodules/infra/cli/route.cmodules/infra/cli/vrf.cmodules/infra/control/gr_vrf.hmodules/infra/control/vrf.cmodules/ip/api/gr_ip4.hmodules/ip/cli/route.cmodules/ip/control/route.cmodules/ip6/api/gr_ip6.hmodules/ip6/cli/route.cmodules/ip6/control/route.csmoke/config_vrf_test.shsmoke/fib4_fullview_manualtest.shsmoke/fib6_fullview_manualtest.sh
🚧 Files skipped from review as they are similar to previous changes (4)
- smoke/fib6_fullview_manualtest.sh
- modules/ip6/api/gr_ip6.h
- modules/infra/control/gr_vrf.h
- modules/infra/cli/route.c
| // VRF reconfiguration attribute flags. | ||
| #define GR_VRF_SET_FIB GR_BIT64(32) | ||
|
|
||
| // Per-AF FIB configuration. | ||
| struct gr_iface_info_vrf_fib { | ||
| uint32_t max_routes; // 0 = default | ||
| uint32_t num_tbl8; // 0 = auto | ||
| }; | ||
|
|
||
| // Info structure for GR_IFACE_TYPE_VRF interfaces. | ||
| struct gr_iface_info_vrf { | ||
| struct gr_iface_info_vrf_fib ipv4; | ||
| struct gr_iface_info_vrf_fib ipv6; | ||
| }; |
There was a problem hiding this comment.
GR_VRF_SET_FIB is too coarse to support resetting an AF back to defaults.
struct gr_iface_info_vrf_fib makes 0 a real value (default / auto), but the request only carries one GR_VRF_SET_FIB bit for the whole struct gr_iface_info_vrf. On GR_IFACE_SET, ipv4 = {0, 0} is indistinguishable from “leave IPv4 unchanged”, so once a VRF has an explicit override there is no way to clear it back to inherited defaults/auto through this public API. You need per-AF presence bits or explicit validity flags.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/api/gr_infra.h` around lines 128 - 141, The single
GR_VRF_SET_FIB flag is too coarse causing gr_iface_info_vrf_fib values {0,0} to
be ambiguous; update the API to carry per-AF presence/validity flags so callers
can explicitly clear or set IPv4/IPv6 FIBs: add either per-AF bits (e.g.
GR_VRF_SET_FIB_IPV4 and GR_VRF_SET_FIB_IPV6) or a validity bitfield inside
struct gr_iface_info_vrf (or inside gr_iface_info_vrf_fib) and update the
GR_IFACE_SET handling to check those presence bits when applying or clearing
max_routes/num_tbl8 for gr_iface_info_vrf.ipv4 and .ipv6 so 0 is treated as a
real value when the corresponding presence flag is set and as “no change” when
it is not.
| if (set_attrs & GR_VRF_SET_FIB) { | ||
| for (unsigned af = 0; af < ARRAY_DIM(fib_ops); af++) { | ||
| const struct gr_iface_info_vrf_fib *api; | ||
| struct gr_iface_info_vrf_fib *fib_conf; | ||
| struct gr_iface_info_vrf_fib old; | ||
|
|
||
| api = vrf_fib_api(info, af); | ||
| if (api == NULL) | ||
| continue; | ||
| if (api->max_routes == 0 && api->num_tbl8 == 0) | ||
| continue; | ||
|
|
||
| if (fib_ops[af] == NULL) | ||
| return errno_set(ENOENT); | ||
|
|
||
| fib_conf = vrf_fib_conf(vrf, af); | ||
| old = *fib_conf; | ||
|
|
||
| if (api->max_routes) { | ||
| fib_conf->max_routes = api->max_routes; | ||
| if (!api->num_tbl8) | ||
| fib_conf->num_tbl8 = 0; | ||
| } | ||
| if (api->num_tbl8) | ||
| fib_conf->num_tbl8 = api->num_tbl8; | ||
|
|
||
| if (fib_conf->max_routes == old.max_routes | ||
| && fib_conf->num_tbl8 == old.num_tbl8) | ||
| continue; | ||
|
|
||
| if (fib_ops[af]->reconfig(iface) < 0) { | ||
| *fib_conf = old; | ||
| return -errno; | ||
| } | ||
|
|
||
| LOG(INFO, | ||
| "resized %s FIB VRF %s(%u) max_routes %u -> %u num_tbl8 %u -> %u", | ||
| gr_af_name(af), | ||
| iface->name, | ||
| iface->id, | ||
| old.max_routes, | ||
| fib_conf->max_routes, | ||
| old.num_tbl8, | ||
| fib_conf->num_tbl8); | ||
| } |
There was a problem hiding this comment.
Rollback earlier AF changes when a later reconfig() fails.
This loop commits IPv4 and IPv6 one by one. If one FIB resize succeeds and a later one fails, the function returns an error after restoring only the failing AF’s config snapshot; the earlier AF stays resized, so a single multi-AF interface set vrf ... request can partially commit and drop routes while reporting failure. Track the AFs that already succeeded and roll them back before returning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/control/vrf.c` around lines 298 - 342, The loop that resizes
per-address-family FIBs (using vrf_fib_api, vrf_fib_conf, fib_ops[af] and
calling fib_ops[af]->reconfig(iface)) can partially commit: if a later
reconfig() fails you only restore that AF’s snapshot (old) and return, leaving
earlier AFs changed; fix by recording successful AF indices and their pre-change
snapshots (e.g. an array of struct gr_iface_info_vrf_fib old_configs indexed by
af or a bitmap/list of succeeded afs) before calling reconfig(), and if any
reconfig() fails iterate the list of previously succeeded AFs in reverse to
restore their vrf_fib_conf entries and call their fib_ops[af]->reconfig(iface)
to roll them back (preserve/propagate errno semantics and restore LOG messages
accordingly) so the function leaves the VRF in its original state on error.
| static void fib4_fini(struct iface *vrf) { | ||
| struct rte_fib *fib = vrf_fib_ptr_get(vrf, GR_AF_IP4); | ||
|
|
||
| if (iface->type != GR_IFACE_TYPE_VRF) | ||
| return; | ||
|
|
||
| memset(&fib4_conf[iface->id], 0, sizeof(fib4_conf[0])); | ||
| fib = vrf_fibs[iface->id]; | ||
| vrf_fibs[iface->id] = NULL; | ||
| vrf_fib_ptr_set(vrf, GR_AF_IP4, NULL); | ||
| if (fib != NULL) { | ||
| LOG(INFO, "destroying IPv4 FIB for VRF %s(%u)", iface->name, iface->id); | ||
| rib4_iter_one(rte_fib_get_rib(fib), iface->id, route4_vrf_rm, (void *)iface); | ||
| LOG(INFO, "destroying IPv4 FIB for VRF %s(%u)", vrf->name, vrf->id); | ||
| rib4_iter_one(rte_fib_get_rib(fib), vrf->id, route4_vrf_rm, (void *)vrf); | ||
| rte_fib_free(fib); | ||
| } | ||
| memset(route_counts[iface->id], 0, sizeof(route_counts[iface->id])); | ||
| memset(route_counts[vrf->id], 0, sizeof(route_counts[vrf->id])); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and inspect the fib4_fini and fib4_reconfig functions
rg -A 20 "^static void fib4_fini" modules/ip/control/route.cRepository: DPDK/grout
Length of output: 815
🏁 Script executed:
# Now let's look at fib4_reconfig to see how it uses QSBR
rg -A 30 "^static.*fib4_reconfig" modules/ip/control/route.cRepository: DPDK/grout
Length of output: 1027
🏁 Script executed:
# Check for QSBR usage patterns in the file
rg "rte_rcu_qsbr_synchronize|gr_datapath_rcu" modules/ip/control/route.c -B 3 -A 3Repository: DPDK/grout
Length of output: 534
🏁 Script executed:
# Check the fib6_fini for comparison
rg -A 20 "^static void fib6_fini" modules/ip/control/route.cRepository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Let's also check route4_vrf_rm to understand what it does
rg -B 5 -A 20 "^static.*route4_vrf_rm" modules/ip/control/route.cRepository: DPDK/grout
Length of output: 605
🏁 Script executed:
# Check if fib6_fini exists
rg -A 20 "fib6_fini" modules/ip/control/route.cRepository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Let's check the broader context of when fib4_fini is called
rg -B 5 -A 5 "fib4_fini" modules/ip/control/route.cRepository: DPDK/grout
Length of output: 569
🏁 Script executed:
# Verify the datapath RCU mechanism is used elsewhere for similar teardown
rg "vrf_fini|interface.*fini" modules/ip/control/ -A 10Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Search for fib6_fini more broadly
rg "fib6_fini|fib6_ops" modules/ip/control/route.c -A 15Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Check the vrf_fib_ops structure definition to understand the interface
rg -B 5 -A 10 "struct vrf_fib_ops" modules/Repository: DPDK/grout
Length of output: 4213
🏁 Script executed:
# Look for where vrf_fini or fib4_fini is called from
rg -B 3 -A 3 "\.fini\(" modules/ -A 5Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Let's find where the fib_ops.fini is actually called
rg "fib_ops.*fini|ops->fini" modules/infra/control/ -B 5 -A 5Repository: DPDK/grout
Length of output: 1807
🏁 Script executed:
# Check for vrf_fini or interface deletion patterns
rg "vrf_fini|iface.*destroy|interface.*delete" modules/infra/control/ -B 3 -A 10Repository: DPDK/grout
Length of output: 9987
🏁 Script executed:
# Look at fib6_fini for comparison
rg -A 20 "^static void fib6_fini" modules/ip6/control/route.cRepository: DPDK/grout
Length of output: 818
Add QSBR synchronization before tearing down the IPv4 FIB.
The FIB pointer is unpublished at line 753, but rte_fib_free() at line 757 can race with in-flight datapath lookups. Unlike fib4_reconfig() which waits for QSBR after publishing the new FIB and before freeing the old one, fib4_fini() has no synchronization barrier. In-flight lookups that acquired a reference to the old FIB or its nexthops before the pointer was NULLed can hit freed memory. The same issue exists in fib6_fini().
Suggested fix
static void fib4_fini(struct iface *vrf) {
struct rte_fib *fib = vrf_fib_ptr_get(vrf, GR_AF_IP4);
vrf_fib_ptr_set(vrf, GR_AF_IP4, NULL);
if (fib != NULL) {
LOG(INFO, "destroying IPv4 FIB for VRF %s(%u)", vrf->name, vrf->id);
+ rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID);
rib4_iter_one(rte_fib_get_rib(fib), vrf->id, route4_vrf_rm, (void *)vrf);
rte_fib_free(fib);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip/control/route.c` around lines 750 - 759, fib4_fini (and likewise
fib6_fini) removes the published FIB pointer with vrf_fib_ptr_set but
immediately calls rte_fib_free, which can race with in-flight lookups; add the
same QSBR synchronization used in fib4_reconfig: after calling
vrf_fib_ptr_set(vrf, GR_AF_IP4, NULL) invoke the QSBR grace-wait function (the
same QSBR wait/advance used in fib4_reconfig) to ensure all readers have exited
before calling rte_fib_free(fib) and performing route4_vrf_rm, ensuring the wait
occurs between vrf_fib_ptr_set and rte_fib_free (repeat the analogous change in
fib6_fini).
| const struct iface *vrf = get_vrf_iface(v); | ||
| if (vrf == NULL) | ||
| continue; | ||
| info.vrf_id = v; | ||
| info.max_routes = fib6_get_max_routes(v); | ||
| info.max_routes = fib6_get_max_routes(vrf); | ||
| info.used_routes = fib6_total_routes(v); | ||
| #ifdef HAVE_RTE_FIB_TBL8_GET_STATS | ||
| rte_fib6_tbl8_get_stats(vrf_fibs[v], &info.used_tbl8, &info.num_tbl8); | ||
| struct rte_fib6 *fib = vrf_fib_ptr_get(vrf, GR_AF_IP6); | ||
| if (fib != NULL) | ||
| rte_fib6_tbl8_get_stats(fib, &info.used_tbl8, &info.num_tbl8); |
There was a problem hiding this comment.
Initialize tbl8 stats before sending gr_fib6_info.
When HAVE_RTE_FIB_TBL8_GET_STATS is enabled and vrf_fib_ptr_get() returns NULL, info.num_tbl8 and info.used_tbl8 are left uninitialized and still sent at Line 754. That turns a missing FIB into bogus stats and leaks stack garbage through the API.
Possible fix
info.vrf_id = v;
info.max_routes = fib6_get_max_routes(vrf);
info.used_routes = fib6_total_routes(v);
+ info.num_tbl8 = fib6_get_num_tbl8(vrf);
+ info.used_tbl8 = 0;
`#ifdef` HAVE_RTE_FIB_TBL8_GET_STATS
struct rte_fib6 *fib = vrf_fib_ptr_get(vrf, GR_AF_IP6);
if (fib != NULL)
rte_fib6_tbl8_get_stats(fib, &info.used_tbl8, &info.num_tbl8);Also applies to: 751-754
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip6/control/route.c` around lines 740 - 749, The tbl8 counters
info.used_tbl8 and info.num_tbl8 can remain uninitialized when
vrf_fib_ptr_get(vrf) returns NULL; modify the block around
vrf_fib_ptr_get/vrf_fib_ptr_get(vrf, GR_AF_IP6) and rte_fib6_tbl8_get_stats so
that when vrf_fib_ptr_get returns NULL you explicitly set info.used_tbl8 = 0 and
info.num_tbl8 = 0 (or initialize them to 0 before the
HAVE_RTE_FIB_TBL8_GET_STATS ifdef) so gr_fib6_info always sends defined values
instead of stack garbage.
| static void fib6_fini(struct iface *vrf) { | ||
| struct rte_fib6 *fib = vrf_fib_ptr_get(vrf, GR_AF_IP6); | ||
|
|
||
| if (iface->type != GR_IFACE_TYPE_VRF) | ||
| return; | ||
|
|
||
| memset(&fib6_conf[iface->id], 0, sizeof(fib6_conf[0])); | ||
| fib = vrf_fibs[iface->id]; | ||
| vrf_fibs[iface->id] = NULL; | ||
| vrf_fib_ptr_set(vrf, GR_AF_IP6, NULL); | ||
| if (fib != NULL) { | ||
| LOG(INFO, "destroying IPv6 FIB for VRF %s(%u)", iface->name, iface->id); | ||
| rib6_iter_one(rte_fib6_get_rib(fib), iface->id, route6_vrf_rm, (void *)iface); | ||
| LOG(INFO, "destroying IPv6 FIB for VRF %s(%u)", vrf->name, vrf->id); | ||
| rib6_iter_one(rte_fib6_get_rib(fib), vrf->id, route6_vrf_rm, (void *)vrf); | ||
| rte_fib6_free(fib); | ||
| } | ||
| memset(route_counts[iface->id], 0, sizeof(route_counts[iface->id])); | ||
| memset(route_counts[vrf->id], 0, sizeof(route_counts[vrf->id])); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the fib6_fini and fib6_reconfig functions
find . -name "route.c" -path "*/ip6/control/*" -type fRepository: DPDK/grout
Length of output: 83
🏁 Script executed:
# Read the route.c file to find these functions and understand the context
wc -l modules/ip6/control/route.cRepository: DPDK/grout
Length of output: 85
🏁 Script executed:
# Get the fib6_fini, fib6_reconfig functions and nearby context
sed -n '780,850p' modules/ip6/control/route.cRepository: DPDK/grout
Length of output: 1702
🏁 Script executed:
# Search for fib6_reconfig to see the full implementation
rg -n "fib6_reconfig|fib6_fini" modules/ip6/control/route.c -A 20Repository: DPDK/grout
Length of output: 2410
🏁 Script executed:
# Also check the IPv4 counterpart to understand the pattern
rg -n "fib_fini|fib_reconfig" modules/ip/control/route.c -A 20 | head -80Repository: DPDK/grout
Length of output: 36
🏁 Script executed:
# Search for rte_rcu_qsbr_synchronize usage in the codebase
rg -n "rte_rcu_qsbr_synchronize" --type c --type hRepository: DPDK/grout
Length of output: 1716
🏁 Script executed:
# Check the IPv4 fib_fini implementation
sed -n '680,750p' modules/ip/control/route.cRepository: DPDK/grout
Length of output: 1752
🏁 Script executed:
# Check where fib_fini is called from - likely in VRF teardown
rg -n "\.fini\s*=" modules/ip*/control/route.c -B 5 -A 5Repository: DPDK/grout
Length of output: 1233
🏁 Script executed:
# Search for callers of fib_fini through vrf_fib_ops
rg -n "vrf_fib_ops|fib_ops|\.fini" modules/infra/control/iface.c -B 2 -A 2 | head -100Repository: DPDK/grout
Length of output: 159
🏁 Script executed:
# Find where vrf_fib_ops functions are invoked
rg -n "vrf_fib_ops|fib_ops" modules/infra/control/*.c -B 3 -A 3Repository: DPDK/grout
Length of output: 4095
🏁 Script executed:
# Look for the actual fib4_fini implementation to compare with fib6_fini
rg -n "^static void fib4_fini" modules/ip/control/route.c -A 10Repository: DPDK/grout
Length of output: 496
🏁 Script executed:
# Search for where .fini callbacks are called
rg -n "->fini\(" modules --type c --type h -B 2 -A 2Repository: DPDK/grout
Length of output: 458
Add RCU grace period before freeing the IPv6 FIB.
After unpublishing the table at line 794, fib6_fini() immediately frees the FIB and releases nexthop references. Without waiting for in-flight lookups to complete, freed memory can still be accessed. This mirrors the RCU synchronization already present in fib6_reconfig() (line 716).
Suggested fix
static void fib6_fini(struct iface *vrf) {
struct rte_fib6 *fib = vrf_fib_ptr_get(vrf, GR_AF_IP6);
vrf_fib_ptr_set(vrf, GR_AF_IP6, NULL);
if (fib != NULL) {
LOG(INFO, "destroying IPv6 FIB for VRF %s(%u)", vrf->name, vrf->id);
+ rte_rcu_qsbr_synchronize(gr_datapath_rcu(), RTE_QSBR_THRID_INVALID);
rib6_iter_one(rte_fib6_get_rib(fib), vrf->id, route6_vrf_rm, (void *)vrf);
rte_fib6_free(fib);
}Note: fib4_fini() has the same issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/ip6/control/route.c` around lines 791 - 800, After unpublishing the
IPv6 FIB with vrf_fib_ptr_set in fib6_fini, wait for the RCU grace period (use
the same RCU synchronization call used in fib6_reconfig) before calling
rte_fib6_free and releasing nexthop references; apply the same change to
fib4_fini so both functions perform an RCU synchronize after vrf_fib_ptr_set and
prior to freeing the FIB.
| return 0; | ||
|
|
||
| fib_fail: | ||
| for (unsigned i = 0; i < af; i++) |
There was a problem hiding this comment.
do the same as line 235:
for (af = 0; af < ARRAY_DIM(fib_ops); af++) {
if (fib_ops[af] == NULL)
continue;There was a problem hiding this comment.
Looping over all registered AFs (i < ARRAY_DIM(fib_ops)) would call fini() on AFs that were never init()'d -- both the one that failed and any registered AF that comes after it in the array. It works by accident because fib4_fini/fib6_fini check for a NULL fib pointer, but we'd be relying on every future fini() implementation being equally defensive.
i < af makes the contract explicit: only tear down what was successfully set up. The failing init() cleans up after itself.
75fd061 to
7f332a3
Compare
fib6_conf_set used rib6_iter() to migrate routes during FIB resize, which iterates all VRFs to find the matching one. Use rib6_iter_one() directly with the old FIB's RIB, like the IPv4 counterpart already does in fib4_conf_set. Fixes: ac81860 ("ip6: add configurable FIB size per VRF") Signed-off-by: Maxime Leroy <maxime@leroys.fr> Reviewed-by: Robin Jarry <rjarry@redhat.com>
Instead of configuring FIB capacity through a separate per-VRF API request (GR_IP4_FIB_CONF_SET / GR_IP6_FIB_CONF_SET), store FIB parameters (max_routes, num_tbl8) directly in the VRF interface info. This allows setting FIB capacity at VRF creation time and reconfiguring it through the standard interface set API. A new vrf_fib_ops callback mechanism lets IP modules hook into VRF init/reconfig/fini, replacing the previous IFACE_REMOVE event subscription and lazy FIB creation. The per-VRF rte_fib/rte_fib6 pointers are now stored in the private VRF iface struct (fib_ptr[]) instead of separate global arrays, trading one extra indirection on the lookup path for removing two GR_MAX_IFACES-sized allocations. A fixed enum (GR_VRF_FIB_IP4, GR_VRF_FIB_IP6) indexes both the FIB config and opaque pointer arrays. The "route config set default" command now only sets the default max_routes for new VRFs. The tbl8 default is always auto-derived. Per-VRF tbl8 override is still available through "iface add/set vrf". Signed-off-by: Maxime Leroy <maxime@leroys.fr> Reviewed-by: Robin Jarry <rjarry@redhat.com>
When requesting FIB info for a specific VRF, the default entry (with VRF ID 0 and global default values) was unconditionally sent before the per-VRF entry. This caused `route config show vrf <name>` to always display a spurious default line. Guard the default entry emission with a check on vrf_id so it is only included when listing all VRFs. Signed-off-by: Maxime Leroy <maxime@leroys.fr> Reviewed-by: Robin Jarry <rjarry@redhat.com>
The VRF reconfig handler rejects any set_attrs bits it does not recognize. When description support was added, GR_IFACE_SET_DESCR was not included in the VRF allowed mask, causing description-only updates to return EOPNOTSUPP even though iface_reconfig() already saved the new description before calling the type-specific handler. Fixes: fd1097d ("infra: add interface description field") Signed-off-by: Maxime Leroy <maxime@leroys.fr> Reviewed-by: Robin Jarry <rjarry@redhat.com>
7f332a3 to
a26e19e
Compare
Summary by CodeRabbit
New Features
Bug Fixes
Chores