[NE16] Add GAP9_w_NE16 platform: NE16 accelerator Engine on GAP9#183
[NE16] Add GAP9_w_NE16 platform: NE16 accelerator Engine on GAP9#183runwangdl wants to merge 6 commits intopulp-platform:develfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 57 minutes and 48 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (55)
📝 WalkthroughWalkthroughAdds comprehensive NE16 accelerator support and a new platform variant Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (4)
Deeploy/Targets/NE16/TileConstraints/__init__.py (1)
5-5: Replace self-referential wildcard import with explicit exports.
from . import *on Line 5 makes exported symbols unclear. Replace with explicit re-exports plus__all__to clarify the public API.Proposed fix
-from . import * +from .NE16DenseConstraint import NE16DenseConv2DTileConstraint, NE16RQSDenseConv2DTileConstraint +from .NE16DepthwiseConstraint import NE16DWConv2DTileConstraint, NE16RQSDWConv2DTileConstraint +from .NE16PointwiseConstraint import NE16PWConv2DTileConstraint, NE16RQSPWConv2DTileConstraint + +__all__ = [ + "NE16DenseConv2DTileConstraint", + "NE16RQSDenseConv2DTileConstraint", + "NE16DWConv2DTileConstraint", + "NE16RQSDWConv2DTileConstraint", + "NE16PWConv2DTileConstraint", + "NE16RQSPWConv2DTileConstraint", +]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/__init__.py` at line 5, Replace the self-referential wildcard import in __init__.py with explicit re-exports: remove "from . import *" and import the specific public symbols (e.g., classes, functions, constants) from the package submodules using explicit imports (for example: from .submodule import ClassX, func_y), then declare __all__ = ["ClassX", "func_y", ...] listing those exported names; ensure you reference the actual symbol names defined in the package (replace placeholders with the real identifiers) and avoid introducing circular imports when reorganizing the imports.Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py (1)
161-162: Prefix unused unpacked variables with underscore.The
BatchOffset,HOffset,WOffset, andBatchSizevariables are unpacked for structural clarity but never used. Prefix them with_to indicate intentional disuse.🔧 Proposed fix
for cube in outputCubes: - (BatchOffset, HOffset, WOffset, COffset) = cube.offset - (BatchSize, HSize, WSize, CSize) = cube.dims + (_BatchOffset, _HOffset, _WOffset, COffset) = cube.offset + (_BatchSize, HSize, WSize, CSize) = cube.dimsSimilarly at line 173:
- inBSize, inHSize, inWSize, inCSize = InCube.dims + _inBSize, _inHSize, inWSize, inCSize = InCube.dims🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py` around lines 161 - 162, The tuple unpackings in NE16DenseConstraint.py currently bind unused names BatchOffset, HOffset, WOffset, COffset and BatchSize (and similarly at the later unpacking around line 173); update those unpackings to prefix unused variables with an underscore (e.g., _BatchOffset, _HOffset, _WOffset, _BatchSize) so the intent of intentional non-use is clear; locate the unpacking expressions that assign to cube.offset and cube.dims and rename only the unused identifiers accordingly while keeping used names (COffset, HSize, WSize, CSize) unchanged.Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py (1)
125-136: Use integer division to avoid floating-point precision issues.Line 131 uses float division (
/), which can cause precision loss for large integers. Use integer division (//) instead.🔧 Proposed fix
def _findAllMultiplicands(x: int) -> List[int]: multiplicands = [] tmpX = x for i in range(2, math.ceil(math.sqrt(x))): # Ceil cause range doesn't include the last number while tmpX % i == 0: multiplicands.append(i) - tmpX = tmpX / i + tmpX = tmpX // i if x // math.prod(multiplicands) > 1: multiplicands.append(x // math.prod(multiplicands)) return multiplicands🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py` around lines 125 - 136, In _findAllMultiplicands, replace the floating-point division on tmpX (tmpX = tmpX / i) with integer division (tmpX = tmpX // i) and ensure tmpX remains an int so subsequent modulus and math.prod operations are integer-safe; update any related expressions that rely on tmpX's type (e.g., the while loop condition tmpX % i == 0 and the final math.prod usage) to work with the integer result.Deeploy/Targets/NE16/Parsers.py (1)
55-62: Usenot channels_firstinstead of== False.Per PEP 8 and Ruff E712, avoid direct comparisons to
False.🔧 Proposed fix
if not all([ - channels_first == False, + not channels_first, len(data_in.shape) == 4, # LMACAN: weight shape should be equal to 3 because we have to do the ne16's # special weight encoding len(weight.shape) == 3, ]):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/Parsers.py` around lines 55 - 62, In the conditional inside Parsers.py (the if checking channels_first, data_in.shape, weight.shape) replace the direct boolean comparison channels_first == False with the idiomatic not channels_first (i.e., use not channels_first in the all([...]) list) so the condition follows PEP8/Ruff E712; adjust the list element in the same if that currently reads channels_first == False to read not channels_first.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Deeploy/Targets/NE16/__init__.py`:
- Line 5: The __init__.py currently uses explicit imports but omitted the Tiler
module, causing Engine.py's internal imports to fail; update
Deeploy.Targets.NE16.__init__.py to explicitly import or expose the Tiler module
(the Tiler.py symbol) alongside the other exported modules so it is available
when the package is imported, ensuring Engine.py's references to Tiler resolve
correctly (alternatively, use a package import loop such as using
pkgutil.iter_modules to dynamically import all submodules if you prefer
automatic inclusion).
In `@Deeploy/Targets/NE16/Deployer.py`:
- Around line 28-30: The constructor currently uses a mutable default
inputOffsets = {} which causes shared state; change the signature for the
relevant classes (SignPropDeployer.__init__ and GAP9Deployer.__init__) to use
inputOffsets=None and inside each __init__ set inputOffsets = {} if inputOffsets
is None before passing it to super().__init__ (or before any mutations), so each
instance gets its own dict and avoids state leakage when SignPropDeployer
populates the keys.
In `@Deeploy/Targets/NE16/Engine.py`:
- Around line 62-81: The conv-classification is too loose: tighten isPWConv to
reject grouped 1x1 convolutions by adding a check that node.attrs['group'] == 1,
and tighten isDWConv to only accept true depthwise convolutions by inspecting
the conv weight tensor (node.inputs[1].values) and requiring the per-group
in_channels == 1 and that the number of groups equals the number of output
channels (e.g., weights.shape[1] == 1 and weights.shape[0] ==
node.attrs['group'] or group == out_channels); update isPWConv, isDWConv (and
thereby canExecute) accordingly so only bona fide pointwise (group==1) and true
depthwise convs are classified for NE16 coloring.
In `@Deeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.py`:
- Around line 27-44: The loop that sums and assigns weight memory uses
weightMemoryOccupation and then re-adds the same buffer when multiple NE16 nodes
reference it and uses a strict "<" so exact-fit buffers are skipped; change the
logic so before calling _ne16WeightBufferSize(buffer) or adding to
weightMemoryOccupation you check whether buffer._memoryLevel is already equal to
self._weightMemoryLevel.name (skip counting/annotating if already placed) and
use a non-strict comparison (<=) against self._weightMemoryLevel.size when
deciding to assign buffer._memoryLevel; update references in this block
(ne16Nodes iteration, ctxt.lookup(node.inputs[1].name), buffer._memoryLevel,
_ne16WeightBufferSize, and self._weightMemoryLevel.size) accordingly so buffers
aren’t double-counted and exact-fit placements are allowed.
In `@Deeploy/Targets/NE16/Parsers.py`:
- Around line 118-121: The parsers map only 4 inputs into
self.operatorRepresentation without bounds checking and always set shift to
'NULL', causing IndexError and mismatched scale_shift_addr; update parseNode()
in the affected parsers (the current loop that assigns inputs via inputs =
['data_in','weight','mul','add'] and self.operatorRepresentation[inputs[idx]] =
ctxt.lookup(inputNode.name).name) to perform explicit validation like
len(node.inputs) == 4 (following RQMatMulParser) OR handle a fifth input by
mapping node.inputs[4] into e.g. self.operatorRepresentation['shift'] (or read
shift from node.attrs['shift'] like PULPOpen) and stop hardcoding 'NULL' so the
template variable ${shift} used by ConvTemplate.py:310 gets a proper value;
apply the same change to NE16RQSPWConv2DParser and NE16RQSDenseConv2DParser.
In `@Deeploy/Targets/NE16/Templates/ConvTemplate.py`:
- Around line 141-147: The conf0 computation treats bias presence
incorrectly—change the bias check to treat the "NULL" sentinel as absent: when
calling getNormQuantConf0 in the block guarded by operatorRepresentation["mul"]
!= "NULL", replace the argument "add" in operatorRepresentation with an explicit
boolean like ('add' in operatorRepresentation and operatorRepresentation["add"]
!= "NULL") so bit 25 isn't set when add == "NULL"; keep the existing mul !=
"NULL" guard as-is and pass the adjusted boolean into getNormQuantConf0
(symbols: operatorRepresentation, getNormQuantConf0, conf0, "mul", "add").
- Around line 323-327: The compound literal for ne16_cfg_t incorrectly assigns
weights_stride via an equals expression instead of a designated initializer;
change the block that sets task.data.cfg.weights_stride to use a designated
field (e.g., add a .weights_stride = { .d0 = weightStrideD0, .d1 =
weightStrideD1, .d2 = weightStrideD2 } entry) inside the ne16_cfg_t initializer
so it matches the surrounding fields like .input_stride, .output_stride and
.subtile and produces valid C code.
In `@Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py`:
- Around line 41-44: The three unused weight-dimension variable declarations in
NE16DenseConstraint.py (weightInChannelMajorVar, weightBitsVar,
weightBandwidthVar) attempt to access non-existent dims and cause a runtime
KeyError; remove the lines that create weightInChannelMajorVar, weightBitsVar
and weightBandwidthVar and keep only weightOutChannelVar =
tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 0) which is
the only weight dim used in this function (adjust any references to those
removed symbols if present elsewhere in the function).
In `@Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py`:
- Around line 43-46: The current constraints in NE16PointwiseConstraint.py bind
outputHeightVar==inputHeightVar and outputWidthVar==inputWidthVar which breaks
strided 1x1 convs allowed by NE16Engine.canExecute when enableStrides is true;
replace those identity constraints (the tilerModel.addConstraint calls for
outputHeightVar/outputWidthVar) with the proper stride-aware relation using the
stride variables (e.g., strideH/strideW or the stride parameter used by
NE16Engine) so outputDim = (effective_input_dim - 1) // stride + 1 for height
and width respectively, preserving the existing batch equality (outputBatchVar
== inputBatchVar).
In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py`:
- Around line 81-101: The subtraction of weight_offset from values in Passes.py
can overflow if values is int8; convert values to a wider integer dtype (e.g.,
np.int16 or np.int32) before computing weight_offset and performing values =
values - weight_offset, then ensure the resulting values are clipped to the
0–255 range and finally cast to np.uint8 just before calling _weightEncode
(refer to weightTensor.values, weight_offset, values, and _weightEncode) so the
transpose and depthwise logic remain unchanged.
In `@DeeployTest/deeployRunner_tiled_gap9_w_ne16.py`:
- Around line 15-16: The two parsed flags --ne16-wmem and --enable-3x3 are never
consumed; update main() to translate args.ne16_wmem and args.enable_3x3 into
CMake defines (e.g., add "-DENABLE_NE16_WMEM=ON" when args.ne16_wmem is true and
"-DENABLE_3X3_CONV=ON" when args.enable_3x3 is true) and append them to the same
cmake argument list used for building (the variable that assembles cmake args /
options or the call that runs cmake/configure in main()); if the project expects
OFF values when flags are absent, explicitly add =OFF otherwise. Ensure you
reference the existing parser flags (--ne16-wmem / --enable-3x3) and modify the
code path that constructs cmake arguments in main() so these options are passed
to CMake.
---
Nitpick comments:
In `@Deeploy/Targets/NE16/Parsers.py`:
- Around line 55-62: In the conditional inside Parsers.py (the if checking
channels_first, data_in.shape, weight.shape) replace the direct boolean
comparison channels_first == False with the idiomatic not channels_first (i.e.,
use not channels_first in the all([...]) list) so the condition follows
PEP8/Ruff E712; adjust the list element in the same if that currently reads
channels_first == False to read not channels_first.
In `@Deeploy/Targets/NE16/TileConstraints/__init__.py`:
- Line 5: Replace the self-referential wildcard import in __init__.py with
explicit re-exports: remove "from . import *" and import the specific public
symbols (e.g., classes, functions, constants) from the package submodules using
explicit imports (for example: from .submodule import ClassX, func_y), then
declare __all__ = ["ClassX", "func_y", ...] listing those exported names; ensure
you reference the actual symbol names defined in the package (replace
placeholders with the real identifiers) and avoid introducing circular imports
when reorganizing the imports.
In `@Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py`:
- Around line 161-162: The tuple unpackings in NE16DenseConstraint.py currently
bind unused names BatchOffset, HOffset, WOffset, COffset and BatchSize (and
similarly at the later unpacking around line 173); update those unpackings to
prefix unused variables with an underscore (e.g., _BatchOffset, _HOffset,
_WOffset, _BatchSize) so the intent of intentional non-use is clear; locate the
unpacking expressions that assign to cube.offset and cube.dims and rename only
the unused identifiers accordingly while keeping used names (COffset, HSize,
WSize, CSize) unchanged.
In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py`:
- Around line 125-136: In _findAllMultiplicands, replace the floating-point
division on tmpX (tmpX = tmpX / i) with integer division (tmpX = tmpX // i) and
ensure tmpX remains an int so subsequent modulus and math.prod operations are
integer-safe; update any related expressions that rely on tmpX's type (e.g., the
while loop condition tmpX % i == 0 and the final math.prod usage) to work with
the integer result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3a121db6-6486-4738-9ded-bbe6729e8727
📒 Files selected for processing (28)
CMakeLists.txtDeeploy/Targets/NE16/Bindings.pyDeeploy/Targets/NE16/Deployer.pyDeeploy/Targets/NE16/Engine.pyDeeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.pyDeeploy/Targets/NE16/OptimizationPasses/__init__.pyDeeploy/Targets/NE16/Parsers.pyDeeploy/Targets/NE16/Platform.pyDeeploy/Targets/NE16/Templates/AllocateTemplate.pyDeeploy/Targets/NE16/Templates/ConvTemplate.pyDeeploy/Targets/NE16/Templates/__init__.pyDeeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.pyDeeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.pyDeeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.pyDeeploy/Targets/NE16/TileConstraints/RequantHelpers.pyDeeploy/Targets/NE16/TileConstraints/__init__.pyDeeploy/Targets/NE16/Tiler.pyDeeploy/Targets/NE16/TopologyOptimizationPasses/Passes.pyDeeploy/Targets/NE16/TopologyOptimizationPasses/__init__.pyDeeploy/Targets/NE16/__init__.pyDeeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.pyDeeployTest/CMakeLists.txtDeeployTest/deeployRunner_tiled_gap9_w_ne16.pyDeeployTest/testMVP.pyDeeployTest/testUtils/core/execution.pyDeeployTest/testUtils/platformMapping.pyDeeployTest/test_gap9_ne16_tiled_config.pyTargetLibraries/GAP9/CMakeLists.txt
| # | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from . import * |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify self-referential wildcard imports in NE16 package initializers.
rg -n '^\s*from \. import \*$' Deeploy/Targets/NE16Repository: pulp-platform/Deeploy
Length of output: 391
🏁 Script executed:
find Deeploy/Targets/NE16 -maxdepth 1 -type d | sortRepository: pulp-platform/Deeploy
Length of output: 241
🏁 Script executed:
ls -la Deeploy/Targets/NE16/ | head -30Repository: pulp-platform/Deeploy
Length of output: 878
🏁 Script executed:
cat -n Deeploy/Targets/NE16/__init__.pyRepository: pulp-platform/Deeploy
Length of output: 224
🏁 Script executed:
rg "from.*NE16.*import.*Tiler|from.*NE16.*Tiler" --type pyRepository: pulp-platform/Deeploy
Length of output: 205
🏁 Script executed:
rg "\.Tiler" Deeploy/Targets/NE16 --type pyRepository: pulp-platform/Deeploy
Length of output: 836
🏁 Script executed:
find Deeploy/Targets -maxdepth 2 -name "__init__.py" -type f | head -10Repository: pulp-platform/Deeploy
Length of output: 374
🏁 Script executed:
for dir in Deeploy/Targets/*/; do echo "=== $(basename $dir) ==="; cat "$dir/__init__.py" 2>/dev/null | head -20; doneRepository: pulp-platform/Deeploy
Length of output: 1218
🏁 Script executed:
rg "__all__" Deeploy/Targets/NE16 --type py -A 5Repository: pulp-platform/Deeploy
Length of output: 47
🏁 Script executed:
head -20 Deeploy/Targets/NE16/Tiler.pyRepository: pulp-platform/Deeploy
Length of output: 1270
🏁 Script executed:
rg "import.*Tiler" Deeploy --type py | grep -v "TilerExtension\|TilerModel" | head -20Repository: pulp-platform/Deeploy
Length of output: 47
🏁 Script executed:
rg "from.*NE16" Deeploy --type py | grep -v "^Binary" | head -30Repository: pulp-platform/Deeploy
Length of output: 2287
The proposed fix is incomplete: Tiler.py exists in the NE16 directory but is missing from the proposed imports
While the original concern is valid—from . import * does not expose submodules/packages as intended—the proposed fix lists only 9 modules and omits Tiler.py, which exists as a module in the directory and is imported internally by Engine.py.
Corrected fix
-from . import *
+from . import Bindings, Deployer, Engine, OptimizationPasses, Parsers, Platform, Tiler, Templates, TileConstraints, TopologyOptimizationPasses
+
+__all__ = [
+ "Bindings",
+ "Deployer",
+ "Engine",
+ "OptimizationPasses",
+ "Parsers",
+ "Platform",
+ "Tiler",
+ "Templates",
+ "TileConstraints",
+ "TopologyOptimizationPasses",
+]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from . import * | |
| from . import Bindings, Deployer, Engine, OptimizationPasses, Parsers, Platform, Tiler, Templates, TileConstraints, TopologyOptimizationPasses | |
| __all__ = [ | |
| "Bindings", | |
| "Deployer", | |
| "Engine", | |
| "OptimizationPasses", | |
| "Parsers", | |
| "Platform", | |
| "Tiler", | |
| "Templates", | |
| "TileConstraints", | |
| "TopologyOptimizationPasses", | |
| ] |
🧰 Tools
🪛 Ruff (0.15.9)
[error] 5-5: from . import * used; unable to detect undefined names
(F403)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/__init__.py` at line 5, The __init__.py currently uses
explicit imports but omitted the Tiler module, causing Engine.py's internal
imports to fail; update Deeploy.Targets.NE16.__init__.py to explicitly import or
expose the Tiler module (the Tiler.py symbol) alongside the other exported
modules so it is available when the package is imported, ensuring Engine.py's
references to Tiler resolve correctly (alternatively, use a package import loop
such as using pkgutil.iter_modules to dynamically import all submodules if you
prefer automatic inclusion).
| inputOffsets = {}): | ||
| super().__init__(graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name, | ||
| default_channels_first, deeployStateDir, inputOffsets) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether inputOffsets is mutated downstream (which would make shared default state dangerous).
rg -nP --type=py -C3 'def __init__\(|inputOffsets\s*=' Deeploy/Targets/NE16/Deployer.py Deeploy/Targets/GAP9/Deployer.py Deeploy/DeeployTypes.py
rg -nP --type=py -C2 'inputOffsets\[[^]]+\]\s*=|inputOffsets\.update\(|setdefault\(' Deeploy/Targets/NE16/Deployer.py Deeploy/Targets/GAP9/Deployer.py Deeploy/DeeployTypes.pyRepository: pulp-platform/Deeploy
Length of output: 12773
🏁 Script executed:
# Find the parent class of GAP9Deployer
rg -n "class GAP9Deployer" Deeploy/Targets/GAP9/Deployer.py -A 5Repository: pulp-platform/Deeploy
Length of output: 312
🏁 Script executed:
# Get the full NE16Deployer.__init__ context and check parent
head -n 35 Deeploy/Targets/NE16/Deployer.pyRepository: pulp-platform/Deeploy
Length of output: 1693
🏁 Script executed:
# Get the full GAP9Deployer.__init__ context to see parent class definition
head -n 70 Deeploy/Targets/GAP9/Deployer.pyRepository: pulp-platform/Deeploy
Length of output: 2726
🏁 Script executed:
# Search for any mutation of inputOffsets using broader patterns
rg -n "inputOffsets" Deeploy/Targets/NE16/Deployer.py Deeploy/Targets/GAP9/Deployer.py Deeploy/DeeployTypes.py -A 2 -B 2Repository: pulp-platform/Deeploy
Length of output: 1535
🏁 Script executed:
# Find PULPDeployer class and its __init__
fd "Deployer.py" Deeploy/Targets/PULPOpen/Repository: pulp-platform/Deeploy
Length of output: 101
🏁 Script executed:
# Search for PULPDeployer definition
rg -n "class PULPDeployer" Deeploy/ -A 10Repository: pulp-platform/Deeploy
Length of output: 974
🏁 Script executed:
# Search for how inputOffsets is used in the parent class hierarchy
rg -n "self.inputOffsets" Deeploy/ -B 2 -A 2Repository: pulp-platform/Deeploy
Length of output: 3019
🏁 Script executed:
# Read SignPropDeployer to see the full __init__ and how inputOffsets is mutated
head -n 55 Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.pyRepository: pulp-platform/Deeploy
Length of output: 2075
🏁 Script executed:
# Check if inputOffsets is mutated elsewhere besides the initial population
rg -n "inputOffsets\[" Deeploy/ -B 1 -A 1Repository: pulp-platform/Deeploy
Length of output: 1113
Use None as default for inputOffsets parameter.
The {} default on line 28 creates a shared mutable object that gets mutated in SignPropDeployer.__init__() (which populates it with keys). Multiple instances created without providing inputOffsets will share the same dict, causing unintended state leakage.
Proposed fix
- inputOffsets = {}):
+ inputOffsets = None):
+ if inputOffsets is None:
+ inputOffsets = {}
super().__init__(graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name,
default_channels_first, deeployStateDir, inputOffsets)Note: GAP9Deployer has the same issue on line 54 and should be fixed identically.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inputOffsets = {}): | |
| super().__init__(graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name, | |
| default_channels_first, deeployStateDir, inputOffsets) | |
| inputOffsets = None): | |
| if inputOffsets is None: | |
| inputOffsets = {} | |
| super().__init__(graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name, | |
| default_channels_first, deeployStateDir, inputOffsets) |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 28-28: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/Deployer.py` around lines 28 - 30, The constructor
currently uses a mutable default inputOffsets = {} which causes shared state;
change the signature for the relevant classes (SignPropDeployer.__init__ and
GAP9Deployer.__init__) to use inputOffsets=None and inside each __init__ set
inputOffsets = {} if inputOffsets is None before passing it to super().__init__
(or before any mutations), so each instance gets its own dict and avoids state
leakage when SignPropDeployer populates the keys.
| def isPWConv(self, node) -> bool: | ||
| return node.op in ["Conv", "RequantizedConv"] and \ | ||
| isinstance(node.inputs[1], gs.Constant) and \ | ||
| node.attrs['kernel_shape'] == [1, 1] and \ | ||
| node.attrs['dilations'] == [1, 1] and \ | ||
| (node.attrs['strides'] == [1, 1] or self.enableStrides) | ||
|
|
||
| def isDWConv(self, node) -> bool: | ||
| return node.op in ["Conv", "RequantizedConv"] and \ | ||
| isinstance(node.inputs[1], gs.Constant) and \ | ||
| node.attrs['kernel_shape'] == [3, 3] and \ | ||
| node.attrs['dilations'] == [1, 1] and \ | ||
| node.attrs['group'] != 1 and \ | ||
| (node.attrs['strides'] == [1, 1] or self.enableStrides) | ||
|
|
||
| def canExecute(self, node: gs.Node) -> bool: | ||
| if self.enable3x3: | ||
| return self.isPWConv(node) or self.isDWConv(node) or self.isDenseConv(node) | ||
| else: | ||
| return self.isPWConv(node) |
There was a problem hiding this comment.
Tighten grouped-convolution classification before coloring.
isPWConv() currently accepts grouped 1x1 convs, and isDWConv() accepts any 3x3 conv with group != 1. The downstream DW path assumes true depthwise semantics, so generic grouped convolutions can be colored for NE16 and then fail later in parsing/tiling.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/Engine.py` around lines 62 - 81, The conv-classification
is too loose: tighten isPWConv to reject grouped 1x1 convolutions by adding a
check that node.attrs['group'] == 1, and tighten isDWConv to only accept true
depthwise convolutions by inspecting the conv weight tensor
(node.inputs[1].values) and requiring the per-group in_channels == 1 and that
the number of groups equals the number of output channels (e.g.,
weights.shape[1] == 1 and weights.shape[0] == node.attrs['group'] or group ==
out_channels); update isPWConv, isDWConv (and thereby canExecute) accordingly so
only bona fide pointwise (group==1) and true depthwise convs are classified for
NE16 coloring.
| weightMemoryOccupation = 0 | ||
|
|
||
| # Current weight memory occupation | ||
| for buffer in {**ctxt.globalObjects, **ctxt.localObjects}.values(): | ||
| if hasattr(buffer, "_memoryLevel") and buffer._memoryLevel == self._weightMemoryLevel.name: | ||
| weightMemoryOccupation += _ne16WeightBufferSize(buffer) | ||
|
|
||
| ne16Nodes = [node for node in graph.nodes if node.attrs["engine"] == self.ne16EngineName] | ||
| for node in ne16Nodes: | ||
| if node.op in ["Conv", "RequantizedConv"]: | ||
|
|
||
| if not (ctxt.is_local(node.inputs[1].name) or ctxt.is_global(node.inputs[1].name)): | ||
| continue | ||
|
|
||
| buffer = ctxt.lookup(node.inputs[1].name) | ||
| if weightMemoryOccupation + _ne16WeightBufferSize(buffer) < self._weightMemoryLevel.size: | ||
| buffer._memoryLevel = self._weightMemoryLevel.name | ||
| weightMemoryOccupation += _ne16WeightBufferSize(buffer) |
There was a problem hiding this comment.
Avoid double-counting reused weights and allow exact-fit placement.
If two NE16 nodes share the same weight buffer, this loop charges the same buffer twice once it has already been annotated in this pass. The strict < check also skips buffers that exactly fill the remaining SRAM. Both cases leave feasible placements unassigned.
Suggested fix
ne16Nodes = [node for node in graph.nodes if node.attrs["engine"] == self.ne16EngineName]
for node in ne16Nodes:
if node.op in ["Conv", "RequantizedConv"]:
if not (ctxt.is_local(node.inputs[1].name) or ctxt.is_global(node.inputs[1].name)):
continue
buffer = ctxt.lookup(node.inputs[1].name)
- if weightMemoryOccupation + _ne16WeightBufferSize(buffer) < self._weightMemoryLevel.size:
- buffer._memoryLevel = self._weightMemoryLevel.name
- weightMemoryOccupation += _ne16WeightBufferSize(buffer)
+ if getattr(buffer, "_memoryLevel", None) == self._weightMemoryLevel.name:
+ continue
+
+ bufferSize = _ne16WeightBufferSize(buffer)
+ if weightMemoryOccupation + bufferSize <= self._weightMemoryLevel.size:
+ buffer._memoryLevel = self._weightMemoryLevel.name
+ weightMemoryOccupation += bufferSize
return ctxt, graph🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.py`
around lines 27 - 44, The loop that sums and assigns weight memory uses
weightMemoryOccupation and then re-adds the same buffer when multiple NE16 nodes
reference it and uses a strict "<" so exact-fit buffers are skipped; change the
logic so before calling _ne16WeightBufferSize(buffer) or adding to
weightMemoryOccupation you check whether buffer._memoryLevel is already equal to
self._weightMemoryLevel.name (skip counting/annotating if already placed) and
use a non-strict comparison (<=) against self._weightMemoryLevel.size when
deciding to assign buffer._memoryLevel; update references in this block
(ne16Nodes iteration, ctxt.lookup(node.inputs[1].name), buffer._memoryLevel,
_ne16WeightBufferSize, and self._weightMemoryLevel.size) accordingly so buffers
aren’t double-counted and exact-fit placements are allowed.
| weightOutChannelVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 0) | ||
| weightInChannelMajorVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 1) | ||
| weightBitsVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 2) | ||
| weightBandwidthVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 3) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify weight tensor shape after encoding and check if dimIdx 3 causes issues
echo "=== Check TilerModel.getTensorDimVar implementation ==="
ast-grep --pattern 'def getTensorDimVar($$$):
$$$
'
echo -e "\n=== Check if weight shape assertions exist in tiler ==="
rg -n "getTensorDimVar.*weight" --type py -C 2Repository: pulp-platform/Deeploy
Length of output: 17734
🏁 Script executed:
# Check if the variables declared at lines 41-44 are used later in the function
sed -n '1,100p' Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py | cat -nRepository: pulp-platform/Deeploy
Length of output: 6259
🏁 Script executed:
# Check if similar variables are used in the comparable Neureka file
sed -n '40,80p' Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py | cat -nRepository: pulp-platform/Deeploy
Length of output: 2692
🏁 Script executed:
# Find where NE16AdjustWeightMemoryLayoutPass is defined and check weight shape
find . -name "*NE16AdjustWeightMemoryLayoutPass*" -o -name "*Parsers.py" | head -5Repository: pulp-platform/Deeploy
Length of output: 246
🏁 Script executed:
# Search for all uses of the suspect variables in NE16DenseConstraint.py
rg -n "weightInChannelMajorVar|weightBitsVar|weightBandwidthVar" Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.pyRepository: pulp-platform/Deeploy
Length of output: 370
🏁 Script executed:
# Check NE16 Parsers.py for weight shape information
cat -n Deeploy/Targets/NE16/Parsers.py | head -100Repository: pulp-platform/Deeploy
Length of output: 4765
🏁 Script executed:
# Compare with Neureka - check if those variables are used there
rg -n "weightInChannelMajorVar|weightBitsVar|weightBandwidthVar" Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.pyRepository: pulp-platform/Deeploy
Length of output: 370
🏁 Script executed:
# Check if getTensorDimVar creates variables on-demand or retrieves them
cat -n Deeploy/TilingExtension/TilerModel.py | sed -n '125,145p'Repository: pulp-platform/Deeploy
Length of output: 1096
🏁 Script executed:
# Check addTensorDimToModel to see what dimensions it adds to the model
rg -n "def addTensorDimToModel" Deeploy/TilingExtension/TilerModel.py -A 15Repository: pulp-platform/Deeploy
Length of output: 762
🏁 Script executed:
# Verify the runtime error scenario: check if weight is 3D and dimIdx=3 would fail
python3 << 'EOF'
# Simulate the getTensorDimVar call
weight_shape = (64, 32*64, 128) # Example 3D weight: (cout, cinMajor, bits*spatial)
num_dims = len(weight_shape)
print(f"Weight shape: {weight_shape}")
print(f"Number of dimensions: {num_dims}")
print(f"Valid dimIdx range: 0 to {num_dims - 1}")
# Try to access dimIdx=3
for dimIdx in [0, 1, 2, 3]:
varName = f"weight_dim_{dimIdx}"
if dimIdx < num_dims:
print(f"✓ dimIdx={dimIdx}: {varName} would exist (size: {weight_shape[dimIdx]})")
else:
print(f"✗ dimIdx={dimIdx}: {varName} would NOT exist (KeyError)")
EOFRepository: pulp-platform/Deeploy
Length of output: 344
🏁 Script executed:
# Double-check the complete addGeometricalConstraint function to confirm these variables aren't used
wc -l Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py
sed -n '24,90p' Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py | grep -E "weightInChannelMajorVar|weightBitsVar|weightBandwidthVar"Repository: pulp-platform/Deeploy
Length of output: 425
Remove unused weight dimension variables that cause runtime KeyError.
Lines 42-44 declare variables that are never used and access a non-existent dimension. Per Parsers.py line 60, NE16 weight tensors are 3D (cout, cinMajor, bits*spatial) after encoding, with valid dimensions 0–2. Attempting to access dimIdx=3 via getTensorDimVar will fail at runtime with a KeyError.
Remove these three lines:
weightInChannelMajorVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 1)
weightBitsVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 2)
weightBandwidthVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 3)Keep only line 41: weightOutChannelVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 0), which is actually used in the function.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py` around lines 41
- 44, The three unused weight-dimension variable declarations in
NE16DenseConstraint.py (weightInChannelMajorVar, weightBitsVar,
weightBandwidthVar) attempt to access non-existent dims and cause a runtime
KeyError; remove the lines that create weightInChannelMajorVar, weightBitsVar
and weightBandwidthVar and keep only weightOutChannelVar =
tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = 0) which is
the only weight dim used in this function (adjust any references to those
removed symbols if present elsewhere in the function).
| # Map output dims to inputs dims | ||
| tilerModel.addConstraint(outputBatchVar == inputBatchVar) | ||
| tilerModel.addConstraint(outputHeightVar == inputHeightVar) | ||
| tilerModel.addConstraint(outputWidthVar == inputWidthVar) |
There was a problem hiding this comment.
Model pointwise output geometry with stride, not identity.
This forces outputHeightVar == inputHeightVar and outputWidthVar == inputWidthVar, but NE16Engine.canExecute() explicitly allows strided 1x1 convs when enableStrides is set. Those tiles need the usual (effective_input - 1) // stride + 1 relation, otherwise the model is wrong as soon as stride > 1.
Suggested fix
- # Map output dims to inputs dims
+ # Map output dims to input dims
tilerModel.addConstraint(outputBatchVar == inputBatchVar)
- tilerModel.addConstraint(outputHeightVar == inputHeightVar)
- tilerModel.addConstraint(outputWidthVar == inputWidthVar)
+ strides = parseDict["strides"]
+ padding = parseDict["pads"]
+ inputBuffer = ctxt.lookup(inputBufferName)
+ effectiveHeight = inputHeightVar + ((padding[0] + padding[2]) * (inputHeightVar == inputBuffer.shape[1]))
+ effectiveWidth = inputWidthVar + ((padding[1] + padding[3]) * (inputWidthVar == inputBuffer.shape[2]))
+ tilerModel.addConstraint(outputHeightVar == (effectiveHeight - 1) // strides[0] + 1)
+ tilerModel.addConstraint(outputWidthVar == (effectiveWidth - 1) // strides[1] + 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py` around lines
43 - 46, The current constraints in NE16PointwiseConstraint.py bind
outputHeightVar==inputHeightVar and outputWidthVar==inputWidthVar which breaks
strided 1x1 convs allowed by NE16Engine.canExecute when enableStrides is true;
replace those identity constraints (the tilerModel.addConstraint calls for
outputHeightVar/outputWidthVar) with the proper stride-aware relation using the
stride variables (e.g., strideH/strideW or the stride parameter used by
NE16Engine) so outputDim = (effective_input_dim - 1) // stride + 1 for height
and width respectively, preserving the existing batch equality (outputBatchVar
== inputBatchVar).
| values = weightTensor.values | ||
|
|
||
| # Extract weight offset and translate weights by the offset | ||
| weight_offset = values.min() | ||
| values = values - weight_offset | ||
| node.attrs["weight_offset"] = weight_offset | ||
|
|
||
| if "channels_first" in node.attrs: | ||
| channels_first = node.attrs["channels_first"] | ||
| else: | ||
| channels_first = default_channels_first | ||
|
|
||
| # Weight encode expects channels first | ||
| if not channels_first: | ||
| values = values.transpose(0, 3, 1, 2) | ||
|
|
||
| bits = 8 # Support only 8 bit weights for now | ||
| if node.attrs['group'] == 1: | ||
| weightTensor.values = _weightEncode(values.astype(np.uint8), bits, depthwise = False) | ||
| else: | ||
| weightTensor.values = _weightEncode(values.astype(np.uint8), bits, depthwise = True) |
There was a problem hiding this comment.
Potential integer overflow when subtracting weight offset.
If values has dtype int8 (range -128 to 127), the subtraction at line 85 can overflow before the astype(np.uint8) at line 99. For example, with weight_offset = -128 and a value of 127:
127 - (-128) = 255 # overflows int8, wraps to -1
Cast to a wider type before subtraction to avoid wraparound:
🐛 Proposed fix
# Adjust N-EUREKA's weights
values = weightTensor.values
# Extract weight offset and translate weights by the offset
weight_offset = values.min()
- values = values - weight_offset
+ values = values.astype(np.int32) - int(weight_offset)
node.attrs["weight_offset"] = weight_offset
if "channels_first" in node.attrs:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| values = weightTensor.values | |
| # Extract weight offset and translate weights by the offset | |
| weight_offset = values.min() | |
| values = values - weight_offset | |
| node.attrs["weight_offset"] = weight_offset | |
| if "channels_first" in node.attrs: | |
| channels_first = node.attrs["channels_first"] | |
| else: | |
| channels_first = default_channels_first | |
| # Weight encode expects channels first | |
| if not channels_first: | |
| values = values.transpose(0, 3, 1, 2) | |
| bits = 8 # Support only 8 bit weights for now | |
| if node.attrs['group'] == 1: | |
| weightTensor.values = _weightEncode(values.astype(np.uint8), bits, depthwise = False) | |
| else: | |
| weightTensor.values = _weightEncode(values.astype(np.uint8), bits, depthwise = True) | |
| values = weightTensor.values | |
| # Extract weight offset and translate weights by the offset | |
| weight_offset = values.min() | |
| values = values.astype(np.int32) - int(weight_offset) | |
| node.attrs["weight_offset"] = weight_offset | |
| if "channels_first" in node.attrs: | |
| channels_first = node.attrs["channels_first"] | |
| else: | |
| channels_first = default_channels_first | |
| # Weight encode expects channels first | |
| if not channels_first: | |
| values = values.transpose(0, 3, 1, 2) | |
| bits = 8 # Support only 8 bit weights for now | |
| if node.attrs['group'] == 1: | |
| weightTensor.values = _weightEncode(values.astype(np.uint8), bits, depthwise = False) | |
| else: | |
| weightTensor.values = _weightEncode(values.astype(np.uint8), bits, depthwise = True) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py` around lines 81 -
101, The subtraction of weight_offset from values in Passes.py can overflow if
values is int8; convert values to a wider integer dtype (e.g., np.int16 or
np.int32) before computing weight_offset and performing values = values -
weight_offset, then ensure the resulting values are clipped to the 0–255 range
and finally cast to np.uint8 just before calling _weightEncode (refer to
weightTensor.values, weight_offset, values, and _weightEncode) so the transpose
and depthwise logic remain unchanged.
| parser.add_argument('--ne16-wmem', action = 'store_true', help = 'Enable NE16 weight memory\n') | ||
| parser.add_argument('--enable-3x3', action = 'store_true', help = 'Enable 3x3 convolutions\n') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify end-to-end wiring for NE16-specific runner flags.
rg -nP --type=py -C3 'ne16[_-]wmem|enable[_-]3x3|hasattr\(args,\s*.\s*ne16|hasattr\(args,\s*.\s*enable' DeeployTest/testUtils/deeployRunner.py DeeployTest/deeployRunner_tiled_gap9_w_ne16.py DeeployTest/testUtils/core/execution.py
rg -nP --type=cmake -C3 'NE16|3x3|WMEM|NUM_CORES' CMakeLists.txt DeeployTest/CMakeLists.txt TargetLibraries/GAP9/CMakeLists.txtRepository: pulp-platform/Deeploy
Length of output: 6815
🏁 Script executed:
# Search for main() function and how it handles arguments
rg -n 'def main' DeeployTest/testUtils/deeployRunner.py -A 20
# Look for args usage with underscore conversion (--ne16-wmem → args.ne16_wmem)
rg -n 'args\.ne16|args\.enable' DeeployTest/testUtils/deeployRunner.py DeeployTest/deeployRunner_tiled_gap9_w_ne16.py
# Check the entire flow in the runner file
cat -n DeeployTest/deeployRunner_tiled_gap9_w_ne16.pyRepository: pulp-platform/Deeploy
Length of output: 2136
🏁 Script executed:
# Get more of the main() function to see how it uses parser_setup_callback
sed -n '311,450p' DeeployTest/testUtils/deeployRunner.py | cat -n
# Also search for any reference to ne16_wmem or enable_3x3 after variable names are converted
rg -n 'ne16_wmem|enable_3x3' DeeployTest/testUtils/deeployRunner.py DeeployTest/testUtils/core/execution.pyRepository: pulp-platform/Deeploy
Length of output: 5539
Add handling for --ne16-wmem and --enable-3x3 flags in main().
The flags are defined on Lines 15-16, but unlike --cores (which is properly handled at deeployRunner.py:90-91), there is no downstream consumption of --ne16-wmem or --enable-3x3. These arguments are parsed but never converted to CMake options or used anywhere in execution. Either wire them into CMake configuration (e.g., -DENABLE_NE16_WMEM, -DENABLE_3X3_CONV), or remove them if not yet implemented.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DeeployTest/deeployRunner_tiled_gap9_w_ne16.py` around lines 15 - 16, The two
parsed flags --ne16-wmem and --enable-3x3 are never consumed; update main() to
translate args.ne16_wmem and args.enable_3x3 into CMake defines (e.g., add
"-DENABLE_NE16_WMEM=ON" when args.ne16_wmem is true and "-DENABLE_3X3_CONV=ON"
when args.enable_3x3 is true) and append them to the same cmake argument list
used for building (the variable that assembles cmake args / options or the call
that runs cmake/configure in main()); if the project expects OFF values when
flags are absent, explicitly add =OFF otherwise. Ensure you reference the
existing parser flags (--ne16-wmem / --enable-3x3) and modify the code path that
constructs cmake arguments in main() so these options are passed to CMake.
1153077 to
4edb011
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
Deeploy/Targets/NE16/Templates/ConvTemplate.py (1)
352-356:⚠️ Potential issue | 🔴 CriticalFix invalid designated initializer syntax for
weights_stride.The line
task.data.cfg.weights_stride = (ne16_stride_t) {...}is inside the.cfg = (ne16_cfg_t) { ... }designated initializer block. Within designated initializers, you must use.fieldname = valuesyntax. The current syntax is invalid C and will cause compilation errors.🐛 Proposed fix
- task.data.cfg.weights_stride = (ne16_stride_t) { + .weights_stride = (ne16_stride_t) { .d0 = ${weightStrideD0}, .d1 = ${weightStrideD1}, .d2 = ${weightStrideD2} },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/Templates/ConvTemplate.py` around lines 352 - 356, Inside the designated initializer for task.data.cfg (the .cfg = (ne16_cfg_t) { ... } block) the assignment to weights_stride uses an invalid nested C cast-style initializer; replace it with a proper designated-field initializer by setting .weights_stride = { .d0 = ${weightStrideD0}, .d1 = ${weightStrideD1}, .d2 = ${weightStrideD2} } (or include an explicit .weights_stride = (ne16_stride_t){ .d0 = ..., .d1 = ..., .d2 = ... } if you prefer the cast) so the field name .weights_stride is used with a brace-enclosed initializer compatible with ne16_stride_t within the ne16_cfg_t designated initializer.
🧹 Nitpick comments (8)
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1)
7-11: Minor style nit: Consider grouping all imports before module-level assignments.The assignment
float32_tPtr = PointerClass(float32_t)on line 10 appears between import statements. Per PEP 8 conventions, imports should typically be grouped together at the top of the file before other statements.♻️ Suggested reordering
from Deeploy.AbstractDataTypes import PointerClass from Deeploy.CommonExtensions.DataTypes import float32_t - -float32_tPtr = PointerClass(float32_t) from Deeploy.DeeployTypes import NetworkContext, NodeTemplate, OperatorRepresentation + +float32_tPtr = PointerClass(float32_t)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py` around lines 7 - 11, Move the module-level assignment float32_tPtr = PointerClass(float32_t) so all import statements are grouped first: keep the imports for PointerClass, float32_t, and DeeployTypes together at the top, then place the float32_tPtr assignment immediately after those imports; ensure you reference PointerClass and float32_t exactly as they are imported and leave NetworkContext, NodeTemplate, OperatorRepresentation imports unchanged.DeeployTest/test_platforms.py (1)
1004-1006: Unusedconfig_namevariable (consistent with existing tests).The
config_namevariable is unpacked but never used. This matches the pattern in all other tiled test functions in this file (e.g., lines 317, 346, 808, etc.). For consistency, you could prefix it with an underscore to signal intent:- test_name, l1, config_name = test_params + test_name, l1, _config_name = test_paramsThis is a low-priority style nit since the existing codebase follows the same pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@DeeployTest/test_platforms.py` around lines 1004 - 1006, In test_gap9_w_ne16_tiled_kernels_l2_singlebuffer, the unpacked variable config_name is unused; rename it to _config_name (or prefix with an underscore) in the function signature to indicate it's intentionally unused and match the existing tiled test patterns (refer to function test_gap9_w_ne16_tiled_kernels_l2_singlebuffer)..github/workflows/_runner-gap9-w-ne16-tiled.yml (1)
36-36: Consider logging pip install failures instead of silently ignoring them.The
|| truesuppresses all pip install failures silently. If the install genuinely fails, subsequent test runs will fail with confusing errors. Consider logging the failure while still allowing the workflow to proceed:- pip install -e . || true + pip install -e . || echo "WARNING: pip install failed, tests may not work correctly"Alternatively, if the install is expected to always succeed in CI, remove
|| trueto fail fast on broken dependencies.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/_runner-gap9-w-ne16-tiled.yml at line 36, The workflow step currently runs "pip install -e . || true", which silently swallows installation failures; change this to either remove the "|| true" so the job fails fast on pip install errors, or replace it with logic that captures the pip exit status and logs the failure before continuing (e.g., run pip install -e . and on failure emit a clear error log message referencing the pip install step); locate the exact step containing "pip install -e . || true" in the workflow and update it accordingly so failures are not silently ignored.Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py (1)
161-162: Prefix unused unpacked variables with underscore.Per static analysis hints, several unpacked variables are never used. Prefix them with underscore to indicate intentional non-use.
♻️ Suggested fix
- (BatchOffset, HOffset, WOffset, COffset) = cube.offset - (BatchSize, HSize, WSize, CSize) = cube.dims + (_BatchOffset, _HOffset, _WOffset, COffset) = cube.offset + (_BatchSize, HSize, WSize, CSize) = cube.dims- inBSize, inHSize, inWSize, inCSize = InCube.dims + _inBSize, _inHSize, inWSize, inCSize = InCube.dimsAlso applies to: 173-173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py` around lines 161 - 162, The tuple unpackings in NE16DenseConstraint where you extract cube.offset into (BatchOffset, HOffset, WOffset, COffset) and cube.dims into (BatchSize, HSize, WSize, CSize) (and the similar unpacking around line 173) include variables that are never used; update these unpackings to prefix the unused names with an underscore (e.g., _BatchOffset, _HOffset, etc.) so static analysis recognizes intentional non-use while keeping any used variables unchanged and preserving the tuple structure in the methods within the NE16DenseConstraint class.Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py (1)
79-80: Remove unusedpaddingvariable.The
paddingvariable is read fromparseDictbut never used inaddPolicyConstraint.♻️ Suggested fix
strides = parseDict["strides"] - padding = parseDict["pads"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py` around lines 79 - 80, In addPolicyConstraint (class NE16PointwiseConstraint) the local variable padding is assigned from parseDict["pads"] but never used; remove the unused assignment (the line `padding = parseDict["pads"]`) so there is no dead variable left, or if padding is intended to be used, instead reference it where needed—locate the parseDict usage in addPolicyConstraint and either delete the padding assignment or integrate padding into the constraint logic.Deeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.py (2)
29-31: Remove unuseddilationvariable.The
dilationvariable is extracted but never used in the function. Since NE16 doesn't support dilation (enforced by parser'sdilations == [1, 1]check), this variable is unnecessary.♻️ Suggested fix
strides = parseDict["strides"] padding = parseDict["pads"] - dilation = parseDict["dilations"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.py` around lines 29 - 31, Remove the unused local variable `dilation` that is assigned from `parseDict["dilations"]`; since the parser enforces `dilations == [1, 1]` and NE16 does not support dilation, delete the line `dilation = parseDict["dilations"]` (and any dead code referencing `dilation`) from the function so only `strides = parseDict["strides"]` and `padding = parseDict["pads"]` remain, ensuring no other code expects `dilation`.
160-162: Use already-looked-upoutputBufferinstead of redundant lookup.
outputBufferwas already retrieved at line 153. Line 162 performs an unnecessary second lookup.♻️ Suggested fix
InCube, padding_tuple = Conv2DTileConstraint.computeInputCube((weightH, weightW), pads, strides, weightC, cube, - ctxt.lookup(varOut).shape) + outputBuffer.shape)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.py` around lines 160 - 162, Replace the redundant ctxt.lookup(varOut).shape call with the already-fetched outputBuffer.shape in the NE16DepthwiseConstraint code: when calling Conv2DTileConstraint.computeInputCube (which returns InCube and padding_tuple), pass outputBuffer.shape instead of performing a second ctxt.lookup(varOut) to avoid the duplicate lookup and use the previously retrieved outputBuffer variable.Deeploy/Targets/NE16/Parsers.py (1)
56-56: Use idiomatic boolean check instead of equality comparison toFalse.Per Ruff E712, avoid explicit equality comparisons to
False.♻️ Suggested fix
if not all([ - channels_first == False, + not channels_first, len(data_in.shape) == 4,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/Parsers.py` at line 56, Replace the explicit equality check "channels_first == False" with the idiomatic boolean negation "not channels_first" wherever it appears (e.g., in the call/config constructing the parser using the channels_first variable) to comply with Ruff E712 and avoid direct comparison to False.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Deeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.py`:
- Line 34: The list comprehension that builds ne16Nodes uses
node.attrs["engine"] directly and can raise KeyError for nodes without an
"engine"; change it to safely check for the key (e.g., use
node.attrs.get("engine") == self.ne16EngineName or "engine" in node.attrs and
node.attrs["engine"] == self.ne16EngineName) so only nodes with an engine
attribute are compared against self.ne16EngineName; update the expression that
defines ne16Nodes accordingly in MemoryLevelAnnotationPasses.py.
---
Duplicate comments:
In `@Deeploy/Targets/NE16/Templates/ConvTemplate.py`:
- Around line 352-356: Inside the designated initializer for task.data.cfg (the
.cfg = (ne16_cfg_t) { ... } block) the assignment to weights_stride uses an
invalid nested C cast-style initializer; replace it with a proper
designated-field initializer by setting .weights_stride = { .d0 =
${weightStrideD0}, .d1 = ${weightStrideD1}, .d2 = ${weightStrideD2} } (or
include an explicit .weights_stride = (ne16_stride_t){ .d0 = ..., .d1 = ..., .d2
= ... } if you prefer the cast) so the field name .weights_stride is used with a
brace-enclosed initializer compatible with ne16_stride_t within the ne16_cfg_t
designated initializer.
---
Nitpick comments:
In @.github/workflows/_runner-gap9-w-ne16-tiled.yml:
- Line 36: The workflow step currently runs "pip install -e . || true", which
silently swallows installation failures; change this to either remove the "||
true" so the job fails fast on pip install errors, or replace it with logic that
captures the pip exit status and logs the failure before continuing (e.g., run
pip install -e . and on failure emit a clear error log message referencing the
pip install step); locate the exact step containing "pip install -e . || true"
in the workflow and update it accordingly so failures are not silently ignored.
In `@Deeploy/Targets/NE16/Parsers.py`:
- Line 56: Replace the explicit equality check "channels_first == False" with
the idiomatic boolean negation "not channels_first" wherever it appears (e.g.,
in the call/config constructing the parser using the channels_first variable) to
comply with Ruff E712 and avoid direct comparison to False.
In `@Deeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.py`:
- Around line 161-162: The tuple unpackings in NE16DenseConstraint where you
extract cube.offset into (BatchOffset, HOffset, WOffset, COffset) and cube.dims
into (BatchSize, HSize, WSize, CSize) (and the similar unpacking around line
173) include variables that are never used; update these unpackings to prefix
the unused names with an underscore (e.g., _BatchOffset, _HOffset, etc.) so
static analysis recognizes intentional non-use while keeping any used variables
unchanged and preserving the tuple structure in the methods within the
NE16DenseConstraint class.
In `@Deeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.py`:
- Around line 29-31: Remove the unused local variable `dilation` that is
assigned from `parseDict["dilations"]`; since the parser enforces `dilations ==
[1, 1]` and NE16 does not support dilation, delete the line `dilation =
parseDict["dilations"]` (and any dead code referencing `dilation`) from the
function so only `strides = parseDict["strides"]` and `padding =
parseDict["pads"]` remain, ensuring no other code expects `dilation`.
- Around line 160-162: Replace the redundant ctxt.lookup(varOut).shape call with
the already-fetched outputBuffer.shape in the NE16DepthwiseConstraint code: when
calling Conv2DTileConstraint.computeInputCube (which returns InCube and
padding_tuple), pass outputBuffer.shape instead of performing a second
ctxt.lookup(varOut) to avoid the duplicate lookup and use the previously
retrieved outputBuffer variable.
In `@Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py`:
- Around line 79-80: In addPolicyConstraint (class NE16PointwiseConstraint) the
local variable padding is assigned from parseDict["pads"] but never used; remove
the unused assignment (the line `padding = parseDict["pads"]`) so there is no
dead variable left, or if padding is intended to be used, instead reference it
where needed—locate the parseDict usage in addPolicyConstraint and either delete
the padding assignment or integrate padding into the constraint logic.
In `@Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py`:
- Around line 7-11: Move the module-level assignment float32_tPtr =
PointerClass(float32_t) so all import statements are grouped first: keep the
imports for PointerClass, float32_t, and DeeployTypes together at the top, then
place the float32_tPtr assignment immediately after those imports; ensure you
reference PointerClass and float32_t exactly as they are imported and leave
NetworkContext, NodeTemplate, OperatorRepresentation imports unchanged.
In `@DeeployTest/test_platforms.py`:
- Around line 1004-1006: In test_gap9_w_ne16_tiled_kernels_l2_singlebuffer, the
unpacked variable config_name is unused; rename it to _config_name (or prefix
with an underscore) in the function signature to indicate it's intentionally
unused and match the existing tiled test patterns (refer to function
test_gap9_w_ne16_tiled_kernels_l2_singlebuffer).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0e7e2d4c-444f-4b78-b2b5-22aef5e9a86f
📒 Files selected for processing (32)
.github/workflows/_runner-gap9-w-ne16-tiled.yml.github/workflows/ci-platform-gap9-w-ne16-tiled.ymlCMakeLists.txtDeeploy/Targets/NE16/Bindings.pyDeeploy/Targets/NE16/Deployer.pyDeeploy/Targets/NE16/Engine.pyDeeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.pyDeeploy/Targets/NE16/OptimizationPasses/__init__.pyDeeploy/Targets/NE16/Parsers.pyDeeploy/Targets/NE16/Platform.pyDeeploy/Targets/NE16/Templates/AllocateTemplate.pyDeeploy/Targets/NE16/Templates/ConvTemplate.pyDeeploy/Targets/NE16/Templates/__init__.pyDeeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.pyDeeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.pyDeeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.pyDeeploy/Targets/NE16/TileConstraints/RequantHelpers.pyDeeploy/Targets/NE16/TileConstraints/__init__.pyDeeploy/Targets/NE16/Tiler.pyDeeploy/Targets/NE16/TopologyOptimizationPasses/Passes.pyDeeploy/Targets/NE16/TopologyOptimizationPasses/__init__.pyDeeploy/Targets/NE16/__init__.pyDeeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.pyDeeployTest/CMakeLists.txtDeeployTest/conftest.pyDeeployTest/deeployRunner_tiled_gap9_w_ne16.pyDeeployTest/testMVP.pyDeeployTest/testUtils/core/execution.pyDeeployTest/testUtils/platformMapping.pyDeeployTest/test_gap9_ne16_tiled_config.pyDeeployTest/test_platforms.pyTargetLibraries/GAP9/CMakeLists.txt
✅ Files skipped from review due to trivial changes (5)
- DeeployTest/testMVP.py
- DeeployTest/testUtils/core/execution.py
- DeeployTest/CMakeLists.txt
- Deeploy/Targets/NE16/Tiler.py
- DeeployTest/test_gap9_ne16_tiled_config.py
🚧 Files skipped from review as they are similar to previous changes (6)
- TargetLibraries/GAP9/CMakeLists.txt
- CMakeLists.txt
- DeeployTest/testUtils/platformMapping.py
- DeeployTest/deeployRunner_tiled_gap9_w_ne16.py
- Deeploy/Targets/NE16/TileConstraints/RequantHelpers.py
- Deeploy/Targets/NE16/Platform.py
| if hasattr(buffer, "_memoryLevel") and buffer._memoryLevel == self._weightMemoryLevel.name: | ||
| weightMemoryOccupation += _ne16WeightBufferSize(buffer) | ||
|
|
||
| ne16Nodes = [node for node in graph.nodes if node.attrs["engine"] == self.ne16EngineName] |
There was a problem hiding this comment.
Potential KeyError if node lacks engine attribute.
The list comprehension accesses node.attrs["engine"] without checking if the key exists. Nodes not assigned to an engine may lack this attribute.
🛡️ Proposed defensive fix
- ne16Nodes = [node for node in graph.nodes if node.attrs["engine"] == self.ne16EngineName]
+ ne16Nodes = [node for node in graph.nodes if node.attrs.get("engine") == self.ne16EngineName]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.py` at
line 34, The list comprehension that builds ne16Nodes uses node.attrs["engine"]
directly and can raise KeyError for nodes without an "engine"; change it to
safely check for the key (e.g., use node.attrs.get("engine") ==
self.ne16EngineName or "engine" in node.attrs and node.attrs["engine"] ==
self.ne16EngineName) so only nodes with an engine attribute are compared against
self.ne16EngineName; update the expression that defines ne16Nodes accordingly in
MemoryLevelAnnotationPasses.py.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (5)
Deeploy/Targets/NE16/Deployer.py (1)
19-30:⚠️ Potential issue | 🟠 MajorAvoid shared
inputOffsetsstate across deployer instances.
inputOffsets = {}still creates one dict shared by everyNE16Deployerbuilt without an explicit argument, so per-instance state can leak between deployments.Suggested fix
- inputOffsets = {}): + inputOffsets = None): + if inputOffsets is None: + inputOffsets = {} super().__init__(graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name, default_channels_first, deeployStateDir, inputOffsets)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/Deployer.py` around lines 19 - 30, The constructor __init__ currently uses a mutable default inputOffsets = {} which shares state across instances; change the parameter default to inputOffsets: Optional[Dict] = None and inside __init__ (before calling super().__init__) set inputOffsets = {} if it's None, then pass that local dict into super().__init__; update the function signature and ensure any references to inputOffsets in NE16Deployer.__init__ use the per-instance dict created when None is supplied.Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py (1)
90-106:⚠️ Potential issue | 🟠 MajorWiden before subtracting
weight_offset.If
weightTensor.valuesisint8,values - weight_offsethappens in the narrow dtype and can wrap before the later cast touint8, corrupting the encoded weights.Suggested fix
- weight_offset = values.min() - values = values - weight_offset + weight_offset = int(values.min()) + values = values.astype(np.int32) - weight_offset node.attrs["weight_offset"] = weight_offset🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py` around lines 90 - 106, The subtraction values = values - weight_offset can underflow when weightTensor.values is int8; before computing the offset and subtracting, widen values to a safe integer type (e.g., int16/int32) so the subtraction is performed without wraparound, then clip/normalize and finally cast to np.uint8 before calling _weightEncode; update the operations around weight_offset, values, and the subsequent transpose/astype steps (referencing weight_offset, values, weightTensor.values, and _weightEncode) to perform widening, subtraction, any needed clamping, and only then cast to uint8 for encoding.Deeploy/Targets/NE16/Engine.py (1)
62-75:⚠️ Potential issue | 🟠 MajorGrouped convolutions are still being classified as NE16 PW/DW.
isPWConv()still accepts grouped 1x1 convs, andisDWConv()still accepts any grouped 3x3 conv. That will color generic grouped convolutions for NE16 even though the downstream paths only handle regular pointwise (group == 1) or true depthwise weights.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/Engine.py` around lines 62 - 75, isPWConv currently allows grouped 1x1 convs and isDWConv accepts any grouped 3x3 conv; change the predicates to only mark true pointwise (group == 1) and true depthwise (group == number of input channels) respectively: update isPWConv to include node.attrs['group'] == 1 in its condition, and update isDWConv to require node.attrs['group'] equal the convolution's input channel count (i.e., only treat it as depthwise when group == input_channels) while keeping the existing kernel_shape, dilation and stride checks.Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py (1)
43-46:⚠️ Potential issue | 🟠 MajorModel pointwise output geometry from stride, not identity.
These constraints only describe stride-1 pointwise conv. If
NE16Engine.enableStridesis ever enabled,H_out/W_outbecome wrong immediately because the tiler still forcesoutputHeightVar == inputHeightVarandoutputWidthVar == inputWidthVar.Suggested fix
# Map output dims to inputs dims tilerModel.addConstraint(outputBatchVar == inputBatchVar) - tilerModel.addConstraint(outputHeightVar == inputHeightVar) - tilerModel.addConstraint(outputWidthVar == inputWidthVar) + strides = parseDict["strides"] + padding = parseDict["pads"] + inputBuffer = ctxt.lookup(inputBufferName) + effectiveHeight = inputHeightVar + ((padding[0] + padding[2]) * (inputHeightVar == inputBuffer.shape[1])) + effectiveWidth = inputWidthVar + ((padding[1] + padding[3]) * (inputWidthVar == inputBuffer.shape[2])) + tilerModel.addConstraint(outputHeightVar == (effectiveHeight - 1) // strides[0] + 1) + tilerModel.addConstraint(outputWidthVar == (effectiveWidth - 1) // strides[1] + 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py` around lines 43 - 46, The current constraints in NE16PointwiseConstraint map outputHeightVar and outputWidthVar directly to inputHeightVar/inputWidthVar which only works for stride=1; update the constraints to compute output dims from the pointwise stride when NE16Engine.enableStrides is used (use the convolution output formula for 1x1: H_out = ceil(inputHeight / strideH), W_out = ceil(inputWidth / strideW)). Replace the equalities on outputHeightVar and outputWidthVar with expressions that use the pointwise stride variables (e.g., strideH/strideW or the engine's stride config) and implement the integer ceil division ((input + stride - 1) / stride) so tilerModel.addConstraint uses outputHeightVar == (inputHeightVar + strideHVar - 1) / strideHVar and similarly for width; keep the batch mapping (outputBatchVar == inputBatchVar) as-is.Deeploy/Targets/NE16/Templates/ConvTemplate.py (1)
352-356:⚠️ Potential issue | 🔴 CriticalUse a designated field initializer for
weights_stride.
task.data.cfg.weights_stride = ...is not valid inside a C compound literal. This needs to be.weights_stride = ...like the neighboring fields, otherwise the generated kernel code will not compile.Proposed fix
- task.data.cfg.weights_stride = (ne16_stride_t) { + .weights_stride = (ne16_stride_t) { .d0 = ${weightStrideD0}, .d1 = ${weightStrideD1}, .d2 = ${weightStrideD2} },Is `task.data.cfg.weights_stride = ...` valid inside a C designated initializer, or must it be written as `.weights_stride = ...`?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/Templates/ConvTemplate.py` around lines 352 - 356, Replace the invalid in-line assignment inside the C compound literal with a designated initializer: change the occurrence of "task.data.cfg.weights_stride = (ne16_stride_t) { ... }" so that within the compound literal for task.data.cfg you use ".weights_stride = (ne16_stride_t) { .d0 = ..., .d1 = ..., .d2 = ... }" (i.e., reference the struct member as .weights_stride) so the generated kernel compiles; update the initializer near the other designated fields in the same compound literal where task.data.cfg is constructed.
🧹 Nitpick comments (1)
Deeploy/Targets/NE16/TileConstraints/__init__.py (1)
5-5: Consider using explicit submodule imports for clarity.The
from . import *statement in a package__init__.pydoesn't automatically import submodules—it only imports names from__all__(if defined) in the current package's namespace. For a package that simply exposes its submodules, explicit imports are clearer:from . import NE16DenseConstraint, NE16DepthwiseConstraint, NE16PointwiseConstraint, RequantHelpersThis same pattern appears in several other new NE16
__init__.pyfiles. If the current behavior is intentional (relying on consumers to import submodules directly), this can remain as-is.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Deeploy/Targets/NE16/TileConstraints/__init__.py` at line 5, Replace the wildcard import in the package __init__ (from . import *) with explicit submodule imports to make exported symbols clear: import the submodules NE16DenseConstraint, NE16DepthwiseConstraint, NE16PointwiseConstraint and RequantHelpers (e.g., from . import NE16DenseConstraint, NE16DepthwiseConstraint, NE16PointwiseConstraint, RequantHelpers) so consumers get those modules via the package namespace; alternatively, if you intended to expose names rather than modules, define an explicit __all__ listing the public symbols instead of using a star import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Deeploy/Targets/NE16/Deployer.py`:
- Around line 32-40: The constructor is mutating the shared lowering optimizer
in place (self.loweringOptimizer.passes) which causes cross-deployer pollution;
instead clone the optimizer or at least its passes list before modifying it
(e.g., deep-copy the optimizer or assign a new passes list based on
list(self.loweringOptimizer.passes)) and then perform the PULPNCHWtoNHWCPass ->
NCHWtoNHWCPass replacement and append ConvEngineDiscolorationPass() and
NE16OptimizationPass(self.default_channels_first, "NE16") to the cloned
optimizer so the module-scoped NE16Optimizer from Platform.py is not mutated
across instances.
In `@Deeploy/Targets/NE16/Parsers.py`:
- Around line 37-40: The NE16 parser's parseNodeCtxt currently defaults
channels_first=True but the implementation only accepts False; change the
default parameter in parseNodeCtxt to channels_first: bool = False and update
the three RQS override method signatures to also default channels_first to False
so callers use NHWC by default and the method signatures match the only
supported layout (refer to parseNodeCtxt and the three RQS override methods to
locate and update their signatures).
In `@Deeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.py`:
- Around line 52-58: The current constraint only pins weightShape[0] via
weightOutChannelVar.Max(), but the DW packed weight tensor is emitted/loaded as
a full encoded block (so all dims must be pinned); update the tilerModel
constraints for the packed DW weight (located where weightBuffer =
ctxt.lookup(weightBufferName) and tilerModel.addConstraint(...) is called) to
set every shape dimension of that buffer to its Max() instead of only
weightOutChannelVar—e.g., iterate over the buffer's shape variables (weightShape
or equivalent per-dimension vars obtained from weightBuffer) and call
tilerModel.addConstraint(dimVar == dimVar.Max()) for each dimension so the tiler
cannot shrink any axis while codegen still DMA-loads the full packed block.
In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py`:
- Around line 139-150: The _findAllMultiplicands function currently uses float
division and an exclusive upper bound which breaks integer factorization; change
the loop to iterate i from 2 through int(math.sqrt(x)) inclusive (or use
math.isqrt(x)+1) and use integer division (tmpX //= i) when extracting factors
so tmpX stays an int; ensure the final remaining factor appended for x //
math.prod(multiplicands) is computed with integer arithmetic and that
multiplicands contains ints so callers like _bestReshapeOption receive correct
balanced factor pairs.
---
Duplicate comments:
In `@Deeploy/Targets/NE16/Deployer.py`:
- Around line 19-30: The constructor __init__ currently uses a mutable default
inputOffsets = {} which shares state across instances; change the parameter
default to inputOffsets: Optional[Dict] = None and inside __init__ (before
calling super().__init__) set inputOffsets = {} if it's None, then pass that
local dict into super().__init__; update the function signature and ensure any
references to inputOffsets in NE16Deployer.__init__ use the per-instance dict
created when None is supplied.
In `@Deeploy/Targets/NE16/Engine.py`:
- Around line 62-75: isPWConv currently allows grouped 1x1 convs and isDWConv
accepts any grouped 3x3 conv; change the predicates to only mark true pointwise
(group == 1) and true depthwise (group == number of input channels)
respectively: update isPWConv to include node.attrs['group'] == 1 in its
condition, and update isDWConv to require node.attrs['group'] equal the
convolution's input channel count (i.e., only treat it as depthwise when group
== input_channels) while keeping the existing kernel_shape, dilation and stride
checks.
In `@Deeploy/Targets/NE16/Templates/ConvTemplate.py`:
- Around line 352-356: Replace the invalid in-line assignment inside the C
compound literal with a designated initializer: change the occurrence of
"task.data.cfg.weights_stride = (ne16_stride_t) { ... }" so that within the
compound literal for task.data.cfg you use ".weights_stride = (ne16_stride_t) {
.d0 = ..., .d1 = ..., .d2 = ... }" (i.e., reference the struct member as
.weights_stride) so the generated kernel compiles; update the initializer near
the other designated fields in the same compound literal where task.data.cfg is
constructed.
In `@Deeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.py`:
- Around line 43-46: The current constraints in NE16PointwiseConstraint map
outputHeightVar and outputWidthVar directly to inputHeightVar/inputWidthVar
which only works for stride=1; update the constraints to compute output dims
from the pointwise stride when NE16Engine.enableStrides is used (use the
convolution output formula for 1x1: H_out = ceil(inputHeight / strideH), W_out =
ceil(inputWidth / strideW)). Replace the equalities on outputHeightVar and
outputWidthVar with expressions that use the pointwise stride variables (e.g.,
strideH/strideW or the engine's stride config) and implement the integer ceil
division ((input + stride - 1) / stride) so tilerModel.addConstraint uses
outputHeightVar == (inputHeightVar + strideHVar - 1) / strideHVar and similarly
for width; keep the batch mapping (outputBatchVar == inputBatchVar) as-is.
In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py`:
- Around line 90-106: The subtraction values = values - weight_offset can
underflow when weightTensor.values is int8; before computing the offset and
subtracting, widen values to a safe integer type (e.g., int16/int32) so the
subtraction is performed without wraparound, then clip/normalize and finally
cast to np.uint8 before calling _weightEncode; update the operations around
weight_offset, values, and the subsequent transpose/astype steps (referencing
weight_offset, values, weightTensor.values, and _weightEncode) to perform
widening, subtraction, any needed clamping, and only then cast to uint8 for
encoding.
---
Nitpick comments:
In `@Deeploy/Targets/NE16/TileConstraints/__init__.py`:
- Line 5: Replace the wildcard import in the package __init__ (from . import *)
with explicit submodule imports to make exported symbols clear: import the
submodules NE16DenseConstraint, NE16DepthwiseConstraint, NE16PointwiseConstraint
and RequantHelpers (e.g., from . import NE16DenseConstraint,
NE16DepthwiseConstraint, NE16PointwiseConstraint, RequantHelpers) so consumers
get those modules via the package namespace; alternatively, if you intended to
expose names rather than modules, define an explicit __all__ listing the public
symbols instead of using a star import.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eaa00d35-32b5-49a6-8eeb-a015c49962b3
📒 Files selected for processing (35)
.github/workflows/_runner-gap9-w-ne16-tiled.yml.github/workflows/ci-platform-gap9-w-ne16-tiled.ymlCMakeLists.txtDeeploy/Targets/NE16/Bindings.pyDeeploy/Targets/NE16/Deployer.pyDeeploy/Targets/NE16/Engine.pyDeeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.pyDeeploy/Targets/NE16/OptimizationPasses/__init__.pyDeeploy/Targets/NE16/Parsers.pyDeeploy/Targets/NE16/Platform.pyDeeploy/Targets/NE16/Templates/AllocateTemplate.pyDeeploy/Targets/NE16/Templates/ConvTemplate.pyDeeploy/Targets/NE16/Templates/__init__.pyDeeploy/Targets/NE16/TileConstraints/NE16DenseConstraint.pyDeeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.pyDeeploy/Targets/NE16/TileConstraints/NE16PointwiseConstraint.pyDeeploy/Targets/NE16/TileConstraints/RequantHelpers.pyDeeploy/Targets/NE16/TileConstraints/__init__.pyDeeploy/Targets/NE16/Tiler.pyDeeploy/Targets/NE16/TopologyOptimizationPasses/Passes.pyDeeploy/Targets/NE16/TopologyOptimizationPasses/__init__.pyDeeploy/Targets/NE16/__init__.pyDeeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.pyDeeployTest/CMakeLists.txtDeeployTest/Tests/Kernels/Integer/Conv/Dense_2D_RQ/inputs.npzDeeployTest/Tests/Kernels/Integer/Conv/Dense_2D_RQ/network.onnxDeeployTest/Tests/Kernels/Integer/Conv/Dense_2D_RQ/outputs.npzDeeployTest/conftest.pyDeeployTest/deeployRunner_tiled_gap9_w_ne16.pyDeeployTest/testMVP.pyDeeployTest/testUtils/core/execution.pyDeeployTest/testUtils/platformMapping.pyDeeployTest/test_gap9_ne16_tiled_config.pyDeeployTest/test_platforms.pyTargetLibraries/GAP9/CMakeLists.txt
✅ Files skipped from review due to trivial changes (4)
- DeeployTest/conftest.py
- Deeploy/Targets/NE16/Tiler.py
- DeeployTest/testUtils/core/execution.py
- Deeploy/Targets/NE16/TileConstraints/RequantHelpers.py
🚧 Files skipped from review as they are similar to previous changes (9)
- DeeployTest/testUtils/platformMapping.py
- TargetLibraries/GAP9/CMakeLists.txt
- Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
- .github/workflows/_runner-gap9-w-ne16-tiled.yml
- .github/workflows/ci-platform-gap9-w-ne16-tiled.yml
- Deeploy/Targets/NE16/OptimizationPasses/MemoryLevelAnnotationPasses.py
- DeeployTest/deeployRunner_tiled_gap9_w_ne16.py
- Deeploy/Targets/NE16/Bindings.py
- DeeployTest/test_gap9_ne16_tiled_config.py
| def parseNodeCtxt(self, | ||
| ctxt: NetworkContext, | ||
| node: gs.Node, | ||
| channels_first: bool = True) -> Tuple[NetworkContext, bool]: |
There was a problem hiding this comment.
Make the NE16 parser default to NHWC.
parseNodeCtxt() defaults channels_first to True, but Line 56 rejects anything except False. Any caller that uses the default path will fail to parse NE16 convolutions. Flip this default here and mirror it in the three RQS overrides so the method signature matches the only supported layout.
Proposed fix
- def parseNodeCtxt(self,
- ctxt: NetworkContext,
- node: gs.Node,
- channels_first: bool = True) -> Tuple[NetworkContext, bool]:
+ def parseNodeCtxt(self,
+ ctxt: NetworkContext,
+ node: gs.Node,
+ channels_first: bool = False) -> Tuple[NetworkContext, bool]:
@@
- channels_first == False,
+ not channels_first,- def parseNodeCtxt(self,
- ctxt: NetworkContext,
- node: gs.Node,
- channels_first: bool = True) -> Tuple[NetworkContext, bool]:
+ def parseNodeCtxt(self,
+ ctxt: NetworkContext,
+ node: gs.Node,
+ channels_first: bool = False) -> Tuple[NetworkContext, bool]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/Parsers.py` around lines 37 - 40, The NE16 parser's
parseNodeCtxt currently defaults channels_first=True but the implementation only
accepts False; change the default parameter in parseNodeCtxt to channels_first:
bool = False and update the three RQS override method signatures to also default
channels_first to False so callers use NHWC by default and the method signatures
match the only supported layout (refer to parseNodeCtxt and the three RQS
override methods to locate and update their signatures).
| weightBuffer = ctxt.lookup(weightBufferName) | ||
| # NE16 DW weight is packed as a single (1, 1, packed_bytes) block | ||
| # containing all output channels (up to NE16_SUBTILE_INPUT_CHANNEL=16). | ||
| # Keep the outermost dim fixed at its full (=1) value regardless of | ||
| # the output channel tiling. | ||
| tilerModel.addConstraint(weightOutChannelVar == weightOutChannelVar.Max()) | ||
|
|
There was a problem hiding this comment.
Pin the whole packed DW weight tensor, not just dim 0.
Line 57 fixes only weightShape[0], but Lines 227-228 always emit a load for the full encoded weight block. That lets the tiler under-account weight memory by shrinking the packed-bytes axis, then codegen still DMA-loads the entire tensor. Constrain every dimension of the encoded DW weight buffer to Max() for this path.
Proposed fix
weightBuffer = ctxt.lookup(weightBufferName)
# NE16 DW weight is packed as a single (1, 1, packed_bytes) block
# containing all output channels (up to NE16_SUBTILE_INPUT_CHANNEL=16).
- # Keep the outermost dim fixed at its full (=1) value regardless of
- # the output channel tiling.
- tilerModel.addConstraint(weightOutChannelVar == weightOutChannelVar.Max())
+ # Keep the full encoded block resident regardless of output-channel tiling.
+ for dimIdx in range(len(weightBuffer.shape)):
+ weightDimVar = tilerModel.getTensorDimVar(tensorName = weightBufferName, dimIdx = dimIdx)
+ tilerModel.addConstraint(weightDimVar == weightDimVar.Max())Also applies to: 225-228
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/TileConstraints/NE16DepthwiseConstraint.py` around lines
52 - 58, The current constraint only pins weightShape[0] via
weightOutChannelVar.Max(), but the DW packed weight tensor is emitted/loaded as
a full encoded block (so all dims must be pinned); update the tilerModel
constraints for the packed DW weight (located where weightBuffer =
ctxt.lookup(weightBufferName) and tilerModel.addConstraint(...) is called) to
set every shape dimension of that buffer to its Max() instead of only
weightOutChannelVar—e.g., iterate over the buffer's shape variables (weightShape
or equivalent per-dimension vars obtained from weightBuffer) and call
tilerModel.addConstraint(dimVar == dimVar.Max()) for each dimension so the tiler
cannot shrink any axis while codegen still DMA-loads the full packed block.
| def _findAllMultiplicands(x: int) -> List[int]: | ||
| multiplicands = [] | ||
| tmpX = x | ||
| for i in range(2, math.ceil(math.sqrt(x))): # Ceil cause range doesn't include the last number | ||
| while tmpX % i == 0: | ||
| multiplicands.append(i) | ||
| tmpX = tmpX / i | ||
|
|
||
| if x // math.prod(multiplicands) > 1: | ||
| multiplicands.append(x // math.prod(multiplicands)) | ||
|
|
||
| return multiplicands |
There was a problem hiding this comment.
Use integer factorization here.
tmpX = tmpX / i turns the accumulator into a float, and the current loop bound skips exact square roots. For dimensions like 4 or 49 this misses balanced factor pairs, so _bestReshapeOption() falls back to (dim, 1) and defeats the pointwise-reshape heuristic.
Suggested fix
def _findAllMultiplicands(x: int) -> List[int]:
multiplicands = []
tmpX = x
- for i in range(2, math.ceil(math.sqrt(x))): # Ceil cause range doesn't include the last number
+ for i in range(2, math.isqrt(x) + 1):
while tmpX % i == 0:
multiplicands.append(i)
- tmpX = tmpX / i
+ tmpX //= i
- if x // math.prod(multiplicands) > 1:
- multiplicands.append(x // math.prod(multiplicands))
+ if tmpX > 1:
+ multiplicands.append(tmpX)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Deeploy/Targets/NE16/TopologyOptimizationPasses/Passes.py` around lines 139 -
150, The _findAllMultiplicands function currently uses float division and an
exclusive upper bound which breaks integer factorization; change the loop to
iterate i from 2 through int(math.sqrt(x)) inclusive (or use math.isqrt(x)+1)
and use integer division (tmpX //= i) when extracting factors so tmpX stays an
int; ensure the final remaining factor appended for x //
math.prod(multiplicands) is computed with integer arithmetic and that
multiplicands contains ints so callers like _bestReshapeOption receive correct
balanced factor pairs.
Mirrors the Siracusa_w_neureka pattern. NE16Platform extends GAP9Platform with engines=[NE16Engine, GAP9ClusterEngine]; NE16Deployer extends GAP9Deployer (reuses ClDma transformers via GAP9Bindings). New Target: Deeploy/Targets/NE16/ (Platform, Engine, Bindings, Parsers, Tiler, Deployer, Templates, TileConstraints, TopologyOptimizationPasses). The _weightEncode function is ported from pulp-nnx/test/Ne16Weight.py (single CIN_SUBTILE=16 mode, no 1x1 vs 3x3 split). ConvTemplate subtile constants set per ne16_task_defs.h (output 3x3, weight stride bytes PW=16 DW/Dense=144). New test infrastructure: - DeeployTest/deeployRunner_tiled_gap9_w_ne16.py - DeeployTest/test_gap9_ne16_tiled_config.py (PW/DW/Dense RQ Conv) DeeployTest wiring: - testUtils/platformMapping.py: register GAP9_w_NE16 in the platforms list, mapPlatform, setupMemoryPlatform, mapDeployer. - testMVP.py: include GAP9_w_NE16 in the EngineColoringDeployerWrapper branch (without it NE16AdjustWeightMemoryLayoutPass never fires and parsing backtracks to exhaustion). - testUtils/core/execution.py: build the GAP9 SDK 'image' target for GAP9_w_NE16 too (so chip.soc.mram.bin is produced before gvsoc run). - CMakeLists.txt, DeeployTest/CMakeLists.txt: accept GAP9_w_NE16 alongside GAP9 in the platform branches. - TargetLibraries/GAP9/CMakeLists.txt: for GAP9_w_NE16 platform, add_subdirectory on pulp-nnx with USE_NE16=ON and link it into deeploygap9. Fix: Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py referenced an undefined symbol float32_tPtr from Deeploy.AbstractDataTypes; define it locally via PointerClass(float32_t) to unblock the import chain reached by NE16Platform. Verified on gvsoc gap9.evk: PW 1x1 RQ (Regular_RQ): 0/1152 errors, 901917 cycles DW 3x3 RQ (DW_2D_RQ): 0/1280 errors, 27339 cycles (--enable-3x3) Dense 3x3 (Regular_2D_RQ): 0/6372 errors, 244595 cycles (--enable-3x3)
- Add NE16 linear layer kernels, including a topology pass, NE16 templates, parsers, tile constraints, and bindings - The topology pass recognizes NE16-compatible GEMM layers, adjusts the weight layout for the NE16, and converts the requant shift/scale to the NE16 format - The template detects whether the input is signed; if so, it adds a +128 offset to the input during C runtime and compensates via the bias - Add GAP9 SDK-based Dequant/Quant templates using CNN_Copy.c kernels, replacing the generic templates - Add a generic DequantQuantMergePass that folds adjacent Dequant→Quant pairs into identity or RequantShift - Add a GAP9-specific TopologyOptimizer (GAP9Optimizer) to replace PULPOptimizer Bug fixes: - Add output signedness check in QuantChecker - Fix L3 DMA template (add proper casts) and remove the blocking L3 DMA hack - Isolate dory memory functions from other libraries in CMakeLists so they compile with -Og while compute kernels compile with -O3 - Disable PULPAddRequantMergePass due to incorrect pattern matching when Add has multiple consumers Co-authored-by: runwangdl <samanthawangdl@gmail.com>
b8087fc to
b3f40e5
Compare
- TargetLibraries/GAP9/CMakeLists.txt: rename CNN_Libraries_NE16 → CNN_Libraries_HWPE (the actual gap9-sdk path); skip SDK CNN_BasicKernels_NE16.c source for GAP9_w_NE16 platform (it uses the pulp-nnx ne16 stack, so the SDK NE16 kernels are not needed). - Deeploy/Targets/NE16/Platform.py: instantiate the GAP9ClusterEngine with a trimmed includeList (no CNN_BasicKernels_NE16.h / ne16_utils.h / CNN_Copy.h) so the generated Network.c does not pull in the SDK NE16 header alongside pulp-nnx ne16_task_defs.h — the NE16_REG_* macros are defined in both and trigger -Werror redefs.
ghcr.io/pulp-platform/deeploy-gap9:* is hosted in pulp-platform's private GitHub Container Registry. Only upstream's self-hosted runners have credentials to pull it; on fork CI runs (ubuntu-latest) the docker pull fails with 'Error response from daemon: denied' and the whole job is reported as failure. Guard the select-env entry of all three gap9 workflows (ci-platform-gap9.yml, -tiled.yml, -w-ne16-tiled.yml) so they SKIP cleanly on forks instead of FAILING. Upstream behaviour is unchanged.
QuantChecker.checkOutputType (added by the NE16-Linear PR) requires opSigned == outputTypeSigned. Existing Generic and PULPOpen bindings only registered the signed-int8 output variant, so any Quant pattern with signed=0 (e.g. 4-bit unsigned quantization in Models/Transformer_DeepQuant) had no candidate and parsing exhausted backtracking. Add uint8 output to BasicQuantBindings and uint8 input to BasicDequantBindings in both Targets/Generic/Bindings.py and Targets/PULPOpen/Bindings.py. Verified: Models/Transformer_DeepQuant network gen now succeeds for both Generic and Siracusa platforms.
The Snitch FP32 GEMM/TransB-5000 build OOMs the GitHub-hosted runner
('std::bad_alloc' from the C compiler driver) when 4 pytest-xdist
workers compile in parallel. Two workers leave enough headroom on
the standard 7-GB runner.
(Pre-existing flake; surfaced as a hard fail in CI runs that happen
to land both heavy FP32 GEMM compilations on adjacent workers.)
Adds the NE16 neural engine as an accelerator Engine on top of the existing GAP9 platform, registered as a new composite platform
GAP9_w_NE16that mirrors theSiracusa_w_neurekapattern.Added
Deeploy/Targets/NE16/— full Target:Platform.py(NE16Platform extends GAP9Platform, engines=[NE16Engine, GAP9ClusterEngine]),Engine.py,Bindings.py,Parsers.py,Tiler.py,Deployer.py(extends GAP9Deployer to reuse ClDma transformers),Templates/{Allocate,Conv}Template.py,TileConstraints/{NE16Pointwise,NE16Depthwise,NE16Dense}Constraint.py,TopologyOptimizationPasses/Passes.py(incl._weightEncodeported frompulp-nnx/test/Ne16Weight.py).DeeployTest/deeployRunner_tiled_gap9_w_ne16.pyandDeeployTest/test_gap9_ne16_tiled_config.py— new runner + minimal test config covering PW 1x1, DW 3x3, Dense 3x3 RQ Conv.TargetLibraries/GAP9/CMakeLists.txt— forGAP9_w_NE16platform,add_subdirectoryonpulp-nnxwithUSE_NE16=ONand link it intodeeploygap9.Changed
DeeployTest/testUtils/platformMapping.py— registerGAP9_w_NE16in the platforms list,mapPlatform,setupMemoryPlatform, andmapDeployer.DeeployTest/testMVP.py— includeGAP9_w_NE16in theEngineColoringDeployerWrapperbranch (without it,NE16AdjustWeightMemoryLayoutPassnever fires and parsing backtracks).DeeployTest/testUtils/core/execution.py— build the GAP9 SDKimagetarget forGAP9_w_NE16too (sochip.soc.mram.binis produced beforegvsoc run).CMakeLists.txt,DeeployTest/CMakeLists.txt— acceptGAP9_w_NE16alongsideGAP9in the platform branches.Deeploy/Targets/NE16/Templates/ConvTemplate.py— swap Neureka-inherited subtile constants for NE16 spec perne16_task_defs.h:CIN_SUBTILE32/28 → 16 (single mode), output6→3, per-(cout,cinMajor) weight bytes PW 16, DW/Dense 144.Fixed
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py— work around a pre-existingImportError: cannot import name 'float32_tPtr' from 'Deeploy.AbstractDataTypes'by defining it locally viaPointerClass(float32_t).Test plan
Ran on gvsoc
gap9.evk(insideghcr.io/pulp-platform/deeploy-gap9:devel), full pipelinegen -> parse -> lower -> codegen -> CMake -> build -> gapy image -> gvsoc flash run:Kernels/Integer/Conv/PW_2D_RQ/Regular_RQKernels/Integer/Conv/DW_2D_RQ(--enable-3x3)Kernels/Integer/Conv/Regular_2D_RQ(--enable-3x3)Follow-up (out of scope for this PR): PW
Unsigned_RQvariant currently produces non-zero errors — likely unsigned weight-offset orout_typeconf0 handling; tracked separately.PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.