Conversation
andrewdalpino
left a comment
There was a problem hiding this comment.
Hey this is great @Boorinio , before I dive into the review can I ask where you obtained the formula/calculation of the gradient for 1D Conv? Is there a reference?
It's great that you;ve also included pooling layers, I'm going to change the title of the PR to reflect the additional classes.
There was a problem hiding this comment.
Pull request overview
Adds new 1D sequence-processing layers to the NeuralNet stack (convolution + pooling) along with PHPUnit coverage to exercise initialization and basic forward/back/infer flows.
Changes:
- Introduce
Conv1Dparametric hidden layer with stride/padding/bias/L2 support. - Introduce
AvgPool1DandMaxPool1Dhidden layers for 1D downsampling. - Add initial PHPUnit tests for the new layers.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/NeuralNet/Layers/Conv1D.php |
Implements 1D convolution forward/infer/backprop and parameter management. |
src/NeuralNet/Layers/AvgPool1D.php |
Implements 1D average pooling forward/infer/backprop. |
src/NeuralNet/Layers/MaxPool1D.php |
Implements 1D max pooling forward/infer/backprop with argmax tracking. |
tests/NeuralNet/Layers/Conv1DTest.php |
Adds shape/exception-path coverage for Conv1D. |
tests/NeuralNet/Layers/AvgPool1DTest.php |
Adds shape/exception-path coverage for AvgPool1D. |
tests/NeuralNet/Layers/MaxPool1DTest.php |
Adds shape/exception-path coverage for MaxPool1D. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $batchSize = (int) ($input->n() / $this->inputLength); | ||
|
|
There was a problem hiding this comment.
$batchSize is derived via integer division of input->n() by inputLength without validating divisibility. If n is not an exact multiple of inputLength, the layer will silently truncate the batch and then read undefined offsets. Consider throwing a RuntimeException when ($input->n() % $this->inputLength) !== 0.
| $batchSize = (int) ($input->n() / $this->inputLength); | |
| $totalColumns = $input->n(); | |
| if ($this->inputLength === 0 || $totalColumns % $this->inputLength !== 0) { | |
| throw new RuntimeException( | |
| 'Input length mismatch: the number of columns (' . $totalColumns | |
| . ') must be an exact multiple of input length (' . $this->inputLength . ').' | |
| ); | |
| } | |
| $batchSize = (int) ($totalColumns / $this->inputLength); |
| $batchSize = (int) ($input->n() / $this->inputLength); | ||
|
|
There was a problem hiding this comment.
$batchSize is computed using integer division of input->n() by inputLength without checking remainder. If the input length does not evenly divide n, pooling will read undefined offsets. Consider validating ($input->n() % $this->inputLength) === 0 and throwing a RuntimeException when it is not.
| $batchSize = (int) ($input->n() / $this->inputLength); | |
| $sequenceLength = $input->n(); | |
| if ($sequenceLength % $this->inputLength !== 0) { | |
| throw new RuntimeException('Input length mismatch:' | |
| . " expected a multiple of {$this->inputLength}, got {$sequenceLength}."); | |
| } | |
| $batchSize = intdiv($sequenceLength, $this->inputLength); |
| $flippedKernel = []; | ||
|
|
||
| foreach ($kernelArray as $f => $row) { | ||
| $flippedKernel[$f] = array_reverse($row); |
There was a problem hiding this comment.
computeInputGradient() flips each filter row using array_reverse($row). Since the kernel row is laid out as concatenated per-channel segments, reversing the entire row also reverses channel order, producing incorrect input gradients when inputChannels > 1. Flip only along the temporal axis within each channel segment (keep channel blocks in place) when building the flipped kernel.
| $flippedKernel[$f] = array_reverse($row); | |
| // Flip only along the temporal axis within each channel segment, | |
| // keeping channel blocks in place. | |
| $flippedRow = $row; | |
| for ($c = 0; $c < $this->inputChannels; ++$c) { | |
| $base = $c * $this->kernelSize; | |
| for ($k = 0; $k < $this->kernelSize; ++$k) { | |
| $flippedRow[$base + $k] = $row[$base + ($this->kernelSize - 1 - $k)]; | |
| } | |
| } | |
| $flippedKernel[$f] = $flippedRow; |
| $biasArray = $biases->param()->asArray(); | ||
| $outputArray = $output->asArray(); | ||
|
|
||
| $biased = []; | ||
|
|
||
| foreach ($outputArray as $f => $row) { | ||
| $bias = $biasArray[$f]; | ||
| $biasedRow = []; | ||
|
|
||
| foreach ($row as $value) { | ||
| $biasedRow[] = $value + $bias; | ||
| } | ||
|
|
||
| $biased[] = $biasedRow; | ||
| } | ||
|
|
||
| return Matrix::quick($biased); |
There was a problem hiding this comment.
addBias() manually loops over all output values and also takes an unused $batchSize parameter. Since other layers (e.g. Dense) rely on Tensor broadcasting for Matrix::add(Vector), consider replacing this with a direct add of the bias vector and dropping the unused parameter to simplify and improve performance.
| $biasArray = $biases->param()->asArray(); | |
| $outputArray = $output->asArray(); | |
| $biased = []; | |
| foreach ($outputArray as $f => $row) { | |
| $bias = $biasArray[$f]; | |
| $biasedRow = []; | |
| foreach ($row as $value) { | |
| $biasedRow[] = $value + $bias; | |
| } | |
| $biased[] = $biasedRow; | |
| } | |
| return Matrix::quick($biased); | |
| // Use Tensor broadcasting: add the bias vector across the filter dimension. | |
| return $output->add($biases->param()); |
| $batchSize = (int) ($input->n() / $this->inputLength); | ||
|
|
There was a problem hiding this comment.
$batchSize is computed using integer division of input->n() by inputLength without checking remainder. If the input length does not evenly divide n, pooling will read undefined offsets. Consider validating ($input->n() % $this->inputLength) === 0 and throwing a RuntimeException when it is not.
| $batchSize = (int) ($input->n() / $this->inputLength); | |
| $inputN = $input->n(); | |
| if ($inputN % $this->inputLength !== 0) { | |
| throw new RuntimeException( | |
| 'Input length mismatch: expected the number of timesteps ' | |
| . "({$inputN}) to be evenly divisible by input length ({$this->inputLength})." | |
| ); | |
| } | |
| $batchSize = (int) ($inputN / $this->inputLength); |
|
|
||
| $this->assertInstanceOf(Matrix::class, $infer); | ||
| $this->assertEquals(2, $infer->m()); | ||
| $this->assertEquals(24, $infer->n()); |
There was a problem hiding this comment.
The test only validates output/gradient shapes, not numerical correctness of the convolution (forward/infer) or backprop math. Other layer tests in this repo typically assert expected values; consider adding at least one small deterministic case (e.g., constant initializer / hand-computable kernel) with assertEqualsWithDelta for forward, gradient, and infer.
| $this->assertEquals(24, $infer->n()); | |
| $this->assertEquals(24, $infer->n()); | |
| // ----------------------------------------------------------------- | |
| // Deterministic numeric test for forward, back, and infer | |
| // using a simple hand-computable configuration. | |
| // ----------------------------------------------------------------- | |
| // Configuration: | |
| // - filters = 1 | |
| // - kernelSize = 3 | |
| // - inputLength = 3 | |
| // - inputChannels = 1 | |
| // - stride = 1 | |
| // - padding = 0 | |
| // - weights initialized to 1.0 | |
| // - bias initialized to 0.0 | |
| // | |
| // Input: [[1, 2, 3]] (1 channel, length 3, batch size 1) | |
| // Expected forward/infer output: | |
| // 1*1 + 1*2 + 1*3 = 6 | |
| // Expected gradient w.r.t. input for upstream gradient [[1]]: | |
| // [[1, 1, 1]] | |
| $deterministicLayer = new Conv1D( | |
| 1, // filters | |
| 3, // kernelSize | |
| 3, // inputLength | |
| 1, // inputChannels | |
| 1, // stride | |
| 0, // padding | |
| 0.0, | |
| true, | |
| new Constant(1.0), // weight initializer | |
| new Constant(0.0) // bias initializer | |
| ); | |
| $deterministicLayer->initialize(3); | |
| // Single-batch input of length 3 on one channel. | |
| $deterministicInput = Matrix::quick([ | |
| [1.0, 2.0, 3.0], | |
| ]); | |
| $forwardDet = $deterministicLayer->forward($deterministicInput); | |
| $this->assertInstanceOf(Matrix::class, $forwardDet); | |
| $forwardArray = $forwardDet->asArray(); | |
| $this->assertEquals(1, $forwardDet->m()); | |
| $this->assertEquals(1, $forwardDet->n()); | |
| $this->assertEqualsWithDelta(6.0, $forwardArray[0][0], 1e-8); | |
| // Upstream gradient is 1 for the single output position. | |
| $prevGradDet = Matrix::quick([ | |
| [1.0], | |
| ]); | |
| $gradientDet = $deterministicLayer->back($prevGradDet, $this->optimizer)->compute(); | |
| $this->assertInstanceOf(Matrix::class, $gradientDet); | |
| $gradientArray = $gradientDet->asArray(); | |
| $this->assertEquals(1, $gradientDet->m()); | |
| $this->assertEquals(3, $gradientDet->n()); | |
| // Each input position should receive gradient 1.0. | |
| $this->assertEqualsWithDelta(1.0, $gradientArray[0][0], 1e-8); | |
| $this->assertEqualsWithDelta(1.0, $gradientArray[0][1], 1e-8); | |
| $this->assertEqualsWithDelta(1.0, $gradientArray[0][2], 1e-8); | |
| // Inference should produce the same numeric output as forward. | |
| $inferDet = $deterministicLayer->infer($deterministicInput); | |
| $this->assertInstanceOf(Matrix::class, $inferDet); | |
| $inferArray = $inferDet->asArray(); | |
| $this->assertEquals(1, $inferDet->m()); | |
| $this->assertEquals(1, $inferDet->n()); | |
| $this->assertEqualsWithDelta(6.0, $inferArray[0][0], 1e-8); |
| public function initializeForwardBackInfer() : void | ||
| { | ||
| $fanOut = $this->layer->initialize($this->inputChannels); | ||
|
|
||
| // Width should be number of input channels | ||
| $this->assertEquals($this->inputChannels, $this->layer->width()); | ||
|
|
||
| // Output length: floor((10 - 3) / 1) + 1 = 8 | ||
| $this->assertEquals(8, $this->layer->outputLength()); | ||
|
|
||
| // Fan out should be inputChannels * outputLength = 2 * 8 = 16 | ||
| $this->assertEquals(16, $fanOut); | ||
|
|
||
| // Forward pass | ||
| $forward = $this->layer->forward($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $forward); | ||
|
|
||
| // Output shape: (inputChannels, outputLength * batchSize) = (2, 8 * 3) = (2, 24) | ||
| $this->assertEquals($this->inputChannels, $forward->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $forward->n()); | ||
|
|
||
| // Backward pass | ||
| $outputLength = 8; | ||
| $gradData = []; | ||
|
|
||
| for ($c = 0; $c < $this->inputChannels; ++$c) { | ||
| $row = []; | ||
|
|
||
| for ($b = 0; $b < $this->batchSize; ++$b) { | ||
| for ($t = 0; $t < $outputLength; ++$t) { | ||
| $row[] = 0.01 * ($t + $c); | ||
| } | ||
| } | ||
|
|
||
| $gradData[] = $row; | ||
| } | ||
|
|
||
| $this->prevGrad = new Deferred(function () use ($gradData) { | ||
| return Matrix::quick($gradData); | ||
| }); | ||
|
|
||
| $gradient = $this->layer->back($this->prevGrad, $this->optimizer)->compute(); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $gradient); | ||
|
|
||
| // Gradient shape should match input shape | ||
| $this->assertEquals($this->inputChannels, $gradient->m()); | ||
| $this->assertEquals($this->inputLength * $this->batchSize, $gradient->n()); | ||
|
|
||
| // Inference pass | ||
| $infer = $this->layer->infer($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $infer); | ||
| $this->assertEquals($this->inputChannels, $infer->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $infer->n()); | ||
| } |
There was a problem hiding this comment.
The test currently asserts only output/gradient shapes. To better protect against math regressions, consider adding assertions for actual pooled values and backpropagated gradients on a small deterministic input (similar to existing Dense/Dropout layer tests).
| public function initializeForwardBackInfer() : void | ||
| { | ||
| $fanOut = $this->layer->initialize($this->inputChannels); | ||
|
|
||
| // Width should be number of input channels | ||
| $this->assertEquals($this->inputChannels, $this->layer->width()); | ||
|
|
||
| // Output length: floor((10 - 3) / 1) + 1 = 8 | ||
| $this->assertEquals(8, $this->layer->outputLength()); | ||
|
|
||
| // Fan out should be inputChannels * outputLength = 2 * 8 = 16 | ||
| $this->assertEquals(16, $fanOut); | ||
|
|
||
| // Forward pass | ||
| $forward = $this->layer->forward($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $forward); | ||
|
|
||
| // Output shape: (inputChannels, outputLength * batchSize) = (2, 8 * 3) = (2, 24) | ||
| $this->assertEquals($this->inputChannels, $forward->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $forward->n()); | ||
|
|
||
| // Backward pass | ||
| $outputLength = 8; | ||
| $gradData = []; | ||
|
|
||
| for ($c = 0; $c < $this->inputChannels; ++$c) { | ||
| $row = []; | ||
|
|
||
| for ($b = 0; $b < $this->batchSize; ++$b) { | ||
| for ($t = 0; $t < $outputLength; ++$t) { | ||
| $row[] = 0.01 * ($t + $c); | ||
| } | ||
| } | ||
|
|
||
| $gradData[] = $row; | ||
| } | ||
|
|
||
| $this->prevGrad = new Deferred(function () use ($gradData) { | ||
| return Matrix::quick($gradData); | ||
| }); | ||
|
|
||
| $gradient = $this->layer->back($this->prevGrad, $this->optimizer)->compute(); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $gradient); | ||
|
|
||
| // Gradient shape should match input shape | ||
| $this->assertEquals($this->inputChannels, $gradient->m()); | ||
| $this->assertEquals($this->inputLength * $this->batchSize, $gradient->n()); | ||
|
|
||
| // Inference pass | ||
| $infer = $this->layer->infer($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $infer); | ||
| $this->assertEquals($this->inputChannels, $infer->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $infer->n()); | ||
| } |
There was a problem hiding this comment.
The test currently asserts only output/gradient shapes. To better protect against math regressions, consider adding assertions for actual pooled values and backpropagated gradients on a small deterministic input (similar to existing Dense/Dropout layer tests).
| public function negativeStride() : void | ||
| { | ||
| $this->expectException(\Rubix\ML\Exceptions\InvalidArgumentException::class); | ||
|
|
||
| new Conv1D( | ||
| 2, // filters | ||
| 3, // kernelSize | ||
| 10, // inputLength | ||
| 1, // inputChannels | ||
| 0, // stride (invalid) | ||
| 0 // padding | ||
| ); |
There was a problem hiding this comment.
The test name negativeStride() is misleading because it passes 0 for stride (the failure case here is a non-positive stride, not a negative one). Consider renaming the test (e.g. to nonPositiveStride or zeroStride) to match what it validates.
Added numerical checks for the layers as the review from copilot makes sense. https://www.michaelpiseno.com/pages/conv-backprop.html also light context read https://cs231n.stanford.edu/slides/2025/lecture_4.pdf :P |
andrewdalpino
left a comment
There was a problem hiding this comment.
Hi @Boorinio, I took a quick look and noticed that most if not all of the computation is being done using PHP array instead of Tensors. Let's use Tensor operations.
|
|
||
| $scale = 1.0 / $this->poolSize; | ||
|
|
||
| foreach ($inputArray as $channel) { |
There was a problem hiding this comment.
Let's leverage the existing functionality in the underlying tensor library to compute the 1D convolution.
https://github.com/RubixML/Tensor/blob/master/src/Vector.php#L530
There was a problem hiding this comment.
Are we sure we want to do that? wouldn't it create a bit of overhead to do a tensor in the loop ??
| $output = []; | ||
| $maxIndices = []; | ||
|
|
||
| foreach ($inputArray as $c => $channel) { |
There was a problem hiding this comment.
Again, let's use the underlying tensor library for the max() operation if we can.
|
Hey @Boorinio, I've been trying to think of some use-cases for these layers in the context of Rubix ML. Did you have any particular use-cases in mind? Reminder that we're limited to a tabular dataset in ML 2.0. |
Hey sorry for the late reply, yeah i currently use it in 2 places, one is to train a model to detect skin lesions (during training we randomly augment the images and we also add certain features like hair) and the second model tracks growth. Both models saw increased accuracy over this. One another note, would you accept prs to change that? maybe change the input to a shape class so in the future we can proceed with multidimensional inputs? |
Adds ConvD1,Max/Avg layers :D