Skip to content

Return a 1-D vector from torch.diagonal converter#2682

Open
john-rocky wants to merge 2 commits into
apple:mainfrom
john-rocky:fix-diagonal-vector-output
Open

Return a 1-D vector from torch.diagonal converter#2682
john-rocky wants to merge 2 commits into
apple:mainfrom
john-rocky:fix-diagonal-vector-output

Conversation

@john-rocky
Copy link
Copy Markdown
Contributor

Summary

  • torch.diagonal(input, offset, dim1, dim2) returns the requested diagonal as a 1-D tensor (for 2-D input), but the converter used mb.band_part, which only zeros the off-diagonal entries and returns the same-shape matrix. As a result torch.diagonal(x) for a 5x5 matrix produced a 5x5 result instead of a length-5 vector.
  • Extract the diagonal by flattening the input and gathering elements at strides of m + 1, mirroring NumPy's row-major diagonal indexing.
  • Support offset and the (dim1, dim2) == (1, 0) transpose case in addition to the default (0, 1). Higher-rank input still raises NotImplementedError, matching the pre-existing scope.

Test plan

  • pytest test_torch_ops.py::TestDiagonal — 120 passed, 120 skipped (dim1==dim2 / empty-diagonal cases).
  • End-to-end mlpackage.predict matched the PyTorch reference on shapes {(5,5), (3,4), (4,3)} × offsets {-2,-1,0,1,2} × (dim1, dim2) ∈ {(0,1), (1,0)}.

Fixes #2565.

`torch.diagonal(input, offset, dim1, dim2)` returns the requested
diagonal as a 1-D tensor (for 2-D input), but the converter used
`mb.band_part`, which only zeros the off-diagonal entries and returns
a same-shape matrix. As a result, `torch.diagonal(x)` for a 5x5 matrix
produced a 5x5 result instead of a length-5 vector.

Extract the diagonal by flattening the input and gathering the
elements at strides of `m + 1`, mirroring NumPy's row-major diagonal
indexing. Support `offset` and the `(dim1, dim2) == (1, 0)` transpose
case in addition to the default. Higher-rank input still raises
`NotImplementedError`, matching the pre-existing scope.

Verified end-to-end against PyTorch reference for shapes
{(5,5), (3,4), (4,3)}, offsets {-2,-1,0,1,2}, and dim swaps.

Fixes apple#2565.
@john-rocky john-rocky force-pushed the fix-diagonal-vector-output branch from e08d13d to 1ae5717 Compare May 6, 2026 00:28
raise NotImplementedError("diagonal requires a statically-shaped input")

if dim1 < 0:
dim1 += x.rank
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the logic here is correct. It also looks like you don't have any unit test instances for when dim1 and dim2 are less than zero. Please add some.

@TobyRoseman
Copy link
Copy Markdown
Collaborator

@john-rocky
Copy link
Copy Markdown
Contributor Author

Thanks for the review — addressed both points in 93010490.

Negative-dim tests added: the parametrize for dim1/dim2 is now [-2, -1, 0, 1] (was [0, 1]), with the "same axis" skip rewritten as d1 % 2 == d2 % 2 so (-1, 1), (0, -2) etc. are filtered out. Effective-offset calc in the test also resolves negatives before checking the (1, 0) swap. End-to-end on a 3×4 input I sanity-checked off/dim combos including (off=-1, dim1=-1, dim2=-2) and (off=1, dim1=1, dim2=0) — MIL graph matches PyTorch (reshape → gather, correct output shape and values).

On the logic concern (line 8381): the section is doing three things and I think the worry might be about the offset flip — happy to push a comment-only commit if anything below still reads ambiguously.

  1. Negative-dim resolution (dim1 += x.rank): since we already rejected anything but x.rank == 2 two lines up, this is just -1 → 1, -2 → 0. After this both dims are in {0, 1}.

  2. (dim1, dim2) == (1, 0) ⇒ flip offset: torch.diagonal(A, k, dim1=1, dim2=0) is the same as torch.diagonal(A.T, k, dim1=0, dim2=1). For an (n, m) matrix A and offset k ≥ 0, the diagonal of A.T at +k reads A.T[i][i+k] = A[i+k][i], which equals the diagonal of A at offset -k. Same identity for k < 0. So we can keep n, m = x.shape (original, not transposed) and just negate offset.

  3. Index formula on the flattened tensor: for row-major shape (n, m), element A[r][c] is at flat index r*m + c. The diagonal at offset k starts at (0, k) for k ≥ 0 (flat index k) and at (-k, 0) for k < 0 (flat index -k*m), then steps m + 1 each time. That's the start + i * (m + 1) you see.

I cross-checked this against torch.diagonal for all 280 combinations of shape ∈ {(5,5), (3,4), (4,3), (2,3), (3,2), (1,5), (5,1)}, offset ∈ {-3..3}, (dim1, dim2) over {0, 1, -1, -2}-valid pairs — zero mismatches (excluding the empty-diagonal cases, where the PR raises ValueError instead of returning an empty tensor; happy to switch that to a mb.reshape(x=..., shape=[0]) if matching PyTorch's empty-result semantics matters).

Re-running CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/2534725082 covered the previous diff — pushed commit will need a fresh run.

@TobyRoseman
Copy link
Copy Markdown
Collaborator

Thanks for the additional tests.

Updated CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/2538806799

If CI passes, this change should be good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

coremltools gives incorrect output for torch.ops.aten.diagonal.default

2 participants