Conversation
|
There was a problem hiding this comment.
No issues found across 5 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Client as External API Client
participant API as Fastify Server (v4)
participant Schema as Zod / OpenAPI Schemas
participant Handler as ElementInfo Route Handler
participant Action as Page Action Handler Wrapper
Note over Client,Action: NEW: Request Flow for /page/elementInfo
Client->>API: POST /v4/page/elementInfo
activate API
Note right of Client: Headers: Session-ID<br/>Body: Selector (XPath/CSS/Text)
API->>Schema: NEW: Validate Request Body
Schema-->>API: PageElementInfoRequest valid
API->>Handler: Dispatch to elementInfoRoute
activate Handler
Handler->>Action: NEW: createPageActionHandler()
activate Action
Action->>Handler: execute() (Internal)
Note over Handler: STUB: Currently returns<br/>placeholder values
Handler->>Schema: NEW: Parse Result (PageElementInfoResultSchema)
Schema-->>Handler: Validated Info (count, visibility, centroid, etc.)
Handler-->>Action: Return Result
deactivate Action
Action-->>API: Wrap in PageElementInfoResponse
deactivate Handler
API-->>Client: 200 OK (JSON Result)
deactivate API
Note over Client,API: Result includes: count, isVisible, isChecked, <br/>inputValue, textContent, innerHTML, innerText, centroid
|
I would also add:
|
eda8d77 to
03b7e08
Compare
| isChecked: z.boolean().default(false), | ||
| inputValue: z.string().default(""), |
There was a problem hiding this comment.
these are redundant with attrs below, I dont think we need to send both.
If you do want to break it out then I think it should just be value and value=1 or value=checked for checkboxes that are checked
There was a problem hiding this comment.
isCheckedneeds to look at thecheckedproperty on theHTMLInputElement, so it wouldn't show inattrsinputValuealso reads from thevalueproperty, so it also wouldn't show inattrs
| export const PageElementInfoResultSchema = z | ||
| .object({ | ||
| count: z.number().int().nonnegative().default(0), | ||
| isVisible: z.boolean().default(false), |
There was a problem hiding this comment.
| isVisible: z.boolean().default(false), | |
| isInViewport: z.boolean().default(false), // e.g. scrolled out of view | |
| isHidden: z.boolean().default(false), // e.g. display: hidden | none | |
| isOccluded: z.boolean().default(false), // e.g. elem is behind a modal, not clickable until modal is dismissed |
isVisible is misleadingly simple, we should split this up so consumers can actually determine what is needed in order to click an element (closing some occluding element, scrolling, un-hiding, etc. all require different steps)
There was a problem hiding this comment.
down to have multiple visibility related fields. should we have them all nested in an outer visibility object though?
| backendNodeId: z.number().int().optional(), | ||
| ariaRole: z.string().optional(), | ||
| ariaAttributes: z.record(z.string(), z.string()).optional(), | ||
| attrs: z.record(z.string(), z.string()).optional(), |
There was a problem hiding this comment.
| attrs: z.record(z.string(), z.string()).optional(), | |
| attributes: z.record(z.string(), z.string()), | |
| style: z.record(z.string(), z.unknown()), |
We should add computed CSS styles so that the agent can check for things like display: none, display: hidden, opacity: 0, color: green, etc.).
Also attributes should never be none, we should just return {} to avoid more unions. Try to match the naming and shapes that the DOM Element API uses to minimize confusion for agents and users: https://developer.mozilla.org/en-US/docs/Web/API/Element
style should be the same object shape as the DOM API: CSSStyleDeclaration
| .default({ x: 0, y: 0 }), | ||
| selector: ResultSelectorSchema.optional(), | ||
| elementType: z.string().optional(), | ||
| backendNodeId: z.number().int().optional(), |
There was a problem hiding this comment.
| backendNodeId: z.number().int().optional(), | |
| backendNodeId: z.bigint().positive(), |
I think this is always available, no? Also it should never be negative.
| isVisible: z.boolean().default(false), | ||
| isChecked: z.boolean().default(false), | ||
| inputValue: z.string().default(""), | ||
| textContent: z.string().default(""), |
There was a problem hiding this comment.
| textContent: z.string().default(""), |
This is redundant with selector.text and innerText I think
There was a problem hiding this comment.
innerText & textContent are different https://stackoverflow.com/questions/35213147/difference-between-textcontent-vs-innertext
| centroid: z | ||
| .object({ | ||
| x: z.number(), | ||
| y: z.number(), | ||
| }) | ||
| .strict() | ||
| .default({ x: 0, y: 0 }), |
There was a problem hiding this comment.
| centroid: z | |
| .object({ | |
| x: z.number(), | |
| y: z.number(), | |
| }) | |
| .strict() | |
| .default({ x: 0, y: 0 }), | |
| domrects: z.array(z | |
| .object({ | |
| x: z.number(), | |
| y: z.number(), | |
| top: z.number().optional(), | |
| bottom: z.number().optional(), | |
| right: z.number().optional(), | |
| left: z.number().optional(), | |
| }) | |
| .strict()) |
This should be an array of DOMRects matching the shape returned by elem.getClientRects(). I has to be an array because many inline elements like <span>, <p> etc. often wrap across multiple lines and have multiple bounding boxes. Centroid-based clicking is useless for most inline elements subject to wrapping unless you provide the full list of bounding rects.
There was a problem hiding this comment.
BU tried a bunch of really intricate heuristics to try and solve this problem because it was a big issue with links that wrap across lines. e.g. try 2px in from the upper left corner of the first bounding box, then try 4px in from the bottom right corner of the last bounding box, etc. based on various conditions
a naiive centroid click would land in the middle of the space between the two bounding boxex, which is actually outside both of them, so the click does nothing.
There was a problem hiding this comment.
yeah, centroid is a bit too leaky. can make it an array with multiple rects. do you think we need top, right, bottom, & left though? or should we just make it x, y, width, height ?
| .strict() | ||
| .default({ x: 0, y: 0 }), | ||
| selector: ResultSelectorSchema.optional(), | ||
| elementType: z.string().optional(), |
There was a problem hiding this comment.
| elementType: z.string().optional(), | |
| elementType: z.string(), |
|
|
||
| export const PageElementInfoResultSchema = z | ||
| .object({ | ||
| count: z.number().int().nonnegative().default(0), |
There was a problem hiding this comment.
If we dont return the whole list of matching elements here then we need to a separate endpoint to get the whole list, or at least a param like idx that we can pass to "get the second matching element".
There was a problem hiding this comment.
i think this route should only return details for a single element:
- if there are multiple matches, this route will just take the first match
- eg, if the user wants info on the third match, they must provide a selector that is specific enough to reach the third match
There was a problem hiding this comment.
thats really difficult to do if you dont know the nesting level at which they are peers, e.g.
/div/div/div to get the second match thee is no way to know which is needed:
/div/div/div[2]/div/div[2]/div/div[2]/div/div
There was a problem hiding this comment.
thats why the ElementSelectorSchema provides an idx param
| elementType: z.string().optional(), | ||
| backendNodeId: z.number().int().optional(), | ||
| ariaRole: z.string().optional(), | ||
| ariaAttributes: z.record(z.string(), z.string()).optional(), |
There was a problem hiding this comment.
| ariaAttributes: z.record(z.string(), z.string()).optional(), | |
| aria: z.record(z.string(), z.unknown()), |
Just return {} if empty, not undefined. Also all aria-related stuff should be under a single key imo, there are lots of attrs we might want to add in the future and it's easier if they're nested under one type instead of flat at the top level:
aria-autocompletearia-checkedaria-disabledaria-errormessagearia-expandedaria-haspopuparia-hiddenaria-invalidaria-labelaria-levelaria-modalaria-multilinearia-multiselectablearia-orientationaria-placeholderaria-pressedaria-readonlyaria-requiredaria-selectedaria-sortaria-valuemaxaria-valueminaria-valuenowaria-valuetext- and lots more...
https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Reference/Attributes
| selector: ResultSelectorSchema.optional(), | ||
| elementType: z.string().optional(), | ||
| backendNodeId: z.number().int().optional(), | ||
| ariaRole: z.string().optional(), |
There was a problem hiding this comment.
| ariaRole: z.string().optional(), |
I think this should be under a single aria: {...} shape, see below.
pirate
left a comment
There was a problem hiding this comment.
Lots of small changes but in general I recommend just copying the native shape that DOM APIs provide. Lets not introduce novel concepts and names for things that have had stable shapes for 20yr+
why
/elementInforoute which encapsulates multiplelocatorfunctionswhat changed
/elementInforoute, which accepts aSelector, and returns the following info:countisVisibleisCheckedinputValuetextContentinnerHTMLinnerTextcentroidscreenshot:
test plan
/elementInforoute accepts a requestSummary by cubic
Adds the v4
page.elementInfoendpoint (POST /v4/page/elementInfo) to return common element details in one call. The handler is a stub that returns default values to unblock client work.page.elementInforoute that accepts aSelectorand returns: count, isVisible, isChecked, inputValue, textContent, innerHTML, innerText, centroid, plus optional selector, elementType, backendNodeId, ariaRole, ariaAttributes, attrs.Written for commit 03b7e08. Summary will update on new commits. Review in cubic