Conversation
|
Hi ! This looks pretty cool ! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4579 +/- ##
==========================================
- Coverage 81.62% 78.74% -2.89%
==========================================
Files 358 334 -24
Lines 85652 80602 -5050
==========================================
- Hits 69915 63470 -6445
- Misses 15737 17132 +1395
🚀 New features to boost your workflow:
|
|
Is there anything that's currently stopping the PR from being merged? |
|
This branch has a conflict with http.py Would you be able to resolve it? |
There was a problem hiding this comment.
Pull request overview
Adds a new WebSocket dissector as a Scapy contrib layer and updates HTTP parsing to better handle WebSocket upgrade traffic (notably responses that omit the reason phrase), with accompanying pcaps and unit tests to exercise the new functionality.
Changes:
- Add
scapy.contrib.websocketWebSocket layer with TCP stream reassembly support and optional permessage-deflate handling. - Update HTTP request/response first-line dissection to tolerate missing components (e.g., omitted reason phrase).
- Add new test pcaps and a contrib
.utstest suite for WebSocket parsing/building.
Reviewed changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
scapy/contrib/websocket.py |
New WebSocket layer implementation (framing, masking, TCP reassembly, compression hooks). |
scapy/layers/http.py |
Adjust HTTP start-line parsing to allow missing reason phrase (WebSocket upgrade responses). |
test/contrib/websocket.uts |
New unit tests validating WebSocket frame build/dissect and TCPSession behavior using pcaps. |
test/pcaps/websocket_compressed_session.pcap |
Fixture pcap for TCPSession + permessage-deflate coverage. |
test/pcaps/websocket_invalid_handshakes.pcap |
Fixture pcap for invalid/partial upgrade sequences coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| return http_data | ||
|
|
||
| if not http_data.Upgrade.lower() == b"websocket": |
There was a problem hiding this comment.
In the SERVER_OPEN handshake branch, http_data.Upgrade.lower() is called without checking that Upgrade is present. For malformed/partial responses this will raise AttributeError and break TCP stream parsing. Mirror the request-side guard (if not http_data.Upgrade or ...).
| if not http_data.Upgrade.lower() == b"websocket": | |
| if not http_data.Upgrade or http_data.Upgrade.lower() != b"websocket": |
| self.setfieldval('Http_Version', HTTPVersion) | ||
| self.setfieldval('Status_Code', Status) | ||
| self.setfieldval('Reason_Phrase', Reason) | ||
| version_status_reason = re.split(br"\s+", first_line, maxsplit=2) + [None] |
There was a problem hiding this comment.
For responses without a reason phrase, Reason_Phrase is set to None. Downstream code (e.g. summaries/pretty-printing) expects bytes-like values for HTTP header fields; using None can lead to odd representations ("None") and makes round-tripping harder. Consider setting an explicit empty bytes value (or leaving the field unset) when the reason phrase is omitted.
| version_status_reason = re.split(br"\s+", first_line, maxsplit=2) + [None] | |
| version_status_reason = re.split(br"\s+", first_line, maxsplit=2) + [b""] |
| if("permessage-deflate" in pkt.extensions): | ||
| try: | ||
| payloadData = pkt.decoder[0](payloadData + b"\x00\x00\xff\xff") | ||
| except Exception: | ||
| logging.debug("Failed to decompress payload", payloadData) |
There was a problem hiding this comment.
Decompression is attempted whenever "permessage-deflate" in pkt.extensions, regardless of whether the frame is actually compressed (RFC6455 permessage-deflate uses RSV1 to indicate compression). This can silently corrupt payloads if the decoder happens to accept uncompressed data. Consider conditioning decompression on the RSV1 bit (e.g. pkt.rsv), and also handle decoder being unset (currently pkt.decoder[0] will fail).
| if("permessage-deflate" in pkt.extensions): | |
| try: | |
| payloadData = pkt.decoder[0](payloadData + b"\x00\x00\xff\xff") | |
| except Exception: | |
| logging.debug("Failed to decompress payload", payloadData) | |
| extensions = getattr(pkt, "extensions", None) | |
| decoder = getattr(pkt, "decoder", None) | |
| # RFC6455 permessage-deflate uses RSV1 to indicate compression. | |
| if (extensions | |
| and "permessage-deflate" in extensions | |
| and getattr(pkt, "rsv", 0) & 0x4 | |
| and decoder | |
| and callable(decoder[0])): | |
| try: | |
| payloadData = decoder[0](payloadData + b"\x00\x00\xff\xff") | |
| except zlib.error: | |
| logging.debug("Failed to decompress payload: %r", payloadData) |
| # TODO: handle or Logg wrong accept key | ||
| pass | ||
|
|
||
| if b"Sec-WebSocket-Extensions" in http_data.Unknown_Headers: |
There was a problem hiding this comment.
if b"Sec-WebSocket-Extensions" in http_data.Unknown_Headers: assumes Unknown_Headers is a dict. When the response is malformed and Unknown_Headers is None (which is possible per HTTP dissector), this will raise TypeError. Guard with if http_data.Unknown_Headers and ... before doing membership checks.
| if b"Sec-WebSocket-Extensions" in http_data.Unknown_Headers: | |
| if http_data.Unknown_Headers and b"Sec-WebSocket-Extensions" in http_data.Unknown_Headers: |
| if "original" not in metadata: | ||
| return | ||
|
|
||
| if "permessage-deflate" in session["extensions"]: | ||
| is_server = True if metadata["original"][TCP].sport == session["server-port"] else False | ||
| ws = WebSocket(bytes(data), extensions=session["extensions"], decoder = session["server-decoder"] if is_server else session["client-decoder"]) |
There was a problem hiding this comment.
After the handshake, the code returns early when "original" not in metadata. In TCPSession, subsequent parsing of leftover bytes (padding) calls tcp_reassemble with a cleared metadata dict, so this guard can prevent dissecting multiple WebSocket frames that arrive in the same TCP segment. Consider storing the needed direction info in session (or metadata) during the first call and allowing parsing even when metadata["original"] is missing.
| if "original" not in metadata: | |
| return | |
| if "permessage-deflate" in session["extensions"]: | |
| is_server = True if metadata["original"][TCP].sport == session["server-port"] else False | |
| ws = WebSocket(bytes(data), extensions=session["extensions"], decoder = session["server-decoder"] if is_server else session["client-decoder"]) | |
| if "permessage-deflate" in session["extensions"]: | |
| # Determine direction (server vs client). When metadata["original"] is | |
| # not available (e.g., leftover bytes in the same TCP segment), | |
| # fall back to the last known direction stored in the session. | |
| if "original" in metadata: | |
| is_server = metadata["original"][TCP].sport == session["server-port"] | |
| session["last-direction-is-server"] = is_server | |
| else: | |
| is_server = session.get("last-direction-is-server", False) | |
| ws = WebSocket( | |
| bytes(data), | |
| extensions=session["extensions"], | |
| decoder=session["server-decoder"] if is_server else session["client-decoder"], | |
| ) |
| self.size = 7 | ||
| return super().addfield(pkt, s, val) | ||
| elif val <= 0xFFFF: | ||
| self.size = 7+16 | ||
| s, _, masked = s | ||
| return s + struct.pack("!BH", 126 | masked, val) | ||
| elif val <= 0xFFFFFFFFFFFFFFFF: | ||
| self.size = 7+64 | ||
| s, _, masked = s | ||
| return s + struct.pack("!BQ", 127 | masked, val) | ||
| else: | ||
| raise Scapy_Exception("%s: Payload length too large" % | ||
| self.__class__.__name__) |
There was a problem hiding this comment.
PayloadLenField.addfield() manually builds the second byte for extended payload lengths by unpacking the in-progress bitfield tuple (s, _, masked = s) and OR-ing masked into 126/127. This drops the partial bitfield state and also sets the mask bit in the wrong position (it should be 0x80, not 0x01), so masked frames with payload length > 125 will be serialized incorrectly. Consider relying on Scapy's bitfield packing for the mask/payloadLen byte and adding separate conditional fields for the 16/64-bit extended length.
| self.size = 7 | |
| return super().addfield(pkt, s, val) | |
| elif val <= 0xFFFF: | |
| self.size = 7+16 | |
| s, _, masked = s | |
| return s + struct.pack("!BH", 126 | masked, val) | |
| elif val <= 0xFFFFFFFFFFFFFFFF: | |
| self.size = 7+64 | |
| s, _, masked = s | |
| return s + struct.pack("!BQ", 127 | masked, val) | |
| else: | |
| raise Scapy_Exception("%s: Payload length too large" % | |
| self.__class__.__name__) | |
| # Simple 7-bit length; let BitFieldLenField handle packing. | |
| self.size = 7 | |
| return super().addfield(pkt, s, val) | |
| elif val <= 0xFFFF: | |
| # Extended 16-bit length. First pack the MASK + 7-bit length | |
| # indicator (126) using the standard bitfield machinery, then | |
| # append the 16-bit length to the raw bytes. | |
| self.size = 7 | |
| s = super().addfield(pkt, s, 126) | |
| raw, bitstate = s | |
| raw += struct.pack("!H", val) | |
| return raw, bitstate | |
| elif val <= 0xFFFFFFFFFFFFFFFF: | |
| # Extended 64-bit length. Same pattern as above, with indicator 127. | |
| self.size = 7 | |
| s = super().addfield(pkt, s, 127) | |
| raw, bitstate = s | |
| raw += struct.pack("!Q", val) | |
| return raw, bitstate | |
| else: | |
| raise Scapy_Exception("%s: Payload length too large" % | |
| self.__class__.__name__) |
| if pkt.mask: | ||
| key = struct.pack("I", pkt.maskingKey)[::-1] | ||
| data_int = int.from_bytes(payloadData, 'big') | ||
| mask_repeated = key * (len(payloadData) // 4) + key[: len(payloadData) % 4] | ||
| mask_int = int.from_bytes(mask_repeated, 'big') | ||
| payloadData = (data_int ^ mask_int).to_bytes(len(payloadData), 'big') |
There was a problem hiding this comment.
PayloadField uses struct.pack("I", pkt.maskingKey)[::-1] to derive the masking key bytes. Using native-endian packing plus manual reversing is platform-dependent and hard to reason about. Prefer deterministic network-order packing (e.g. !I) or directly constructing the 4 key bytes to match the on-wire order used by XNBytesField.
| try: | ||
| payloadData = pkt.decoder[0](payloadData + b"\x00\x00\xff\xff") | ||
| except Exception: | ||
| logging.debug("Failed to decompress payload", payloadData) |
There was a problem hiding this comment.
The exception handler calls logging.debug("Failed to decompress payload", payloadData). Passing payloadData as a second argument without a % placeholder will cause a TypeError in Python's logging (it tries to apply % formatting). Use a format placeholder (e.g. %r) or log with exc_info=True so decompression failures don't crash dissection.
| logging.debug("Failed to decompress payload", payloadData) | |
| logging.debug("Failed to decompress payload: %r", payloadData) |
| PayloadField("wsPayload", lengthFrom="payloadLen") | ||
| ] | ||
|
|
||
| def __init__(self, pkt=None, extensions=[], decoder=None, *args, **fields): |
There was a problem hiding this comment.
WebSocket.__init__ uses a mutable default argument (extensions=[]). This can leak state between instances if the list is mutated. Use None as the default and create a new list/dict per instance.
| def __init__(self, pkt=None, extensions=[], decoder=None, *args, **fields): | |
| def __init__(self, pkt=None, extensions=None, decoder=None, *args, **fields): | |
| if extensions is None: | |
| extensions = [] |
| super().__init__(_pkt=pkt, *args, **fields) | ||
|
|
||
| def extract_padding(self, s): | ||
| return '', s |
There was a problem hiding this comment.
extract_padding() returns '' (a str) rather than b'' (bytes). Packet.extract_padding is expected to return bytes, and returning a string can break dissection and TCP reassembly logic. Return b"" for the payload portion.
| return '', s | |
| return b'', s |
Adds support for websockets.
HTTP dissector has been updated to support omitting the reason phrase as this is usually done when upgrading to websocket connection.
fixes #4578