-
Notifications
You must be signed in to change notification settings - Fork 14
[FAQ] PyFlink job keeps restarting with JSON deserialization error #244
Copy link
Copy link
Open
Labels
Description
Course
data-engineering-zoomcamp
Question
Why does the PyFlink streaming job fail with a JSON deserialization error when consuming records from the Kafka/Redpanda topic?
Answer
The Green Taxi dataset contains NaN values in some numeric fields (for example passenger_count). When the producer serializes rows to JSON and sends them to the Kafka topic, those NaN values appear in the payload. Flink’s JSON parser does not accept NaN because it is not valid JSON, so the Kafka source fails during deserialization and the job keeps restarting.
Fix: clean the dataset before producing events by replacing NaN values with null or a valid number before serialization.
Checklist
- I have searched existing FAQs and this question is not already answered
- The answer provides accurate, helpful information
- I have included any relevant code examples or links
Reactions are currently unavailable