Using Spark Structured Streaming to upsert Kafka messages into a database

I wrote a detailed and technical blog post demonstrating an integration of Spark Structured Streaming with Apache Kafka messages and Snowflake.

An overview of the content is:

  • querying Twitter API for realtime tweets
  • setting up a Kafka server
  • producing messages with Kafka
  • consuming and parsing Kafka messages with Spark Structured Streaming
  • explanation of the streaming model of Spark Structured Streaming
  • upserting latest data to Snowflake

You can find the full blog post here.

A small preview:

Screen Shot 2018-02-11 at 17.23.07.png


About dorianbg

A Data Engineer based in London, United Kingdom
This entry was posted in Big Data, Data Engineering, Python. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s