Category Archives: Big Data

Apache Spark Presentation

I’ve published online the presentation on Apache Spark I’ve made for a introductory lecture to graduate students at Maastricht University. If interested, please take a look at the presentation here. Advertisements

Posted in Big Data, Data Engineering, Data Systems | Leave a comment

My articles for Sonra Intelligence

Apache Airflow Using Apache Airflow to build reusable ETL on AWS Redshift Apache Kafka + Spark Streaming + Redshift Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka Advanced Spark Structured Streaming – Aggregations, Joins, Checkpointing Snowflake … Continue reading

Posted in Big Data, Data Engineering, Data Warehousing, Data Systems | Leave a comment

Using Spark Structured Streaming to upsert Kafka messages into a database

I wrote a detailed and technical blog post demonstrating an integration of Spark Structured Streaming with Apache Kafka messages and Snowflake. An overview of the content is: querying Twitter API for realtime tweets setting up a Kafka server producing messages … Continue reading

Posted in Big Data, Data Engineering, Python | Leave a comment

Advanced Spark Structured Streaming – Aggregations, Joins, Checkpointing

I wrote a blog post demonstrating advanced Spark Structured Streaming topics. An overview of the content is: setting up a Kafka server producing messages with Kafka consuming tweets with Spark Structured Streaming watermarking messages parsing JSON data performing aggregattion queries … Continue reading

Posted in Big Data, Data Engineering, Python | Leave a comment

Clustering keys Snowflake

I’ve written a blog post covering in depth the new Clustering Keys feature of Snowflake. In the post I explained the internals of how clustering keys work in a Shared-Disk architecture of Snowflake and compared it to Redshift. I also … Continue reading

Posted in Big Data, Data Engineering | Leave a comment

Writing UDAFs on Snowflake

I’ve written a blog post explaining in-depth how to create User Defined Aggregate Functions (UDAFs) using Javascript on Snowflake. You can find the blog post here. A small preview:

Posted in Big Data, Data Engineering | Leave a comment

Apache Airflow for data pipelines and ETL management

I have been looking for good workflow management software and found Apache Airflow to be superior to other solutions. I’ve taken some time to write a pretty detailed blog post on using Airflow for development of ETL pipelines. Airflow is … Continue reading

Posted in Big Data, Data Engineering, Python | Leave a comment