Category Archives: Data Systems

Apache Spark Presentation

I’ve published online the presentation on Apache Spark I’ve made for a introductory lecture to graduate students at Maastricht University. If interested, please take a look at the presentation here. Advertisements

Posted in Big Data, Data Engineering, Data Systems | Leave a comment

My articles for Sonra Intelligence

Apache Airflow Using Apache Airflow to build reusable ETL on AWS Redshift Apache Kafka + Spark Streaming + Redshift Streaming Tweets to Snowflake Data Warehouse with Spark Structured Streaming and Kafka Advanced Spark Structured Streaming – Aggregations, Joins, Checkpointing Snowflake … Continue reading

Posted in Big Data, Data Engineering, Data Systems, Data Warehousing | Leave a comment

Loading Data into Snowflake Data Warehouse and performance of joins

I wrote a detailed article showing how to load 6GB of data into Snowflake using the PUT and COPY INTO commands. Then I evaluated the performance of joins and how caching and instance size affects them. You find the full … Continue reading

Posted in Data Engineering, Data Systems, Data Warehousing | Leave a comment

Caching in Snowflake Data Warehouse

I wrote a technical article covering how Snowflake uses caching on several layers (virtual warehouses caching data and caching of result sets). In the article I also explain how this works and what are the benefits of caching. You can read … Continue reading

Posted in Data Engineering, Data Systems, Data Warehousing | Leave a comment

My favorite features of Snowflake Data Warehouse

I wrote a blog post describing my 10 favorite features of Snowflake. You can find the full blog post here. A small preview:

Posted in Data Systems, Data Warehousing | Leave a comment