Snowflake is becoming the Go-to platform for analytics and many companies are now looking to take advantage of this fantastic cloud data warehouse platform that has resolved most of the pain points IT community has faced for last 15 years.
In this blog, we will not discuss the advantages of snowflake vs Hadoop , but rather we will show hos to migrate from an on-premise system to a cloud based DWH.
PS : what we refer to when we say Hadoop, is mainly the analytics part of it, since 80% of Hadoop projects are mainly designed to host a Data analytics platforms.
There are 3 different ways you may consider before you start this journey :
- LIFT AND SHIFT
- LIFT, “IMPROVE” AND SHIFT
we will go through details of each.
How to migrate from Hadoop streaming projects ( Spark Streaming & Kafka ) to Snowflake :
Companies need to work on a non-disruptive migration that allows to simplify and ease the process of real-time ingestions, snowflake has easy integration with Apache Kafka.
Customers using the publish/subscribe Kafka platform can now push data directly into the snowflake tables using Kafka Connector with a minimal effort.
A simple path would be one table for each topic, with messages containing one row each, the table will contain two semi-structured columns ( Called VARIANT in snowflake ) which are RECORD_METADATA and RECORD_CONTENT, the content could be either JSON or Avro.
Tables can be automatically created by the Kafka connector, or if the user has a predefined table, the other columns apart from those two mentioned must be nullable.
A small schema below to illustrate the flow post-migration