We’re excited to share a new connector in Popsink: Source Connector for Snowflake. This new connector enables Change Data Capture (CDC) directly from Snowflake as a data source. For data teams, this means automated, continuous and consistent data replication from Snowflake.
Dec 25, 2024
We’re thrilled to announce Popsink’s latest connector: the Popsink Source Connector for Snowflake. This connector gives you Change Data Capture (CDC) capabilities directly on Snowflake tables, allowing for real-time, accurate, and reliable data synchronization from Snowflake to all your tools.
This connector is ideal for syncing data across systems and enriching your existing tools with the latest data. For example, it can be used to ensure your CRM always contains up-to-date insights pulled directly from Snowflake, allowing for more accurate customer interactions. It also simplifies the process of syncing analytical data with ERP systems, ensuring smooth business operations by keeping decision-making tools aligned with real-time data. Additionally, the connector is ideal for ensuring that operational systems, such as order management or customer service platforms, have access to the most up-to-date data. By capturing and delivering changes directly from Snowflake, it eliminates reliance on batch processing, enabling smoother workflows and reducing delays in critical decision-making processes. This makes your data integration lightweight, efficient, and ready for real-time business needs.
Popsink uses CDC to continuously capture and replicate data as incremental changes from Snowflake to your consumer systems. This means a lot from a data operations perspectives and drives a number of advantages compared to historical copy-pasting mechanisms
1. Guaranteed Data Consistency: changes are captured and tracked individually which means that data losses are prevented from source, and data consistency can be guaranteed in consumer systems.
2. Fully Automated Process: because CDC is continuous in nature, changes are propagated “at-it-happens” which means there is no need to worry about orchestration and triggering updates. When data changes in Snowflake, it is automatically updated everywhere.
3. Resource Efficiency: by capturing only changes, Popsink’s Snowflake connector replicates data incrementally, and by being continuous, it spreads resources over time: This minimizes resource usage and translates into cost efficiency.
4. Low Latency: another key benefit of continuous replication is that it enables low latency data replication. This means that consumer and source systems are consistent with one another and show no data discrepancy.
Popsink’s Source Connector for Snowflake leverages the internal mechanisms of Snowflake to consistently track, parse, and replicate data that changes in Snowflake tables. Tables in Snowflake can be set to track change operations which can then be consumed via Streams. Streams capture deltas over time, allowing data teams to extract only the changes since the last checkpoint. Thanks to Snowflake’s zero-copy cloning and efficient metadata management, CDC has minimal impact on performance. This change data can then be seamlessly ingested into downstream systems without the need for complex transformations, ensuring schema consistency and reliability.
Popsink’s connector takes CDC to the next level by simplifying and automating the entire process. Pipelines can be set up in minutes with minimal configuration, and schema changes are handled automatically, reducing maintenance overhead. By integrating directly with Snowflake Streams, Popsink automates the capture and processing of data changes, ensuring consistent, low-latency updates to your target systems. This lightweight approach makes it an efficient solution for modern data needs.
To enable change tracking in Snowflake, simply execute the following command on the tables you wish to replicate:
ALTER TABLE <name> SET CHANGE_TRACKING = TRUE
To summarise, 3 main mechanics are used:
1. Change Tracking: Snowflake tables are configured to track changes at a granular level, capturing INSERT, UPDATE, and DELETE operations.
2. Streams: Popsink uses Snowflake Streams to track changes made to a table over time in a more resource-efficient way. Streams record the delta between two points in time, making it easy to extract consistent changes without the risk of missing any datapoint.
3. Zero Copy: By leveraging Snowflake’s zero-copy cloning and efficient metadata management, CDC operations don’t require duplicating data inside your Snowflake data warehouse. Instead they simply add invisible columns that Popsink uses to capture changes incrementally.
Cost-efficiency: avoid the overhead of full table refreshes by capturing only what has changed.
Automation: focus on analytics and insights while Popsink takes care of data movement and consistency.
Flexibility: adapt seamlessly to changing schemas or scaling data needs without additional engineering effort.
Whether you’re syncing Snowflake data with operational tools, managing a data lake, or optimizing pipelines for cost and reliability, this connector helps you achieve your goals with minimal effort. Check it out on our website or start a free trial to see how it works. Do reach out for questions or feedback; we’d love to hear how you’re using CDC to enhance your data processes!