CDC is a method to increase the efficiency of data movements by capturing changes of data instead of periodic snapshots. The benefits: greater speed, less processing, less system resources.
Extract - Load is a data integration pattern for transferring raw data from source systems to a target data storage system, typically a data warehouse or data lake. The process involves extracting a dataset from its source, loading it directly into the target storage, and then transforming the data for querying and analysis purposes.
ELT processes will periodically read a source system (database, CRM, ERP...) and copy datasets or portions of datasets its contains - this is called an Extract. They will then go on to write the extracted datasets to a destination system (generally a data warehouse or a data lake) as raw data - the Load part. Since the data is loaded raw, it often needs further processing - for instance to identify which records within the new datasets are in fact updates to existing records, or to apply logics to identify records that might have been deleted. This is the Transform part.
CDC is a pattern used to replicate the changes made to data from a source systems - rather than the data itself. CDC identifies changes such as new records, updates to existing records or records deleted. These changes are automatically pushed to a target system, (be it data stores or applications) ensuring it is consistantly up-to-date.
CDC leverages the internals of source systems to capture changes as they happen. This could mean consuming database logs, service events or webhooks. The captured changes are then transferred to the target system to be applied. By transporting only changes and directly applying then in the destination, CDC makes it possible to achieve data consistency accross multiple systems while moving less data in real-time.
Try for yourself! Beyond the 14-day trial, Popsink is free up to 1 million rows per month.