WORK-IN-PROGRESS
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline https://medium.com/@yinondn/streaming-data-changes-to-a-data-lake-with-debezium-and-delta-lake-pipeline- (dc3
This is an example end-to-end project that demonstrates the Debezium-Delta Lake combo pipeline
See medium post for more details
High Level Strategy Overview
)
- voter-processing: Notebook with PySpark code that transforms Debezium messages to INSERT, UPDATE and DELETE operations
- fake_it: For an end-to-end example, a simulator of a voters book application’s database with live input
-
- cd compose
- docker-compose up -d
-
curl -i -X POST -H “Accept: application / json” -H “Content-Type: application / json” http: // localhost: 299821053 / connectors / -d @ debezium / config.json
Import the notebook file in voter-processing voter-processing.html to a Databricks Community account and follow the instructions inside the notebook
TODO – To complete the end-to-end example flow Config Debezium connector
- Change the voter-processing from notebook to PySpark application
- Add the PySpark application to the Docker-Compose
- Change the configurations so that Kafka writes to local file system instead of S3
- Change the Spark application so that it read Kafka’s output instead of generating it’s own mock data
-
What’s Next?
Make it a configurable generic tool that can be assembled on top of any supported database
(Read More )
GIPHY App Key not set. Please check settings