Below you will find pages that utilize the taxonomy term “data engineering”
Liquibase: Web Framework Independent Database Migration Tool
If you’re familiar with web frameworks such as Rails and Django, you probably know that these frameworks come with ORM and database migration. This is a great feature to have it enables evolutionary database design in your application. The problem with using built-in migration tools such as this is that you’re locked to your web framework. Moreover, independent migration tools such as Liquibase or Flyway may have some features that are not available off-the-self in your framework.
Getting Started With Kafka
Apache Kafka is an open-source framework that allows you to develop real-time applications. In this article, I will jot down some points that may help you save some time and frustration if you’re just learning about Apache Kafka. First of all, to setup a development Kafka environment, it will save you a lot of hassle if you just use confluent distribution of Kafka as opposed to the native Apache version. Download the Confluent Platform from https://docs.
Comparing SQL, Pandas and Spark
Most of us are familiar with writing database queries with SQL. But there are also other ways you can query your data from the database or from a file directly. One way is through a Python package called Pandas or through Apache Spark. Both of them are very popular these days in the Data Science field. If you can fit your data in memory in a single computer, I’d suggest to use Pandas.
Deep Dive Into HDFS Kafka Connect
Previously in this article, I wrote about Kafka Connect. Today, I’m going to get into the details of a type of Kafka Connect called Kafka HDFS Connect that usually comes pre-installed in the confluent distribution of Kafka. If not, it can be easily installed from the Confluent Hub by running the following command from the command line: confluent-hub install confluentinc/kafka-connect-hdfs:latest You can check all the connectors that are installed by: