Kafka MirrorMaker 2 is an advanced data replication service for Apache Kafka.

In this guide, we are covering setting up and monitoring data replication between two Kafka clusters using MirrorMaker 2 on Aiven using the Aiven Console UI and command line client.

Concepts

Cluster alias is the name under which a Kafka cluster is known to MirrorMaker.

Replication flow is the flow of data between two Kafka clusters (called the source and target) executed by MirrorMaker. One MirrorMaker service can execute multiple replication flows.

Remote topics are topics replicated by MirrorMaker from a source cluster to a target cluster. There is only one source topic for each remote topic. Remote topics refer to the source cluster by the prefix in the name: {source_cluster_alias}.{source_topic_name}.

MirrorMaker preserves partitioning and order of records between source and remote topics.

Initial setup

Let's say we have an Aiven project shop. In this project, there are two Kafka clusters: kafka-primary and kafka-backup.

kafka-primary contains the following topics:

  • customer.orders;
  • customer.info;
  • customer.support; and
  • warehouse.operations.

Our goal is to replicate data from topics customer.orders, customer.info, and warehouse.operations but not customer.support to kafka-backup.

Also, there is InfluxDB service influxdb and Grafana service grafana.

Setting up replication with Aiven Console UI

First, we need to let the MirrorMaker know about our two Kafka clusters on which it’s going to operate. This is done via service integrations.

To setup the integration for kafka-primary, open its service page in the Console and and under Service Integrations option, click Manage integrations which will bring up a list of available integrations for your service.

Manage integrations button

Select Kafka MirrorMaker from the provided list and click Use integration.

Kafka MirrorMaker integration selection

In the dialog, select New service and click Continue. Enter the service name (for this tutorial, let's name it mirrormaker), select the cloud and plan and click Continue.

For the integration, we need to specify the cluster alias. This is optional. If the alias is not specified, it’s <project_name>.<service_name>. Let’s assume we want kafka-primary to be called primary and kafka-backup, backup.

Configuring Cluster alias

Enter the Cluster alias, primary, and Create and enable.

Repeat this sequence for kafka-backup, but instead of creating a new MirrorMaker service, select the existing.

Now, when both Kafka clusters are known to MirrorMaker, let’s set up a replication flow between them. Open the mirrormaker service page in the Console and go to Replication flows tab, click Create replication flow button.

Source Cluster and Target Cluster are the aliases of the clusters MirrorMaker will be replicating data from and to.

Topics is a list of regular expressions (in Java format). Using it, MirrorMaker will select topics from the source Kafka cluster for replication. Topics Blacklist is similar but serves the opposite purpose, to prohibit replication of some topics.

Topics and Topics Blacklist specified as above helps us to replicate customer.orders, customer.info, and warehouse.operations but not customer.support.

When everything is set, click Save.

Now, if the topics in kafka-primary are not empty, their remote counterparts will be created in kafka-backup and data will be replicated.

Apart from the topics we're considering, some additional service topics will be created in both kafka-primary and kafka-backup.

Setting up replication with Aiven command line client

Starting MirrorMaker service

To start a MirrorMaker service, use the following command:

avn service create --project shop -t kafka_mirrormaker \
--cloud google-europe-west1 --plan startup-4 mirrormaker

Here, we're using google-europe-west1 region and startup-4 plan for MirrorMaker.

Configuring service integrations

Now, we need to let the MirrorMaker know about our two Kafka clusters on which it’s going to operate. This is done via service integrations. The service integration type we need is kafka_mirrormaker and its direction is always from a Kafka cluster to a MirrorMaker.

In the integration, we specify the cluster alias. This is optional. If the alias is not specified, it’s <project_name>.<service_name>. Let’s assume we want kafka-primary to be called primary and kafka-backup, backup.

avn service integration-create --project shop -t kafka_mirrormaker \
-s kafka-primary -d mirrormaker -c cluster_alias=primary

avn service integration-create --project shop -t kafka_mirrormaker \
-s kafka-backup -d mirrormaker -c cluster_alias=backup

Configuring the replication flow

Now, when both Kafka clusters are known to MirrorMaker, let’s set up a replication flow between them.

The replication flow configuration looks like the following:

{
  "source_cluster": "primary",
  "target_cluster": "backup",
  "enabled": true,
  "topics": [
    "customer\\..*",
    "warehouse\\.operations"
  ],
  "topics.blacklist": [
    "customer\\.support"
  ]
}

topics is a list of regular expressions (in Java format). Using it, MirrorMaker will select topics from the source Kafka cluster for replication. topics.blacklist is similar but serves the opposite purpose, to prohibit replication of some topics.

topics and topics.blacklist specified above helps us to replicate customer.orders, customer.info, and warehouse.operations but not customer.support.

Let's add this replication flow to our MirrorMaker service:

avn mirrormaker replication-flow update --project shop mirrormaker \
--source-cluster primary --target-cluster backup \
'{
  "enabled": true,
  "topics": [
    "customer\\..*",
    "warehouse\\.operations"
  ],
  "topics.blacklist": [
    "customer\\.support"
  ]
}'

Now, if the topics in kafka-primary are not empty, their remote counterparts will be created in kafka-backup and data will be replicated.

Apart from the topics we're considering, some additional service topics will be created in both kafka-primary and kafka-backup.

Setting up monitoring

The metrics about replication flows run by MirrorMaker are collected and can be exported via a metric integration. Let's set up a metric integration with InfluxDB:

avn service integration-create --project shop \
-t metrics -s mirrormaker -d influxdb

Shortly after that first metrics will appear in InfluxDB in measurement kafka_mirrormaker_summary.

Now we can visualize and monitor them in Grafana:

Learn more about MirrorMaker 2 on our product page, and get the full story in our announcement post here.

Did this answer your question?