Kafka MirrorMaker 2 is an advanced data replication service for Apache Kafka.
In this guide, we are covering setting up and monitoring data replication between two Kafka clusters using MirrorMaker 2 on Aiven using the Aiven Console UI and command line client.
Concepts
Cluster alias is the name under which a Kafka cluster is known to MirrorMaker.
Replication flow is the flow of data between two Kafka clusters (called the source and target) executed by MirrorMaker. One MirrorMaker service can execute multiple replication flows.
Remote topics are topics replicated by MirrorMaker from a source cluster to a target cluster. There is only one source topic for each remote topic. Remote topics refer to the source cluster by the prefix in the name: {source_cluster_alias}.{source_topic_name}
.
MirrorMaker preserves partitioning and order of records between source and remote topics.
Initial setup
Let's say we have an Aiven project shop
. In this project, there are two Kafka clusters: kafka-primary
and kafka-backup
.
kafka-primary
contains the following topics:
customer.orders
;customer.info
;customer.support
; andwarehouse.operations
.
Our goal is to replicate data from topics customer.orders
, customer.info
, and warehouse.operations
but not customer.support
to kafka-backup
.
Also, there is InfluxDB service influxdb
and Grafana service grafana
.
Setting up replication with Aiven Console UI
First, we need to let the MirrorMaker know about our two Kafka clusters on which it’s going to operate. This is done via service integrations.
To setup the integration for kafka-primary
, open its service page in the Console and and under Service Integrations option, click Manage integrations which will bring up a list of available integrations for your service.
Select Kafka MirrorMaker from the provided list and click Use integration.
In the dialog, select New service and click Continue. Enter the service name (for this tutorial, let's name it mirrormaker
), select the cloud and plan and click Continue.
For the integration, we need to specify the cluster alias. This is optional. If the alias is not specified, it’s <project_name>.<service_name>
. Let’s assume we want kafka-primary
to be called primary
and kafka-backup
, backup
.
Enter the Cluster alias, primary
, and Create and enable.
Repeat this sequence for kafka-backup
, but instead of creating a new MirrorMaker service, select the existing.
Now, when both Kafka clusters are known to MirrorMaker, let’s set up a replication flow between them. Open the mirrormaker
service page in the Console and go to Replication flows tab, click Create replication flow button.
Source Cluster and Target Cluster are the aliases of the clusters MirrorMaker will be replicating data from and to.
Topics is a list of regular expressions (in Java format). Using it, MirrorMaker will select topics from the source Kafka cluster for replication. Topics Blacklist is similar but serves the opposite purpose, to prohibit replication of some topics.
Topics and Topics Blacklist specified as above helps us to replicate customer.orders
, customer.info
, and warehouse.operations
but not customer.support
.
When everything is set, click Save.
Now, if the topics in kafka-primary
are not empty, their remote counterparts will be created in kafka-backup
and data will be replicated.
Apart from the topics we're considering, some additional service topics will be created in both kafka-primary
and kafka-backup
.
Setting up replication with Aiven command line client
Starting MirrorMaker service
To start a MirrorMaker service, use the following command:
avn service create --project shop -t kafka_mirrormaker \
--cloud google-europe-west1 --plan startup-4 mirrormaker
Here, we're using google-europe-west1
region and startup-4
plan for MirrorMaker.
Configuring service integrations
Now, we need to let the MirrorMaker know about our two Kafka clusters on which it’s going to operate. This is done via service integrations. The service integration type we need is kafka_mirrormaker
and its direction is always from a Kafka cluster to a MirrorMaker.
In the integration, we specify the cluster alias. This is optional. If the alias is not specified, it’s <project_name>.<service_name>
. Let’s assume we want kafka-primary
to be called primary
and kafka-backup
, backup
.
avn service integration-create --project shop -t kafka_mirrormaker \
-s kafka-primary -d mirrormaker -c cluster_alias=primary
avn service integration-create --project shop -t kafka_mirrormaker \
-s kafka-backup -d mirrormaker -c cluster_alias=backup
Configuring the replication flow
Now, when both Kafka clusters are known to MirrorMaker, let’s set up a replication flow between them.
The replication flow configuration looks like the following:
{
"source_cluster": "primary",
"target_cluster": "backup",
"enabled": true,
"topics": [
"customer\\..*",
"warehouse\\.operations"
],
"topics.blacklist": [
"customer\\.support"
]
}
topics is a list of regular expressions (in Java format). Using it, MirrorMaker will select topics from the source Kafka cluster for replication. topics.blacklist is similar but serves the opposite purpose, to prohibit replication of some topics.
topics and topics.blacklist specified above helps us to replicate customer.orders, customer.info, and warehouse.operations but not customer.support.
Let's add this replication flow to our MirrorMaker service:
avn mirrormaker replication-flow update --project shop mirrormaker \
--source-cluster primary --target-cluster backup \
'{
"enabled": true,
"topics": [
"customer\\..*",
"warehouse\\.operations"
],
"topics.blacklist": [
"customer\\.support"
]
}'
Now, if the topics in kafka-primary are not empty, their remote counterparts will be created in kafka-backup and data will be replicated.
Apart from the topics we're considering, some additional service topics will be created in both kafka-primary and kafka-backup.
Setting up monitoring
The metrics about replication flows run by MirrorMaker are collected and can be exported via a metric integration. Let's set up a metric integration with InfluxDB:
avn service integration-create --project shop \
-t metrics -s mirrormaker -d influxdb
Shortly after that first metrics will appear in InfluxDB in measurement kafka_mirrormaker_summary.
Now we can visualize and monitor them in Grafana:
Learn more about MirrorMaker 2 on our product page, and get the full story in our announcement post here.
Integration with an external Kafka cluster
An external Kafka cluster is most typically a cluster not manged by Aiven. Also it may be an Aiven Kafka service, but located in a project different from the project of the MirrorMaker service.
To allow MirrorMaker to access the external Kafka cluster, first, an integration endpoint must be created. You need to use the command line client for this at the moment:
avn service integration-endpoint-create --project shop \
--endpoint-name my-external-kafka-endpoint \
--endpoint-type external_kafka \
-c bootstrap_servers="my-external-kafka.net:19435" \
-c security_protocol="SSL" \
-c ssl_ca_cert="$(cat ca.pem)" \
-c ssl_client_cert="$(cat service.cert)" \
-c ssl_client_key="$(cat service.key)"
security_protocol can be SSL and PLAINTEXT as of now. For PLAINTEXT, nothing additional needs to be provided. For SSL, it's possible to provide the CA certificate, client certificate and key.
Then, we need to get the ID of the endpoint:
avn service integration-endpoint-list --project shop
Copy the value from ENDPOINT_ID
.
And finally, an integration between the endpoint and the MirrorMaker service. This is done very similar to the integrations between a Kafka service and a MirrorMaker service:
avn service integration-create --project shop -t kafka_mirrormaker \
-S '$ENDPOINT_ID' -d mirrormaker \
-c cluster_alias=external-kafka
Note the only difference: instead of -s
parameter with the source service name, -S
with the integration endpoint name is used.
The rest (e.g. setting up replication flows) is the same to the service-to-service integration.