Kafka is a distributed streaming platform that is technically implemented as a series of append-only logs that make up different topics and partitions within topics.  When new data is written into Kafka, it's appended to the end of one of the logs.

Data retention period

To avoid running out of disk space Kafka, by default, drops oldest messages from the beginning of each log after their retention period expires.  Aiven Kafka allows users to configure the retention period on a per-topic basis.

The maximum retention periods are not limited by Aiven in any way and setting retention to value -1 will disable time-based content expiration altogether ("unlimited retention"). To change this per topic, you have to go to the specific Topic you want to set infinite retention and modify it by setting the Retention (hours) to -1.

Retention can also be done at a broker level, in the advanced configuration by setting the retention period to -1, kafka.log_retention_hours.

NOTE: Using high retention periods without monitoring the available storage space can lead to running of out disk space and these situations are not covered by our SLA.

Do note that as the data is not being backed up, the primary mechanism for keeping your Kafka data durable is the number of Kafka brokers that your topic is set to replicate to. Setting the retention period for a topic to be unlimited just makes sure that Kafka itself does not truncate older data.

Log compaction 

Instead of dropping all data in the beginning of a topic, the topic can also be configured to utilize Log compaction, which allows the latest value for each record key to be retained even if the retention period of the message has expired. Using log compaction thus requires messages to always have an explicit key.  Log compaction can be enabled for new topics in Aiven console's Kafka topic management section.

Got here by accident? Learn how Aiven simplifies working with Apache Kafka:

Did this answer your question?