Aiven InfluxDB services may run out of memory when a lot of metrics points are stored in the DB causing the number of total series grow to such high levels that the server does not have enough memory to handle all of them. When this situation happens, Aiven automation reacts to such service unavailability by automatically rebuilding a replacement server.
The most common cause for the high InfluxDB memory usage is that there is a large number of unique metrics series in the database and the current InfluxDB TSM storage engine requires some memory for each series. Memory allocated by the InfluxDB process until it completely runs out of available memory.
The options in such cases are either reducing the number of individual series or upgrading the service plan to a larger one that has more memory. A very high number of unique series is most often caused by using many unique tag combinations when writing data into the DB.
NOTE: Each unique tag combination is effectively a series of its own in the TSM storage engine.
Here are a couple of links describing series cardinality and related issues:
InfluxDB 1.4 and greater have a few query language commands for investigating series cardinality. You can find more information about these commands from the InfluxDB documentation.
Aiven support can also provide you with a more detailed cardinality report as given by the
influx_inspect report command. Please contact support via the Aiven console online chat tool or via firstname.lastname@example.org to request a report.
InfluxDB 1.4 also introduced a new database format called TSI that is designed to handle very high series cardinality, but the technology is not production ready quite yet in InfluxDB 1.4.