Occasionally, our team receive concerns from our customers regarding the performance of services provided by Aiven versus an equivalent-sized instance with the same service installed.
There can be several reasons as to why the performance may not be as expected when comparing these two instances - in what would appear to be the same node specification - this article aims to address these concerns by detailing some of the reasons why this may be.
Post-migration steps have not been performed
For many data storage applications, especially relational databases, some post-migration steps after restoring a database are required. A few examples are below:
Node pairing & replication
Many of the plans offered by Aiven include a pair of nodes. For example with Aiven for MySQL Business plans, these are provided as a High Availability Pair, thus replication overhead exists between these nodes running Aiven for MySQL. This overhead is not present for a single node without replication enabled. For MySQL this can be particularly problematic if some tables do not contain primary keys.
Aiven's management services continually monitor the health of both the VM and the service running on the VM, such as MySQL, to ensure that it remains healthy and available. It is important to alert of any potential future failure upstream, so that we can react with self-healing and monitoring, before minor blips become major outages. This monitoring requires high frequency polling that does carry some overhead to the CPU, which is about 5% of a single CPU core. This management overhead also allows us to provide certain functionality such as log output, metrics, service monitoring and alerting.
Full disk encryption
Aiven use full volume encryption provided by LUKS. This adds a slight overhead to the system when reading/writing data. Often the service being compared against is not using disk encryption, thus having lower overhead and therefore higher throughput. Read more about this in our developer documentation.
Certain configuration changes can cause performance issues if those changes affect how the service operates. A few examples:
with Aiven for Kafka, if your service is already under heavy load and
min.insync.replicasis increased, then this may overload the service, degrading performance. Such issues can be resolved by upgrading the service plan or by changing the configuration.
with Aiven for PostgreSQL, if using pgbouncer, avoid creating a lot of new connections and try to reuse existing ones. Similarly, for both pgbouncer and non-pgbouncer avoid long-living idle connections if they aren't going to be used for a while.