Aiven does not place additional restrictions on the number of indexes or shard counts you can use for your managed Elasticsearch service. Per-index default shard count limit (1024) applies.
Designing index usage
When to create a new index per customer/project/entity?
You have a very limited number of entities (tens, not hundreds or thousands), and
It is very important you can easily and efficiently delete all the data related to a single entity.
For example, storing logs or other events on per-date indexes (
logs_2018-07-21 etc.) adds value assuming old indexes are cleaned up. If you have low-volume logging and want to keep indexes for very long time (years?), consider per-week or per-month indexes instead.
When not to create a new index per customer/project/entity?
You have potentially a very large number of entities (thousands), or you have hundreds of entities and need multiple different indexes for each and every one, or
You expect a strong growth in number of entities, or
You have no other reason than separating different entities from each other.
Instead of creating something like
items_project_a , consider using a single
items index with a field for project identifier, and query the data with Elasticsearch filtering. This will be far more efficient usage of your Elasticsearch service.
How many shards is a good idea?
Shards are not free. Elasticsearch has to store state information for each shard, and continuously check shards.
Number of shards depends heavily on the amount of data you have. Somewhere between a few gigabytes and a few tens of gigabytes per shard is a good rule of thumb.
If you know you will have a very small amount of data but many indexes, start with 1 shard, and split the index if necessary.
If you estimate you will have tens of gigabytes of data, start with 5 shards per index in order to avoid splitting the index for a long time.
If you estimate you will have hundreds of gigabytes of data, start with something like (amount of data in gigabytes) / 10 for
number_of_shards. For 250GB index, that would be 25 shards.
If you estimate you will have terabytes of data, increase shard size a bit. For example, for 1TB index 50 shards could be a relevant suggestion.
These suggestions are only indicative - optimal values depend heavily on your usage pattern and forecasted growth of data in Elasticsearch.
You can change number of shards without losing your data, but this process will require a brief downtime when index is rewritten.
Having a large number of indexes or shards affect performance you get out from Elasticsearch. Some rough numbers from three-node Aiven Elasticsearch business-8 cluster:
1 000 shards: no visible effect in Elasticsearch performance.
10 000 shards is already quite a lot - creating new shards starts to take longer and longer time. Variance in performance grows.
15 000 shards: creating new shards takes significantly longer time, often tens of seconds.
20 000 shards: inserting new data randomly takes significantly longer times (20x longer than mean). Similarly, variance in search performance grows significantly.
Aiven Elasticsearch takes a snapshot once every hour. With 10 000 shards cluster is continuously taking new backups and deleting old backups from backup storage. This will naturally affect service performance, as part of the capacity is continuously in use for managing backups.
Got here by accident? Learn how Aiven simplifies working with Elasticsearch: