If the disk usage of a cluster is growing, yet there is the amount of data in the actual database is not growing significantly, one potential culprit are inactive or lagging replication slots.

You can check the replication slot status yourself using this command in psql:

SELECT slot_name,restart_lsn FROM pg_replication_slots;

The output will look something like this:

   slot_name   │ restart_lsn 
───────────────┼─────────────
 pghoard_local │ 6E/16000000
      debezium | 5B/8B0  
(2 rows)

The critical thing to look at is the restart_lsn. If the value is much less than what pghoard_local has (e.g. the first two hex values differ, in this case for example they differ by 0x6E-0x5B = 19), there are plenty of write-ahead-logs (WALs) logs that are waiting for debezium connector to catch up. 

As we have noted that debezium connector is broken, if we are still using debezium, we could of course e.g. restart it and hope it sets up the replication slot again. If we are no longer even using debezium, we can remove it by running:

SELECT pg_drop_replication_slot('debezium');

and after the next PostgreSQL checkpoint the disk used by the WAL logs just for debezium connector's sake should be gone. Note that the checkpoint will occur only when:

  • an hour has elapsed ( we use checkpoint_timeout of 3600 seconds), or
  • there has been 5% of disk writes ( we use max_wal_size which is 5% of instance storage)

For further information about WALs and checkpointing, refer to https://www.postgresql.org/docs/current/wal-configuration.html.

Did this answer your question?