Disk I/O is overloaded on a PostgreSQL server

It’s a couple weeks that Zabbix shows occasionally the message “Disk I/O is overloaded on db” on a PostgreSQL database server.  It showed the message for from a range of a few minutes to a few hours and then it vanished.

The other day, I checked all that VM’s graphs on Zabbix and then logged in to the VM to check its health status manually. Everything seemed OK. I checked PostgreSQL logs and it was OK too at the first sight.

As this problem didn’t have any impact on performance or business revenue and I had more important things to do, I didn’t spend a lot of time on that. Today, I made enough time to investigate it more.

As, this VM is dedicated to a Postgresql server and it does not server anything else and all items of operating system seemed OK, I went directly to PostgreSQL service! First of all, I realized, logging slow queries are not enable on this PostgreSQL server. So, I un-commented line containing log_min_duration_statement and gave it a value of 30,000. Note that that number is in milliseconds, so, PostgreSQL will log any query with a run time of more than 30 seconds.

Thanks to postgres, it didn’t need any restart and a reload was enough to sense recently enabled configuration. So I executed systemctl reload postgresql and tailed postgresql logs with tail -f /var/log/postgresql/postgresql-9.6-main.log command. I spent a few minutes there, but there was nothing to indicate what is wrong with it. I poured a cup of coffee, did a flow task and returned again to see if postgres logged anything related to the problem. I saw this:

2018-01-17 08:26:11.337 +0330 [19994] USER@DB_NAME ERROR:  integer out of range
2018-01-17 08:26:11.337 +0330 [19994] USER@DB_NAME STATEMENT:  INSERT INTO user_comments (delivery_id,services,timestamp) VALUES (2376584071,'{255}',NOW())

 

I figured out the problem. Once a customer tries to leave a comment, we will face the problem. We don’t need change type of delivery_id this time since we do not use it anymore in that way. I asked Abraham to completely remove the code related to user comments.

Hint: This database is a massive database with the size of ~660GB. A few months ago we have a serious problem on this DB server. We couldn’t insert any row on main table of this DB and after investigation, we found out that id column of that table was defined as integer and we met its limitations of about 2.1 billion. This DB’s tables are quite old and its original architect didn’t predict such a day!

 

We changed type of id column from integers to big integers with a lot of hassle and a few hours of unwanted down time.

P.S.

Name Storage Size Description Range
smallint 2 bytes small-range integer -32768 to +32767
integer 4 bytes typical choice for integer -2147483648 to +2147483647
bigint 8 bytes large-range integer -9223372036854775808 to +9223372036854775807
decimal variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numeric variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real 4 bytes variable-precision, inexact 6 decimal digits precision
double precision 8 bytes variable-precision, inexact 15 decimal digits precision
smallserial 2 bytes small autoincrementing integer 1 to 32767
serial 4 bytes autoincrementing integer 1 to 2147483647
bigserial 8 bytes large autoincrementing integer 1 to 9223372036854775807

Above table shows range of numeric types in PostgreSQL 9.6 borrowed from here.

Leave a Reply

Your email address will not be published. Required fields are marked *