0018138: /proc/diskstats provide wrong numbers for md devices

0018138: /proc/diskstats provide wrong numbers for md devices

Hello. <br />
<br />
Starting from 4th of February I’ve received a lot of alerts regarding 100% utilization of /dev/md[0-9] devices in my systems from iostat which is backed by /proc/diskstats. We are using CentOS7 and with the latest kernel version. So, 4th of February is very likely to be a date for "3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021". Downgrading to 11.1 fixes and I can’t see this issue on that version.<br />
<br />
Here is the sample output from the real server:<br />
<br />
$ > iostat -xy 1 1<br />
Linux 3.10.0-1160.21.1.el7.x86_64 04/02/2021 _x86_64_ (32 CPU)<br />
<br />
avg-cpu: %user %nice %system %iowait %steal %idle<br />
2.92 0.00 1.44 0.03 0.00 95.61<br />
<br />
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util<br />
sda 0.00 33.00 0.00 48.00 0.00 568.00 23.67 0.01 0.12 0.00 0.12 0.12 0.60<br />
sdb 0.00 0.00 0.00 51.00 0.00 448.00 17.57 0.00 0.08 0.00 0.08 0.08 0.40<br />
md2 0.00 0.00 0.00 225.00 0.00 1016.00 9.03 1970189.56 0.00 0.00 0.00 4.45 100.10<br />
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 304.30 0.00 0.00 0.00 0.00 100.10<br />
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 367.00 0.00 0.00 0.00 0.00 100.00<br />
<br />
As you can see abnormal high values for avgqu-sz and %util is 100% or even higher… Which just doesn’t make sense. In the changelog of 15.2 kernel version I’ve found this:<br />
<br />
Mon Dec 14 13:00:00 2020 Augusto Caringi <<a href="mailto:acaringi@redhat.com">acaringi@redhat.com</a>> [3.10.0-1160.13.1.el7]<br />
– [s390] zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl (Philipp Rudo) [1896826]<br />
– [block] block/diskstats: more accurate approximation of io_ticks for slow disks (Ming Lei) [1859364]<br />
– [block] block: delete part_round_stats and switch to less precise counting (Ming Lei) [1859364]<br />
– [md] dm: simplify start of block stats accounting for bio-based (Ming Lei) [1859364]<br />
– [block] block/rsxx: use generic io stats accounting functions to simplify io stat accounting (Ming Lei) [1859364]<br />
– [block] drbd: use generic io stats accounting functions to simplify io stat accounting (Ming Lei) [1859364]<br />
– [md] md: use generic io stats accounting functions to simplify io stat accounting (Ming Lei) [1859364]<br />
<br />
Which can be related to this bug. I also tried to install clean CentOS 7 and it’s very easy to reproduce this bug.

* This article was originally published here

Leave a Reply

Your email address will not be published. Required fields are marked *