Context: we are using qfsadmin -s $ip -p $port ping to collect metrics from our QFS cluster.
One metric we use is the ncorrupt counter. When it's not 0, we get an alert to check the disk of the particular chunkserver.
E.g. s=REDUCTED, p=REDUCTED, rack=REDUCTED, used=28464644933285, free=23535571826431, total=53541442322432, util=56.04, nblocks=437542, lastheard=0, ncorrupt=65, nchunksToMove=0, numDrives=6, numWritableDrives=6, overloaded=0, numReplications=0, numReadReplications=0, good=1, nevacuate=0, bytesevacuate=0, nlost=0, nwrites=40, load=0, md5sum=a95d6ff5740cb73bd29d8330233c40ff, replay=0, connected=1, stopped=0, chunks=437552, tiers=10:1:19:1482:2.37e+12:3.94e+12:39.76;15:5:23:436070:2.12e+13:4.96e+13:57.34, lostChunkDirs=
Our problem is that the ncorrupt counter doesn't reset to 0 when the disk issue is fixed, until we restart the corresponding chunkserver. If we don't restart the chunkserver, the ncorrupt counter stays the same.
Is this a feature or a bug?
If this is intended, we'll need to resolve it on our end, but I though it's worth a shot asking.
Context: we are using
qfsadmin -s $ip -p $port pingto collect metrics from our QFS cluster.One metric we use is the
ncorruptcounter. When it's not 0, we get an alert to check the disk of the particular chunkserver.E.g.
s=REDUCTED, p=REDUCTED, rack=REDUCTED, used=28464644933285, free=23535571826431, total=53541442322432, util=56.04, nblocks=437542, lastheard=0, ncorrupt=65, nchunksToMove=0, numDrives=6, numWritableDrives=6, overloaded=0, numReplications=0, numReadReplications=0, good=1, nevacuate=0, bytesevacuate=0, nlost=0, nwrites=40, load=0, md5sum=a95d6ff5740cb73bd29d8330233c40ff, replay=0, connected=1, stopped=0, chunks=437552, tiers=10:1:19:1482:2.37e+12:3.94e+12:39.76;15:5:23:436070:2.12e+13:4.96e+13:57.34, lostChunkDirs=Our problem is that the
ncorruptcounter doesn't reset to 0 when the disk issue is fixed, until we restart the corresponding chunkserver. If we don't restart the chunkserver, thencorruptcounter stays the same.Is this a feature or a bug?
If this is intended, we'll need to resolve it on our end, but I though it's worth a shot asking.