Avamar Server Won't Backup
Closed     Case # 10055     Affiliated Job:  New Trier Township District 2031
Opened:  Tuesday, September 13, 2011     Closed:  Tuesday, September 13, 2011
Total Hit Count:  5404     Last Hit:  Wednesday, October 18, 2017 2:32:30 PM
Unique Hit Count:  2267     Last Unique Hit:  Wednesday, October 18, 2017 2:32:30 PM
Case Type(s):  Server, Vendor Support
Case Notes(s):  All cases are posted for review purposes only. Any implementations should be performed at your own risk.

Problem:
Our Avamar grid appeared to stall for us over a weekend while nothing on our network had changed. Jobs initiated via a schedule had been running for 30 hours straight despite the 12:00 noon blackout window which should have timed-out the jobs but didn't. I proceeded to cancel these jobs and test manual runs (both image level and agent levels) which simply remained in a "waiting" status, allowed a schedule job to begin which never did. In further review I noticed that under "Server" - "Active Sessions" - there were 19 even though from my perspective nothing was running under the "Activity" screen. Additionally, later I received a notice that a recent checkpoint had not run. I contacted EMC support.

Resolution:
It turned out there were stalled sessions in a "CLOSE_WAIT" status. They performed the commands in a putty session below to diagnose:

netstat -alp | grep CLOSE
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)
tcp 1 0 avamar-u.nths.net:28001 avamar-vm-01.domain.local:50441 CLOSE_WAIT 19222/java
tcp 138 0 avamar-u.nths.net:28001 avamar-vm-02.domain.local:46636 CLOSE_WAIT 19222/java ...

... The above demonstrated these "CLOSE_WAIT" - there were 38 of these

The Technician continued by running a manual checkpoint and once this completed, he proceeded by restarting the MCS service:

dpnctl stop mcs
dpnctl start mcs
dpnctl start sched
dpnctl start maint
dpnctl status

dpnctl: INFO: gsan status: ready
dpnctl: INFO: MCS status: up.
dpnctl: INFO: EMS status: up.
dpnctl: INFO: Backup scheduler status: up.
dpnctl: INFO: dtlt status: up.
dpnctl: INFO: axionfs status: up.
dpnctl: INFO: Maintenance windows scheduler status: enabled.
dpnctl: INFO: Maintenance cron jobs status: enabled.
dpnctl: INFO: Unattended startup status: disabled.


This indeed resolved our issue, the next evenings backups proceeded without a hitch.



Profile IMG: Footer Left Profile IMG: Footer Right