last weekend was not very good weekend for me. sucks big time! i was oncall last week. there was no page at all during the week days. but on friday my bos told me that there will be a power outage in one of our DC this weekend. they want to do something with the power supply & will fail over the power to generators. my thinking, i should not be worried then since the power will still be there. but i was wrong!
on saturday i got paged as early as 7am. one by one our servers rebooting. login to office's vpn and looked at our monitoring tool. shit! all of our servers rebooted & a few of them still down including 4 out of 6 our cluster servers! i knew something was not right.

connected to console to check what wrong with the servers. some of them crashed & needed
fcsk, some of them keep on rebooting with
root_mount_not_found and some other weird errors that i never encountered before.

my boss called and told me to go to DC. he's coming too of cos. i was there till 10pm doing
fsck the servers. we managed to recover all except 1 server and continue on Sunday till 12pm.
there goes my weekend...

but out of it i learned a lot of things especially recovering
root filesystems on
solaris disk suite,
veritas volume manager &
veritas cluster server as well as the preasure behinds it when the big bos keeps asking when will the systems back online.

what went wrong was the generator failed!!!
fsck /dev/generator
p/s: in total i received 100 over paged
3 comments:
the weekend that u missed the webcam wt yr family
Haha..mmg hampes..aku keje morning time tu...time aku balik kul 3pm/2am Hou, Roxanne ngan Philip tgh keje lg. Sib baik esoknyer aku pg hehe training..selamat :p
ohhhh bila solaris run veritaS cluster dan shutdown tak dijangka... run> ha start pon takkan jalan...sebab SAN or NAS tak mount bebetul.. ohhhh aku pernah kena.. tak sanggupppppp
Post a Comment