big admin big daddy: solaris

Showing posts with label solaris. Show all posts

Thursday, February 28, 2008

housekeeping core files

since i've enabled the coreadm for all my solaris servers, i keep getting alerts on file systems full cause of fill up by those core files.

# coreadm
global core file pattern: /var/cores/%f.%n.%p.core
init core file pattern: core
global core dumps: enabled
per-process core dumps: enabled
global setid core dumps: enabled
per-process setid core dumps: enabled
global core dump logging: enabled

so i came out with this 1 liner. put in the crontab. problem solved ;)

# cd /var/cores; find . -type f -name \*core ! -name \*.gz -mtime +1 -exec gzip {} \;

find . - find files in the current directory
-type f - find only file type, not directory or link
-name \*core - find only *core filename
! -name \*.gz - skip the .gz files
-mtime +1 - files before today's date
-exec gzip {} - gzip the files

go here on how to enable the core dump: http://docs.sun.com/app/docs/doc/820-0434/6nc63qo4r?a=view

Thursday, November 22, 2007

fsck /dev/oncall

last weekend was not very good weekend for me. sucks big time! i was oncall last week. there was no page at all during the week days. but on friday my bos told me that there will be a power outage in one of our DC this weekend. they want to do something with the power supply & will fail over the power to generators. my thinking, i should not be worried then since the power will still be there. but i was wrong!

on saturday i got paged as early as 7am. one by one our servers rebooting. login to office's vpn and looked at our monitoring tool. shit! all of our servers rebooted & a few of them still down including 4 out of 6 our cluster servers! i knew something was not right.

connected to console to check what wrong with the servers. some of them crashed & needed fcsk, some of them keep on rebooting with root_mount_not_found and some other weird errors that i never encountered before.

my boss called and told me to go to DC. he's coming too of cos. i was there till 10pm doing fsck the servers. we managed to recover all except 1 server and continue on Sunday till 12pm.
there goes my weekend...

but out of it i learned a lot of things especially recovering root filesystems on solaris disk suite, veritas volume manager & veritas cluster server as well as the preasure behinds it when the big bos keeps asking when will the systems back online.

what went wrong was the generator failed!!! fsck /dev/generator

p/s: in total i received 100 over paged

Thursday, May 25, 2006

equivalent ethtool for solaris

if you are looking for equivalent ethtool for solaris, the answer is ndd
ndd - get and set driver configuration parameters

to get your network interface card (eg: hme0) info:

# ndd -get /dev/hme link_status
# ndd -get /dev/hme link_speed
# ndd -get /dev/hme link_mode

link_status
0 for Link Down
1 for Link up

link_speed
0 for 10 Mbps
1 for 100 Mbps

link_mode
0 for Half-Duplex mode
1 for Full-Duplex mode

and of course for more info:

# man ndd

change hostname - solaris vs linux

in order to change your machine's hostname, you will need to edit some files.

for solaris:
there are 6 files you need to edit:

/etc/hosts
/etc/nodename
/etc/hostname.network_interface (e.g hostname.hme0)
/etc/net/*/hosts (3 hosts files)

but for linux:
only 1 file

/etc/sysconfig/network

for temporary hostname (i'm not sure why you need it temporary), you can use command:

hostname new_hostname

but if you want it permanent, you need to reboot the system after changing all the necessary files.