Sharing

2013年8月29日 星期四

DNS lookup for Mail Exchange

原來 host 的功能不只有反查 ip 的 hostname

https://kb.mediatemple.net/questions/791/DNS+Explained

$ host -t MX mydomain.com
mydomain.com mail is handled by 10 inboundmx.mydomain.com.


2013年8月27日 星期二

NMON for Linux

最近又玩到一個新工具, 叫 NMON

http://nmon.sourceforge.net/pmwiki.php

只要輸入 nmon, 就可以進入互動式的介面, 即時看到系統大部份的狀況, 對於系統 administrator 是個超好用的工具. 除此之外, 他會定期把 data 紀錄到 /var/log/nmon, 如果當有問題發生時, 當場可能因為系統當機而不得不重開機的話, 事後還可以利用這些 data 來觀察當時到底發生什麼事

http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/nmon_analyser

nmon analyser 是個利用 Excel 做出來的工具, 只要你把這些 .nmon 檔丟進這個工具, 他就可以替你進行分析, 讓你快速找出當時 CPU 的狀況, 那一個 process 吃掉最多 memory.. 等等, 非常方便


2013年8月25日 星期日

How to know if running in virtual machine


有個好用的小工具可以

$ sudo apt-get install virt-what


# 如果是在 virtualbox 內
$ sudo virt-what
virtualbox

# 如果是在 vmware 內
$ sudo virt-what
vmware

# 如果是 Physical Machine, 就沒有任何 output
$ virt-what

2013年8月19日 星期一

Duplicate definition found in icinga/nagios

最近在使用 icinga 來 monitor 系統,希望所有的 service 都儘量能重覆使用,也希望加入的 node 能設定最少的項目,卻能直接有完整的 monitoring 項目。所以會使用大量的 Object Inheritance
http://docs.icinga.org/latest/en/objectinheritance.html

有一篇文章也介紹了怎麼利用 Object Inheritance 和 Hostgroup
http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/

Inheritance service from hostgroup

最常見的例子是對 hostgroup 設置 service,一但新的 host 加入 hostgroup, 就會直接對這個 host 加上 monitoring service

define hostgroup {
        hostgroup_name  linux-servers
        alias           linux-servers
}

define service {
        use                             generic-service
        hostgroup_name                  linux-servers
        service_description             PING
        check_command                   check_ping!200.0,20%!500.0,60%
}

define host {
        use             generic-server
        host_name       test-server1
        hostgroups      linux-servers
        address         192.168.100.100
}


Overwrite service from hostgroup for single host

但有時會遇到加入的 host 不想要延用原來 service 的參數, 比方說 check_ping 第一個參數改成 100.0

define host {
        use             generic-server
        host_name       test-server2
        hostgroups      linux-servers
        address         192.168.100.100
}

define service {
        use                        generic-service
        host_name                  test-server2
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}

這個時候,雖然 icinga 會吐出 ”Warning: Duplicate definition found for service“ 的訊息,但單一 host 的設定會 overwrite hostgroup 的設定, 所以沒問題


Overwrite service from hostgroup for a hostgroup (FAIL)

但最近我卻遇到了無法 overwrite 的怪問題,我的設定如下

define host {
        use             generic-server
        host_name       web-server1
        hostgroups      linux-servers, web-servers
        address         192.168.100.100
}

define host {
        use             generic-server
        host_name       web-server2
        hostgroups      linux-servers, web-servers
        address         192.168.100.101
}

define service {
        use                        generic-service
        hostgroup_name             web-servers
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}


我希望一般的 server, ping 的第一個參數是 200.0, 但屬於 web-server 的部份是 100.0
但出來的結果很奇妙,只有其中一台 web server 是用新的參數

$ grep PING -A 2 -B 1 /var/cache/icinga/objects.cache 
        host_name       web-server1
        service_description     PING
        check_command   check_ping!100.0,20%!500.0,60%
--
        host_name       web-server2
        service_description     PING
        check_command   check_ping!200.0,20%!500.0,60%


Icinga source code Analysis

所以我就很好奇 icinga 對於 Duplication definition 的處理到底為何,原本以為只要在子目錄的設定一定會蓋掉母目錄的設定,但看起來不是這麼一回事,研究了半天,還去挖 source code 來看

parse file order


icinga-core/xdata/xdotemplate.c, loading config 的順序是 DFS
/* process all files in a specific config directory */
int xodtemplate_process_config_dir(char *dirname, int options) {
 /* process all files in the directory... */
 while ((dirfile = readdir(dirp)) != NULL) {

  case S_IFREG:
   /* process the config file */
   result = xodtemplate_process_config_file(file, options);
   break;

  case S_IFDIR:
   /* recurse into subdirectories... */
   result = xodtemplate_process_config_dir(file, options);
   break;

  default:
   /* everything else we ignore */
   break;
  }
 }
}

不過有趣的是 readdir 是沒有排序的,同樣的檔案結構在不同機器可能會得到不一樣的結果
http://www.wretch.cc/blog/awaysu/24060729
http://stackoverflow.com/questions/8977441/does-readdir-guarantee-an-order

root@ops-buildmonitor1:/etc/icinga/conf.d# ls -fl
total 72
-rw-r--r-- 1 root root    3515 Aug 16 18:05 common-commands.cfg
-rw-r--r-- 1 root root    1630 Jun 14 15:15 common-timeperiods.cfg
-rw-r--r-- 1 root root    1514 Aug 14 09:11 common-hostgroups.cfg
-rw-r----- 1 root nagios  3075 Jun 14 15:15 common-contacts.cfg
-rw-r----- 1 root nagios 10596 Aug 19 06:47 common-services.cfg
drwxr-xr-x 5 root root    4096 Aug 19 08:18 .
drwxr-xr-x 2 root root    4096 Aug 19 08:20 hosts
drwxr-xr-x 3 root root    4096 Aug 19 04:46 safesync
-rw-r--r-- 1 root root    5043 Aug 16 05:23 services.cfg
drwxr-xr-x 8 root root    4096 Aug 19 04:46 ..
-rw-r--r-- 1 root root    2242 Jun 14 15:15 generic.cfg
-rw-r--r-- 1 root root     221 Jun 14 15:15 smokeping-services.cfg
drwxr-xr-x 2 root root    4096 Jul 31 02:57 eventhandlers
-rw-r--r-- 1 root root    6122 Aug 13 13:06 common-hosts.cfg


Service Object Generation


xdata/xodtemplate.c, 下面是我的筆記,有點亂,希望看的懂,看不懂的話就看結論好了。
xodtemplate_read_config_data
       xodtemplate_process_config_dir
       xodtemplate_process_config_file
             xodtemplate_add_object_property
                 case XODTEMPLATE_SERVICE:
                        #register service into xodtemplate_service_list
                        xod_begin_def(service);

        xodtemplate_duplicate_services
        1) expand hostgroup and host
            temp_memberlist = xodtemplate_expand_hostgroups_and_hosts(temp_service->hostgroup_name, temp_service->host_name, temp_service->_config_file, temp_service->_start_line);
        2) add into xodtemplate_service_list
             a) first member, use old memory space
                 /* if this is the first duplication, use the existing entry */
             b) other member, use new memory space and add into xodtemplate_service_list at tail
                 result = xodtemplate_duplicate_service(temp_service, this_memberlist->name1);

                 ex: service A (group A) -> service B (group B)
                       => service A (A1) -> service B(B1) -> service A (A2) -> service A (A3) -> service B(B2)

        3) create xobject_skiplist
              a) move single host service into xobject_skiplist and check duplication
              b) move hostgroup service into xobject_skiplist and check duplication

Conclusion

  • Service for single host vs single host
    • In same file, latter one win
    • In differnt file, check the file loading order, but it is very unsafety, it may change when the file modified.
  • Service for single host vs hostgroup
    • configuration for single host win, file loading order is no different
  • Service for hostgroup vs hostgroup
    • no one win, the configuration will become a mess, no matter which file is loading first.
  • Do not setup two hosts in the same "define". Please separate them. Refer to the tricky note for why.

Tricky Note


如果你預期在一個設置中同時讓兩個 host 改變原本 hostgroup 的設定,那可能要失望了,我個人覺得是個 bug.
第二個 host, 在程式中會被當成 hostgroup, 所以會變成 hostgroup vs hostgroup 打架,一切就混亂掉了。
所以 workaround 的方式是分開設定
define hostgroup {
        hostgroup_name  linux-servers
        alias           linux-servers
}

define service {
        use                             generic-service
        hostgroup_name                  linux-servers
        service_description             PING
        check_command                   check_ping!200.0,20%!500.0,60%
}

define host {
        use             generic-server
        host_name       test-server1
        hostgroups      linux-servers
        address         192.168.100.100
}

define host {
        use             generic-server
        host_name       test-server2
        hostgroups      linux-servers
        address         192.168.100.101
}

#### will make a mess ##################################################
#define service {
#        use                        generic-service
#        host_name                  test-server1,test-server2
#        service_description        PING
#        check_command              check_ping!100.0,20%!500.0,60%
#}
#

#### define them separately, inconvenience but works ####################
define service {
        use                        generic-service
        host_name                  test-server1
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}

define service {
        use                        generic-service
        host_name                  test-server2
        service_description        PING
        check_command              check_ping!100.0,20%!500.0,60%
}




Custom object variable

還有一種比較複雜的作法是利用 custom object variable 來做到差異化

http://docs.icinga.org/latest/en/customobjectvars.html
http://docs.icinga.org/latest/en/objectinheritance.html#objectinheritance-customobjectvariables

define hostgroup {
        hostgroup_name  linux-servers
        alias           linux-servers
}

define service {
        use                             generic-service
        hostgroup_name                  linux-servers
        service_description             PING
        check_command                   check_ping!$_HOSTPINGPARA$,20%!500.0,60%
}

define host {
        name            generic-linux-server
        hostgroups      linux-servers
        register        0
        _pingpara       200
}

define host {
        use             generic-linux-server
        host_name       test-server1
        hostgroups      linux-servers
        address         192.168.100.100
        _pingpara       100
}

define host {
        use             generic-server
        host_name       test-server2
        hostgroups      linux-servers
        address         192.168.100.101
        _pingprar       100
}


2013年8月15日 星期四

2013年8月14日 星期三

LSISAS1068E SAS Controller

今天在管理的機器有 HDD 出了問題,所以 HWRAID 就自動重 build. 因為好奇所以想進去看了一下RAID card 型號,結果發現是不認識的東西,用 megacli 也不能操作. 查了一下, 發現是很久以前的 LSI SAS CHIP 是 SATA 


$ lspci | grep LSI
01:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)


http://www.lsi.com/products/storagecomponents/Pages/LSISAS1068E.aspx
* 8-port SAS/SATA Controller
* Supports 1.5 and 3Gb/s SAS and SATA data transfer rates per port, full duplex
* Integrated RAID support (Only RAID0 and RAID1)


http://hwraid.le-vert.net/wiki/LSIFusionMPT
http://docs.oracle.com/cd/E19121-01/sf.x4100/819-1157-23/F_BIOS_RAID.html#0_77640
第一個操作的工具可以快速的看到目前 RAID 的狀態, 可以 state 是 "DEGRADED", 正常會是 "OPTIMAL"

$ mpt-status 
ioc0 vol_id 0 type IME, 8 phy, 1862 GB, state DEGRADED, flags ENABLED RESYNC_IN_PROGRESS
ioc0 phy 9 scsi_id 9 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags NONE
ioc0 phy 8 scsi_id 1 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags NONE
ioc0 phy 7 scsi_id 2 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 8 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags OUT_OF_SYNC
ioc0 phy 5 scsi_id 3 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags NONE
ioc0 phy 4 scsi_id 5 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags NONE
ioc0 phy 3 scsi_id 4 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags NONE
ioc0 phy 2 scsi_id 6 ATA      ST9500530NS      DA03, 465 GB, state ONLINE, flags NONE

如果要看更多的資訊,可以用 lsiutil, 操作上是採用 interactive mode.

$ lsiutil

LSI Logic MPT Configuration Utility, Version 1.62, January 14, 2009

1 MPT Port found

     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
 1.  /proc/mpt/ioc0    LSI Logic SAS1068E B3     105      011a0000     0

Select a device:  [1-1 or 0 to quit] 1

 1.  Identify firmware, BIOS, and/or FCode
 2.  Download firmware (update the FLASH)
 4.  Download/erase BIOS and/or FCode (update the FLASH)
 8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
 e   Enable expert mode in menus
 p   Enable paged mode
 w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 21

 1.  Show volumes
 2.  Show physical disks
 3.  Get volume state
 4.  Wait for volume resync to complete
23.  Replace physical disk
26.  Disable drive firmware update mode
27.  Enable drive firmware update mode
30.  Create volume
31.  Delete volume
32.  Change volume settings
33.  Change volume name
50.  Create hot spare
51.  Delete hot spare
99.  Reset port
 e   Enable expert mode in menus
 p   Enable paged mode
 w   Enable logging

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 3

Volume 0 is Bus 0 Target 0, Type IME (Integrated Mirroring Extended)

Volume 0 State:  degraded, enabled, resync in progress
Resync Progress:  total blocks 976562176, blocks remaining 55607808, 5%


最後可以看到正在做 Sync, 還剩 5% 就做完了

至於下一代的產品可以看這個連結
http://hwraid.le-vert.net/wiki/LSIFusionMPTSAS2
這邊可以看到有那些廠商在用這些 SAS CHIP
https://wiki.debian.org/LinuxRaidForAdmins

2013年8月13日 星期二

Netstat 筆記


Check the LISTEN port of TCP

$ sudo netstat -plnt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      1199/mysqld     
tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN      953/smbd        
tcp        0      0 0.0.0.0:110             0.0.0.0:*               LISTEN      1191/dovecot    
tcp        0      0 0.0.0.0:143             0.0.0.0:*               LISTEN      1191/dovecot    
tcp        0      0 127.0.0.1:80            0.0.0.0:*               LISTEN      18249/nginx     
tcp        0      0 10.1.192.180:22         0.0.0.0:*               LISTEN      32423/sshd      
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      2140/master     
tcp        0      0 10.1.192.180:443        0.0.0.0:*               LISTEN      18249/nginx     
tcp        0      0 0.0.0.0:445             0.0.0.0:*               LISTEN      953/smbd        
tcp        0      0 0.0.0.0:993             0.0.0.0:*               LISTEN      1191/dovecot    
tcp        0      0 10.1.192.180:5666       0.0.0.0:*               LISTEN      22129/nrpe      
tcp        0      0 0.0.0.0:995             0.0.0.0:*               LISTEN      1191/dovecot    
tcp6       0      0 :::139                  :::*                    LISTEN      953/smbd        
tcp6       0      0 :::110                  :::*                    LISTEN      1191/dovecot    
tcp6       0      0 :::143                  :::*                    LISTEN      1191/dovecot    
tcp6       0      0 ::1:25                  :::*                    LISTEN      2140/master     
tcp6       0      0 :::445                  :::*                    LISTEN      953/smbd        
tcp6       0      0 :::993                  :::*                    LISTEN      1191/dovecot    
tcp6       0      0 :::995                  :::*                    LISTEN      1191/dovecot    

看所有現在的 connection
$ netstat -n -a  | grep 8080
tcp        0      0 10.42.92.34:8080        0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:58080         0.0.0.0:*               LISTEN     
tcp        0      0 10.42.92.34:8080        10.42.92.8:51923        ESTABLISHED
tcp        0      0 10.42.92.34:8080        10.42.92.8:1896         ESTABLISHED
tcp        0      0 10.42.92.34:8080        10.42.92.8:63793        ESTABLISHED
tcp        0      0 10.42.92.34:8080        10.42.92.8:30276        ESTABLISHED
tcp        0      0 10.42.92.34:8080        10.42.92.8:47118        TIME_WAIT  
tcp        0      0 10.42.92.34:8080        10.42.92.8:29695        TIME_WAIT  
tcp        0      0 10.42.92.34:8080        10.42.92.8:4093         TIME_WAIT  
tcp        0      0 10.42.92.34:8080        10.42.92.8:37087        TIME_WAIT  
tcp        0      0 10.42.92.34:8080        10.42.92.201:50008      TIME_WAIT  

查 host name
$ host <ip>
$ nslookup <ip>
$ dig -x <ip>
$ arp -a | grep <ip>