最近在使用 icinga 來 monitor 系統,希望所有的 service 都儘量能重覆使用,也希望加入的 node 能設定最少的項目,卻能直接有完整的 monitoring 項目。所以會使用大量的 Object Inheritance
http://docs.icinga.org/latest/en/objectinheritance.html
有一篇文章也介紹了怎麼利用 Object Inheritance 和 Hostgroup
http://www.standalone-sysadmin.com/blog/2009/07/nagios-config/
Inheritance service from hostgroup
最常見的例子是對 hostgroup 設置 service,一但新的 host 加入 hostgroup, 就會直接對這個 host 加上 monitoring service
define hostgroup {
hostgroup_name linux-servers
alias linux-servers
}
define service {
use generic-service
hostgroup_name linux-servers
service_description PING
check_command check_ping!200.0,20%!500.0,60%
}
define host {
use generic-server
host_name test-server1
hostgroups linux-servers
address 192.168.100.100
}
Overwrite service from hostgroup for single host
但有時會遇到加入的 host 不想要延用原來 service 的參數, 比方說 check_ping 第一個參數改成 100.0
define host {
use generic-server
host_name test-server2
hostgroups linux-servers
address 192.168.100.100
}
define service {
use generic-service
host_name test-server2
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
這個時候,雖然 icinga 會吐出 ”Warning: Duplicate definition found for service“ 的訊息,但單一 host 的設定會 overwrite hostgroup 的設定, 所以沒問題
Overwrite service from hostgroup for a hostgroup (FAIL)
但最近我卻遇到了無法 overwrite 的怪問題,我的設定如下
define host {
use generic-server
host_name web-server1
hostgroups linux-servers, web-servers
address 192.168.100.100
}
define host {
use generic-server
host_name web-server2
hostgroups linux-servers, web-servers
address 192.168.100.101
}
define service {
use generic-service
hostgroup_name web-servers
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
我希望一般的 server, ping 的第一個參數是 200.0, 但屬於 web-server 的部份是 100.0
但出來的結果很奇妙,只有其中一台 web server 是用新的參數
$ grep PING -A 2 -B 1 /var/cache/icinga/objects.cache
host_name web-server1
service_description PING
check_command check_ping!100.0,20%!500.0,60%
--
host_name web-server2
service_description PING
check_command check_ping!200.0,20%!500.0,60%
Icinga source code Analysis
所以我就很好奇 icinga 對於 Duplication definition 的處理到底為何,原本以為只要在子目錄的設定一定會蓋掉母目錄的設定,但看起來不是這麼一回事,研究了半天,還去挖 source code 來看
parse file order
icinga-core/xdata/xdotemplate.c, loading config 的順序是 DFS
/* process all files in a specific config directory */
int xodtemplate_process_config_dir(char *dirname, int options) {
/* process all files in the directory... */
while ((dirfile = readdir(dirp)) != NULL) {
case S_IFREG:
/* process the config file */
result = xodtemplate_process_config_file(file, options);
break;
case S_IFDIR:
/* recurse into subdirectories... */
result = xodtemplate_process_config_dir(file, options);
break;
default:
/* everything else we ignore */
break;
}
}
}
不過有趣的是 readdir 是沒有排序的,同樣的檔案結構在不同機器可能會得到不一樣的結果
http://www.wretch.cc/blog/awaysu/24060729
http://stackoverflow.com/questions/8977441/does-readdir-guarantee-an-order
root@ops-buildmonitor1:/etc/icinga/conf.d# ls -fl
total 72
-rw-r--r-- 1 root root 3515 Aug 16 18:05 common-commands.cfg
-rw-r--r-- 1 root root 1630 Jun 14 15:15 common-timeperiods.cfg
-rw-r--r-- 1 root root 1514 Aug 14 09:11 common-hostgroups.cfg
-rw-r----- 1 root nagios 3075 Jun 14 15:15 common-contacts.cfg
-rw-r----- 1 root nagios 10596 Aug 19 06:47 common-services.cfg
drwxr-xr-x 5 root root 4096 Aug 19 08:18 .
drwxr-xr-x 2 root root 4096 Aug 19 08:20 hosts
drwxr-xr-x 3 root root 4096 Aug 19 04:46 safesync
-rw-r--r-- 1 root root 5043 Aug 16 05:23 services.cfg
drwxr-xr-x 8 root root 4096 Aug 19 04:46 ..
-rw-r--r-- 1 root root 2242 Jun 14 15:15 generic.cfg
-rw-r--r-- 1 root root 221 Jun 14 15:15 smokeping-services.cfg
drwxr-xr-x 2 root root 4096 Jul 31 02:57 eventhandlers
-rw-r--r-- 1 root root 6122 Aug 13 13:06 common-hosts.cfg
Service Object Generation
xdata/xodtemplate.c, 下面是我的筆記,有點亂,希望看的懂,看不懂的話就看結論好了。
xodtemplate_read_config_data
xodtemplate_process_config_dir
xodtemplate_process_config_file
xodtemplate_add_object_property
case XODTEMPLATE_SERVICE:
#register service into xodtemplate_service_list
xod_begin_def(service);
xodtemplate_duplicate_services
1) expand hostgroup and host
temp_memberlist = xodtemplate_expand_hostgroups_and_hosts(temp_service->hostgroup_name, temp_service->host_name, temp_service->_config_file, temp_service->_start_line);
2) add into xodtemplate_service_list
a) first member, use old memory space
/* if this is the first duplication, use the existing entry */
b) other member, use new memory space and add into xodtemplate_service_list at tail
result = xodtemplate_duplicate_service(temp_service, this_memberlist->name1);
ex: service A (group A) -> service B (group B)
=> service A (A1) -> service B(B1) -> service A (A2) -> service A (A3) -> service B(B2)
3) create xobject_skiplist
a) move single host service into xobject_skiplist and check duplication
b) move hostgroup service into xobject_skiplist and check duplication
Conclusion
- Service for single host vs single host
- In same file, latter one win
- In differnt file, check the file loading order, but it is very unsafety, it may change when the file modified.
- Service for single host vs hostgroup
- configuration for single host win, file loading order is no different
- Service for hostgroup vs hostgroup
- no one win, the configuration will become a mess, no matter which file is loading first.
- Do not setup two hosts in the same "define". Please separate them. Refer to the tricky note for why.
Tricky Note
如果你預期在一個設置中同時讓兩個 host 改變原本 hostgroup 的設定,那可能要失望了,我個人覺得是個 bug.
第二個 host, 在程式中會被當成 hostgroup, 所以會變成 hostgroup vs hostgroup 打架,一切就混亂掉了。
所以 workaround 的方式是分開設定
define hostgroup {
hostgroup_name linux-servers
alias linux-servers
}
define service {
use generic-service
hostgroup_name linux-servers
service_description PING
check_command check_ping!200.0,20%!500.0,60%
}
define host {
use generic-server
host_name test-server1
hostgroups linux-servers
address 192.168.100.100
}
define host {
use generic-server
host_name test-server2
hostgroups linux-servers
address 192.168.100.101
}
#### will make a mess ##################################################
#define service {
# use generic-service
# host_name test-server1,test-server2
# service_description PING
# check_command check_ping!100.0,20%!500.0,60%
#}
#
#### define them separately, inconvenience but works ####################
define service {
use generic-service
host_name test-server1
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service {
use generic-service
host_name test-server2
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
Custom object variable
還有一種比較複雜的作法是利用 custom object variable 來做到差異化
http://docs.icinga.org/latest/en/customobjectvars.html
http://docs.icinga.org/latest/en/objectinheritance.html#objectinheritance-customobjectvariables
define hostgroup {
hostgroup_name linux-servers
alias linux-servers
}
define service {
use generic-service
hostgroup_name linux-servers
service_description PING
check_command check_ping!$_HOSTPINGPARA$,20%!500.0,60%
}
define host {
name generic-linux-server
hostgroups linux-servers
register 0
_pingpara 200
}
define host {
use generic-linux-server
host_name test-server1
hostgroups linux-servers
address 192.168.100.100
_pingpara 100
}
define host {
use generic-server
host_name test-server2
hostgroups linux-servers
address 192.168.100.101
_pingprar 100
}