Sharing

顯示具有 Ceph 標籤的文章。 顯示所有文章
顯示具有 Ceph 標籤的文章。 顯示所有文章

2012年10月2日 星期二

Openstack Folsom - Installation of Cinder with Ceph

Openstack Folsom Release

前幾天 Openstack Folsom 正式 Release
Release Software Site
http://www.openstack.org/software/folsom/
Release Note
http://wiki.openstack.org/ReleaseNotes/Folsom
Architecture
http://ken.pepple.info/openstack/2012/09/25/openstack-folsom-architecture/

這次的 Release 多了 Quantum 及 Cinder 兩個 project. Quantum 是將 Open vSwitch 整合進 Openstack, 加強了原本 Openstack 對於網路虛擬化不足的地方. Cinder 則是將原來 nova-volume 拆出來變成一個獨立的 module, 我想原因不外乎背後有太多 Storage 的大廠想要整合進 Openstack, ex: nexenta、netapp... 所以把這塊的 API 獨立出來, 和各家整合才會順利.

PPA of openstack testing
root@ubuntu12:~$ apt-get install -y python-software-properties
root@ubuntu12:~$ add-apt-repository ppa:openstack-ubuntu-testing/folsom-trunk-testing
root@ubuntu12:~$ add-apt-repository ppa:openstack-ubuntu-testing/folsom-deps-staging
root@ubuntu12:~$ apt-get update && apt-get -y dist-upgrade


PPA of Ubuntu Cloud
root@ubuntu12:~$ add-apt-repository ppa:ubuntu-cloud-archive/folsom-staging
You are about to add the following PPA to your system:

 More info: https://launchpad.net/~ubuntu-cloud-archive/+archive/folsom-staging
Press [ENTER] to continue or ctrl-c to cancel adding it

gpg: keyring `/tmp/tmpdzKHU_/secring.gpg' created
gpg: keyring `/tmp/tmpdzKHU_/pubring.gpg' created
gpg: requesting key 9F68104E from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpdzKHU_/trustdb.gpg: trustdb created
gpg: key 9F68104E: public key "Launchpad PPA for Ubuntu Cloud Archive Team" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
OK

Openstack 的更新速度會比 Ubuntu Cloud 來的快, 以下的流程是以 Opentack 的 PPA 來進行

Installation

這次我先從 Cinder 下手, 並且目標是要把它和 Ceph 整合在一起. 原本想要單獨裝 Cinder 就好, 但研究 Cinder Source Code 後, 發現他目前不支援 noauth 的模式, 所以就一定要裝 Keystone, 所以整體來說重要的 Component 有: Cinder + Keystone + MySQL + CEPH

首先先裝基本的套件, MySQL 及 RabbitMQ

MySQL Installation

root@ubuntu12:~$ apt-get -y install mysql-server python-mysqldb
root@ubuntu12:~$ sed -i 's/127.0.0.1/0.0.0.0/g' /etc/mysql/my.cnf
root@ubuntu12:~$ service mysql restart
root@ubuntu12:~$ mysql -u root -ppassword
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 36
Server version: 5.5.24-0ubuntu0.12.04.1 (Ubuntu)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE DATABASE cinder;
Query OK, 1 row affected (0.01 sec)

mysql> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON cinder.* TO 'cinder'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE DATABASE keystone;
Query OK, 1 row affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)


RabbitMQ Installation

root@ubuntu12:~$ sudo apt-get -y install rabbitmq-server
root@ubuntu12:~$ rabbitmqctl change_password guest password
Changing password for user "guest" ...
...done.

Keystone Installation

root@ubuntu12:~$ apt-get -y install keystone python-keystone python-keystoneclient
root@ubuntu12:~$ dpkg -l | grep keystone
ii  keystone                                        2012.2+git201209252030~precise-0ubuntu1     OpenStack identity service - Daemons
ii  python-keystone                                 2012.2+git201209252030~precise-0ubuntu1     OpenStack identity service - Python library
ii  python-keystoneclient                           1:0.1.3.19+git201210011900~precise-0ubuntu1 Client libary for Openstack Keystone API

# 修改 keyston.conf 的內容, 隨機取一個 admin_token
root@ubuntu12:~$ cat /etc/keystone/keystone.conf                                                              [DEFAULT]
# A "shared secret" between keystone and other openstack services
admin_token = password

# The IP address of the network interface to listen on
bind_host = 0.0.0.0

# The port number which the public service listens on
public_port = 5000

# The port number which the public admin listens on
admin_port = 35357

# The port number which the OpenStack Compute service listens on
compute_port = 8774

# === Logging Options ===
# Print debugging output
verbose = True

# Print more verbose output
# (includes plaintext request logging, potentially including passwords)
debug = True

...

[sql]
# The SQLAlchemy connection string used to connect to the database
# connection = sqlite:////var/lib/keystone/keystone.db
connection = mysql://keystone:password@localhost:3306/keystone

# the timeout before idle sql connections are reaped
idle_timeout = 200
...

root@ubuntu12:~$ service keystone restart
root@ubuntu12:~$ keystone-manage db_sync

# 編輯一個檔, 設定一些等下會用到的區域變數, 並且匯到 .bashrc 之中, 下次再進來就不需要重新設定
root@ubuntu12:~# cat novarc
export OS_TENANT_NAME=admin
export OS_USERNAME=admin
export OS_PASSWORD=password
export OS_AUTH_URL="http://localhost:5000/v2.0/"
export SERVICE_ENDPOINT="http://localhost:35357/v2.0"
export SERVICE_TOKEN=password
root@ubuntu12:~$ source novarc
root@ubuntu12:~$ echo "source novarc">>.bashrc

接下來下載兩個網路上別人提供好的 script, 裡面預設的 token 也是 "password", 如果你要使用別的 token, 請記得要修改 script 的內容
root@ubuntu12:~$ wget https://raw.github.com/EmilienM/openstack-folsom-guide/master/scripts/keystone-data.sh
root@ubuntu12:~$ wget https://raw.github.com/EmilienM/openstack-folsom-guide/master/scripts/keystone-endpoints.sh
root@ubuntu12:~$ chmod a+x *.sh
# 記得把 keystone-endpoints.sh 內的 MASTER 改成你自己的 ip
root@ubuntu12:~$ less keystone-endpoints.sh
....
# other definitions
MASTER="172.17.123.13"
....

root@ubuntu12:~$ ./keystone-data.sh
root@ubuntu12:~$ ./keystone-endpoints.sh
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |    OpenStack Compute Service     |
|      id     | 597fff05550043efb530ab05fa85d818 |
|     name    |               nova               |
|     type    |             compute              |
+-------------+----------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |     OpenStack Volume Service     |
|      id     | 27e3539fea104d159fcc7ec9766ac8b3 |
|     name    |              cinder              |
|     type    |              volume              |
+-------------+----------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |     OpenStack Image Service      |
|      id     | b7047156ef81464b8c6754fc7994ecea |
|     name    |              glance              |
|     type    |              image               |
+-------------+----------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |    OpenStack Storage Service     |
|      id     | 86e08436f65944be8ab0e23657a9d3e2 |
|     name    |              swift               |
|     type    |           object-store           |
+-------------+----------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |        OpenStack Identity        |
|      id     | 4ec3ab138f2a4bd2800b5a4a5d407ef1 |
|     name    |             keystone             |
|     type    |             identity             |
+-------------+----------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |      OpenStack EC2 service       |
|      id     | 15bb5a5a0c3d446e8b3bfa293df3b10e |
|     name    |               ec2                |
|     type    |               ec2                |
+-------------+----------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |   OpenStack Networking service   |
|      id     | cacfce49cb2141a1ac48b3e31cef5c01 |
|     name    |             quantum              |
|     type    |             network              |
+-------------+----------------------------------+
+-------------+--------------------------------------------+
|   Property  |                   Value                    |
+-------------+--------------------------------------------+
|   adminurl  | http://172.17.123.13:8774/v2/$(tenant_id)s |
|      id     |      090f849da5a94dcb817aa340b39eb83c      |
| internalurl | http://172.17.123.13:8774/v2/$(tenant_id)s |
|  publicurl  | http://172.17.123.13:8774/v2/$(tenant_id)s |
|    region   |                 RegionOne                  |
|  service_id |      597fff05550043efb530ab05fa85d818      |
+-------------+--------------------------------------------+
+-------------+--------------------------------------------+
|   Property  |                   Value                    |
+-------------+--------------------------------------------+
|   adminurl  | http://172.17.123.13:8776/v1/$(tenant_id)s |
|      id     |      6363131b18974040a3dd1276ddc2c72e      |
| internalurl | http://172.17.123.13:8776/v1/$(tenant_id)s |
|  publicurl  | http://172.17.123.13:8776/v1/$(tenant_id)s |
|    region   |                 RegionOne                  |
|  service_id |      27e3539fea104d159fcc7ec9766ac8b3      |
+-------------+--------------------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
|   adminurl  |   http://172.17.123.13:9292/v2   |
|      id     | 1db94d180acb4d64adab45c3116812cb |
| internalurl |   http://172.17.123.13:9292/v2   |
|  publicurl  |   http://172.17.123.13:9292/v2   |
|    region   |            RegionOne             |
|  service_id | b7047156ef81464b8c6754fc7994ecea |
+-------------+----------------------------------+
+-------------+-------------------------------------------------+
|   Property  |                      Value                      |
+-------------+-------------------------------------------------+
|   adminurl  |           http://172.17.123.13:8080/v1          |
|      id     |         b9006b5f4fc64e4d854c171ea157b7b4        |
| internalurl | http://172.17.123.13:8080/v1/AUTH_$(tenant_id)s |
|  publicurl  | http://172.17.123.13:8080/v1/AUTH_$(tenant_id)s |
|    region   |                    RegionOne                    |
|  service_id |         86e08436f65944be8ab0e23657a9d3e2        |
+-------------+-------------------------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
|   adminurl  | http://172.17.123.13:35357/v2.0  |
|      id     | faf6ca350f4842799b3801d0b7571a59 |
| internalurl |  http://172.17.123.13:5000/v2.0  |
|  publicurl  |  http://172.17.123.13:5000/v2.0  |
|    region   |            RegionOne             |
|  service_id | 4ec3ab138f2a4bd2800b5a4a5d407ef1 |
+-------------+----------------------------------+
+-------------+------------------------------------------+
|   Property  |                  Value                   |
+-------------+------------------------------------------+
|   adminurl  | http://172.17.123.13:8773/services/Admin |
|      id     |     da745295e7ef419682d45516456047c5     |
| internalurl | http://172.17.123.13:8773/services/Cloud |
|  publicurl  | http://172.17.123.13:8773/services/Cloud |
|    region   |                RegionOne                 |
|  service_id |     15bb5a5a0c3d446e8b3bfa293df3b10e     |
+-------------+------------------------------------------+
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
|   adminurl  |    http://172.17.123.13:9696/    |
|      id     | 8b570092f8614cba98fe227de6e65e27 |
| internalurl |    http://172.17.123.13:9696/    |
|  publicurl  |    http://172.17.123.13:9696/    |
|    region   |            RegionOne             |
|  service_id | cacfce49cb2141a1ac48b3e31cef5c01 |
+-------------+----------------------------------+

安裝好之後驗証一下

root@ubuntu12:~$ keystone endpoint-list
+----------------------------------+-----------+-------------------------------------------------+-------------------------------------------------+--------------------------------------------+
|                id                |   region  |                    publicurl                    |                   internalurl                   |                  adminurl                  |
+----------------------------------+-----------+-------------------------------------------------+-------------------------------------------------+--------------------------------------------+
| 090f849da5a94dcb817aa340b39eb83c | RegionOne |    http://172.17.123.13:8774/v2/$(tenant_id)s   |    http://172.17.123.13:8774/v2/$(tenant_id)s   | http://172.17.123.13:8774/v2/$(tenant_id)s |
| 1db94d180acb4d64adab45c3116812cb | RegionOne |           http://172.17.123.13:9292/v2          |           http://172.17.123.13:9292/v2          |        http://172.17.123.13:9292/v2        |
| 6363131b18974040a3dd1276ddc2c72e | RegionOne |    http://172.17.123.13:8776/v1/$(tenant_id)s   |    http://172.17.123.13:8776/v1/$(tenant_id)s   | http://172.17.123.13:8776/v1/$(tenant_id)s |
| 8b570092f8614cba98fe227de6e65e27 | RegionOne |            http://172.17.123.13:9696/           |            http://172.17.123.13:9696/           |         http://172.17.123.13:9696/         |
| b9006b5f4fc64e4d854c171ea157b7b4 | RegionOne | http://172.17.123.13:8080/v1/AUTH_$(tenant_id)s | http://172.17.123.13:8080/v1/AUTH_$(tenant_id)s |        http://172.17.123.13:8080/v1        |
| da745295e7ef419682d45516456047c5 | RegionOne |     http://172.17.123.13:8773/services/Cloud    |     http://172.17.123.13:8773/services/Cloud    |  http://172.17.123.13:8773/services/Admin  |
| faf6ca350f4842799b3801d0b7571a59 | RegionOne |          http://172.17.123.13:5000/v2.0         |          http://172.17.123.13:5000/v2.0         |      http://172.17.123.13:35357/v2.0       |
+----------------------------------+-----------+-------------------------------------------------+-------------------------------------------------+--------------------------------------------+
root@ubuntu12:~$ sudo apt-get install -y curl openssl
root@ubuntu12:~$ curl -d '{"auth": {"tenantName": "admin", "passwordCredentials":{"username": "admin", "password": "pass
word"}}}' -H "Content-type: application/json" http://172.17.123.13:35357/v2.0/tokens | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2946    0  2844  100   102  18698    670 --:--:-- --:--:-- --:--:-- 18834
{
    "access": {
        "metadata": {
            "is_admin": 0,
            "roles": [
                "5660593decab401bbc8f13aa8b19dc23",
                "833b1617cb404466ba8546c8194f8ad6",
                "7ec43588e3a14b69ae2278334bee1423"
            ]
        },
        "serviceCatalog": [
            {
                "endpoints": [
                    {
                        "adminURL": "http://172.17.123.13:8774/v2/4b310104ee3345fd988fe16dd2f1f79d",
                        "id": "090f849da5a94dcb817aa340b39eb83c",
                        "internalURL": "http://172.17.123.13:8774/v2/4b310104ee3345fd988fe16dd2f1f79d",
                        "publicURL": "http://172.17.123.13:8774/v2/4b310104ee3345fd988fe16dd2f1f79d",
                        "region": "RegionOne"
                    }
                ],
                "endpoints_links": [],
                "name": "nova",
                "type": "compute"
            },
            {
                "endpoints": [
                    {
                        "adminURL": "http://172.17.123.13:9696/",
                        "id": "8b570092f8614cba98fe227de6e65e27",
                        "internalURL": "http://172.17.123.13:9696/",
                        "publicURL": "http://172.17.123.13:9696/",
                        "region": "RegionOne"
                    }
                ],
                "endpoints_links": [],
                "name": "quantum",
                "type": "network"
            },
            {
                "endpoints": [
                    {
                        "adminURL": "http://172.17.123.13:9292/v2",
                        "id": "1db94d180acb4d64adab45c3116812cb",
                        "internalURL": "http://172.17.123.13:9292/v2",
                        "publicURL": "http://172.17.123.13:9292/v2",
                        "region": "RegionOne"
                    }
                ],
                "endpoints_links": [],
                "name": "glance",
                "type": "image"
            },
            {
                "endpoints": [
                    {
                        "adminURL": "http://172.17.123.13:8776/v1/4b310104ee3345fd988fe16dd2f1f79d",
                        "id": "6363131b18974040a3dd1276ddc2c72e",
                        "internalURL": "http://172.17.123.13:8776/v1/4b310104ee3345fd988fe16dd2f1f79d",
                        "publicURL": "http://172.17.123.13:8776/v1/4b310104ee3345fd988fe16dd2f1f79d",
                        "region": "RegionOne"
                    }
                ],
                "endpoints_links": [],
                "name": "cinder",
                "type": "volume"
            },
            {
                "endpoints": [
                    {
                        "adminURL": "http://172.17.123.13:8773/services/Admin",
                        "id": "da745295e7ef419682d45516456047c5",
                        "internalURL": "http://172.17.123.13:8773/services/Cloud",
                        "publicURL": "http://172.17.123.13:8773/services/Cloud",
                        "region": "RegionOne"
                    }
                ],
                "endpoints_links": [],
                "name": "ec2",
                "type": "ec2"
            },
            {
                "endpoints": [
                    {
                        "adminURL": "http://172.17.123.13:8080/v1",
                        "id": "b9006b5f4fc64e4d854c171ea157b7b4",
                        "internalURL": "http://172.17.123.13:8080/v1/AUTH_4b310104ee3345fd988fe16dd2f1f79d",
                        "publicURL": "http://172.17.123.13:8080/v1/AUTH_4b310104ee3345fd988fe16dd2f1f79d",
                        "region": "RegionOne"
                    }
                ],
                "endpoints_links": [],
                "name": "swift",
                "type": "object-store"
            },
            {
                "endpoints": [
                    {
                        "adminURL": "http://172.17.123.13:35357/v2.0",
                        "id": "faf6ca350f4842799b3801d0b7571a59",
                        "internalURL": "http://172.17.123.13:5000/v2.0",
                        "publicURL": "http://172.17.123.13:5000/v2.0",
                        "region": "RegionOne"
                    }
                ],
                "endpoints_links": [],
                "name": "keystone",
                "type": "identity"
            }
        ],
        "token": {
            "expires": "2012-10-03T06:55:20Z",
            "id": "5f2797bdf8fd4380bfe919a05b01772e",
            "tenant": {
                "description": null,
                "enabled": true,
                "id": "4b310104ee3345fd988fe16dd2f1f79d",
                "name": "admin"
            }
        },
        "user": {
            "id": "c7f17dfd798242fc9065afd2ea251a6d",
            "name": "admin",
            "roles": [
                {
                    "name": "admin"
                },
                {
                    "name": "KeystoneAdmin"
                },
                {
                    "name": "KeystoneServiceAdmin"
                }
            ],
            "roles_links": [],
            "username": "admin"
        }
    }
}


CEPH Installation

root@ubuntu12:~$ wget -q -O - https://raw.github.com/ceph/ceph/master/keys/release.asc | apt-key add -
OK
# 手動增加一個 ceph.list 在 /etc/apt/sources.list.d 下
root@ubuntu12:/etc/apt/sources.list.d$ cat ceph.list
deb http://ceph.newdream.net/debian/ precise main
deb-src http://ceph.newdream.net/debian/ precise main
root@ubuntu12:~$ apt-get update
root@ubuntu12:~$ apt-get install -y ceph python-ceph
root@ubuntu12:~$ dpkg -l | grep ceph
ii  ceph                                            0.48.2argonaut-1precise                     distributed storage and file system
ii  ceph-common                                     0.48.2argonaut-1precise                     common utilities to mount and interact with a ceph storage cluster
ii  ceph-fs-common                                  0.48.2argonaut-1precise                     common utilities to mount and interact with a ceph file system
ii  ceph-fuse                                       0.48.2argonaut-1precise                     FUSE-based client for the Ceph distributed file system
ii  ceph-mds                                        0.48.2argonaut-1precise                     metadata server for the ceph distributed file system
ii  libcephfs1                                      0.48.2argonaut-1precise                     Ceph distributed file system client library
ii  python-ceph                                     0.48.2argonaut-1precise                     Python libraries for the Ceph distributed filesystem

# 安裝好之後, 就把你的 ceph cluster 的設定檔 copy 到 /etc/ceph 下, 正常就可以使用
# 至於怎麼安裝 ceph cluster 就請到 ceph 的官網去看囉~ 
root@ubuntu12:~$ ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at {wistor-003=172.17.123.92:6789/0,wistor-006=172.17.123.94:6789/0,wistor-007=172.17.123.95:6789/0}, election epoch 10, quorum 0,1,2 wistor-003,wistor-006,wistor-007
   osdmap e24: 23 osds: 23 up, 23 in
    pgmap v2242: 4416 pgs: 4416 active+clean; 8362 MB data, 156 GB used, 19850 GB / 21077 GB avail
   mdsmap e1: 0/0/1 up


如果只是要和一般的 iscsi server 整合, 可以參考官網, 大致上就是把一個 block device (ex:/dev/sdb) 丟到 LVM 去管理, 然後記得要建一個 cinder-volume
http://docs.openstack.org/trunk/openstack-compute/install/apt/content/osfolubuntu-cinder.html

Cinder Installation

root@ubuntu12:~$ apt-get install -y cinder-api cinder-scheduler cinder-volume iscsitarget open-iscsi iscsitarget-dkms python-cinderclient tgt
root@ubuntu12:~$ dpkg -l | grep cinder
ii  cinder-api                                      2012.2+git201209252100~precise-0ubuntu1            Cinder storage service - api server
ii  cinder-common                                   2012.2+git201209252100~precise-0ubuntu1            Cinder starage service - common files
ii  cinder-scheduler                                2012.2+git201209252100~precise-0ubuntu1            Cinder storage service - api server
ii  cinder-volume                                   2012.2+git201209252100~precise-0ubuntu1            Cinder storage service - api server
ii  python-cinder                                   2012.2+git201209252100~precise-0ubuntu1            Cinder python libraries
ii  python-cinderclient                             1:0.2.26+git201209201100~precise-0ubuntu1          python bindings to the OpenStack Volume API

修改 /etc/cinder/api_paste.ini
[filter:authtoken]
paste.filter_factory = keystone.middleware.auth_token:filter_factory
service_protocol = http
service_host = 127.0.0.1
service_port = 5000
auth_host = 127.0.0.1
auth_port = 35357
auth_protocol = http
# 修改這三行
admin_tenant_name = service
admin_user = cinder
admin_password = password

修改 /etc/cinder/api_paste.ini, 改成使用 mysql 及 RBD driver, 並且把 rabbitmq 的 password 修改一下
[DEFAULT]
rootwrap_config = /etc/cinder/rootwrap.conf
api_paste_confg = /etc/cinder/api-paste.ini
iscsi_helper = tgtadm
volume_name_template = volume-%s
volume_group = cinder-volumes
verbose = True
auth_strategy = keystone
state_path = /var/lib/cinder
rabbit_password = password
sql_connection = mysql://cinder:password@localhost:3306/cinder
volume_driver=cinder.volume.driver.RBDDriver


root@ubuntu12:~$ cinder-manage db sync
root@ubuntu12:~$ service cinder-api restart
root@ubuntu12:~$ service cinder-scheduler restart
root@ubuntu12:~$ service cinder-volume restart
root@ubuntu12:~$ cinder create --display_name test 1
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|      created_at     |      2012-10-02T07:14:34.815546      |
| display_description |                 None                 |
|     display_name    |                 test                 |
|          id         | e7e83b13-761e-40e3-8b4c-415126404e40 |
|       metadata      |                  {}                  |
|         size        |                  1                   |
|     snapshot_id     |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+
root@ubuntu12:~$ cinder list
+--------------------------------------+-----------+--------------+------+-------------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+-------------+
| e7e83b13-761e-40e3-8b4c-415126404e40 | available |     test     |  1   |     None    |             |
+--------------------------------------+-----------+--------------+------+-------------+-------------+
root@ubuntu12:~$ rbd list
volume-e7e83b13-761e-40e3-8b4c-415126404e40


Reference

Folsom Announcement
http://lists.openstack.org/pipermail/openstack-announce/2012-September/000035.html
Folsom: How it was made
http://blog.bitergia.com/2012/09/27/how-the-new-release-of-openstack-was-built/

Cinder Installation Document
http://docs.openstack.org/trunk/openstack-compute/install/apt/content/osfolubuntu-cinder.html
https://github.com/EmilienM/openstack-folsom-guide/blob/master/doc/out/pdf/openstack-folsom-guide.pdf
Cinder Developer document
http://docs.openstack.org/developer/cinder/
Cinder Source Code
https://github.com/openstack/cinder.git
https://github.com/openstack/python-cinderclient

The Top 3 New Swift Features in OpenStack Folsom
http://swiftstack.com/blog/2012/09/27/top-three-swift-features-in-openstack-folsom/

2012年2月2日 星期四

Ceph with ext4

Ceph 預設是使用 btrfs, 所以第一次初始化時, 可以透過參數把 storage 準備好

mkcephfs -a -c /etc/ceph/ceph.conf --mkbtrfs

不過如果要改用 ext4 就沒有這麼輕鬆, 首先你必須在每一台 osd 自行準備好一塊 ext4, 而且掛載到正確的路徑
根據網站上的描述
The ext4 partition must be mounted with -o user_xattr or else mkcephfs will fail. Also using noatime,nodiratime boosts performance at no cost. When using ext4, you should disable the ext4 journal

root@wistor-dev-7:~$ mke2fs -t ext4 /dev/mapper/ubuntu64--33--7-lvol0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
13107200 inodes, 52428800 blocks
2621440 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1600 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

root@wistor-dev-7:~$ tune2fs -o journal_data_writeback /dev/mapper/ubuntu64--33--7-lvol0
tune2fs 1.41.14 (22-Dec-2010)

root@wistor-dev-7:~$ tune2fs -O ^has_journal /dev/mapper/ubuntu64--33--7-lvol0
tune2fs 1.41.14 (22-Dec-2010)

root@wistor-dev-7:~$ e2fsck -f /dev/mapper/ubuntu64--33--7-lvol0
e2fsck 1.41.14 (22-Dec-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/ubuntu64--33--7-lvol0: 11/13107200 files (0.0% non-contiguous), 837781/52428800 blocks


為了讓每次開機能自動掛載這個磁區, 要修改一下 /etc/fstab

root@wistor-dev-7:~$ cat /etc/fstab
proc            /proc           proc    nodev,noexec,nosuid 0       0
/dev/mapper/ubuntu64--33--7-root /               ext4    errors=remount-ro 0       1

# /boot was on /dev/sda1 during installation
UUID=bf0a72da-7ca7-4960-9a7e-f90298b95609 /boot           ext2    defaults        0       2

/dev/mapper/ubuntu64--33--7-swap_1 none            swap    sw              0       0

# 加入這一行
/dev/mapper/ubuntu64--33--7-lvol0 /srv/osd.2  ext4 errors=remount-ro,data=writeback,noatime,nodiratime,user_xattr  0  1


root@wistor-dev-7:~$ mount -a
root@wistor-dev-7:~$ mount
/dev/mapper/ubuntu64--33--7-lvol0 on /srv/osd.2 type ext4 (rw,noatime,nodiratime,errors=remount-ro,data=writeback,user_xattr)







然後修改 /etc/ceph/ceph.conf

[osd]
        ; This is where the btrfs volume will be mounted.
        osd data = /srv/osd.$id

[osd.0]
        host = wistor-dev-5

#        把原本指定 btrfs devs 的這行拿掉
#        btrfs devs = /dev/mapper/ubuntu1104--64--5-lvol0

[osd.1]
        host = wistor-dev-6

#        把原本指定 btrfs devs 的這行拿掉
#        btrfs devs = /dev/mapper/wistor--dev--6-lvol0

[osd.2]
        host = wistor-dev-7

#        把原本指定 btrfs devs 的這行拿掉
#        btrfs devs = /dev/mapper/ubuntu64--33--7-lvol0



2011年10月20日 星期四

Install Ceph 0.37


配合 ubuntu oneiric (11.10) release, Ceph 也 release 0.37, 在安裝上更方便了, 然後 library 也更齊全, 首先參考他新的 document

http://ceph.newdream.net/docs/latest/ops/install/mkcephfs/#installing-the-packages

wget -q -O- https://raw.github.com/NewDreamNetwork/ceph/master/keys/release.asc | sudo apt-key add -

sudo tee /etc/apt/sources.list.d/ceph.list << EOF <="" apt-get="" ceph.newdream.net="" ceph="" deb-src="" deb="" debian="" eof="" http:="" install="" main="" natty="" pre="" sudo="" update="">

The following packages have unmet dependencies:
 ceph : Depends: libcrypto++8 but it is not installable
        Depends: ceph-common but it is not going to be installed
        Recommends: ceph-fuse but it is not going to be installed
        Recommends: libcephfs1 but it is not going to be installed
        Recommends: librados2 but it is not going to be installed
        Recommends: librbd1 but it is not going to be installed
        Recommends: btrfs-tools but it is not going to be installed
接下來設定好 /etc/ceph/ceph.conf, (請按照網站上的設定方法, 因為舊的設定檔無法使用) , 1mon + 1mds + 2osd
[global]
        auth supported = cephx
        keyring = /etc/ceph/$name.keyring
        log file = /var/log/ceph/$name.log
        log_to_syslog = true        ; uncomment this line to log to syslog
        pid file = /var/run/ceph/$name.pid

[mon]
        mon data = /srv/mon.$id

[mon.a]
        host = ubuntu1104-64-5
        mon addr = 172.16.33.5:6789

[mds]

[mds.a]
        host = ubuntu1104-64-5

[osd]
        osd data = /srv/osd.$id
        osd journal = /srv/osd.$id.journal
        osd journal size = 1000 ; journal size, in megabytes

[osd.0]
        host = ubuntu1104-64-5
        btrfs devs = /dev/mapper/ubuntu1104--64--5-lvol0


[osd.1]
        host = ubuntu1104-64-6
        btrfs devs = /dev/mapper/ubuntu1104--64--6-lvol0


把 master 的 public key import 到其它台
root@ubuntu1104-64-5:~$ ssh-copy-id -i /root/.ssh/id_dsa.pub root@172.16.33.6
root@ubuntu1104-64-5:~$ ssh-copy-id -i /root/.ssh/id_dsa.pub root@172.16.33.7
root@ubuntu1104-64-5:~$ mkdir /var/log/ceph # 把 log folder 先建出來
# 把 file system 建立起來

root@ubuntu1104-64-5:/etc/ceph$ mkcephfs -a -c /etc/ceph/ceph.conf --mkbtrfs

# 把 service 啟動起來

root@ubuntu1104-64-5:/etc/ceph$ service ceph -a start
=== mon.a ===
Starting Ceph mon.a on ubuntu1104-64-5...
starting mon.a rank 0 at 172.16.33.5:6789/0 mon_data /srv/mon.a fsid cbb32d58-ceb8-7379-e10e-fc5ad51
865e3
=== mds.a ===
Starting Ceph mds.a on ubuntu1104-64-5...
starting mds.a at 0.0.0.0:6800/1124
=== osd.0 ===
Mounting Btrfs on ubuntu1104-64-5:/srv/osd.0
Scanning for Btrfs filesystems
Starting Ceph osd.0 on ubuntu1104-64-5...
starting osd.0 at 0.0.0.0:6801/1198 osd_data /srv/osd.0 /srv/osd.0.journal
=== osd.1 ===
Mounting Btrfs on ubuntu1104-64-6:/srv/osd.1
Scanning for Btrfs filesystems
Starting Ceph osd.1 on ubuntu1104-64-6...
starting osd.1 at 0.0.0.0:6800/19846 osd_data /srv/osd.1 /srv/osd.1.journal

authtool 的名字改成 ceph-authtool
root@ubuntu1104-64-5:/etc/ceph$ ceph -s
2011-10-20 14:39:54.756735    pg v64: 396 pgs: 396 active+clean; 24 KB data, 4672 KB used, 395 GB / 400 GB avail
2011-10-20 14:39:54.757492   mds e4: 1/1/1 up {0=a=up:active}
2011-10-20 14:39:54.757512   osd e4: 2 osds: 2 up, 2 in
2011-10-20 14:39:54.757541   log 2011-10-20 14:39:55.081612 osd.1 172.16.33.6:6800/19846 102 : [INF] 1.5e scrub ok
2011-10-20 14:39:54.757582   mon e1: 1 mons at {a=172.16.33.5:6789/0}

root@ubuntu1104-64-5:/etc/ceph$ ceph auth list
2011-10-20 14:40:10.586279 mon <- [auth,list]
2011-10-20 14:40:10.586960 mon.0 -> 'installed auth entries:
mon.
        key: AQDfwJ9OkCqOJRAAB5cXyb6EzUrMbCOL1xGVUw==
mds.a
        key: AQDfwJ9OyMIiIBAAh07TCA6SAkNKixVYoyJGvA==
        caps: [mds] allow
        caps: [mon] allow rwx
        caps: [osd] allow *
osd.0
        key: AQDcwJ9OYP7LKhAAhfO7c11l+U5KAGAP+8kVqw==
        caps: [mon] allow rwx
        caps: [osd] allow *
osd.1
        key: AQDnwJ9OOKCqBBAA/oPp6Z1yg+WjuTAutZHT7g==
        caps: [mon] allow rwx
        caps: [osd] allow *
client.admin
        key: AQDfwJ9O8AVZJBAAWe9LfeOYUIP7GauVU1Mi5A==
        caps: [mds] allow
        caps: [mon] allow *
        caps: [osd] allow *
' (0)

2011年9月28日 星期三

Test Ceph RBD + iSCSI performance

# 在 local
root@ubuntu1104-64-5:/mnt$ dd if=/dev/zero of=testfile bs=4096 count=10000 conv=fdatasync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 4.41257 s, 9.3 MB/s

# 在 local 測 rdb image
root@ubuntu1104-64-5:/mnt$ dd if=/dev/zero of=testfile bs=4096 count=10000 conv=fdatasync
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 3.79785 s, 10.8 MB/s

# 用 Client 端測 rdb

root@ubuntu1104-64-6:~# dd if=/dev/zero of=testfile bs=4096k count=100 conv=fdatasync
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 4.2075 s, 99.7 MB/s
root@ubuntu1104-64-6:~# dd if=/dev/zero of=testfile bs=4096k count=1000 conv=fdatasync
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 40.8692 s, 103 MB/s




# local 
root@ubuntu1104-64-5:~# dd if=/dev/zero of=testfile bs=4096k count=100 conv=fdatasync
100+0 records in
100+0 records out
419430400 bytes (419 MB) copied, 4.59282 s, 91.3 MB/s






2011年9月20日 星期二

Export Ceph RBD with iSCSI

原本要使用 LIO , 但因為 Linux Kernel 2.6.38 似乎還沒把 iSCSI 整合進來, 所以就改用 iET (iSCSI Enterprise Target) 來試驗整個 Ceph 提供的 RBD (Rados Block Device)

如果對 iSCSI 還很不了解, 可以先從這兩篇開始了解
至於設定的部份可以參考這幾篇


第一步先從簡單的 partition 開始熟悉 iSCSI

設定 Target

root@ubuntu1104-64-5:/dev$ apt-get install iscsitarget
root@ubuntu1104-64-5:/dev$ apt-get install open-iscsi

安裝完出現警告訊息: iscsitarget not enabled in "/etc/default/iscsitarget", not starting... ... (warning). 這個問題需編輯設定檔:/etc/default/iscsitarget 將「ISCSITARGET_ENABLE=false」改為「ISCSITARGET_ENABLE=true」,iSCSI Target 服務才能作用。

接下來要設定 iET 的 /etc/iet/ietd.conf

iSNSServer 172.16.33.5
iSNSAccessControl No
Target iqn.2011-09.com.example:storage.lun1
# ubuntu1104--64--5-lvol1 是之前就切好的一塊 partition, 我們用fileio 來把它 release 出去
# ScsiId 及 ScsiSN 現在不知道是做什麼的, 暫時不管它
Lun 0 Path=/dev/mapper/ubuntu1104--64--5-lvol1,Type=fileio,ScsiId=xyz,ScsiSN=xyz
 

root@ubuntu1104-64-5:/dev$ /etc/init.d/iscsitarget restart
 * Removing iSCSI enterprise target devices:
   ...done.
 * Stopping iSCSI enterprise target service:
   ...done.
 * Removing iSCSI enterprise target modules:
   ...done.
 * Starting iSCSI enterprise target service
   ...done.
   ...done.


然後設定允許的連線 /etc/iet/initiators.allow, 末行顯示 ALL ALL 表示預設開放所有來源與目的連線,測試初期就先保持這樣了。

設定 Initiator

先把 /etc/iscsi/iscsid.conf 內的 node.startup 設成 automatic

# 重啟 initiator 的服務
root@ubuntu1104-64-6:/etc/iscsi$ /etc/init.d/open-iscsi restart
* Disconnecting iSCSI targets
...done.
* Stopping iSCSI initiator service
...done.
* Starting iSCSI initiator service iscsid
...done.
* Setting up iSCSI targets
...done.

# 然後尋找我們剛剛開出來的 iSCSI
root@ubuntu1104-64-6:/etc/iscsi$ iscsiadm -m discovery -t st -p 172.16.33.5
172.16.33.5:3260,1 iqn.2011-09.com.example:storage.lun1

root@ubuntu1104-64-6:/etc/iscsi# iscsiadm -m node
172.16.33.5:3260,1 iqn.2011-09.com.example:storage.lun1

# 如果有連結到, 應該會在這邊看到它的目錄
root@ubuntu1104-64-6:/etc/iscsi$ ll /etc/iscsi/nodes/
total 12
drw------- 3 root root 4096 2011-09-20 16:06 ./
drwxr-xr-x 5 root root 4096 2011-09-20 15:20 ../
drw------- 3 root root 4096 2011-09-20 16:06 iqn.2011-09.com.example:storage.lun1/

# 登入到這個節點, 不過基本上前面會自動登入, 所以在這個步驟可能會看到"already exists"
root@ubuntu1104-64-6:/etc/iscsi$ iscsiadm -m node -T iqn.2011-09.com.example:storage.lun1 -p 172.16.33.5 -l
Logging in to [iface: default, target: iqn.2011-09.com.example:storage.lun1, portal: 172.16.33.5,3260]
Login to [iface: default, target: iqn.2011-09.com.example:storage.lun1, portal: 172.16.33.5,3260]: successful

# 用 fdisk 看一下, 會看到多了個 sdb 這個 device, 然後有一個 sdb0 partition
root@ubuntu1104-64-6:/etc/iscsi$ fdisk -l
Disk /dev/sdb: 53.7 GB, 53687091200 bytes
64 heads, 32 sectors/track, 51200 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe4a1139a

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       51200    52428784   83  Linux


# 格式化這個 partition
root@ubuntu1104-64-6:/etc/iscsi$ mkfs.ext4 /dev/sdb1
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
3276800 inodes, 13107196 blocks
655359 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
400 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 35 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

# mount 起來
root@ubuntu1104-64-6:/data$ mount /dev/sdb1 /data/scsi

# 看一下所有 mount 的狀況
root@ubuntu1104-64-6:/data$ mount
/dev/mapper/ubuntu1104--64--6-root on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
none on /sys type sysfs (rw,noexec,nosuid,nodev)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /dev type devtmpfs (rw,mode=0755)
none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
none on /dev/shm type tmpfs (rw,nosuid,nodev)
none on /var/run type tmpfs (rw,nosuid,mode=0755)
none on /var/lock type tmpfs (rw,noexec,nosuid,nodev)
/dev/sda1 on /boot type ext2 (rw)
/dev/mapper/ubuntu1104--64--6-lvol0 on /data/osd.2 type btrfs (rw,noatime)
/dev/mapper/ubuntu1104--64--6-lvol1 on /data/osd.3 type btrfs (rw,noatime)
/dev/sdb1 on /data/scsi type ext4 (rw)

# 看一下 partition 的使用狀況
root@ubuntu1104-64-6:/data$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/ubuntu1104--64--6-root
47328184   2595704  42328336   6% /
none                  12358244       224  12358020   1% /dev
none                  12366300         0  12366300   0% /dev/shm
none                  12366300        48  12366252   1% /var/run
none                  12366300         0  12366300   0% /var/lock
/dev/sda1               233191     45272    175478  21% /boot
/dev/mapper/ubuntu1104--64--6-lvol0
52428800   1033620  49277220   3% /data/osd.2
/dev/mapper/ubuntu1104--64--6-lvol1
52428800   1034584  49276440   3% /data/osd.3
/dev/sdb1             51606124    184136  48800552   1% /data/scsi

如果要把 iscsi 的連結斷掉

iscsiadm -m node -T iqn.2011-09.com.example:storage.lun1 -p 172.16.33.5 -u

Create Ceph RBD

參考連結: http://ceph.newdream.net/wiki/RBD

# 製造一個 rbd , 大小為 5G
root@ubuntu1104-64-5:/dev/rbd/rbd$ rbd create goo --size 5120

# 看目前在 rbd 內的 list
root@ubuntu1104-64-5:/dev/rbd/rbd$ rbd list
foo

# 從 rbd 去觀察 foo
root@ubuntu1104-64-5:~$ rbd info foo
rbd image 'foo':
        size 5120 MB in 1280 objects
        order 22 (4096 KB objects)
        block_name_prefix: rb.0.3
        parent:  (pool -1)

# 從 rados 去看一下 rbd 的狀況, 可以發現多了一個 foo.rbd
root@ubuntu1104-64-5:~$ rados ls -p rbd
foo.rbd
rb.0.1.000000000000
rb.0.1.000000000001
rbd_directory
rbd_info

root@ubuntu1104-64-5:~$ modprobe rbd

# 把 rbd 加到系統內, 等下才看的到這個 device
root@ubuntu1104-64-5:/dev$ echo "172.16.33.5 name=admin,secret=AQDeRGdOMNL3MhAAuzvelwICjpYhLIk7IMcX2g== rbd foo" > /sys/bus/rbd/add
root@ubuntu1104-64-5:/dev$ mknod /dev/rbd0 b 254 0


# 看一下 rbd device 有沒有出現
root@ubuntu1104-64-5:~$ ls /sys/bus/rbd/devices
0
root@ubuntu1104-64-5:~$ ls /dev/rbd/rbd
foo:0

# format rbd0
root@ubuntu1104-64-5:/dev$ mkfs -t ext3 /dev/rbd0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
327680 inodes, 1310720 blocks
65536 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1342177280
40 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information:
done

# mount 起來
root@ubuntu1104-64-5:/dev$ mount -t ext3 /dev/rbd0 /mnt

root@ubuntu1104-64-5:/dev$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu1104--64--5-root
                       46G  6.6G   37G  16% /
none                   12G  228K   12G   1% /dev
none                   12G     0   12G   0% /dev/shm
none                   12G   64K   12G   1% /var/run
none                   12G     0   12G   0% /var/lock
/dev/sda1             228M   45M  172M  21% /boot
/dev/mapper/ubuntu1104--64--5-lvol2
                       50G  1.1G   47G   3% /data/osd.0
/dev/mapper/ubuntu1104--64--5-lvol0
                       50G  1.1G   47G   3% /data/osd.1
/dev/rbd0             5.0G  139M  4.6G   3% /mnt

# 測試完畢, 再將它 umount
root@ubuntu1104-64-5:/dev$ umount /mnt

Export RBD via iSCSI

可以參考 http://ceph.newdream.net/wiki/ISCSI
首先改 /etc/iet/ietd.conf 裡面的設定

Target iqn.2011-09.net.newdream.ceph:rados.iscsi.001
        # 記得要使用 blockio, 而不是 fileio
        Lun 0 Path=/dev/rbd0,Type=blockio

然後分別重啟 target 及 Initiator, 步驟和上面的一樣,

root@ubuntu1104-64-7:~$ fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c4797

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          32      248832   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              32       60802   488134657    5  Extended
/dev/sda5              32       60802   488134656   8e  Linux LVM

Disk /dev/sdb: 5368 MB, 5368709120 bytes
166 heads, 62 sectors/track, 1018 cylinders
Units = cylinders of 10292 * 512 = 5269504 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

# 因為是用 blockio 當成一個 device export 出來, 所以上面沒有任何 partition
Disk /dev/sdb doesn't contain a valid partition table

# 直接 mount 起來
root@ubuntu1104-64-7:~$ mount /dev/sdb /mnt

# 看一下狀況, 多了一個 5G 的 sdb
root@ubuntu1104-64-7:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu1104--64--7-root
                      435G   11G  402G   3% /
none                   12G  216K   12G   1% /dev
none                   12G     0   12G   0% /dev/shm
none                   12G   60K   12G   1% /var/run
none                   12G     0   12G   0% /var/lock
/dev/sda1             228M   45M  172M  21% /boot
/dev/sdb              5.0G  139M  4.6G   3% /mnt

大致上這樣就完成啦! 收工~

2011年9月7日 星期三

Setup Ceph Cluster



把三個 partition create 出來
pjack@ubuntu1104-64-5:/etc/ceph$ sudo lvcreate -L 50G ubuntu1104-64-5
  Logical volume "lvol0" created
pjack@ubuntu1104-64-5:/etc/ceph$ sudo lvcreate -L 50G ubuntu1104-64-5
  Logical volume "lvol1" created
pjack@ubuntu1104-64-5:/etc/ceph$ sudo lvcreate -L 50G ubuntu1104-64-5
  Logical volume "lvol2" created



分別指定成 ext3, ext4, btrfs

第一塊是 ext3
pjack@ubuntu1104-64-5:/etc/ceph$ sudo mkfs -t ext3 /dev/ubuntu1104-64-5/lvol0
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
3276800 inodes, 13107200 blocks
655360 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
400 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

第二塊是 ext4
pjack@ubuntu1104-64-5:/etc/ceph$ sudo mkfs -t ext4 /dev/ubuntu1104-64-5/lvol1
mke2fs 1.41.14 (22-Dec-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
3276800 inodes, 13107200 blocks
655360 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
400 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 31 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.


第三塊是 btrfs, 官方建議的也是這個設定
pjack@ubuntu1104-64-5:/etc/ceph$ sudo mkfs -t btrfs /dev/ubuntu1104-64-5/lvol2

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/ubuntu1104-64-5/lvol2
        nodesize 4096 leafsize 4096 sectorsize 4096 size 50.00GB
Btrfs Btrfs v0.19


# 把這三塊 Mount 起來
pjack@ubuntu1104-64-5:/mnt$ sudo mount /dev/mapper/ubuntu1104--64--5-lvol0 /mnt/lvol0
pjack@ubuntu1104-64-5:/mnt$ sudo mount /dev/mapper/ubuntu1104--64--5-lvol1 /mnt/lvol1
pjack@ubuntu1104-64-5:/mnt$ sudo mount /dev/mapper/ubuntu1104--64--5-lvol2 /mnt/lvol2

# 看一下結果
pjack@ubuntu1104-64-5:/mnt$ df
Filesystem                               1K-blocks      Used Available Use% Mounted on
/dev/mapper/ubuntu1104--64--5-lvol0       51606140    184268  48800432   1% /mnt/lvol0
/dev/mapper/ubuntu1104--64--5-lvol1       51606140    184136  48800564   1% /mnt/lvol1
/dev/mapper/ubuntu1104--64--5-lvol2       52428800        56  50302976   1% /mnt/lvol2

# 看一下每一塊的 format
pjack@ubuntu1104-64-5:/lib/modules/2.6.38-8-server/kernel/fs$ mount -l
/dev/mapper/ubuntu1104--64--5-lvol0 on /mnt/lvol0 type ext3 (rw)
/dev/mapper/ubuntu1104--64--5-lvol1 on /mnt/lvol1 type ext4 (rw)
/dev/mapper/ubuntu1104--64--5-lvol2 on /mnt/lvol2 type btrfs (rw)

不過他對 Ext4 有一些要求

  1. user_xattr
  2. noatime
  3. nodiratime
  4. disable the ext journal

The ext4 partition must be mounted with -o user_xattr or else mkcephfs will fail. Also using noatime,nodiratime boosts performance at no cost. When using ext4, you should disable the ext4 journal, because Ceph does its own journalling. This will boost performance.       

Data Mode
=========
There are 3 different data modes:

* writeback mode
In data=writeback mode, ext4 does not journal data at all. This mode provides a similar level of journaling as that of XFS, JFS, and ReiserFS in its default mode - metadata journaling. A crash+recovery can cause incorrect data to appear in files which were written shortly before the crash. This mode will typically provide the best ext4 performance.

* ordered mode
In data=ordered mode, ext4 only officially journals metadata, but it logically groups metadata information related to data changes with the data blocks into a single unit called a transaction. When it's time to write the new metadata out to disk, the associated data blocks are written first. In general, this mode performs slightly slower than writeback but significantly faster than journal mode.

* journal mode
data=journal mode provides full data and metadata journaling. All new data is written to the journal first, and then to its final location.
In the event of a crash, the journal can be replayed, bringing both data and
metadata into a consistent state. This mode is the slowest except when data
needs to be read from and written to disk at the same time where it outperforms all others modes. Curently ext4 does not have delayed allocation support if this data journalling mode is selected.

修改之後再看一次結果
pjack@ubuntu1104-64-5:/lib/modules/2.6.38-8-server/kernel/fs$ mount -l
/dev/mapper/ubuntu1104--64--5-lvol0 on /mnt/lvol0 type ext3 (rw)
/dev/mapper/ubuntu1104--64--5-lvol2 on /mnt/lvol2 type btrfs (rw)
/dev/mapper/ubuntu1104--64--5-lvol1 on /mnt/lvol1 type ext4 (rw,noatime,nodiratime,user_xattr,data=writeback)


為了之後的方便, 每一台 Server 都先生成 ssh key, 然後 import 到其他台去
pjack@ubuntu1104-64-5:/etc/ceph$ sudo ssh-keygen -d
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
82:8a:85:37:a2:17:f2:41:4f:e8:96:d0:a6:1b:c9:6c root@ubuntu1104-64-5
The key's randomart image is:
+--[ DSA 1024]----+
|                 |
| . .             |
|. = .            |
|oO + .           |
|BEX o . S        |
|o@ =   .         |
|+ +              |
| .               |
|                 |
+-----------------+

root@ubuntu1104-64-5:~$ ssh-copy-id -i /root/.ssh/id_dsa.pub root@172.16.33.6


接下來把 sample.ceph.conf & sample.fetch_config 複製到 /etc/ceph 下
把設定修改好, 大部份都不必修改

[global]
        ; enable secure authentication
        auth supported = cephx

        ; allow ourselves to open a lot of files
        max open files = 131072

        ; set log file
        log file = /var/log/ceph/$name.log
        ; log_to_syslog = true        ; uncomment this line to log to syslog

        ; set up pid files
        pid file = /var/run/ceph/$name.pid

        ; If you want to run a IPv6 cluster, set this to true. Dual-stack isn't possible
        ;ms bind ipv6 = true


        keyring = /etc/ceph/keyring.admin

[mon]
        mon data = /data/$name
[mon.alpha]
        host = ubuntu1104-64-5
        mon addr = 172.16.33.5:6789
[mds]
        ; where the mds keeps it's secret encryption keys
        keyring = /data/keyring.$name

        ; mds logging to debug issues.
        ;debug ms = 1
        ;debug mds = 20

[mds.alpha]
        host = ubuntu1104-64-5
[osd]
        ; This is where the btrfs volume will be mounted.
        osd data = /data/$name
        keyring = /etc/ceph/keyring.$name

        osd journal = /data/$name/journal
        osd journal size = 1000 ; journal size, in megabytes

[osd.0]
        host = ubuntu1104-64-5
        btrfs devs = /dev/mapper/ubuntu1104--64--5-lvol2

[osd.1]
        host = ubuntu1104-64-5
        btrfs devs = /dev/mapper/ubuntu1104--64--5-lvol0

[osd.2]
        host = ubuntu1104-64-6
        btrfs devs = /dev/mapper/ubuntu1104--64--6-lvol0

[osd.3]
        host = ubuntu1104-64-6
        btrfs devs = /dev/mapper/ubuntu1104--64--6-lvol1


#!/bin/sh
conf="$1"
scp -i /root/.ssh/id_dsa root@172.16.33.5:/etc/ceph/ceph.conf $conf

然後因為 ceph.conf 內用的都是 hostname, 而非 ip address, 所以要去設定一下 /etc/hosts, 不然之後 script 會出問題

127.0.0.1       localhost
127.16.33.5     ubuntu1104-64-5
172.16.33.6     ubuntu1104-64-6
172.16.33.7     ubuntu1104-64-7



另外是發現 Ceph 的 Scipt 好像有些問題, 進去後修改了其中一行的順序, 不然他似乎會把 ceph.conf 內的設定蓋掉
-------- 略 ------------
[ -z "$conf" ] && [ -n "$dir" ] && conf="$dir/conf"

# 多加這一行
[ -z "$conf" ] && [ -z "$dir" ] && conf=$default_conf



經過一長串的前置動作, 終於可以開始把 Ceph Filesystem 建起來

root@ubuntu1104-64-5:/tmp$ /sbin/mkcephfs -a --mkbtrfs
here 0 /etc/ceph/ceph.conf
[/etc/ceph/fetch_config /tmp/fetched.ceph.conf.13131]
ceph.conf                                                         100% 4455     4.4KB/s   00:00
temp dir is /tmp/mkcephfs.pt2DlXEHkB
here 0 /tmp/fetched.ceph.conf.13131
preparing monmap in /tmp/mkcephfs.pt2DlXEHkB/monmap
/usr/bin/monmaptool --create --clobber --add alpha 172.16.33.5:6789 --print /tmp/mkcephfs.pt2DlXEHkB
/monmap
/usr/bin/monmaptool: monmap file /tmp/mkcephfs.pt2DlXEHkB/monmap
/usr/bin/monmaptool: generated fsid 9cc6b2d5-1eba-50b2-bd43-7b3807ce301b
epoch 1
fsid 9cc6b2d5-1eba-50b2-bd43-7b3807ce301b
last_changed 2011-09-07 18:18:00.219236
created 2011-09-07 18:18:00.219236
0: 172.16.33.5:6789/0 mon.alpha
/usr/bin/monmaptool: writing epoch 1 to /tmp/mkcephfs.pt2DlXEHkB/monmap (1 monitors)

=== osd.0 ===
here 0 /tmp/mkcephfs.pt2DlXEHkB/conf
umount: /data/osd.0: not mounted
umount: /dev/mapper/ubuntu1104--64--5-lvol2: not mounted

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/mapper/ubuntu1104--64--5-lvol2
        nodesize 4096 leafsize 4096 sectorsize 4096 size 50.00GB
Btrfs Btrfs v0.19
Scanning for Btrfs filesystems
here 0 /tmp/mkcephfs.pt2DlXEHkB/conf
2011-09-07 18:18:00.995785 7fb8f3dfc760 created object store /data/osd.0 journal /data/osd.0/journal
 for osd0 fsid 9cc6b2d5-1eba-50b2-bd43-7b3807ce301b
creating private key for osd.0 keyring /etc/ceph/keyring.osd.0
creating /etc/ceph/keyring.osd.0

=== osd.1 ===
here 0 /tmp/mkcephfs.pt2DlXEHkB/conf
umount: /data/osd.1: not mounted
umount: /dev/mapper/ubuntu1104--64--5-lvol0: not mounted

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/mapper/ubuntu1104--64--5-lvol0
        nodesize 4096 leafsize 4096 sectorsize 4096 size 50.00GB
Btrfs Btrfs v0.19
Scanning for Btrfs filesystems
here 0 /tmp/mkcephfs.pt2DlXEHkB/conf
2011-09-07 18:18:01.689284 7fa826bb2760 created object store /data/osd.1 journal /data/osd.1/journal
 for osd1 fsid 9cc6b2d5-1eba-50b2-bd43-7b3807ce301b
creating private key for osd.1 keyring /etc/ceph/keyring.osd.1
creating /etc/ceph/keyring.osd.1

=== osd.2 ===
pushing conf and monmap to ubuntu1104-64-6:/tmp/mkfs.ceph.13131
umount: /data/osd.2: not mounted
umount: /dev/mapper/ubuntu1104--64--6-lvol0: not mounted

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/mapper/ubuntu1104--64--6-lvol0
        nodesize 4096 leafsize 4096 sectorsize 4096 size 50.00GB
Btrfs Btrfs v0.19
Scanning for Btrfs filesystems
2011-09-07 18:18:03.823692 7fdc3c04c760 created object store /data/osd.2 journal /data/osd.2/journal for osd2 fsid 9cc6b2d5-1eba-50b2-bd43-7b3807ce301b
creating private key for osd.2 keyring /etc/ceph/keyring.osd.2
creating /etc/ceph/keyring.osd.2
collecting osd.2 key


=== osd.3 ===
pushing conf and monmap to ubuntu1104-64-6:/tmp/mkfs.ceph.13131
umount: /data/osd.3: not mounted
umount: /dev/mapper/ubuntu1104--64--6-lvol1: not mounted

WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/mapper/ubuntu1104--64--6-lvol1
        nodesize 4096 leafsize 4096 sectorsize 4096 size 50.00GB
Btrfs Btrfs v0.19
Scanning for Btrfs filesystems
 ** WARNING: Ceph is still under development.  Any feedback can be directed  **
 **          at ceph-devel@vger.kernel.org or http://ceph.newdream.net/.     **
2011-09-07 18:18:06.293806 7f91b9b63760 created object store /data/osd.3 journal /data/osd.3/journal for osd3 fsid 9cc6b2d5-1eba-50b2-bd43-7b3807ce301b
creating private key for osd.3 keyring /etc/ceph/keyring.osd.3
creating /etc/ceph/keyring.osd.3
collecting osd.3 key

=== mds.alpha ===
here 0 /tmp/mkcephfs.pt2DlXEHkB/conf
creating private key for mds.alpha keyring /data/keyring.mds.alpha
creating /data/keyring.mds.alpha
here 0 /tmp/mkcephfs.pt2DlXEHkB/conf
Building generic osdmap
 highest numbered osd in /tmp/mkcephfs.pt2DlXEHkB/conf is osd.3
 num osd = 4
/usr/bin/osdmaptool: osdmap file '/tmp/mkcephfs.pt2DlXEHkB/osdmap'
/usr/bin/osdmaptool: writing epoch 1 to /tmp/mkcephfs.pt2DlXEHkB/osdmap
Generating admin key at /tmp/mkcephfs.pt2DlXEHkB/keyring.admin
creating /tmp/mkcephfs.pt2DlXEHkB/keyring.admin
Building initial monitor keyring
added entity mds.alpha auth auth(auid = 18446744073709551615 key=AQDeRGdOQBGPLhAAA3owSiBl0H4ozL4dy0H7Rg== with 0 caps)
added entity osd.0 auth auth(auid = 18446744073709551615 key=AQDZRGdOKLL3ABAAlU4NM3xNTe+m/dUXEvKCRw== with 0 caps)
added entity osd.1 auth auth(auid = 18446744073709551615 key=AQDZRGdOAJ1MKhAAY69HXzl8QLxZ3/MCHP2Cnw== with 0 caps)
added entity osd.2 auth auth(auid = 18446744073709551615 key=AQDbRGdOeEtQMhAAHhbE8EuTxqpobHIUR0SCdg== with 0 caps)
added entity osd.3 auth auth(auid = 18446744073709551615 key=AQDeRGdOYImzEhAAHQkcZtR4E8npHlgpAT8NpQ== with 0 caps)
=== mon.alpha ===
here 0 /tmp/mkcephfs.pt2DlXEHkB/conf
/usr/bin/cmon: created monfs at /data/mon.alpha for mon.alpha
placing client.admin keyring in /etc/ceph/keyring.admin

看一下 Key 的設置
# 檢查一下 Server 1 (Ubuntu1104-64-5) 的狀況
root@ubuntu1104-64-5:/etc/ceph$ ll
drwxr-xr-x  2 root root 4096 2011-09-07 18:16 ./
drwxr-xr-x 86 root root 4096 2011-09-07 18:18 ../
-rw-r--r--  1 root root 4455 2011-09-07 17:31 ceph.conf
-rwxr-xr-x  1 root root  392 2011-09-07 11:32 fetch_config*
-rw-------  1 root root   92 2011-09-07 18:18 keyring.admin
-rw-------  1 root root   85 2011-09-07 18:18 keyring.osd.0
-rw-------  1 root root   85 2011-09-07 18:18 keyring.osd.1

root@ubuntu1104-64-5:/etc/ceph$ cauthtool -l keyring.admin
[client.admin]
        key = AQDeRGdOMNL3MhAAuzvelwICjpYhLIk7IMcX2g==
        auid = 18446744073709551615

# 檢查一下 Server 2 (Ubuntu1104-64-5) 的狀況
root@ubuntu1104-64-6:/etc/ceph$ ll
drwxr-xr-x  2 root root 4096 2011-09-07 18:16 ./
drwxr-xr-x 86 root root 4096 2011-09-07 18:18 ../
-rwxr-xr-x  1 root root  392 2011-09-07 11:56 fetch*
-rw-------  1 root root   85 2011-09-07 18:18 keyring.osd.2
-rw-------  1 root root   85 2011-09-07 18:18 keyring.osd.3


每個 node 的 Key 看起來就定位了, 讓我們把 service 叫起來吧!

root@ubuntu1104-64-5:/tmp$ service ceph -a start
[/etc/ceph/fetch_config /tmp/fetched.ceph.conf.16083]
ceph.conf                                                         100% 4455     4.4KB/s   00:00
=== mon.alpha ===
Starting Ceph mon.alpha on ubuntu1104-64-5...
starting mon.alpha rank 0 at 172.16.33.5:6789/0 mon_data /data/mon.alpha fsid 9cc6b2d5-1eba-50b2-bd43-7b3807ce301b
=== mds.alpha ===
Starting Ceph mds.alpha on ubuntu1104-64-5...
starting mds.alpha at 0.0.0.0:6800/16268
=== osd.0 ===
Mounting Btrfs on ubuntu1104-64-5:/data/osd.0
Scanning for Btrfs filesystems
Starting Ceph osd.0 on ubuntu1104-64-5...
starting osd0 at 0.0.0.0:6801/16371 osd_data /data/osd.0 /data/osd.0/journal
=== osd.1 ===
Mounting Btrfs on ubuntu1104-64-5:/data/osd.1
Scanning for Btrfs filesystems
Starting Ceph osd.1 on ubuntu1104-64-5...
starting osd1 at 0.0.0.0:6804/16464 osd_data /data/osd.1 /data/osd.1/journal
=== osd.2 ===
Mounting Btrfs on ubuntu1104-64-6:/data/osd.2
Scanning for Btrfs filesystems
Starting Ceph osd.2 on ubuntu1104-64-6...
starting osd2 at 0.0.0.0:6800/14475 osd_data /data/osd.2 /data/osd.2/journal
=== osd.3 ===
Mounting Btrfs on ubuntu1104-64-6:/data/osd.3
Scanning for Btrfs filesystems
Starting Ceph osd.3 on ubuntu1104-64-6...
starting osd3 at 0.0.0.0:6803/14676 osd_data /data/osd.3 /data/osd.3/journal


檢查一下整體的狀況及 Authentication list
root@ubuntu1104-64-5:/etc/ceph$ ceph -s
2011-09-07 18:29:24.305413    pg v160: 792 pgs: 792 active+clean; 24 KB data, 112 MB used, 191 GB / 200 GB avail
2011-09-07 18:29:24.307445   mds e4: 1/1/1 up {0=alpha=up:active}
# 有 4 個 osd, 4 個都 turn on 並且加入 storage pool
2011-09-07 18:29:24.307483   osd e7: 4 osds: 4 up, 4 in   
2011-09-07 18:29:24.307539   log 2011-09-07 18:29:20.760469 osd3 172.16.33.6:6803/14676 130 : [INF] 1.8c scrub ok
2011-09-07 18:29:24.307617   mon e1: 1 mons at {alpha=172.16.33.5:6789/0}

root@ubuntu1104-64-5:/etc/ceph$ ceph auth list
2011-09-07 18:29:41.564151 mon <- [auth,list]
2011-09-07 18:29:41.564718 mon0 -> 'installed auth entries:
mon.
        key: AQDeRGdOiEk2NBAAVHVGzaeOFcgSmbZZ2xPu+w==
mds.alpha
        key: AQDeRGdOQBGPLhAAA3owSiBl0H4ozL4dy0H7Rg==
        caps: [mds] allow
        caps: [mon] allow rwx
        caps: [osd] allow *
osd.0
        key: AQDZRGdOKLL3ABAAlU4NM3xNTe+m/dUXEvKCRw==
        caps: [mon] allow rwx
        caps: [osd] allow *
osd.1
        key: AQDZRGdOAJ1MKhAAY69HXzl8QLxZ3/MCHP2Cnw==
        caps: [mon] allow rwx
        caps: [osd] allow *
osd.2
        key: AQDbRGdOeEtQMhAAHhbE8EuTxqpobHIUR0SCdg==
        caps: [mon] allow rwx
        caps: [osd] allow *
osd.3
        key: AQDeRGdOYImzEhAAHQkcZtR4E8npHlgpAT8NpQ==
        caps: [mon] allow rwx
        caps: [osd] allow *
client.admin
        key: AQDeRGdOMNL3MhAAuzvelwICjpYhLIk7IMcX2g==
        caps: [mds] allow
        caps: [mon] allow *
        caps: [osd] allow *
' (0)


把 ceph mount 起來, 先用 Kernel 的方式.. 不過不知道為什麼, 一直出現 "No such device" 這樣的訊息, 無法解決就放棄了
root@ubuntu1104-64-5:/etc/ceph$ mount -t ceph 172.16.33.5:6789:/ /mnt/ceph -v -o name=admin,secret=AQDeRGdOMNL3MhAAuzvelwICjpYhLIk7IMcX2g==
parsing options: rw,name=admin,secret=AQDeRGdOMNL3MhAAuzvelwICjpYhLIk7IMcX2g==
error adding secret to kernel, key name client.admin: No such device.

改成用 cfuse 的方式就沒什麼問題.. 怪怪~ 有可能是要去更新 mount.ceph ?
root@ubuntu1104-64-5:/etc/ceph$ cfuse -m 172.16.33.5:6789 /mnt/ceph
 ** WARNING: Ceph is still under development.  Any feedback can be directed  **
 **          at ceph-devel@vger.kernel.org or http://ceph.newdream.net/.     **
cfuse[3506]: starting ceph client
cfuse[3506]: starting fuse
root@ubuntu1104-64-5:/etc/ceph$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/ubuntu1104--64--5-root
                      47328184   5929696  38994344  14% /
none                  12358244       220  12358024   1% /dev
none                  12366300         0  12366300   0% /dev/shm
none                  12366300        60  12366240   1% /var/run
none                  12366300         0  12366300   0% /var/lock
/dev/sda1               233191     45262    175488  21% /boot
/dev/mapper/ubuntu1104--64--5-lvol2
                      52428800     30244  50275500   1% /data/osd.0
/dev/mapper/ubuntu1104--64--5-lvol0
                      52428800     31272  50274416   1% /data/osd.1
cfuse                209715200   8616960 201098240   5% /mnt/ceph

單獨加一個 osd 的方法

可以參考 http://ceph.newdream.net/wiki/OSD_cluster_expansion/contraction
但事實上他有前置作業, 必須要先把 /etc/ceph/keyring.admin copy 到新機器上, 否則無法執行這些指令

root@ubuntu1104-64-6:/etc/ceph$ mkfs.btrfs /dev/mapper/ubuntu1104--64--6-lvol1
WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
fs created label (null) on /dev/mapper/ubuntu1104--64--6-lvol1
        nodesize 4096 leafsize 4096 sectorsize 4096 size 50.00GB
Btrfs Btrfs v0.19

root@ubuntu1104-64-6:/etc/ceph$ mount /dev/mapper/ubuntu1104--64--6-lvol1 /data/osd.3

root@ubuntu1104-64-6:/etc/ceph$ cosd -i 3 --mkfs --monmap /tmp/monmap --mkkey
 ** WARNING: Ceph is still under development.  Any feedback can be directed  **
 **          at ceph-devel@vger.kernel.org or http://ceph.newdream.net/.     **
2011-09-07 15:17:17.756554 7ff1ca3a4760 created object store /data/osd.3 journal /data/osd.3/journal for osd3 fsid d9dbbbfc-12ec-7d89-49cd-c91d6c598715
2011-09-07 15:17:17.756987 7ff1ca3a4760 created new key in keyring /etc/ceph/keyring.osd.3


root@ubuntu1104-64-6:/etc/ceph$ ceph auth add osd.3 osd 'allow *' mon 'allow rwx' -i /etc/ceph/keyring.osd.3
2011-09-07 15:17:55.574961 7f7d4bcc5740 read 85 bytes from /etc/ceph/keyring.osd.3
2011-09-07 15:17:55.578720 mon <- [auth,add,osd.3,osd,allow *,mon,allow rwx]
2011-09-07 15:17:55.790027 mon0 -> 'added key for osd.3' (0)


root@ubuntu1104-64-6:/etc/ceph$ ceph osd setmaxosd 4
2011-09-07 15:19:50.847283 mon <- [osd,setmaxosd,4]
2011-09-07 15:19:51.210703 mon0 -> 'set new max_osd = 4' (0)

root@ubuntu1104-64-6:/etc/ceph$ service ceph start osd.3
[/etc/ceph/fetch_config /tmp/fetched.ceph.conf.9423]
ceph.conf                                                         100% 4454     4.4KB/s   00:00
=== osd.3 ===
Mounting Btrfs on ubuntu1104-64-6:/data/osd.3
Scanning for Btrfs filesystems
Starting Ceph osd.3 on ubuntu1104-64-6...
starting osd3 at 0.0.0.0:6803/9532 osd_data /data/osd.3 /data/osd.3/journal

root@ubuntu1104-64-6:/etc/ceph$ ceph -s
2011-09-07 15:21:52.259399    pg v112: 594 pgs: 594 active+clean; 24 KB data, 65856 KB used, 191 GB / 200 GB avail
2011-09-07 15:21:52.260887   mds e4: 1/1/1 up {0=alpha=up:active}
2011-09-07 15:21:52.260924   osd e7: 4 osds: 4 up, 4 in
2011-09-07 15:21:52.260979   log 2011-09-07 15:21:48.274685 osd2 172.16.33.6:6800/9040 127 : [INF] 1.1p2 scrub ok
2011-09-07 15:21:52.261057   mon e1: 1 mons at {alpha=172.16.33.5:6789/0}


Ceph 0.34 build from source

如果想要安裝最新版的 Ceph, 而非 Ubuntu 11.04 official 的 0.24 版,
可以參考
http://ceph.newdream.net/wiki/DebianBuilding from source


# Step 1: install relative package to build source code
root@ubuntu1104-64-5:~/src$ apt-get install debhelper autotools-dev autoconf automake g++ gcc cdbs libfuse-dev libboost-dev libedit-dev libssl-dev libtool libexpat1-dev libfcgi-dev libatomic-ops-dev libgoogle-perftools-dev pkg-config libgtkmm-2.4-dev libcrypto++-dev python-dev

# Step 2: get the source code
root@ubuntu1104-64-5:~/src$ git clone git://ceph.newdream.net/git/ceph.git

# Step 3: get the stable version
root@ubuntu1104-64-5:~/src$ cd ceph
root@ubuntu1104-64-5:~/src/ceph$ git checkout -b stable origin/stable

# Step 4: Build the .deb installation package
root@ubuntu1104-64-5:~/src/ceph$ dpkg-buildpackage -j16

前置步驟要花一些時間, 好了之後到上一層可以發現 .deb 都生出來了

root@ubuntu1104-64-5:~/src$ ls
ceph                                    libceph-dev_0.34-1_amd64.deb
ceph_0.34-1_amd64.changes               librados2_0.34-1_amd64.deb
ceph_0.34-1_amd64.deb                   librados2-dbg_0.34-1_amd64.deb
ceph_0.34-1.dsc                         librados-dev_0.34-1_amd64.deb
ceph_0.34-1.tar.gz                      librbd1_0.34-1_amd64.deb
ceph-client-tools_0.34-1_amd64.deb      librbd1-dbg_0.34-1_amd64.deb
ceph-client-tools-dbg_0.34-1_amd64.deb  librbd-dev_0.34-1_amd64.deb
ceph-dbg_0.34-1_amd64.deb               librgw1_0.34-1_amd64.deb
ceph-fuse_0.34-1_amd64.deb              librgw1-dbg_0.34-1_amd64.deb
ceph-fuse-dbg_0.34-1_amd64.deb          librgw-dev_0.34-1_amd64.deb
gceph_0.34-1_amd64.deb                  obsync_0.34-1_amd64.deb
gceph-dbg_0.34-1_amd64.deb              python-ceph_0.34-1_amd64.deb
libceph1_0.34-1_amd64.deb               radosgw_0.34-1_amd64.deb
libceph1-dbg_0.34-1_amd64.deb           radosgw-dbg_0.34-1_amd64.deb

那就把所有的 .deb 都裝起來, 不過裝的過程中發現還是有些 dependency 的 package 還沒裝

root@ubuntu1104-64-5:~/src$ apt-get install libxslt1.1 python-boto python-pyxattr python-lxml
root@ubuntu1104-64-5:~/src$ dpkg -i *.deb

# 裝好之後檢查一下版本
root@ubuntu1104-64-5:~/src$ ceph --version
ceph version 0.34-4-g7a8ab74 (commit:7a8ab747addf493cb4b82351aeb3c2e07ba46a95)


整個流程還算順利, 沒有太多問題


2011/12/20 補:
後來的版本都可以用 sudo apt-get install ceph python-ceph 來安裝,但如果還是自行改 code, 則可以參考原始的作法,我在 0.39 時有試過
另一個可以參考的網頁:
http://ceph.newdream.net/wiki/Checking_out