Containerized Ceph and

Chris Armstrong / @carmstrong_afk


  • Deis resident gingeneer / team lead
  • Former SaaS builder
  • Former ops engineer

What is Deis?


  • Application Platform (PaaS)
  • Private Heroku
  • Go, Python & Shell
  • 100% open source

Philosophy


  • Focus on 12 Factor Apps
  • Release early, release often
  • Keep a stable developer workflow
  • Integrate with open source ecosystem

  • ~4000 stars, ~600 forks
  • 200+ Deis deployments daily
  • 5 full-time devs, 100+ contributors

Why PaaS?

Developer Self-Service

  • Create applications
  • Deploy code or Docker images
  • Configure runtime environment
  • Manage releases and rollbacks
  • Run admin commands
  • View aggregated logs
  • Scale via the process model
  • Collaborate with a team

Division of Responsibility

  • Developers own the containers
  • Operations own the platform

How does it work?

We've come a long way (to HA)...

Deis pre-1.0

  • No failover!
  • First: Services moving hosts (failover without data)
  • Then: Services down with host (downtime with data)

Requirements

  • Blob store for Docker registry
  • Blob store for Postgres WAL logs
  • Shared filesystem for application logs

deis-store

Why Ceph?

  • Choice of data consumption
  • Built for scale
  • Vibrant community
  • Awesome release names

deis-store

  • monitor
  • daemon
  • gateway
  • metadata
  • volume

All in containers!


FROM ubuntu:14.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -yq curl
RUN curl -sSL 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -
RUN echo "deb http://ceph.com/debian-firefly trusty main" > /etc/apt/sources.list.d/ceph.list
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install -yq ceph
					

Easy peasy


deis-store-daemon.service   ab45ad38.../172.17.8.101    loaded  active  running
deis-store-daemon.service   b1dfc947.../172.17.8.100    loaded  active  running
deis-store-daemon.service   e5ef099b.../172.17.8.102    loaded  active  running
deis-store-gateway.service  ab45ad38.../172.17.8.101    loaded  active  running
deis-store-metadata.service ab45ad38.../172.17.8.101    loaded  active  running
deis-store-metadata.service b1dfc947.../172.17.8.100    loaded  active  running
deis-store-metadata.service e5ef099b.../172.17.8.102    loaded  active  running
deis-store-monitor.service  ab45ad38.../172.17.8.101    loaded  active  running
deis-store-monitor.service  b1dfc947.../172.17.8.100    loaded  active  running
deis-store-monitor.service  e5ef099b.../172.17.8.102    loaded  active  running
deis-store-volume.service   ab45ad38.../172.17.8.101    loaded  active  running
deis-store-volume.service   b1dfc947.../172.17.8.100    loaded  active  running
deis-store-volume.service   e5ef099b.../172.17.8.102    loaded  active  running
					

...or is it?

  • Cluster bootstrapping
  • Real hostnames

Cluster bootstrapping

  • First monitor has some work to do
  • All Ceph clients need ceph.conf

First monitor to boot


etcd_set_default size ${NUM_STORES}
etcd_set_default minSize 1
etcd_set_default pgNum ${PG_NUM}
etcd_set_default delayStart 15

ceph-authtool /etc/ceph/ceph.client.admin.keyring --create-keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
ceph-authtool /etc/ceph/ceph.mon.keyring --create-keyring --gen-key -n mon. --cap mon 'allow *'
fsid=$(uuidgen)
monmaptool --create --add ${HOSTNAME} ${HOST} --fsid ${fsid} /etc/ceph/monmap

etcdctl --no-sync -C $ETCD set ${ETCD_PATH}/fsid ${fsid} >/dev/null
etcdctl --no-sync -C $ETCD set ${ETCD_PATH}/monKeyring < /etc/ceph/ceph.mon.keyring >/dev/null
etcdctl --no-sync -C $ETCD set ${ETCD_PATH}/adminKeyring < /etc/ceph/ceph.client.admin.keyring >/dev/null
echo "store-monitor: setup complete."
					

/etc/ceph/ceph.conf


				[global]
fsid = {{ .deis_store_fsid }}
mon initial members = {{ .deis_store_monSetupLock }}
mon host = {{ range $index, $mon := .deis_store_hosts }}{{ if $index }}, {{ end }}{{ $mon.Value }}{{ end }}
mon addr = {{ range $index, $mon := .deis_store_hosts }}{{ if $index }}, {{ end }}{{ Base $mon.Key }}:6789{{ end }}
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = {{ .deis_store_size }}
osd pool default min_size = {{ .deis_store_minSize }}
osd pool default pg_num = {{ .deis_store_pgNum }}
osd pool default pgp_num = {{ .deis_store_pgNum }}
osd recovery delay start = {{ .deis_store_delayStart }}
log file = /dev/stdout
					

Hostnames are important


$ docker exec -it deis-store-daemon bash
root@c4b655efa6d3:/#
					

$ docker run \
--name deis-store-daemon \
--volumes-from=deis-store-daemon-data \
--rm \
--net host \
deis/store-daemon:v1.4.1
					

deis-store-monitor


etcdctl set /deis/store/hosts/$COREOS_PRIVATE_IPV4 `hostname`
					

/etc/hosts


{{ range $key := .deis_store_hosts }}{{ Base $key.Key }}      {{ $key.Value }}
{{ end }}
					

Whew! We're done!

...or are we?

More container fun

  • Docker's btrfs data volumes
  • Dynamically resizing clusters

btrfs + Docker quirks


mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
				

$ docker run --privileged deis/store-daemon
				

mount: enabling PARALLEL journal mode: fs, checkpoint is enabled
				

Scaling events are common

  • Too many stale hosts will result in lost quorum
  • Do we really need n monitors/daemons?
  • How do we know when a host is *really* dead?
  • How do we know when the cluster is healthy again?

Node removal is involved


$ etcdctl get /deis/store/osds/172.17.8.100
2
$ nse deis-store-admin
# ceph osd out 2
marked out osd.2.
$ docker stop deis-store-daemon
$ nse deis-store-admin
# ceph osd crush remove osd.2
removed item id 2 name 'osd.2' from crush map
# ceph auth del osd.2
updated
# ceph osd rm 2
removed osd.2
$ etcdctl rm /deis/store/osds/172.17.8.100
$ etcdctl rm /deis/store/hosts/172.17.8.100
$ docker stop deis-store-monitor
$ nse deis-store-admin
# ceph mon remove deis-1
					

We still have lots of work to do...

What's next?

  • ceph-docker (thanks Seán!)
  • Automatic monitor/daemon removal
  • Performance optimization

You can help!

Thanks!

  • twitter.com/carmstrong_afk
  • github.com/carmstrong
  • chris@opdemand.com