11 vuotta sitten · 07c4eda46a
--- a/docs/sources/articles/index.rst
+++ b/docs/sources/articles/index.rst
@@ -12,3 +12,4 @@ Articles
 
				 
			
 
				    security
			
 
				    baseimages
			
 
				+   runmetrics
			
--- a/docs/sources/articles/runmetrics.rst
+++ b/docs/sources/articles/runmetrics.rst
@@ -0,0 +1,469 @@
 
				+:title: Runtime Metrics
			
 
				+:description: Measure the behavior of running containers
			
 
				+:keywords: docker, metrics, CPU, memory, disk, IO, run, runtime
			
 
				+
			
 
				+.. _run_metrics:
			
 
				+
			
 
				+
			
 
				+Runtime Metrics
			
 
				+===============
			
 
				+
			
 
				+Linux Containers rely on `control groups
			
 
				+<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>`_ which
			
 
				+not only track groups of processes, but also expose metrics about CPU,
			
 
				+memory, and block I/O usage. You can access those metrics and obtain
			
 
				+network usage metrics as well. This is relevant for "pure" LXC
			
 
				+containers, as well as for Docker containers.
			
 
				+
			
 
				+Control Groups
			
 
				+--------------
			
 
				+
			
 
				+Control groups are exposed through a pseudo-filesystem. In recent
			
 
				+distros, you should find this filesystem under
			
 
				+``/sys/fs/cgroup``. Under that directory, you will see multiple
			
 
				+sub-directories, called devices, freezer, blkio, etc.; each
			
 
				+sub-directory actually corresponds to a different cgroup hierarchy.
			
 
				+
			
 
				+On older systems, the control groups might be mounted on ``/cgroup``,
			
 
				+without distinct hierarchies. In that case, instead of seeing the
			
 
				+sub-directories, you will see a bunch of files in that directory, and
			
 
				+possibly some directories corresponding to existing containers.
			
 
				+
			
 
				+To figure out where your control groups are mounted, you can run:
			
 
				+
			
 
				+::
			
 
				+
			
 
				+  grep cgroup /proc/mounts
			
 
				+
			
 
				+.. _run_findpid:
			
 
				+
			
 
				+Ennumerating Cgroups
			
 
				+--------------------
			
 
				+
			
 
				+You can look into ``/proc/cgroups`` to see the different control group
			
 
				+subsystems known to the system, the hierarchy they belong to, and how
			
 
				+many groups they contain.
			
 
				+
			
 
				+You can also look at ``/proc/<pid>/cgroup`` to see which control
			
 
				+groups a process belongs to. The control group will be shown as a path
			
 
				+relative to the root of the hierarchy mountpoint; e.g. ``/`` means
			
 
				+“this process has not been assigned into a particular group”, while
			
 
				+``/lxc/pumpkin`` means that the process is likely to be a member of a
			
 
				+container named ``pumpkin``.
			
 
				+
			
 
				+Finding the Cgroup for a Given Container
			
 
				+----------------------------------------
			
 
				+
			
 
				+For each container, one cgroup will be created in each hierarchy. On
			
 
				+older systems with older versions of the LXC userland tools, the name
			
 
				+of the cgroup will be the name of the container. With more recent
			
 
				+versions of the LXC tools, the cgroup will be ``lxc/<container_name>.``
			
 
				+
			
 
				+For Docker containers using cgroups, the container name will be the
			
 
				+full ID or long ID of the container. If a container shows up as
			
 
				+ae836c95b4c3 in ``docker ps``, its long ID might be something like
			
 
				+``ae836c95b4c3c9e9179e0e91015512da89fdec91612f63cebae57df9a5444c79``. You
			
 
				+can look it up with ``docker inspect`` or ``docker ps -notrunc``.
			
 
				+
			
 
				+Putting everything together to look at the memory metrics for a Docker
			
 
				+container, take a look at ``/sys/fs/cgroup/memory/lxc/<longid>/``.
			
 
				+
			
 
				+Metrics from Cgroups: Memory, CPU, Block IO
			
 
				+-------------------------------------------
			
 
				+
			
 
				+For each subsystem (memory, cpu, and block i/o), you will find one or
			
 
				+more pseudo-files containing statistics.
			
 
				+
			
 
				+Memory Metrics: ``memory.stat``
			
 
				+...............................
			
 
				+
			
 
				+Memory metrics are found in the "memory" cgroup. Note that the memory
			
 
				+control group adds a little overhead, because it does very
			
 
				+fine-grained accounting of the memory usage on your system. Therefore,
			
 
				+many distros chose to not enable it by default. Generally, to enable
			
 
				+it, all you have to do is to add some kernel command-line parameters:
			
 
				+``cgroup_enable=memory swapaccount=1``.
			
 
				+
			
 
				+The metrics are in the pseudo-file ``memory.stat``. Here is what it
			
 
				+will look like:
			
 
				+
			
 
				+::
			
 
				+
			
 
				+  cache 11492564992
			
 
				+  rss 1930993664
			
 
				+  mapped_file 306728960
			
 
				+  pgpgin 406632648
			
 
				+  pgpgout 403355412
			
 
				+  swap 0
			
 
				+  pgfault 728281223
			
 
				+  pgmajfault 1724
			
 
				+  inactive_anon 46608384
			
 
				+  active_anon 1884520448
			
 
				+  inactive_file 7003344896
			
 
				+  active_file 4489052160
			
 
				+  unevictable 32768
			
 
				+  hierarchical_memory_limit 9223372036854775807
			
 
				+  hierarchical_memsw_limit 9223372036854775807
			
 
				+  total_cache 11492564992
			
 
				+  total_rss 1930993664
			
 
				+  total_mapped_file 306728960
			
 
				+  total_pgpgin 406632648
			
 
				+  total_pgpgout 403355412
			
 
				+  total_swap 0
			
 
				+  total_pgfault 728281223
			
 
				+  total_pgmajfault 1724
			
 
				+  total_inactive_anon 46608384
			
 
				+  total_active_anon 1884520448
			
 
				+  total_inactive_file 7003344896
			
 
				+  total_active_file 4489052160
			
 
				+  total_unevictable 32768
			
 
				+
			
 
				+The first half (without the ``total_`` prefix) contains statistics
			
 
				+relevant to the processes within the cgroup, excluding
			
 
				+sub-cgroups. The second half (with the ``total_`` prefix) includes
			
 
				+sub-cgroups as well.
			
 
				+
			
 
				+Some metrics are "gauges", i.e. values that can increase or decrease
			
 
				+(e.g. swap, the amount of swap space used by the members of the
			
 
				+cgroup). Some others are "counters", i.e. values that can only go up,
			
 
				+because they represent occurrences of a specific event (e.g. pgfault,
			
 
				+which indicates the number of page faults which happened since the
			
 
				+creation of the cgroup; this number can never decrease).
			
 
				+
			
 
				+cache 
			
 
				+  the amount of memory used by the processes of this control group
			
 
				+  that can be associated precisely with a block on a block
			
 
				+  device. When you read and write files from and to disk, this amount
			
 
				+  will increase. This will be the case if you use "conventional" I/O
			
 
				+  (``open``, ``read``, ``write`` syscalls) as well as mapped files
			
 
				+  (with ``mmap``). It also accounts for the memory used by ``tmpfs``
			
 
				+  mounts, though the reasons are unclear.
			
 
				+
			
 
				+rss 
			
 
				+  the amount of memory that *doesn't* correspond to anything on
			
 
				+  disk: stacks, heaps, and anonymous memory maps.
			
 
				+
			
 
				+mapped_file 
			
 
				+  indicates the amount of memory mapped by the processes in the
			
 
				+  control group. It doesn't give you information about *how much*
			
 
				+  memory is used; it rather tells you *how* it is used.
			
 
				+
			
 
				+pgpgin and pgpgout
			
 
				+  correspond to *charging events*. Each time a page is "charged"
			
 
				+  (=added to the accounting) to a cgroup, pgpgin increases. When a
			
 
				+  page is "uncharged" (=no longer "billed" to a cgroup), pgpgout
			
 
				+  increases.
			
 
				+
			
 
				+pgfault and pgmajfault 
			
 
				+  indicate the number of times that a process of the cgroup triggered
			
 
				+  a "page fault" and a "major fault", respectively. A page fault
			
 
				+  happens when a process accesses a part of its virtual memory space
			
 
				+  which is inexistent or protected. The former can happen if the
			
 
				+  process is buggy and tries to access an invalid address (it will
			
 
				+  then be sent a ``SIGSEGV`` signal, typically killing it with the
			
 
				+  famous ``Segmentation fault`` message). The latter can happen when
			
 
				+  the process reads from a memory zone which has been swapped out, or
			
 
				+  which corresponds to a mapped file: in that case, the kernel will
			
 
				+  load the page from disk, and let the CPU complete the memory
			
 
				+  access. It can also happen when the process writes to a
			
 
				+  copy-on-write memory zone: likewise, the kernel will preempt the
			
 
				+  process, duplicate the memory page, and resume the write operation
			
 
				+  on the process' own copy of the page. "Major" faults happen when the
			
 
				+  kernel actually has to read the data from disk. When it just has to
			
 
				+  duplicate an existing page, or allocate an empty page, it's a
			
 
				+  regular (or "minor") fault.
			
 
				+
			
 
				+swap 
			
 
				+  the amount of swap currently used by the processes in this cgroup.
			
 
				+
			
 
				+active_anon and inactive_anon
			
 
				+  the amount of *anonymous* memory that has been identified has
			
 
				+  respectively *active* and *inactive* by the kernel. "Anonymous"
			
 
				+  memory is the memory that is *not* linked to disk pages. In other
			
 
				+  words, that's the equivalent of the rss counter described above. In
			
 
				+  fact, the very definition of the rss counter is **active_anon** +
			
 
				+  **inactive_anon** - **tmpfs** (where tmpfs is the amount of memory
			
 
				+  used up by ``tmpfs`` filesystems mounted by this control
			
 
				+  group). Now, what's the difference between "active" and "inactive"?
			
 
				+  Pages are initially "active"; and at regular intervals, the kernel
			
 
				+  sweeps over the memory, and tags some pages as "inactive". Whenever
			
 
				+  they are accessed again, they are immediately retagged
			
 
				+  "active". When the kernel is almost out of memory, and time comes to
			
 
				+  swap out to disk, the kernel will swap "inactive" pages.
			
 
				+
			
 
				+active_file and inactive_file
			
 
				+  cache memory, with *active* and *inactive* similar to the *anon*
			
 
				+  memory above. The exact formula is cache = **active_file** +
			
 
				+  **inactive_file** + **tmpfs**. The exact rules used by the kernel to
			
 
				+  move memory pages between active and inactive sets are different
			
 
				+  from the ones used for anonymous memory, but the general principle
			
 
				+  is the same. Note that when the kernel needs to reclaim memory, it
			
 
				+  is cheaper to reclaim a clean (=non modified) page from this pool,
			
 
				+  since it can be reclaimed immediately (while anonymous pages and
			
 
				+  dirty/modified pages have to be written to disk first).
			
 
				+
			
 
				+unevictable
			
 
				+  the amount of memory that cannot be reclaimed; generally, it will
			
 
				+  account for memory that has been "locked" with ``mlock``. It is
			
 
				+  often used by crypto frameworks to make sure that secret keys and
			
 
				+  other sensitive material never gets swapped out to disk.
			
 
				+
			
 
				+memory and memsw limits
			
 
				+  These are not really metrics, but a reminder of the limits applied
			
 
				+  to this cgroup. The first one indicates the maximum amount of
			
 
				+  physical memory that can be used by the processes of this control
			
 
				+  group; the second one indicates the maximum amount of RAM+swap.
			
 
				+
			
 
				+Accounting for memory in the page cache is very complex. If two
			
 
				+processes in different control groups both read the same file
			
 
				+(ultimately relying on the same blocks on disk), the corresponding
			
 
				+memory charge will be split between the control groups. It's nice, but
			
 
				+it also means that when a cgroup is terminated, it could increase the
			
 
				+memory usage of another cgroup, because they are not splitting the
			
 
				+cost anymore for those memory pages.
			
 
				+
			
 
				+CPU metrics: ``cpuacct.stat``
			
 
				+.............................
			
 
				+
			
 
				+Now that we've covered memory metrics, everything else will look very
			
 
				+simple in comparison. CPU metrics will be found in the ``cpuacct``
			
 
				+controller.
			
 
				+
			
 
				+For each container, you will find a pseudo-file ``cpuacct.stat``,
			
 
				+containing the CPU usage accumulated by the processes of the
			
 
				+container, broken down between ``user`` and ``system`` time. If you're
			
 
				+not familiar with the distinction, ``user`` is the time during which
			
 
				+the processes were in direct control of the CPU (i.e. executing
			
 
				+process code), and ``system`` is the time during which the CPU was
			
 
				+executing system calls on behalf of those processes.
			
 
				+
			
 
				+Those times are expressed in ticks of 1/100th of second. Actually,
			
 
				+they are expressed in "user jiffies". There are ``USER_HZ``
			
 
				+*"jiffies"* per second, and on x86 systems, ``USER_HZ`` is 100. This
			
 
				+used to map exactly to the number of scheduler "ticks" per second; but
			
 
				+with the advent of higher frequency scheduling, as well as `tickless
			
 
				+kernels <http://lwn.net/Articles/549580/>`_, the number of kernel
			
 
				+ticks wasn't relevant anymore. It stuck around anyway, mainly for
			
 
				+legacy and compatibility reasons.
			
 
				+
			
 
				+Block I/O metrics
			
 
				+.................
			
 
				+
			
 
				+Block I/O is accounted in the ``blkio`` controller. Different metrics
			
 
				+are scattered across different files. While you can find in-depth
			
 
				+details in the `blkio-controller
			
 
				+<https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt>`_
			
 
				+file in the kernel documentation, here is a short list of the most
			
 
				+relevant ones:
			
 
				+
			
 
				+blkio.sectors 
			
 
				+  contain the number of 512-bytes sectors read and written by the
			
 
				+  processes member of the cgroup, device by device. Reads and writes
			
 
				+  are merged in a single counter.
			
 
				+
			
 
				+blkio.io_service_bytes 
			
 
				+  indicates the number of bytes read and written by the cgroup. It has
			
 
				+  4 counters per device, because for each device, it differentiates
			
 
				+  between synchronous vs. asynchronous I/O, and reads vs. writes.
			
 
				+
			
 
				+blkio.io_serviced
			
 
				+  the number of I/O operations performed, regardless of their size. It
			
 
				+  also has 4 counters per device.
			
 
				+
			
 
				+blkio.io_queued 
			
 
				+  indicates the number of I/O operations currently queued for this
			
 
				+  cgroup. In other words, if the cgroup isn't doing any I/O, this will
			
 
				+  be zero. Note that the opposite is not true. In other words, if
			
 
				+  there is no I/O queued, it does not mean that the cgroup is idle
			
 
				+  (I/O-wise). It could be doing purely synchronous reads on an
			
 
				+  otherwise quiescent device, which is therefore able to handle them
			
 
				+  immediately, without queuing. Also, while it is helpful to figure
			
 
				+  out which cgroup is putting stress on the I/O subsystem, keep in
			
 
				+  mind that is is a relative quantity. Even if a process group does
			
 
				+  not perform more I/O, its queue size can increase just because the
			
 
				+  device load increases because of other devices.
			
 
				+
			
 
				+Network Metrics
			
 
				+---------------
			
 
				+
			
 
				+Network metrics are not exposed directly by control groups. There is a
			
 
				+good explanation for that: network interfaces exist within the context
			
 
				+of *network namespaces*. The kernel could probably accumulate metrics
			
 
				+about packets and bytes sent and received by a group of processes, but
			
 
				+those metrics wouldn't be very useful. You want per-interface metrics
			
 
				+(because traffic happening on the local ``lo`` interface doesn't
			
 
				+really count). But since processes in a single cgroup can belong to
			
 
				+multiple network namespaces, those metrics would be harder to
			
 
				+interpret: multiple network namespaces means multiple ``lo``
			
 
				+interfaces, potentially multiple ``eth0`` interfaces, etc.; so this is
			
 
				+why there is no easy way to gather network metrics with control
			
 
				+groups.
			
 
				+
			
 
				+Instead we can gather network metrics from other sources:
			
 
				+
			
 
				+IPtables
			
 
				+........
			
 
				+
			
 
				+IPtables (or rather, the netfilter framework for which iptables is
			
 
				+just an interface) can do some serious accounting.
			
 
				+
			
 
				+For instance, you can setup a rule to account for the outbound HTTP
			
 
				+traffic on a web server:
			
 
				+
			
 
				+::
			
 
				+
			
 
				+  iptables -I OUTPUT -p tcp --sport 80
			
 
				+
			
 
				+
			
 
				+There is no ``-j`` or ``-g`` flag, so the rule will just count matched
			
 
				+packets and go to the following rule.
			
 
				+
			
 
				+Later, you can check the values of the counters, with:
			
 
				+
			
 
				+::
			
 
				+
			
 
				+   iptables -nxvL OUTPUT
			
 
				+
			
 
				+Technically, ``-n`` is not required, but it will prevent iptables from
			
 
				+doing DNS reverse lookups, which are probably useless in this
			
 
				+scenario.
			
 
				+
			
 
				+Counters include packets and bytes. If you want to setup metrics for
			
 
				+container traffic like this, you could execute a ``for`` loop to add
			
 
				+two ``iptables`` rules per container IP address (one in each
			
 
				+direction), in the ``FORWARD`` chain. This will only meter traffic
			
 
				+going through the NAT layer; you will also have to add traffic going
			
 
				+through the userland proxy.
			
 
				+
			
 
				+Then, you will need to check those counters on a regular basis. If you
			
 
				+happen to use ``collectd``, there is a nice plugin to automate
			
 
				+iptables counters collection.
			
 
				+
			
 
				+Interface-level counters
			
 
				+........................
			
 
				+
			
 
				+Since each container has a virtual Ethernet interface, you might want
			
 
				+to check directly the TX and RX counters of this interface. You will
			
 
				+notice that each container is associated to a virtual Ethernet
			
 
				+interface in your host, with a name like ``vethKk8Zqi``. Figuring out
			
 
				+which interface corresponds to which container is, unfortunately,
			
 
				+difficult.
			
 
				+
			
 
				+But for now, the best way is to check the metrics *from within the
			
 
				+containers*. To accomplish this, you can run an executable from the
			
 
				+host environment within the network namespace of a container using
			
 
				+**ip-netns magic**.
			
 
				+
			
 
				+The ``ip-netns exec`` command will let you execute any program
			
 
				+(present in the host system) within any network namespace visible to
			
 
				+the current process. This means that your host will be able to enter
			
 
				+the network namespace of your containers, but your containers won't be
			
 
				+able to access the host, nor their sibling containers. Containers will
			
 
				+be able to “see” and affect their sub-containers, though.
			
 
				+
			
 
				+The exact format of the command is::
			
 
				+
			
 
				+  ip netns exec <nsname> <command...>
			
 
				+
			
 
				+For example::
			
 
				+
			
 
				+  ip netns exec mycontainer netstat -i
			
 
				+
			
 
				+``ip netns`` finds the "mycontainer" container by using namespaces
			
 
				+pseudo-files. Each process belongs to one network namespace, one PID
			
 
				+namespace, one ``mnt`` namespace, etc., and those namespaces are
			
 
				+materialized under ``/proc/<pid>/ns/``. For example, the network
			
 
				+namespace of PID 42 is materialized by the pseudo-file
			
 
				+``/proc/42/ns/net``.
			
 
				+
			
 
				+When you run ``ip netns exec mycontainer ...``, it expects
			
 
				+``/var/run/netns/mycontainer`` to be one of those
			
 
				+pseudo-files. (Symlinks are accepted.)
			
 
				+
			
 
				+In other words, to execute a command within the network namespace of a
			
 
				+container, we need to:
			
 
				+
			
 
				+* find out the PID of any process within the container that we want to
			
 
				+  investigate;
			
 
				+* create a symlink from ``/var/run/netns/<somename>`` to
			
 
				+  ``/proc/<thepid>/ns/net``
			
 
				+* execute ``ip netns exec <somename> ....``
			
 
				+
			
 
				+Please review :ref:`run_findpid` to learn how to find the cgroup of a
			
 
				+pprocess running in the container of which you want to measure network
			
 
				+usage. From there, you can examine the pseudo-file named ``tasks``,
			
 
				+which containes the PIDs that are in the control group (i.e. in the
			
 
				+container). Pick any one of them.
			
 
				+
			
 
				+Putting everything together, if the "short ID" of a container is held
			
 
				+in the environment variable ``$CID``, then you can do this::
			
 
				+
			
 
				+  TASKS=/sys/fs/cgroup/devices/$CID*/tasks
			
 
				+  PID=$(head -n 1 $TASKS)
			
 
				+  mkdir -p /var/run/netns
			
 
				+  ln -sf /proc/$PID/ns/net /var/run/netns/$CID
			
 
				+  ip netns exec $CID netstat -i
			
 
				+
			
 
				+
			
 
				+Tips for high-performance metric collection
			
 
				+-------------------------------------------
			
 
				+
			
 
				+Note that running a new process each time you want to update metrics
			
 
				+is (relatively) expensive. If you want to collect metrics at high
			
 
				+resolutions, and/or over a large number of containers (think 1000
			
 
				+containers on a single host), you do not want to fork a new process
			
 
				+each time.
			
 
				+
			
 
				+Here is how to collect metrics from a single process. You will have to
			
 
				+write your metric collector in C (or any language that lets you do
			
 
				+low-level system calls). You need to use a special system call,
			
 
				+``setns()``, which lets the current process enter any arbitrary
			
 
				+namespace. It requires, however, an open file descriptor to the
			
 
				+namespace pseudo-file (remember: that’s the pseudo-file in
			
 
				+``/proc/<pid>/ns/net``).
			
 
				+
			
 
				+However, there is a catch: you must not keep this file descriptor
			
 
				+open. If you do, when the last process of the control group exits, the
			
 
				+namespace will not be destroyed, and its network resources (like the
			
 
				+virtual interface of the container) will stay around for ever (or
			
 
				+until you close that file descriptor).
			
 
				+
			
 
				+The right approach would be to keep track of the first PID of each
			
 
				+container, and re-open the namespace pseudo-file each time.
			
 
				+
			
 
				+Collecting metrics when a container exits 
			
 
				+-----------------------------------------
			
 
				+
			
 
				+Sometimes, you do not care about real time metric collection, but when
			
 
				+a container exits, you want to know how much CPU, memory, etc. it has
			
 
				+used.
			
 
				+
			
 
				+Docker makes this difficult because it relies on ``lxc-start``, which
			
 
				+carefully cleans up after itself, but it is still possible. It is
			
 
				+usually easier to collect metrics at regular intervals (e.g. every
			
 
				+minute, with the collectd LXC plugin) and rely on that instead.
			
 
				+
			
 
				+But, if you'd still like to gather the stats when a container stops,
			
 
				+here is how:
			
 
				+
			
 
				+For each container, start a collection process, and move it to the
			
 
				+control groups that you want to monitor by writing its PID to the
			
 
				+tasks file of the cgroup. The collection process should periodically
			
 
				+re-read the tasks file to check if it's the last process of the
			
 
				+control group. (If you also want to collect network statistics as
			
 
				+explained in the previous section, you should also move the process to
			
 
				+the appropriate network namespace.)
			
 
				+
			
 
				+When the container exits, ``lxc-start`` will try to delete the control
			
 
				+groups. It will fail, since the control group is still in use; but
			
 
				+that’s fine. You process should now detect that it is the only one
			
 
				+remaining in the group. Now is the right time to collect all the
			
 
				+metrics you need!
			
 
				+
			
 
				+Finally, your process should move itself back to the root control
			
 
				+group, and remove the container control group. To remove a control
			
 
				+group, just ``rmdir`` its directory. It's counter-intuitive to
			
 
				+``rmdir`` a directory as it still contains files; but remember that
			
 
				+this is a pseudo-filesystem, so usual rules don't apply. After the
			
 
				+cleanup is done, the collection process can exit safely.
			
 
				+
			
--- a/docs/sources/reference/builder.rst
+++ b/docs/sources/reference/builder.rst
@@ -1,12 +1,12 @@
 
				-:title: Build Images (Dockerfile Reference)
			
 
				+:title: Dockerfile Reference
			
 
				 :description: Dockerfiles use a simple DSL which allows you to automate the steps you would normally manually take to create an image.
			
 
				 :keywords: builder, docker, Dockerfile, automation, image creation
			
 
				 
			
 
				 .. _dockerbuilder:
			
 
				 
			
 
				-===================================
			
 
				-Build Images (Dockerfile Reference)
			
 
				-===================================
			
 
				+====================
			
 
				+Dockerfile Reference
			
 
				+====================
			
 
				 
			
 
				 **Docker can act as a builder** and read instructions from a text
			
 
				 ``Dockerfile`` to automate the steps you would otherwise take manually
			
--- a/docs/sources/reference/commandline/cli.rst
+++ b/docs/sources/reference/commandline/cli.rst
@@ -18,6 +18,45 @@ To list available commands, either run ``docker`` with no parameters or execute
 
				 
			
 
				     ...
			
 
				 
			
 
				+.. _cli_options:
			
 
				+
			
 
				+Types of Options
			
 
				+----------------
			
 
				+
			
 
				+Boolean
			
 
				+~~~~~~~
			
 
				+
			
 
				+Boolean options look like ``-d=false``. The value you see is the
			
 
				+default value which gets set if you do **not** use the boolean
			
 
				+flag. If you do call ``run -d``, that sets the opposite boolean value,
			
 
				+so in this case, ``true``, and so ``docker run -d`` **will** run in
			
 
				+"detached" mode, in the background. Other boolean options are similar
			
 
				+-- specifying them will set the value to the opposite of the default
			
 
				+value.
			
 
				+
			
 
				+Multi
			
 
				+~~~~~
			
 
				+
			
 
				+Options like ``-a=[]`` indicate they can be specified multiple times::
			
 
				+
			
 
				+  docker run -a stdin -a stdout -a stderr -i -t ubuntu /bin/bash
			
 
				+
			
 
				+Sometimes this can use a more complex value string, as for ``-v``::
			
 
				+
			
 
				+  docker run -v /host:/container example/mysql
			
 
				+
			
 
				+Strings and Integers
			
 
				+~~~~~~~~~~~~~~~~~~~~
			
 
				+
			
 
				+Options like ``-name=""`` expect a string, and they can only be
			
 
				+specified once. Options like ``-c=0`` expect an integer, and they can
			
 
				+only be specified once.
			
 
				+
			
 
				+----
			
 
				+
			
 
				+Commands
			
 
				+--------
			
 
				+
			
 
				 .. _cli_daemon:
			
 
				 
			
 
				 ``daemon``
			
--- a/docs/sources/reference/index.rst
+++ b/docs/sources/reference/index.rst
@@ -14,4 +14,5 @@ Contents:
 
				 
			
 
				    commandline/index
			
 
				    builder
			
 
				+   run
			
 
				    api/index
			
--- a/docs/sources/reference/run.rst
+++ b/docs/sources/reference/run.rst
@@ -0,0 +1,353 @@
 
				+:title: Docker Run Reference 
			
 
				+:description: Configure containers at runtime
			
 
				+:keywords: docker, run, configure, runtime
			
 
				+
			
 
				+.. _run_docker:
			
 
				+
			
 
				+====================
			
 
				+Docker Run Reference
			
 
				+====================
			
 
				+
			
 
				+**Docker runs processes in isolated containers**.  When an operator
			
 
				+executes ``docker run``, she starts a process with its own file
			
 
				+system, its own networking, and its own isolated process tree. The
			
 
				+:ref:`image_def` which starts the process may define defaults related
			
 
				+to the binary to run, the networking to expose, and more, but ``docker
			
 
				+run`` gives final control to the operator who starts the container
			
 
				+from the image. That's the main reason :ref:`cli_run` has more options
			
 
				+than any other ``docker`` command.
			
 
				+
			
 
				+Every one of the :ref:`example_list` shows running containers, and so
			
 
				+here we try to give more in-depth guidance.
			
 
				+
			
 
				+.. contents:: Table of Contents
			
 
				+
			
 
				+.. _run_running:
			
 
				+
			
 
				+General Form
			
 
				+============
			
 
				+
			
 
				+As you've seen in the :ref:`example_list`, the basic `run` command
			
 
				+takes this form::
			
 
				+
			
 
				+  docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]
			
 
				+
			
 
				+To learn how to interpret the types of ``[OPTIONS]``, see
			
 
				+:ref:`cli_options`.
			
 
				+
			
 
				+The list of ``[OPTIONS]`` breaks down into two groups: 
			
 
				+
			
 
				+* options that define the runtime behavior or environment, and 
			
 
				+* options that override image defaults. 
			
 
				+
			
 
				+Since image defaults usually get set in :ref:`Dockerfiles
			
 
				+<dockerbuilder>` (though they could also be set at :ref:`cli_commit`
			
 
				+time too), we will group the runtime options here by their related
			
 
				+Dockerfile commands so that it is easier to see how to override image
			
 
				+defaults and set new behavior.
			
 
				+
			
 
				+We'll start, though, with the options that are unique to ``docker
			
 
				+run``, the options which define the runtime behavior or the container
			
 
				+environment.
			
 
				+
			
 
				+.. note:: The runtime operator always has final control over the
			
 
				+   behavior of a Docker container.
			
 
				+
			
 
				+Detached or Foreground
			
 
				+======================
			
 
				+
			
 
				+When starting a Docker container, you must first decide if you want to
			
 
				+run the container in the background in a "detached" mode or in the
			
 
				+default foreground mode::
			
 
				+
			
 
				+   -d=false: Detached mode: Run container in the background, print new container id
			
 
				+
			
 
				+Detached (-d)
			
 
				+.............
			
 
				+
			
 
				+In detached mode (``-d=true`` or just ``-d``), all IO should be done
			
 
				+through network connections or shared volumes because the container is
			
 
				+no longer listening to the commandline where you executed ``docker
			
 
				+run``. You can reattach to a detached container with ``docker``
			
 
				+:ref:`cli_attach`. If you choose to run a container in the detached
			
 
				+mode, then you cannot use the ``-rm`` option.
			
 
				+
			
 
				+Foreground
			
 
				+..........
			
 
				+
			
 
				+In foreground mode (the default when ``-d`` is not specified),
			
 
				+``docker run`` can start the process in the container and attach the
			
 
				+console to the process's standard input, output, and standard
			
 
				+error. It can even pretend to be a TTY (this is what most commandline
			
 
				+executables expect) and pass along signals. All of that is
			
 
				+configurable::
			
 
				+
			
 
				+   -a=[]          : Attach to stdin, stdout and/or stderr
			
 
				+   -t=false       : Allocate a pseudo-tty
			
 
				+   -sig-proxy=true: Proxify all received signal to the process (even in non-tty mode)
			
 
				+   -i=false       : Keep stdin open even if not attached
			
 
				+
			
 
				+If you do not specify ``-a`` then Docker will `attach everything
			
 
				+(stdin,stdout,stderr)
			
 
				+<https://github.com/dotcloud/docker/blob/master/commands.go#L1797>`_. You
			
 
				+can specify which of the three standard streams (stdin, stdout,
			
 
				+stderr) you'd like to connect between your  instead, as in::
			
 
				+
			
 
				+   docker run -a stdin -a stdout -i -t ubuntu /bin/bash
			
 
				+
			
 
				+For interactive processes (like a shell) you will typically want a tty
			
 
				+as well as persistent standard in, so you'll use ``-i -t`` together in
			
 
				+most interactive cases.
			
 
				+
			
 
				+Clean Up (-rm)
			
 
				+--------------
			
 
				+
			
 
				+By default a container's file system persists even after the container
			
 
				+exits. This makes debugging a lot easier (since you can inspect the
			
 
				+final state) and you retain all your data by default. But if you are
			
 
				+running short-term **foreground** processes, these container file
			
 
				+systems can really pile up. If instead you'd like Docker to
			
 
				+**automatically clean up the container and remove the file system when
			
 
				+the container exits**, you can add the ``-rm`` flag::
			
 
				+
			
 
				+   -rm=false: Automatically remove the container when it exits (incompatible with -d)
			
 
				+
			
 
				+Name (-name)
			
 
				+============
			
 
				+
			
 
				+The operator can identify a container in three ways:
			
 
				+
			
 
				+* UUID long identifier ("f78375b1c487e03c9438c729345e54db9d20cfa2ac1fc3494b6eb60872e74778")
			
 
				+* UUID short identifier ("f78375b1c487")
			
 
				+* name ("evil_ptolemy")
			
 
				+
			
 
				+The UUID identifiers come from the Docker daemon, and if you do not
			
 
				+assign a name to the container with ``-name`` then the daemon will
			
 
				+also generate a random string name too. The name can become a handy
			
 
				+way to add meaning to a container since you can use this name when
			
 
				+defining :ref:`links <working_with_links_names>` (or any other place
			
 
				+you need to identify a container). This works for both background and
			
 
				+foreground Docker containers.
			
 
				+
			
 
				+PID Equivalent
			
 
				+==============
			
 
				+
			
 
				+And finally, to help with automation, you can have Docker write the
			
 
				+container id out to a file of your choosing. This is similar to how
			
 
				+some programs might write out their process ID to a file (you've seen
			
 
				+them as .pid files)::
			
 
				+
			
 
				+      -cidfile="": Write the container ID to the file
			
 
				+
			
 
				+Overriding Dockerfile Image Defaults
			
 
				+====================================
			
 
				+
			
 
				+When a developer builds an image from a :ref:`Dockerfile
			
 
				+<dockerbuilder>` or when she commits it, the developer can set a
			
 
				+number of default parameters that take effect when the image starts up
			
 
				+as a container.
			
 
				+
			
 
				+Four of the Dockerfile commands cannot be overridden at runtime:
			
 
				+``FROM, MAINTAINER, RUN``, and ``ADD``. Everything else has a
			
 
				+corresponding override in ``docker run``. We'll go through what the
			
 
				+developer might have set in each Dockerfile instruction and how the
			
 
				+operator can override that setting.
			
 
				+
			
 
				+
			
 
				+CMD
			
 
				+...
			
 
				+
			
 
				+Remember the optional ``COMMAND`` in the Docker commandline::
			
 
				+
			
 
				+  docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]
			
 
				+
			
 
				+This command is optional because the person who created the ``IMAGE``
			
 
				+may have already provided a default ``COMMAND`` using the Dockerfile
			
 
				+``CMD``. As the operator (the person running a container from the
			
 
				+image), you can override that ``CMD`` just by specifying a new
			
 
				+``COMMAND``.
			
 
				+
			
 
				+If the image also specifies an ``ENTRYPOINT`` then the ``CMD`` or
			
 
				+``COMMAND`` get appended as arguments to the ``ENTRYPOINT``.
			
 
				+
			
 
				+
			
 
				+ENTRYPOINT
			
 
				+..........
			
 
				+
			
 
				+::
			
 
				+
			
 
				+   -entrypoint="": Overwrite the default entrypoint set by the image
			
 
				+
			
 
				+The ENTRYPOINT of an image is similar to a COMMAND because it
			
 
				+specifies what executable to run when the container starts, but it is
			
 
				+(purposely) more difficult to override. The ENTRYPOINT gives a
			
 
				+container its default nature or behavior, so that when you set an
			
 
				+ENTRYPOINT you can run the container *as if it were that binary*,
			
 
				+complete with default options, and you can pass in more options via
			
 
				+the COMMAND. But, sometimes an operator may want to run something else
			
 
				+inside the container, so you can override the default ENTRYPOINT at
			
 
				+runtime by using a string to specify the new ENTRYPOINT. Here is an
			
 
				+example of how to run a shell in a container that has been set up to
			
 
				+automatically run something else (like ``/usr/bin/redis-server``)::
			
 
				+
			
 
				+  docker run -i -t -entrypoint /bin/bash example/redis
			
 
				+
			
 
				+or two examples of how to pass more parameters to that ENTRYPOINT::
			
 
				+
			
 
				+  docker run -i -t -entrypoint /bin/bash example/redis -c ls -l
			
 
				+  docker run -i -t -entrypoint /usr/bin/redis-cli example/redis --help
			
 
				+
			
 
				+
			
 
				+EXPOSE (``run`` Networking Options)
			
 
				+...................................
			
 
				+
			
 
				+The *Dockerfile* doesn't give much control over networking, only
			
 
				+providing the EXPOSE instruction to give a hint to the operator about
			
 
				+what incoming ports might provide services. At runtime, however,
			
 
				+Docker provides a number of ``run`` options related to networking::
			
 
				+
			
 
				+   -n=true   : Enable networking for this container
			
 
				+   -dns=[]   : Set custom dns servers for the container
			
 
				+   -expose=[]: Expose a port from the container 
			
 
				+               without publishing it to your host
			
 
				+   -P=false  : Publish all exposed ports to the host interfaces
			
 
				+   -p=[]     : Publish a container's port to the host (format: 
			
 
				+               ip:hostPort:containerPort | ip::containerPort | 
			
 
				+               hostPort:containerPort) 
			
 
				+               (use 'docker port' to see the actual mapping)
			
 
				+   -link=""  : Add link to another container (name:alias)
			
 
				+
			
 
				+By default, all containers have networking enabled and they can make
			
 
				+any outgoing connections. The operator can completely disable
			
 
				+networking with ``run -n`` which disables all incoming and outgoing
			
 
				+networking. In cases like this, you would perform IO through files or
			
 
				+stdin/stdout only.
			
 
				+
			
 
				+Your container will use the same DNS servers as the host by default,
			
 
				+but you can override this with ``-dns``.
			
 
				+
			
 
				+As mentioned previously, ``EXPOSE`` (and ``-expose``) make a port
			
 
				+available **in** a container for incoming connections. The port number
			
 
				+on the inside of the container (where the service listens) does not
			
 
				+need to be the same number as the port exposed on the outside of the
			
 
				+container (where clients connect), so inside the container you might
			
 
				+have an HTTP service listening on port 80 (and so you ``EXPOSE 80`` in
			
 
				+the Dockerfile), but outside the container the port might be 42800.
			
 
				+
			
 
				+To help a new client container reach the server container's internal
			
 
				+port operator ``-expose'd`` by the operator or ``EXPOSE'd`` by the
			
 
				+developer, the operator has three choices: start the server container
			
 
				+with ``-P`` or ``-p,`` or start the client container with ``-link``.
			
 
				+
			
 
				+If the operator uses ``-P`` or ``-p`` then Docker will make the
			
 
				+exposed port accessible on the host and the ports will be available to
			
 
				+any client that can reach the host. To find the map between the host
			
 
				+ports and the exposed ports, use ``docker port``)
			
 
				+
			
 
				+If the operator uses ``-link`` when starting the new client container,
			
 
				+then the client container can access the exposed port via a private
			
 
				+networking interface. Docker will set some environment variables in
			
 
				+the client container to help indicate which interface and port to use.
			
 
				+
			
 
				+ENV (Environment Variables)
			
 
				+...........................
			
 
				+
			
 
				+The operator can **set any environment variable** in the container by
			
 
				+using one or more ``-e``, even overriding those already defined by the
			
 
				+developer with a Dockefile ``ENV``::
			
 
				+
			
 
				+   $ docker run -e "deep=purple" -rm ubuntu /bin/bash -c export
			
 
				+   declare -x HOME="/"
			
 
				+   declare -x HOSTNAME="85bc26a0e200"
			
 
				+   declare -x OLDPWD
			
 
				+   declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
			
 
				+   declare -x PWD="/"
			
 
				+   declare -x SHLVL="1"
			
 
				+   declare -x container="lxc"
			
 
				+   declare -x deep="purple"
			
 
				+
			
 
				+Similarly the operator can set the **hostname** with ``-h``.
			
 
				+
			
 
				+``-link name:alias`` also sets environment variables, using the
			
 
				+*alias* string to define environment variables within the container
			
 
				+that give the IP and PORT information for connecting to the service
			
 
				+container. Let's imagine we have a container running Redis::
			
 
				+
			
 
				+   # Start the service container, named redis-name
			
 
				+   $ docker run -d -name redis-name dockerfiles/redis
			
 
				+   4241164edf6f5aca5b0e9e4c9eccd899b0b8080c64c0cd26efe02166c73208f3
			
 
				+
			
 
				+   # The redis-name container exposed port 6379
			
 
				+   $ docker ps  
			
 
				+   CONTAINER ID        IMAGE                      COMMAND                CREATED             STATUS              PORTS               NAMES
			
 
				+   4241164edf6f        dockerfiles/redis:latest   /redis-stable/src/re   5 seconds ago       Up 4 seconds        6379/tcp            redis-name  
			
 
				+
			
 
				+   # Note that there are no public ports exposed since we didn't use -p or -P
			
 
				+   $ docker port 4241164edf6f 6379
			
 
				+   2014/01/25 00:55:38 Error: No public port '6379' published for 4241164edf6f
			
 
				+
			
 
				+
			
 
				+Yet we can get information about the redis container's exposed ports with ``-link``. Choose an alias that will form a valid environment variable!
			
 
				+
			
 
				+::
			
 
				+
			
 
				+   $ docker run -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c export
			
 
				+   declare -x HOME="/"
			
 
				+   declare -x HOSTNAME="acda7f7b1cdc"
			
 
				+   declare -x OLDPWD
			
 
				+   declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
			
 
				+   declare -x PWD="/"
			
 
				+   declare -x REDIS_ALIAS_NAME="/distracted_wright/redis"
			
 
				+   declare -x REDIS_ALIAS_PORT="tcp://172.17.0.32:6379"
			
 
				+   declare -x REDIS_ALIAS_PORT_6379_TCP="tcp://172.17.0.32:6379"
			
 
				+   declare -x REDIS_ALIAS_PORT_6379_TCP_ADDR="172.17.0.32"
			
 
				+   declare -x REDIS_ALIAS_PORT_6379_TCP_PORT="6379"
			
 
				+   declare -x REDIS_ALIAS_PORT_6379_TCP_PROTO="tcp"
			
 
				+   declare -x SHLVL="1"
			
 
				+   declare -x container="lxc"
			
 
				+
			
 
				+And we can use that information to connect from another container as a client::
			
 
				+
			
 
				+   $ docker run -i -t -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c '/redis-stable/src/redis-cli -h $REDIS_ALIAS_PORT_6379_TCP_ADDR -p $REDIS_ALIAS_PORT_6379_TCP_PORT'
			
 
				+   172.17.0.32:6379>
			
 
				+
			
 
				+VOLUME (Shared Filesystems)
			
 
				+...........................
			
 
				+
			
 
				+::
			
 
				+
			
 
				+   -v=[]: Create a bind mount with: [host-dir]:[container-dir]:[rw|ro]. 
			
 
				+          If "container-dir" is missing, then docker creates a new volume.
			
 
				+   -volumes-from="": Mount all volumes from the given container(s)
			
 
				+
			
 
				+The volumes commands are complex enough to have their own
			
 
				+documentation in section :ref:`volume_def`. A developer can define one
			
 
				+or more VOLUMEs associated with an image, but only the operator can
			
 
				+give access from one container to another (or from a container to a
			
 
				+volume mounted on the host).
			
 
				+
			
 
				+USER
			
 
				+....
			
 
				+
			
 
				+::
			
 
				+
			
 
				+   -u="": Username or UID
			
 
				+
			
 
				+WORKDIR
			
 
				+.......
			
 
				+
			
 
				+::
			
 
				+
			
 
				+   -w="": Working directory inside the container
			
 
				+
			
 
				+Performance
			
 
				+===========
			
 
				+
			
 
				+The operator can also adjust the performance parameters of the container::
			
 
				+
			
 
				+   -c=0 : CPU shares (relative weight)
			
 
				+   -m="": Memory limit (format: <number><optional unit>, where unit = b, k, m or g)
			
 
				+
			
 
				+   -lxc-conf=[]: Add custom lxc options -lxc-conf="lxc.cgroup.cpuset.cpus = 0,1"
			
 
				+   -privileged=false: Give extended privileges to this container
			
 
				+