Josh England jjengla@sandia.gov
Sandia National Laboratories 1
Last update: oneSIS-2.0rc10 (June, 2005)
This document is continually being updated.
Refer to http://onesis.org/docs.php for the latest documentation.
The root image (or any subset of it) can be deployed to disk, if
desired, so that any cluster node can boot from the local disk.
Using oneSIS, it is possible to have nodes that use almost any
combination of
NFS, RAM-based, and local disk for files/directories in the root image.
This document describes some of the techniques oneSIS employs and details all of the configuration directives and helper applications that make up the system. It is written to serve as reference material for the configuration directives and utilities, not as a detailed usage guide. A useful resource detailing specific configuration examples can be found online at oneSIS HOWTO at http://onesis.org/oneSIS-HOWTO.php.
Linux and DHCP enable diskless machines. oneSIS enables desired elements of the root filesystem to `become' writable and allows functional groups of nodes or individual nodes to use different variations of the root filesystem. For a primer on configuring machines to run diskless, read the NFSroot HOWTO at http://onesis.org/NFSroot-HOWTO.php.
The root filesystem of every node is bit-for-bit identical
whether it resides on a local disk or is mounted via NFS. Functional
groups of nodes or individual nodes can be configured to behave
differently by using `linkbacks' to configure themselves at
boot-time with symbolic links
going to and from a small RAM disk.
For large clusters, diskful mirrors of the image can each NFS export their root filesystem to a reasonable number of `diskless' nodes so that services such as DHCP, TFTP, and NFS of the root image can be distributed efficiently to any scale necessary using commodity hardware.
Managing system services and configuration files to be
class-specific or node-specific is the other central role of oneSIS.
In addition, oneSIS provides some power wrapper scripts to consolidate
different remote power management and console utilities into a single command.
These aspects of the cluster's behavior are centrally configured from a
single configuration file within each master image.
Typical configurations are usually minimal, but even complex setups are still easy. The behavior of the entire cluster is easily viewable and modifiable with the control being as fine-grained as necessary.
More information on installing and configuring oneSIS can be found
in the oneSIS HOWTO at http://onesis.org/oneSIS-HOWTO.php.
Any installed linux distribution can be copied and used as the master image for your cluster. Several distributions are currently well-supported. Look in the `distro-patches' directory in the source tree to see well-supported distributions. See section 4.6 for information on how to port to a new distribution.
The following features should be enabled in the kernel for all components of
oneSIS to operate effectively
- tmpfs (Virtual memory file system support) |
- loopback device support |
- devfs (/dev file system support) Note: only needed with 2.4 kernels |
oneSIS uses a tmpfs ramdisk for many operations. Support
for loopback devices must be enabled to be able to create an initial
ramdisk (initrd). Also, oneSIS requires either udev
(with 2.6 kernels) or devfs (with 2.4 or 2.6 kernels) to
handle the /dev directory. Using udev is the preferred
method when using kernel
version 2.6.
A static /dev can be made to work (ie: LINKDIR /dev -d)
with limited functionality, but it is not recommended. Depending on
what applications you are running you may not even need to use
udev or devfs to handle /dev, but that is also
not
recommended.
Note: It may be necessary to download and install udev or the
devfsd program depending on your linux distribution.
oneSIS makes use of the following linux utilities, most
of which are present in most linux distributions.
- cpio |
- sfdisk |
- mke2fs/tune2fs/e2label |
- gzip/gunzip |
- rsync |
- grub |
- lilo |
Creating the master image from an installed linux machine is a
simple process, although it can take some time to copy such a large
volume of data. The copy-rootfs script (section
6.3) automates the details of copying a root
filesystem from the local machine, or from a
remote machine with an ssh daemon running.
After it has been prepared, this master image will serve as the root filesystem for all cluster nodes. Copies are used on diskful nodes, and the image itself is NFS mounted read-only on diskless nodes.
Note: If it wasn't installed on the remote machine, oneSIS will need to
be installed in the image. In the oneSIS source directory, run
# cd oneSIS-2.0rc10
# prefix=/var/lib/oneSIS/image make install
Also note: If the local machine and the image use different versions of
Perl, it may be necessary to chroot into the image before installing oneSIS.
This will ensure that the oneSIS perl module is installed under the right
directory in /usr/lib/perl.
# cp -a oneSIS-2.0rc10 /var/lib/oneSIS/image/usr/local/src
# chroot /var/lib/oneSIS/image
# cd /usr/local/src/oneSIS-2.0rc10
# make install
# exit
As described in the `Implementation' section, mk-sysimage alters any files listed as a LINK* directive to enable the image to serve as the root filesystem for as many nodes with potentially many different functional roles. It will convert the distribution to be used as a read-only root filesystem. As a convenience, it will also remove (backup) any configuration files that try to mount local disk devices or configure network interfaces, or any other configuration files that would create problems for client nodes.
When booting, your machine should come up, boot a kernel, and mount a root filesystem. The kernel can come from the network via a mechanism such as PXE or EtherBoot, from a local disk, or even from onboard flash with LinuxBIOS. Methods described here are traditional diskless NFSroot, and booting diskless/diskful/mixed systems with initrd.
Once the initrd is created, nodes can boot into diskless/diskful/mixed environments by specifying the initrd on the kernel command line:
The root filesystem itself can be mounted via NFS or from a
local disk. When a portion of the root filesystem has been deployed
on a local disk, the initrd can be configured to automatically mount
those partitions before
pivoting into the root filesystem.
An entire cluster (or any individual node or group within it)
can be configured any way you want. The default initrd template can
also be extended to support almost any conceivable
creative boot method.
Note: The standard mkinitrd utility can still be used to bootstrap diskful nodes in many scenarios.
The initrd templates are compressed ext2 loopback filesystems that can be decompressed with gzip and mounted with mount -o loop. If none of the existing logic in the initrd template is changed, mk-initrd-oneSIS can use any derived templates to create initrds with added functionality. There is a specific place in the initrd's linuxrc script specifically designated for additional logic. Any other creative bootstrapping logic can be added there, without losing existing functionality.
A typical cluster may have administration nodes, front-end nodes, login nodes, and nodes dedicated to providing filesystem I/O to the rest of the cluster. For these nodes, it is desirable to have a root filesystem image that is identical to the main `compute' nodes. Although using the same filesystem image for `admin' or `IO' nodes is not required, for example as for `compute' nodes, consistency helps ease administration and reduce the overall complexity of the system.
These differences usually require deviations from the master image that configure unique behavior for each class of nodes. The types of deviations are grouped into three main categories: deviation of system services, deviation of files and directories, and deviation in usage of local hard disks (if they exist).
For any file requiring variations between nodes, oneSIS replaces
the file with a symbolic link pointing to its corresponding path in
/ram. The original file is moved to a file with the same
name, but with a `.default' extension.
At boot time, each node determines its role and uses the version of the file that corresponds to that node's class (see figure 1). Any node not configured to use an alternate file or directory uses the original `.default' file. This technique can be used to achieve different behavior for any node or class of nodes.
Subclasses inherit all the behaviors of a parent class, can override those behaviors if desired, and can define any additional behaviors considered necessary. There is no explicit limit on the depth of sub-classing.
The oneSIS RAM disk contains all necessary files and directories
configured to help the node function as normal without a writable
root filesystem. It has all class-specific and node-specific
deviations. If a `login' node needs a different /etc/fstab,
rc.preinit creates a /ram/etc/fstab symlink that
points to /etc/fstab.login in the master image, if it
exists.
If there is no class-specific version of /ram/etc/fstab, that symbolic link will link back to the /etc/fstab.default file. This `linkback' process provides flexibility. Any file in the master image can be changed or deleted on a per-class and per-node basis, allowing for fine-grained control of any file in the master image. All deviations in the filesystem are handled similarly.
Commenting out the detrimental lines from the rc scripts
usually eliminates the errors originating from the distribution's
rc scripts. oneSIS does this automatically for several
distributions by applying a `distribution patch`
against the filesystem.
Currently, oneSIS includes patches for several distributions.
Minor changes often are made to a distribution's rc scripts
between sub-versions of a distribution release. This requires
development of a patch to `port' oneSIS to each version
of a distribution.
Creating a patch for a newer version of an already supported
distribution is simple. An older patch for the same distribution can
be used as a model for the new patch. The primary goal of the patch
is to ensure that the root filesystem is not mounted read-write at
boot time. The patch also comments some actions in the rc
scripts that try to write to the root filesystem.
This results in a more aesthetic bootup.
At bootup many errors complaining about the 'Read-only file
system' may appear on the console. Tracking down the source of these
errors and commenting out the offending lines of code is not
difficult. However, leaving the script intact and creating
configuration
directives so that the data is written to the RAM disk instead may be preferable in some cases.
In general, developing a patch for a new version of a distribution can be accomplished with only a few iterations of booting a machine.
/etc/sysimage.conf centrally manages the behavior of
every node booting into that image. The directives are few and
simple with clear functions. The directives are used to define role
abstractions for all the nodes and express the desired
behavior of each role.
With the exception of the NODECLASS* directives, any directive in the configuration can be overridden by directives further down in the configuration. Additionally, any directive (except for NODECLASS* directives) can be limited to apply to only one or more classes of nodes or to individual nodes.
Enabling and disabling system services in the default runlevel
is also distro-specific, and some distro's may require some minor
boot-time
initialization tasks.
A directive is included in /etc/sysimage.conf to specify
the distribution of the master image to enable oneSIS to handle
these tasks for different distributions. If the specified
distribution is not currently supported, it may be necessary to
create a distribution patch as mentioned in section
4.6.
oneSIS will operate even on an unsupported distribution, but some of the distro-specific features described above will not be available. Remember that any SERVICE directives in the /etc/sysimage.conf file must always be enabled in the default runlevel. The rest of the system will still work as expected.
Node classes can be defined by perl-style regular expressions or
by using a syntax to describe range expressions. A combination of
multiple NODECLASS* directives can be used to describe a
single class. For nodes with more than one matching
NODECLASS* directive, but different class names, later
directives will override earlier
ones.
Once a class is defined, the class name can be used in other directives to define behavior specific to that class.
NODECLASS_REGEXP | cn![]() |
compute |
NODECLASS_RANGE | cn[1-32] | compute.gige |
NODECLASS_RANGE | cn[33-64] | compute.myri |
The default maximum size of the oneSIS RAM disk is 1MB. A larger
RAM disk can be created if desired. If any RAM* or
LINK* directives are using the -d flag to duplicate
files in /ram, a larger RAM
disk may be necessary.
The RAM disk is first created by rc.preinit at boot time,
but the update-node script can be used to re-size the RAM disk
according to the current configuration after a node is booted.
RAMSIZE max_size [k
m
g]
[-c class[,class]...] [-n node[,node]...]
Directs oneSIS to create a RAM disk that can grow to at most max_size
units.
Units can be specified in kilobytes(k), megabytes(m), or gigabytes(g).
| Supplying any -c options limits the directive to apply only to
the given classes.
| Supplying any -n options limits the directive to apply only to
the given nodes.
The RAMFILE directive creates an empty
file in the RAM disk, and the RAMDIR directive creates an empty directory.
The LINKFILE directive also creates an empty file in the
RAM disk, and the LINKDIR directive creates an empty
directory. In addition the LINK* directives direct the
mk-sysimage script to
alter the corresponding files in the root filesystem to become
symlinks to /ram.
Both sets of directives have their role. The mk-sysimage
script creates the necessary links in the master image.
rc.preinit creates the corresponding files and directories in
the /ram at boot-time or after update-node boots a
node.
File permissions and ownership are mirrored from the root image, but can be set explicitly for RAM* directives.
Other reasons to have certain files/directories copied into /ram are that any RAM* or LINK* directives can specify the -d flag to duplicate the specified files or directories into /ram. The maximum size of the ramdisk may need to be adjusted accordingly.
RAMFILE file
[-d] [-c class[,class]...] [-n node[,node]...]
| [-m mode] [-u user] [-g group]
Creates a file in the RAM disk.
For both RAMDIR and RAMFILE directives:
| The -m option sets permissions of the specified file or directory.
| The -u option sets owner of the specified file or directory.
| The -g option sets group of the specified file or directory.
LINKDIR dir
[-d] [-p] [-c class[,class]...] [-n node[,node]...]
Creates a directory in the RAM disk, and causes the corresponding directory in
the root filesystem to point to the directory in /ram.
| The -p option protects the contents of the /ram directory from update-node -clean.
LINKFILE file
[-d] [-c class[,class]...] [-n node[,node]...]
Creates a directory in the RAM disk, and causes the corresponding directory in
the root filesystem to point to the file in /ram.
For RAMDIR, RAMFILE, LINKDIR, LINKFILE directives:
| The -d option causes the given file/directory to be duplicated in
/ram.
| Supplying any -c options limits the directive to apply only to
the given classes.
| Supplying any -n options limits the directive to apply only to
the given nodes.
| Any wildcard syntax consisting of *, ?,
, or { } characters can be
used to specify multiple files/directories in accordance with the POSIX.2
glob() function.
To solve these kinds of problems, carefully watch the console of
a booting node for errors related to a 'Read-only file system'. When
these kinds of errors occur, determine which file or directory was
trying to be written to and
include LINKDIR or LINKFILE directives in the configuration as appropriate.
As an example, several distributions like to write .pid files into /var/run to keep track of the process IDs of running daemons. At boot time, when these daemons try to start, there will be complaints about a 'Read-only file system' when /var/run is not writable. One solution for this problem is to add the following directive to the sysimage.conf file of the master image:
Giving a daemon the ability to write to a single file, such as /var/lib/random-seed, can be handled similarly. Rather than link all of /var/lib into the RAM disk, link just the needed file:
Closely watch the bootup and add any needed directives to the configuration to handle the idiosyncrasies of a read-only root filesystem. On a side note, it also helps to disable most of the unnecessary daemons enabled by default in many distributions.
oneSIS gets around this problem by automatically converting
/etc/mtab to be a symbolic link to /proc/mounts. The
only negative effect is that userspace tools such as losetup
and mount -o loop do not append entries to
/proc/mounts, so mounted loopback filesystems do not appear
to the system to be mounted. The result is that loopback filesystems
can be mounted and unmounted, but the umount command does not
automatically free the
loopback device.
Usually a total of eight loopback filesystems can be mounted and
unmounted before mount complains: `mount: could not find any
free loop device'. The solution then is to manually detach the
device associated with each loop device with
losetup -d /dev/loop0; losetup -d /dev/loop1; ....
Alternatively, when /etc is deployed on a local disk, and
auto-mounted read-write from an initrd, oneSIS automatically creates
a real /etc/mtab to eliminate the problem with loopback
mounts. If the local /etc is ever mounted read-only again,
/etc/mtab is again
converted to be a symlink to /proc/mounts.
Note: This swapping to and from a real /etc/mtab only occurs on locally deployed /etc directories. The /etc/mtab of the master image is always a symlink to /proc/mounts.
SERVICE directives specify which services should run on which classes of nodes. The mk-sysimage script verifies that services are configured to start in the default runlevel. When rc.preinit runs at boot-time, a linkback is created to the start script if the service is enabled. Otherwise the linkback is not created and the service does not start because the start script effectively does not exist for that node.
A linkback can have several potential targets. The
`CLASS' target causes the linkback to point back to the
original filename appended with an extension that is the name of the
node class as determined by the NODECLASS directives.
Similarly, the `NODE' target causes the final target to point
back to the original filename appended with
an extension that is the hostname of the node.
A linkback target can also be any arbitrary pathname. This
target path will be interpolated to replace any instance of
`$CLASS' and
`$NODE' with the class name and hostname of the node, respectively.
Note: Some files cannot have linkbacks created for them for various reasons. The most notable of these are /etc/inittab, /boot/grub/menu.lst, /etc/mtab, and /etc/sysimage.conf.
The linkback can be forced even if the target doesn't exist.
Using the * syntax before a linkback target will cause the
linkback to the target to be created even if the target doesn't
exist at boot
time.
Since oneSIS creates these links before any other system initialization is done, remote filesystems specified in /etc/fstab are not mounted yet. The * syntax can be especially useful to point to locations on a network filesystem that hasn't been mounted yet.
For example, if a node in the `compute.gige' class has a `CLASS' linkback defined for /etc/fstab, oneSIS will attempt to linkback to /etc/fstab.compute.gige. If that target doesn't exist, oneSIS will attempt to linkback to /etc/fstab.compute, then finally /etc/fstab.default.
If a LINKBACK is defined for
/etc/sysconfig/network-scripts/ifcfg-eth0, the glob above will match
all of the linkback targets for that file, including
ifcfg-eth0.default and all node-specific versions of the
file. The system script then tries to bring up the interface several
times with
potentially different configurations. This creates many problems, which could result in losing all static configurations.
For such cases it is desirable to `hide' (with the LINKBACK
-h flag) all linkback targets so that the system scripts
still function correctly.
When a linkback is hidden, all linkback targets will have a '.'
pre-pended to the name, so ifcfg-eth0.default, when hidden,
will become .ifcfg-eth0.default. Remember that all variants
of the file also will need to be hidden. If you want a NODE-specific
version of ifcfg-eth0 for admin1, the file
.ifcfg-eth0.admin1 needs to
be created to hold the configuration.
Note: Hidden LINKBACK directives only apply
to CLASS specific and NODE specific linkback targets.
The mk-sysimage script transitions the .default file between hidden and un-hidden according to the configuration, but it does not alter any other CLASS or NODE specific linkback targets. The administrator must ensure that all of these targets are hidden or un-hidden according to the configuration.
When detecting disks, oneSIS assigns a number to each disk it
finds as determined by the kernel order seen in
/proc/partitions. This allows the configuration to specify
that it wants to use the first disk, for example, rather than
requiring a specific device name. oneSIS can detect any disk device
that shows up as a normal block device if the appropriate drivers are loaded.
Support for specifying disks explicitly by their device name (or
disklabel) will be added in a
future release.
Note: For rc.preinit or mk-diskful to operate on a disk, the driver must already be loaded. This means the disk driver needs to either be compiled directly into the kernel, or the module needs to be loaded from an initrd.
The DISKMOUNT and DISKSWAP directives cause disk
partitions of swap space or ext2 filesystems to be created at
every boot time. These directives are best used for creating
filesystems for temporary storage, such as /tmp.
Note: Any local disk partitions not deployed as part of the root filesystem that need to retain data across a reboot should be handled normally (ie: with /etc/fstab).
DISKSWAP | 1 | 3000 | |
DISKMOUNT | 1 | 100% | /tmp |
If a BOOTLOADER directive is defined for a node, and the /boot
directory is deployed on a local disk, the specified bootloader will be
installed.
As a convenience, when deploying local partitions, oneSIS creates some links
on the local disk to abstract out the actual device names of the boot device
and the root device.
/dev/oneSIS-boot is created, which points to the actual boot device.
/dev/oneSIS-root is created, which points to the actual root device.
These links can be used when configuring lilo or grub so that the same configuration can be used for all diskful nodes of a heterogeneous cluster without needing to specify the actual device name of the boot or root device.
DEPLOYSWAP disk
size[%]
[-c class[,class]...] [-n node[,node]...]
Directs the mk-diskful script to create and enable a swap
partition of a specified size on the specified disk.
For both DEPLOYMOUNT and DEPLOYSWAP directives:
The disk parameter is a number specifying disk order as seen by the
kernel.
| The size parameter can be either a percentage of the disk to
use, or the exact size in megabytes. If size is larger than
the remaining
capacity of disk, the remainder of the disk is used.
| Supplying any -c options limits the directive to apply only to
the given classes.
| Supplying any -n options limits the directive to apply only to
the given nodes.
By default on most Linux systems, device names are not guaranteed to be consistent across a reboot. One must be aware that a failed disk could cause the device ordering to come up inconsistently and potentially cause damage to subsequent local disks. One solution is to use udev and configure it to have strong associations between disk devices and their device names.
This is crucial for diskful nodes mounting other filesystems
from any storage that appears to the system as a local device. Any
attempt to synchronize the mountpoint would synchronize it with the
(probably empty) directory in the master image,
possibly deleting some data.
Adding an EXCLUDESYNC directive will exclude any file or directory from being synchronized with the master image.
For a bootloader to be installed, a /boot directory must
be deployed on a local disk. The bootloader is installed onto the
disk containing the /boot directory. A working configuration
for the chosen
bootloader is necessary (ie: in lilo.conf or grub.conf).
It is not necessary to install a bootloader even if the entire
root filesystem is on a local disk. Any node capable of network
booting can still retrieve its kernel from a network resource such
as DHCP and
PXELINUX.
Alternatively, NFSroot nodes can create a single /boot
partition on a local disk, install a bootloader, and load the kernel
off the local disk, but still mount the root filesystem accessed via
NFS. Loading the kernel from a local disk can help reduce network
contention at
boot-time when many machines power on all at once.
Many options exist to boot any node or functional group of nodes (locally or from the network) into a root filesystem that is either local, NFS mounted, or a combination of the two. The best scenario depends on the function of the node and the situation.
The hostname can be set by kernel-level autoconfiguration or in
an initrd. This requires a network resource such as DHCP to supply
the hostname, but there is another alternative: if a node reaches
rc.preinit without having a hostname set, the MAC_ADDR
directives are consulted. This is often necessary
for bringing up stand-alone nodes (ie: the main DHCP server).
Note: The MAC_ADDR directive are only used when no hostname is set. They do not override a previously set hostname.
For this directive to work, any referenced network interface's drivers must have already been loaded. This means the drivers need to be directly compiled into the kernel or loaded from an initrd.
Every power utility has a different way of representing the set
of nodes on which to operate. Similarly, many different methods can
be used to access the serial console
of all cluster nodes.
oneSIS provides a generalized wrapper interface that can easily tie into any
existing power or console management solution.
It serves as a common interface for power and console management across all
machines, eliminating the need for an admin to remember the particular
command and syntax used for power management and console access on each
particular group of machines.
The POWERCMD directive is a way to quickly describe how a
particular power management utility works so that the pwr
(section 6.7) command can then interface to it.
Similarly, the CONSOLECMD directive can quickly describe how to
access the remote console of any node, and thereafter the
consl (section 6.8) command can be used to
access a node's serial
console.
Every command must reference the spec_id of a
SPECFORMAT directive (see section 5.12) by
including `SPEC:spec_id' in the appropriate place in the
command sequence. The `SPEC:spec_id' text gets replaced with
a hostname, or a range, etc., as defined in the SPECFORMAT
corresponding
to spec_id.
A simple hostname format may work in most cases, but may not operate as fast as one of the range formats. When using the hostname or ipaddr specformats, command is interpolated to replace any instance of `$NODE' or `$IP' with the hostname or IP address of the node being operated on, respectively.
Just as with POWERCMD, every command must reference the
spec_id of a SPECFORMAT directive (see section
5.12) by including `SPEC:spec_id' in the
appropriate place in the command sequence. The `SPEC:
spec_id' text gets replaced with a hostname or a range, etc., as
defined in the SPECFORMAT corresponding
to spec_id.
A simple hostname format will work in most cases for remote console
operations.
When using the hostname or ipaddr spec formats, command is interpolated to replace any instance of `$NODE' or `$IP' with the hostname or IP address of the node being operated on, respectively.
For oneSIS to make use of a particular power or console utility,
it needs to know the format that the utility uses to represent a
single host or a group of hosts. Some power utilities operate on a
single hostname. Others can operate in parallel on a range of
hostnames. Others don't operate on hostnames at all,
instead referencing IP addresses or particular ports on a power controller.
The oneSIS interface for power and console management can be used as
long as any mapping exists
between the hostname (or IP address) of a node and the resulting parameter that
gets passed to the power management utility for that host. The parameter itself
could be a
hostname, a port, or any other parameter required by the given utility.
The hostname->parameter mapping can be defined directly in the configuration with a SPECFORMAT directive, or can be determined via more cumbersome methods involving combinations of shell commands in the POWERCMD or CONSOLECMD directives.
Every POWERCMD and CONSOLECMD directive must
reference exactly one spec_id from a SPECFORMAT description.
The resulting formats are sent through a POWERCMD command and executed
by the pwr script.
A SPECFORMAT describes the format a particular command
uses to specify which nodes to operate on. Several formats
currently exist which can be used to describe a set of nodes.
hostname | The resulting command uses the hostname of each specified node and |
---|---|
runs one command for each hostname. | |
ipaddr | The resulting command uses the IP address of each specified node and |
runs one command for each IP address. | |
basic_range | One or more node ranges are constructed. Each range is of the form: |
prefix[a-b], where a![]() |
|
ext_range | One or more node ranges are constructed. Each range is of the form: |
prefix[a-b,x-y,...], where a![]() ![]() |
|
Note: | Adding a '+' to the end of any format causes that format |
to be used multiple times in a single command. |
Consider the powerman utility as an example,
which represents a range of nodes as host[a-b,x-y].
The ext_range format translates a set of hostnames
into one or more ranges in the form acceptable to powerman.
Formats useful for other utilities can be derived from an
existing format and a `SPEC:///' translation if needed. The
number of
formats will never become excessive because the number of ways to represent a set of hosts is limited.
The hostname and ipaddr formats are both
used for commands that operate on a single hostname or IP address.
Each hostname or IP address can optionally have a transformation
applied to it before applying the given format, by using the
`NODE:///' or `IP:///'
parameters.
The `SPEC:///', `NODE:///', and `IP:///'
translations can be any perl-style pattern replacement expression.
Refer to the perl documentation (man perlop and man
perlre) for details on using
the s/// operator.
In addition to normal perl syntax, the right-hand-side
(replacement) portion of the translation expression can contain
minimal inline perl code blocks within {} brackets. These code
blocks can be used to replace patterns in the hostname or IP
addresses with values computed from evaluating the inline code
expression. This is useful for doing inline math on an IP address,
when necessary. The {...} code blocks must be kept very simple
as they cannot yet contain any spaces.
Similar to the NODE and IP translations, a final translation can
be done on the formatted spec before it gets used in a pwr
command. Including
a `SPEC:///' parameter will define the desired translation.
Note: The -dryrun option to the pwr and consl commands is useful when developing a SPECFORMAT, POWERCMD, and CONSOLECMD directives for an environment. It shows the commands that the current configuration can generate. A working configuration can be arrived upon fairly quickly by iterating through changes in your configuration and using the -dryrun option.
Most utility programs have usage information that can be seen by running the command with no arguments.
basedir should be the root of the client's linux image.
Options:
-d, | -dryrun | Preview changes |
-r, | -revert | Revert all files and services back to normal |
-c, | -config=FILE | Specify alternate configuration file |
-p, | -patchfile=FILE | Specify alternate distribution patch |
-sp, | -skippatch | Skip distribution patch |
-q, | -quiet | Suppress output |
The mk-sysimage script reads the oneSIS
configuration file, /etc/sysimage.conf, and alters components
of the filesystem for oneSIS to operate correctly. It creates some
directories, applies the patch file for the specific distribution
(see section
4.6), and performs other helpful tasks.
Several directives in /etc/sysimage.conf require
altering a file in the image to point to its corresponding location
in /ram.
mk-sysimage creates any new symbolic links to /ram
and the corresponding `.default' files or directories.
mk-sysimage automatically restores files in the master
image to their original state when they are removed from the
configuration. It can also revert the entire filesystem back to its
original state with
the -revert option.
To ensure that configuration changes are reflected in the system
image, it is recommended that the mk-sysimage script be run
after changing
any LINK* directives in the configuration.
For an image located in /var/lib/oneSIS/image, mk-sysimage would be run with:
Note: mk-sysimage will attempt to patch the target
distribution every time it is run. A warning will be displayed
unless a patch exists for the distribution or the -skippatch
option is supplied. This is meant to encourage anyone hacking the
rc scripts of a new distribution to
develop a patch for it and feed that back to the oneSIS community.
Warning: If you manually alter your distribution's rc scripts, mk-sysimage will fail to apply the distribution patch and display long error messages. If you plan to do this, you can run mk-sysimage with the -skippatch option so it doesn't try re-patch the distribution.
Options:
-r, | -run | This argument must be given to run the script |
-c, | -clean | Removes files/directories in /ram not in the config |
-d, | -dryrun | Shows updates without making them |
-cf, | -config=FILE | Specify alternate configuration file |
-q, | -quiet | Suppress output |
The update-node script performs a very similar
function as the boot-time script, rc.preinit. It updates all
the files and directories configured in /etc/sysimage.conf
that reside in the oneSIS RAM disk mounted on
/ram. It will also resize the RAM disk if necessary.
If any new RAM* or LINK* directives are added to the configuration, running update-node on all nodes will ensure that their RAM disk is consistent with the new configuration.
To remove files and directories in /ram that are no longer specified in the config, the -clean option must be given to update-node. However, it is recommended to clean files no longer in the config without destroying useful data that may be stored in a RAMDIR or LINKDIR. To protect such directories from having useful data destroyed by an `update-node -clean' operation, a -p flag can be added to the configuration directive:
IMAGE_DIR is the destination directory for the root image.
Options:
-l, | -local | Copy root filesystem from the local machine |
-r, | -remote=MACHINE | Copy root filesystem from a remote machine |
-e, | -exclude=DIR | Exclude contents of DIR from being copied |
-d, | -dryrun | Show local/remote directories that would be copied |
-v, | -verbose | Verbose output (copies much slower) |
The copy-rootfs script copies an installed linux
distribution into a new location to serve as a new master image for
a cluster of nodes. The script recognizes which partitions reside on
a local disk, and copies each one over in the correct order without
recursively copying itself (for a local
copy).
Since copy-rootfs attempts to copy any partitions mounted
from a local disk, it may copy more than you want or need to be a
part of the master image. To prevent this, run copy-rootfs
with the -dryrun option to see a list of what the script
intends to do. Any directories that shouldn't be copied over can be
excluded with the
-exclude option.
When copying the root filesystem from a remote machine, it is
easiest if ssh keys are set up such that no password is required to
ssh to the machine. If ssh keys are not set up, the script will
prompt for a password several times (once for each remote partition
being copied, and once to determine remote
partitions).
A typical scenario to create a master image may look as follows:
This would copy the root filesystem of the local machine into another directory but exclude the contents of the /home directory.
-s, | -size=STRING | Specify size of initrd (default: 4096) |
-d, | -scsi | Include scsi_hostadapter modules |
-p, | -preload=STRING | Add the specified module (loads before SCSI modules) |
-w, | -with=STRING | Add the specified module (loads after SCSI modules) |
-v, | -variant=STRING | Specify the class or node variant to use |
-t, | -template=FILE | Use the specified initrd template. |
(default: /usr/share/oneSIS/initrd-dhclient.gz) | ||
-b, | -basedir=DIR | Look for files relative to DIR (default: /) |
-f, | -force | Force overwrite of an existing initrd |
-nn, | -nonfs | Omit inclusion of NFS modules |
-td, | -tempdir=DIR | Use alternate temporary directory instead of /tmp |
-q, | -quiet | Suppress output |
--- Initrd Behavior Flags --- | ||
-am, | -automount | Auto-mount labeled partitions and swapon swap |
partitions from the initrd | ||
-rw, | -readwrite=STRING | Auto-mount specified labeled partitions read-write |
The string 'ALL' will mount all partitions read-write | ||
-nd, | -nodhcp | Don't run a DHCP client from the initrd |
An alternate method for booting oneSIS systems is to bootstrap using an
initial ramdisk (initrd).
By using mk-initrd-oneSIS, an initrd can be built that
is customized for an entire cluster or for any subset of nodes.
Kernel modules needed for NFS and those specified by any
eth0 aliases in /etc/modules.conf are included
automatically in the initrd and loaded at boot time. Likewise, any
scsi_hostadapter alias in /etc/modules.conf will
cause the
corresponding driver to be loaded when the -scsi option is given.
Any other modules can be included with command-line arguments.
All modules must exist in
/lib/modules/kernel-version relative to the basedir.
For example, to create an initrd for a node running a 2.6.12 kernel
with an e1000 network card and IDE disk support built into
the kernel, assuming kernel modules are installed in
/lib/modules/2.6.12, you would type:
Local disk partitions that have been created with
DEPLOYMOUNT, or DEPLOYSWAP directives and the
mk-diskful script (see section 6.5) can be
mounted automatically (or swapped-on) from the initrd.
To automount locally deployed partitions on the system described above:
Note: mk-initrd-oneSIS does not currently look at /etc/fstab to determine which local partitions to mount.
-run | This argument must be given to run the script | |
-i, | -image=DIR | Specify the NFS location of the master image |
-e, | -exclude=DIR | Exclude DIR from being copied |
-r, | -reboot | Reboot the node when finished |
-d, | -dryrun | Show directories that would be copied to each partition |
-v, | -verbose | Verbose output (copies much slower) |
-q, | -quiet | Suppress output |
Although booting oneSIS nodes with NFS root filesystems is preferred, oneSIS
fully supports booting from a local disk, mounting the root filesystem from
a local disk, or mounting only specific directories of the root filesystem
from a local disk.
The mk-diskful script can be used to deploy portions of the root filesystem onto partitions on a local disk. The script can be run on a node after it is booted into a normal NFS root with no mounted partitions on the target disk. Alternatively, it can be run by calling it as init directly from the kernel command line as follows:
Any portion of a node's root filesystem can be configured to reside on a local disk, so any combination of NFS and local directories in the root filesystem is possible. Nodes having /boot on a local disk can be configured to boot a kernel and initrd from the disk or may simply continue to boot off the network.
At least one local directory to synchronize must be given.
The image parameter is required and must specify the host and remote
path of the NFS-exported master image.
Options:
-i, | -image=HOST:DIR | Specify the location of the master image |
-l, | -lilo | Run lilo |
-a, | -all | Synchronize all local partitions |
-e, | -exclude=PATTERN | Exclude files matching PATTERN |
-d, | -dryrun | Preview changes |
-q, | -quiet | Suppress output |
When portions of the root filesystem exist on local disk partitions, it is necessary to synchronize these partitions with the master image as often as necessary. If a change is made to /etc/passwd, for instance, all nodes having a local /etc partition could be synchronized with:
Note: Running `sync-node /' will synchronize only
the partition that / resides on. If /etc is on another
partition, `sync-node /etc' would need to be run to
synchronize it. To synchronize
all local partitions, use `sync-node -a'.
Warning: Running `sync-node -a' will attempt to synchronize all locally mounted partitions. However, if a /data directory, for example, is mounting a SAN storage device that appears to the system as a SCSI disk, `sync-node -a' will detect that it is local and attempt to synchronize /data with the (probably-empty) /data directory in the master image, resulting in possible data loss. It is advisable to use EXCLUDESYNC directives as appropriate, and use `sync-node -a' with caution around nodes with a SAN.
FUNCTION can be one of: on, off, cycle, status, ledon, ledoff, or ledstatus
Note: Unambiguous short forms of the functions are also accepted.
NODESPEC can be:
[-h] HOSTNAME | |
![]() |
(any text with one or more RANGEs in brackets) |
a RANGE is of the form ![]() ![]() ![]() ![]() ![]() |
|
ie: cn[1-10,15,20-32] or su[1,4]cn[1-32] or my[1-32]nodes | |
-re REGEXP | (perl-style regular expression matching hostnames) |
-c CLASS | (oneSIS class name) |
Options:
-h, | -host=HOSTNAME | Operate on hostname |
-r, | -range=RANGESPEC | Specify a range of nodes to operate on |
-re, | -regexp=REGEXP | Specify a regular expression of nodes to operate on |
-c, | -class=CLASS | Specify a class of nodes to operate on |
-p, | -parallelism=NUM | Specify the maximum number of parallel commands to run |
(default: no limit) | ||
-d, | -dryrun | Show command(s) that would be executed |
-q, | -quiet | Suppress output |
The pwr script is a convenient wrapper script supplied
to provide a unified interface for handling power management for
cluster nodes. It enables the same command to be used on every
cluster
regardless of the underlying mechanisms for handling node power.
Note: At least one valid SPECFORMAT directive and a
POWERCMD for each FUNCTION must be supplied for the
pwr
command to be able to perform that function.
The pwr script builds commands that it runs (in parallel)
to power on, power off, cycle, or query the power status of a given
set of nodes. It can also turn on, turn off, or query the status of
a chassis LED
(or similar mechanism) if that functionality is available.
To power on nodes named cn1 through cn100:
POWERCMD directives for several power management utilities may be included with each oneSIS distribution. For example, to have pwr use the powerman utility (http://www.llnl.gov/linux/powerman), you could add the following line to /etc/sysimage.conf:
This includes a list of directives used to interface to powerman for power management. The contents of that file look like this: SPECFORMAT powerman_spec ext_range+
It is still necessary to configure powerman or any other power utility.
NODESPEC can be:
[-h] HOSTNAME | |
![]() |
(any text with one or more RANGEs in brackets) |
a RANGE is of the form ![]() ![]() ![]() ![]() ![]() |
|
ie: cn[1-10,15,20-32] or su[1,4]cn[1-32] or my[1-32]nodes | |
-re REGEXP | (perl-style regular expression matching hostnames) |
-c CLASS | (oneSIS class name) |
Options:
-h, | -host=HOSTNAME | Operate on hostname |
-r, | -range=RANGESPEC | Specify a range of nodes to operate on |
-re, | -regexp=REGEXP | Specify a regular expression of nodes to operate on |
-c, | -class=CLASS | Specify a class of nodes to operate on |
-p, | -parallelism=NUM | Specify the maximum number of parallel commands to run |
(default: no limit) | ||
-d, | -dryrun | Show command(s) that would be executed |
-q, | -quiet | Suppress output |
Like pwr, consl command is a convenient wrapper
supplied to provide a unified interface for accessing the serial
console of cluster nodes. Typically, only one serial console is
accessed at a time, but if the underlying application supports it
(for instance conman -b),
multiple consoles can be accessed at the same time.
Note: consl requires at least one valid SPECFORMAT and
CONSOLECMD directive in /etc/sysimage.conf to operate.
For example, if the cluster is set up such that the serial console of a node named node1 is accessible by telnetting to node1-term, the following configuration could be used in (/etc/sysimage.conf): SPECFORMAT spec1 hostname NODE:/$/-term/
To clear the screen and print a helpful message before opening each console:
To open each console in a separate window:
Then to connect to the console of node1, run:
Or, to connect to several consoles:
NODESPEC can be:
[-h] HOSTNAME | |
![]() |
(any text with one or more RANGEs in brackets) |
a RANGE is of the form ![]() ![]() ![]() ![]() ![]() |
|
ie: cn[1-10,15,20-32] or su[1,4]cn[1-32] or my[1-32]nodes |
Options:
-l, | -list | List available PXE configuration files |
-s, | -show | Show which nodes are using which config file |
-r, | -revert | Revert specified nodes back to the 'default' config |
-d, | -dryrun | Show command(s) that would be executed |
-q, | -quiet | Suppress output |
This script is supplied as a convenience for operators
using the PXELINUX package for network booting. PXELINUX allows
individual nodes to use different configuration files by looking for
a file with the hex equivalent
of the node's IP address.
pxe-config provides a helpful interface
to specify individual configuration files for given nodes, list which
configuration files are available, and show which nodes are using which
configuration.
PXE configuration files are normally kept in a directory like /tftpboot/pxelinux.cfg. If you create a PXE configuration file called /tftpboot/pxelinux.cfg/x86_64/2.6.12 containing your desired PXE configuration, you could direct nodes node1 through node100 to use that config with:
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -local_icons -split 0 -dir oneSIS-manual-onepage -mkdir -nonavigation oneSIS-manual.tex
The translation was initiated by root on 2005-06-19