Category Archives: Uncategorized

Hanlon does Windows!

One of the most often requested features whenever we’ve talked with Hanlon users (going all the way back to when Nick Weaver and I released Hanlon under it’s original name, Razor) has been Windows support. We’ve struggled with how to add support for Windows provisioning to both Razor and Hanlon for a couple of years now, and we’ve even had a few false starts at providing support for the feature, but somehow the implementations we tried never really made it into a production Razor/Hanlon server.

The issue hasn’t been a technical one, instead it’s been an issue of how to fit the provisioning of Windows and the management of Windows images into the provisioning workflow used by Hanlon. Windows is, well how shall we put this, just a bit different from the other operating systems and hypervisors that we’ve supported to date and we have struggled all along to find a way to integrate the workflow used for the the unattended Windows install process with the workflow Hanlon uses to automate the install of the operating systems and/or hypervisors that we already support.

That being said, today we are formally announcing that Hanlon now provides fully-automated provisioning of Windows instances in your datacenter using a workflow that should be familiar to both Hanlon users and Windows administrators. There are still a few features that remain to be implemented (mainly around the when notification is sent back to Hanlon that a node is “OS Complete” and support for “broker handoffs” to Windows nodes), but we felt that it was better to get these features out in public so that we could get feedback (and pull requests?) from the community as soon as possible.

In this blog posting, I’d like to walk you through the new Windows support we’ve added to Hanlon. Along the way, I’ll talk through some of the features we had to add to Hanlon in order to support Windows provisioning and highlight some of the differences in workflow for those Windows administrators who have not used Hanlon before and those Hanlon users not familiar with how unattended Windows installs work. For those of you who don’t have the time or patience to read the rest of this blog posting, you can find a screencast of yours truly using these new features posted here. As always, feedback and comments are more than welcome, and if anyone would like to help us improve these features, please do let us know.

Bare-metal Provisioning of Windows

As I mentioned in my introductory remarks, the process of bare-metal provisioning Windows via Hanlon is slightly different from the process that Hanlon follows when provisioning a Linux-based OS or a VMware or Xen Hypervisor. This is due to differences between how the Windows ISO is structured when compared to the Linux or Hypervisor ISOs that Hanlon has supported to date and the differences between how an unattended Windows install works and how a Linux or Hypervisor automated install works. As a result, an unattended Windows install requires that a few external components must be setup and configured before Hanlon can successfully iPXE-boot a Windows-based OS install.

Hanlon’s new static ‘slice’

The first big difference for unattended Windows installs is that the iPXE-boot process for that installation process relies on the ability to download a number of components from a web server that is available in the iPXE-boot network. Rather than requiring that users setup (and configure) an external web server, we have decided to add a new static area to Hanlon itself. To make use of this new capability, simply add a hanlon_static_path parameter to your Hanlon server configuration that points to the directory you wish to use for serving up static content through Hanlon. Any content placed under the directory will then be available vi a GET operation against the /static RESTful endpoint.

With that new static area configured in your Hanlon server, the next step is to setup the appropriate structure under that area to support iPXE-booting of a server into WinPE via Hanlon. The tree structure that you are setting up should look something like this:

$ tree
.
├── boot
│   ├── bcd
│   └── boot.sdi
├── sources
│   └── boot.wim
└── wimboot

2 directories, 4 files
$

The files that are placed into this directory tree (the bcd, boot.sdi, boot.wim, and wimboot files) come from a variety of sources. The boot.wim file under the sources directory is the WinPE image you wish to use to boot your hardware. This WinPE image will have to be built separately, and will likely have to be customized to suit your hardware (for those interested, Joe Callen has put together a blog posting of his own that describes this process, you can find his posting here). Unfortunately, licensing restrictions don’t allow for redistribution of ‘pre-built’ WinPE images but, as you can see in Joe’s post, we’ve tried to make the process of building this image as simple as possible (even for non-Windows developers).

The boot/bcd and boot/boot.sdi files can probably be obtained from any Windows ISO, although the easiest location to grab them from is probably the ISO you are going to install Windows from. These files can be obtained by mounting a Windows ISO and copying them over or they can be copied out of the directory created when you add a Windows ISO to Hanlon (more on this later). When copying over these files, keep in mind that while Windows is not case-sensitive when it comes to filenames, the server you are running your Hanlon server likely is. As such, make sure that the filenames you create in the static area match the case of the corresponding file in the tree structure shown, above (the only real issue is is probably the boot/bcd file, which will appear as boot/BCD on a Windows ISO).

Lastly, the wimboot file can be obtained from any recent build of the wimboot project (the latest version is typically available here, but older versions can be found through the project’s GitHub repository, which can be found here). Once these files are in place, your Hanlon server’s static area is ready to be used.

Changes to the DHCP server configuration

Since the DHCP client used by WinPE does not support passing of DHCP options in the same way as that of the DHCP client used in most Linux/Hypervisor distributions, some minor changes to your DHCP server configuration are probably necessary. Specifically, the section of your DHCP server that looks like this:

# specify a few server-defined DHCP options
option hanlon_server code 224 = ip-address;
option hanlon_port code 225 = unsigned integer 16;
option hanlon_base_uri code 226 = text;

will have to be modified to support both Linux and Windows PE clients as follows:

# specify a few server-defined DHCP options
option hanlon_server code 224 = ip-address;
option hanlon_port code 225 = unsigned integer 16;
option hanlon_base_uri code 226 = text;

# options used for Windows provisioning
option space hanlon;
option hanlon.server code 224 = ip-address;
option hanlon.port code 225 = unsigned integer 16;
option hanlon.base_uri code 226 = text;

Note that in the case of Windows PE clients, we rely on a hanlon space to pass through the parameters that the Windows PE client will need to successfully connect back to the Hanlon server and retrieve the active_model parameters that it needs to continue with the appropriate Windows install (based on the model that it was bound to).

Without these additional runtime parameters, we would have to customize our Windows PE image so that it knew how to contact Hanlon in order to retrieve the active_model instance that it has been bound to (which contains information needed by the WinPE instance to perform an unattended Windows install). With these additional parameters, it is actually quite simple to put together a generic Windows PE image that can connect back to the Hanlon server (via a simple PowerShell script) to obtain this information.

To finish off the task of reconfiguring your DHCP server, you’ll also have to make use of the new space that was defined, above. To accomplish this, simply track down the lines that look like this in your current DHCP server configuration file:

  option hanlon_server 192.168.1.2;
  option hanlon_port 8026;
  option hanlon_base_uri "/hanlon/api/v1";

and modify that section of your DHCP server configuration file so that it looks like this instead:

  class "MSFT" {
    match if substring (option vendor-class-identifier, 0, 4) = "MSFT";
    option hanlon.server 192.168.1.2;
    option hanlon.port 8026;
    option hanlon.base_uri "/hanlon/api/v1";
    vendor-option-space hanlon;
  }
  class "OTHER" {
    match if substring (option vendor-class-identifier, 0, 4) != "MSFT";
    option hanlon_server 192.168.1.2;
    option hanlon_port 8026;
    option hanlon_base_uri "/hanlon/api/v1";
  }

With those changes in place, your DHCP server should now be ready to support chain booting of your machines into a Hanlon-based Windows install.

Building a WinPE image that supports your hardware

In order to make the process of provisioning Windows as painless as possible, Joe Callen (@jcpowermac) created a simple PowerShell script that can be used (on a Windows machine) to build a WinPE image that is suitable for use with your hardware. Drivers for specific networking and storage devices are typically needed to successfully iPXE-boot a node and install an OS instance on bare-metal hardware, and in the case of Windows the drivers for these networking and storage devices must be included as part of the WinPE image that is used to drive the installation process.

Do keep in mind, however, that our goal here is to make the resulting WinPE image as generic as possible so that it can be easily reused with all of your hardware. Rather than embedding a lot of model-specific logic into the WinPE image, we’ve worked quite hard to define a few custom PowerShell scripts that can be used to download appropriate versions of the key files/scripts directly from Hanlon (with appropriate values filled in based on the active_model instance that was bound to the current node). I’ll avoid going into the specifics here; Joe has provided a much more complete discussion of this process in his recent blog post, which is available here.

Loading a Windows ISO into Hanlon

As was mentioned earlier, the structure of a Windows ISO is quite different from the structure of the typical Linux/Hypervisor ISO. The biggest difference is that while the typical Linux/Hypervisor ISO contains a single image, a Windows ISO actually contains a number of different Windows images packaged up into a single install.wim file. This difference in structure shows up in the way that an automated install of Windows actually works. As part of the “autounattend” file you present to the Windows installer, you have to specify not only the location of the install.wim file to use for the install but also the WIM Index of the image in that file that you wish to use for the install.

So what does all of this mean for our image slice under Hanlon? To put it quite simply, we’ve had to make a significant change to how Hanlon handles images in order to support loading of Windows ISOs by the image slice. Specifically, the process of adding a single Windows ISO to Hanlon has the effect of creating multiple (linked) Hanlon images, a concept that was not needed for any of the Linux/Hypervisor ISOs that were already supported by Hanlon.

In order to minimize the space that the ‘unpacked’ ISO takes on disk, we maintained the requirement that each ISO would create a single directory on disk (containing the contents of that ISO). What had to change to support Windows was to add the concept of multiple, linked image objects in Hanlon (each of which is linked to a single base image that actually maps to the underlying directory created during the image add process). This base image then becomes a ‘hidden’ image (one that cannot be seen or used), and it is this base image that is actually used by any of the linked images in order to access resources that are associated with these images (like the install.wim file).

To facilitate this new concept, we modified the Hanlon image object type to take advantage of a ‘hidden’ field that was already defined for all Hanlon objects. When a Windows ISO is imported to Hanlon, the following sequence of operations are performed:

  • The ISO is unpacked into a locally accessible directory and a Windows ‘base image’ is created. This image is ‘hidden’ and, as such, will not appear in the list of Hanlon images returned by a GET against the /image endpoint by default
  • A standard Linux utility (the wiminfo utility) is then used to parse the install.wim file associated with that base image. It should be noted that Hanlon will look for this install.wim file under the sources subdirectory for any given base image (eg. under the the ${IMAGE_PATH}/windows/6hQHOAhWyuiLpUmuZlfQAa/sources directory for a base image with the uuid of 6hQHOAhWyuiLpUmuZlfQAa)
  • The information returned from the wiminfo command is then used to create a set of image objects that are linked to the underlying base image that was created when the ISO was unpacked. These linked images are only valid image objects so long as the underlying base image remains intact. Each of these linked images correspond to one of the images found when the underlying install.wim file from the base image was parsed using the wiminfo command.

The result of adding a single Windows ISO to Hanlon as an image (using the hanlon image add command or it’s RESTful equivalent) will be something like the following:

$ hanlon image add -t win -p /tmp/win2012r2.iso
Attempting to add, please wait...
Images:
         UUID                Type                               Name/Filename                          Status
2GLbzghe1VABDmYVtlLQe0  Windows Install  Windows Server 2012 R2 Standard (Server Core Installation)    Valid
2GLcLzQbh6Uxid9Nei84cy  Windows Install  Windows Server 2012 R2 Standard (Server with a GUI)           Valid
2GLccHsh97YE5BA9H0ghqi  Windows Install  Windows Server 2012 R2 Datacenter (Server Core Installation)  Valid
2GLcp9biBf0uMz80EwGBmK  Windows Install  Windows Server 2012 R2 Datacenter (Server with a GUI)         Valid
$  hanlon image 2GLbzghe1VABDmYVtlLQe0
Image:
 UUID =>  2GLbzghe1VABDmYVtlLQe0
 Type =>  Windows Install
 Name/Filename =>  Windows Server 2012 R2 Standard (Server Core Installation)
 Status =>  Valid
 OS Name =>  Windows Server 2012 R2 Standard (Server Core Installation)
 WIM Index =>  1
 Base Image =>  eeQtUH3Id2xs96i0Cft7w
$

As you can see from this example, the ISO was unpacked into a base image (with a UUID of eeQtUH3Id2xs96i0Cft7w) and four linked images (with the UUIDs shown in the output of the hanlon image add command shown above). While the linked images are independent of each other, all of them depend on the underlying base image for their ‘contents’. As such, removing the filesystem associated with the underlying base image will render all four (in the above example) of these linked images invalid.

Methods used to ‘unpack’ Windows ISOs

It should be noted here that the process for unpacking Windows also may differ from that used to unpack Linux or Hypervisor ISOs. In the Linux/Hypervisor case, fuseiso can be used to ‘mount’ the ISO if it is available. If the fuseiso command cannot be found then a regular mount command is attempted as a fallback. If both of those commands fail, then an error is thrown indicating that the user cannot add the image to Hanlon. If either of those commands succeed, then the contents of the ISO are copied over from the mount-point to a directory on the Hanlon filesystem that is under the ${IMAGE_PATH} directory.

Unfortunately, in the case of Windows, the fuseiso command cannot be used to mount a Windows ISO since the UDF (Universal Data Format) file system format used with Windows ISOs is not supported by the fuseiso command (even though it does support the ISO 9600 and Joliet formats). While the mount command could still be used to mount and copy over the contents of a Windows ISO, we didn’t want to add a requirement to Hanlon that the Hanlon server be run as a user with sudo rights to execute the mount/unmount commands.

To get around this limitation of the fuseiso command, we have added support to Hanlon for use of the 7z command for unpacking of a Windows ISO into a subdirectory of the ${IMAGE_PATH} directory. 7z was chosen as an alternative because, even though it cannot properly read the Joliet filesystem used with some Hypervisor ISOs (specifically ESX 5.x ISOs), it does support the UDF filesystem used with Windows ISOs. The biggest difference of using 7z versus the previous methods supported is that the ISO is never mounted on the Hanlon filesystem, instead the contents of the ISO are directly extracted to the target directory.

A note on removing Windows images

As you can see, this single hanlon image add command created a set of 4 linked images. The underlying base image is not visible in this view, nor is it visible in any of the default views provided by Hanlon. To see the underlying base image details, we actually need request the base image details specifically (using a command like hanlon image eeQtUH3Id2xs96i0Cft7w in the example shown above) or we need to make use of the new --hidden flag that we’ve added to the hanlon image command to facilitate the display of all of the images currently available, including the hidden ones:

$ hanlon image --hidden
Images:
         UUID                Type                               Name/Filename                          Status
2GLbzghe1VABDmYVtlLQe0  Windows Install  Windows Server 2012 R2 Standard (Server Core Installation)    Valid
2GLcLzQbh6Uxid9Nei84cy  Windows Install  Windows Server 2012 R2 Standard (Server with a GUI)           Valid
2GLccHsh97YE5BA9H0ghqi  Windows Install  Windows Server 2012 R2 Datacenter (Server Core Installation)  Valid
eeQtUH3Id2xs96i0Cft7w   Windows Install  Windows (Base Image)                                          Valid
2GLcp9biBf0uMz80EwGBmK  Windows Install  Windows Server 2012 R2 Datacenter (Server with a GUI)         Valid
$

So, given that we now have images in Hanlon that are linked together, how do we handle removal of these images? Previously, Hanlon just removed the underlying directory containing the unpacked version of the ISO that was used create the image, then removed the corresponding image object from Hanlon. There is also a check to ensure that the image you are removing is not a part of a model that is currently defined in Hanlon. If it is part of a model, then removal of the underlying image is prohibited (to prevent removal of an image from breaking any existing models defined in Hanlon that might be using that image).

Hopefully the new rules for removing Windows images are fairly apparent (the rules for removal of Microkernel, Linux or Hypervisor images remain unchanged), but in case they are not, here’s a short summary:

  • if an image is a base image, removal of that image will remove all of the images that are linked to that image, the base image itself, and the directory containing the contents of the ISO that was created when that base image was added to Hanlon
  • if an image is a linked image, then removal of that image will only result in removal of the linked image object itself; after that linked image is removed, if there are no other images remaining that are linked to the base image of that linked image, then the underlying base image (and the directory containing the contents of the ISO that was created when that base image was added to Hanlon) will also be removed
  • requests to remove a linked image will be blocked if the image in question is used in a model currently defined in Hanlon
  • requests to remove a base image will be blocked if any of the images that link to that base image are used in a model currently defined in Hanlon

Creating a Windows model

Once a Windows ISO has been loaded, it is relatively simple to use one of the images created by that process to create a Windows model. The command to add a new Windows model to Hanlon will look something like the following:

$ hanlon model add -t windows_2012_r2 -l windows_2012_r2_dc -i 2GLcc
--- Building Model (windows_2012_r2):

Please enter Windows License Key (example: AAAAA-BBBBB-CCCCC-DDDDD-EEEEE)
(QUIT to cancel)
 > XXXXX-XXXXX-XXXXX-XXXXX-XXXXX
Please enter node hostname prefix (will append node number) (example: node)
default: node
(QUIT to cancel)
 >
Please enter local domain name (will be used in /etc/hosts file) (example: example.com)
default: localdomain
(QUIT to cancel)
 >
Please enter admin password (> 8 characters) (example: P@ssword!)
default: test1234
(QUIT to cancel)
 >
Please enter User Name (not blank) (example: My Full Name)
default: Windows User
(QUIT to cancel)
 >
Please enter Organization (not blank) (example: My Organization Name)
default: Windows Organization
(QUIT to cancel)
 >
Model Created:
 Label =>  windows_2012_r2_dc
 Template =>  windows_deploy
 Description =>  Windows 2012 R2
 UUID =>  3xFEKusakDJKBrYXeQPtYG
 Image UUID =>  2GLccHsh97YE5BA9H0ghqi

$

As you can see, using one of these linked Windows images is exactly the same as using a Linux or Hypervisor image; the only differences from creating a Linux model is the template name are the additional Windows License Key, User Name, and Organization fields that must be entered for a Windows model.

Creating a Windows policy

Creating a Windows policy is even simpler. The arguments for a Windows policy are exactly the same as those for a Linux or Hypervisor deployment policy (except, of course for the use of the windows_deploy template in the hanlon policy add ... command):

$ hanlon policy add -p windows_deploy -t 'ebig_disk,memsize_2GiB' -l windows_2012_r2_dc -m 3xFEK -e true
Policy Created:
 UUID =>  31GY0H7Dohh7hkjhGhDXXs
 Line Number =>  10
 Label =>  windows_2012_r2_dc
 Enabled =>  true
 Template =>  windows_deploy
 Description =>  Policy for deploying a Windows operating system.
 Tags =>  [ebig_disk, memsize_2GiB]
 Model Label =>  windows_2012_r2_dc
 Broker Target =>  none
 Currently Bound =>  0
 Maximum Bound =>  0
 Bound Counter =>  0

$

As you can see, the result is a Windows deployment policy that can be used to bind the underlying Windows model to a node that matches this policy (based on the tags assigned to that node).

Booting your Windows machine

Once you’ve loaded a Windows ISO into Hanlon as a set of linked Windows images and you’ve created a model and policy based on one of those linked images, the rest is simple and works exactly like it does for the Linux/Hypervisor installs you have already done. Simply configure a node that will be matched to your Windows model by your Windows policy so that it will network boot on the Hanlon server’s network, then power it on. The node will then chain boot (from a PXE-boot via TFTP to an iPXE-boot via Hanlon) and Hanlon will send back an iPXE-boot script that will trigger a WinPE-based unattended Windows install.

The workflow may be a bit different — requiring an extra reboot versus what you’re used to with the Linux/Hypervisor installs you’ve already done with Hanlon to date — but the process internally remains the same. The WinPE image reaches back to Hanlon via a RESTful request and obtains a PowerShell script that it uses to automate the process of setting up an unattended Windows install. In that script, it downloads the appropriate install.wim file from Hanlon along with an autounattend.xml file that will be used to control the unattended Windows install process. The autounattend.xml file that it downloads will be customized based on the specific Hanlon model that the node in question was bound to, and will contain all of the details that are needed to complete the unattended Windows install (the location of the install.wim file, the WIM Index of the image in that install.wim file that should be installed, the Windows license key, the hostname, the Administrator password, etc.). Finally, that script will also download a set of drivers from Hanlon and inject them into the downloaded install.wim. Currently, these drivers are assumed to be packaged in a single drivers.zip file that is available at the root of the static area that we added to Hanlon in our most recent release, but down the line we may end up extending how Hanlon downloads and injects these drivers in order to make this part of the process a bit easier to configure and extend.

With these tasks complete, the script then starts the standard setup process that will manage the unattended Windows install. When the unattended install is complete, you’ll have a fully-functional Windows instance that has been configured to match the parameters from the underlying Hanlon model that it was bound to.

In conclusion…

As always, we hope you all find this new set of features in Hanlon useful in your day-to-day work. We welcome feedback on these new features from anyone in the Hanlon or Windows communities, and look forward to your contributions as well. There are features we haven’t implemented yet (we still haven’t sorted out the process for handing off these Windows nodes to a Hanlon broker, for example), but we thought that even in this early stage of the game the features we’ve added to Hanlon are significant enough that we should release them into the wild, so to speak.

I’d also like to take this opportunity to thank Joe Callen (@jcpowermac) specifically. Without his unending support on the Windows side of this process (and patience with an old Linux/Unix dweeb like me), the seamless interaction between Hanlon and WinPE shown in these changes wouldn’t be nearly so seamless. Joe has worked long and hard on this set of changes, and deserves a great deal of the credit for where we are today.

Announcing the release of Hanlon v2.0

We recently released a major update to Hanlon that is focused on making Hanlon more usable in a production environment, and I’d like to go through some of the changes that we’ve made in that release here, which include:

  • Support for the use of the recently added ‘Hardware ID’ as an identifier for the node in the node and active_model slices
  • Changes that allow for deployment of the tools that make up the Hanlon CLI (the client) separately from the Hanlon server (which provides the RESTful API used by the CLI)
  • Support for new models and new model types
  • A simplified interface for creation of new models, with support for additional (model-specific) parameters in the Hanlon CLI using an answers file (to allow for automation of what was, up until now, an interactive process)
  • Additional power-status and power-control functionality in the node slice for nodes with an attached Baseboard Managment Controller (or BMC)

Overall, our focus in putting together this new release has been on adding features to Hanlon that will make it easier than ever to use Hanlon as part of the suite of tools that you already use to manage the infrastructure in your datacenter. Hopefully this release starts to realize that goal. To help you get started with the new version, here’s a brief outline of what was added in each of these categories.

Hardware ID support in the node and active_model slices

The first big change in the latest version of Hanlon is that the ‘Hardware ID’ of a node can now be used to identify the node in the GET and POST operations supported by the node slice’s RESTful API (and the CLI equivalents to these commands). To accomplish this, a new --hw_id command line flag (or the corresponding -i short form of this flag) has been added to the node slice’s CLI. An example of using this flag might look something like this:

$ hanlon node --hw_id 564DC8E3-22AC-0D46-6001-50B003AECE0B -f attrib

which will return the attributes registered with Hanlon by the Hanlon Microkernel for the specified node (the node with an SMBIOS UUID value of 564DC8E3-22AC-0D46-6001-50B003AECE0B). The corresponding RESTful command would look something like this:

GET /hanlon/v1/node?uuid=564DC8E3-22AC-0D46-6001-50B003AECE0B

As you can see, the Hardware ID value is included as a query parameter in the corresponding GET command against the /node endpoint in the RESTful API.

It should be noted that this capability is supported by any of the node subcommands previously supported by the Hanlon CLI, which includes the display of detailed information about or the field values for a specific node. The only difference is that you can now use this field (the Hardware ID, which is unique to a given node and assigned to that node by the manufacturer) to identify the node you are interested in. Previously you would only be able to use the UUID assigned to that node by Hanlon during the node registration process when executing these same commands (a value that can change over time). Giving users the ability to obtain this same information using the Hardware ID (which will typically be mapped into the SMBIOS UUID of a node, a unique string that is assigned to the node by the manufacturer) should make it much easier to gather the node-related information that you need to manage your environment using (external) automation systems.

It should also be noted that this same capability has also been added to the active_model slice, providing users with the ability to search for the active_model associated with a given node based on that node’s ‘Hardware ID’. An example of this sort of command from the CLI would be something like the following:

$ hanlon active_model --hw_id 564DC8E3-22AC-0D46-6001-50B003AECE0B

and the corresponding RESTful operation would look like this:

GET /hanlon/v1/active_model?uuid=564DC8E3-22AC-0D46-6001-50B003AECE0B

As was the case with the changes made to the node slice, adding the ability to search for an active_model instance based on the Hardware ID associated with a given node should make it much simpler for external systems to use the Hanlon API to determine which active_model instance (if any) is bound to a given node. This should make automated handling of nodes throughout their lifecycle much simpler.

Separation of the Hanlon client and server

Another big change in this version of Hanlon is that the client and server are now completely decoupled from each other. Previously, because of server-side code that was directly executed by the client (rather than relying on a RESTful request) and because of server-side configuration information that was used within the CLI, it was not possible to run an instance of the Hanlon CLI that was truly remote from the perspective of the machine being used to run the Hanlon server. With the changes in this release, such remote execution is now possible. While most of the changes that were necessary to accomplish this were behind the scenes (and, as such, shouldn’t be apparent to the end user), there were some changes made to how the Hanlon configuration is managed that are significant and that users will have to concern themselves with. We will discuss those changes here.

First (and foremost), changes were made to truly separate the Hanlon configuration into two separate files; the client and server configuration. These files are the cli/config/hanlon_client.conf and web/config/hanlon_server.conf files, respectively (all file paths in this posting are relative to the location where Hanlon was installed on your system). Examples of these two files are shown here; first, the client configuration:

$ cat cli/config/hanlon_client.conf
#
# This file is the main configuration for ProjectHanlon
#
# -- this was system generated --
#
#
--- !ruby/object:ProjectHanlon::Config::Client
noun: config
admin_port: 8025
api_port: 8036
api_version: v1
base_path: /hanlon
hanlon_log_level: Logger::ERROR
hanlon_server: 192.168.78.2
http_timeout: 90
$ 

As you can see, this client configuration has been reduced down to just the minimal set of parameters that are necessary for the CLI to do its job (all dependencies on the underlying server configuration parameters have been removed unless they were absolutely necessary). The server configuration file is much more complete. An example of that configuration (which is much closer to the original Razor configuration file in form) is shown here:

$ cat web/config/hanlon_server.conf
#
# This file is the main configuration for ProjectHanlon
#
# -- this was system generated --
#
#
--- !ruby/object:ProjectHanlon::Config::Server
noun: config
admin_port: 8025
api_port: 8036
api_version: v1
base_path: /hanlon
daemon_min_cycle_time: 30
force_mk_uuid: ''
hanlon_log_level: Logger::ERROR
hanlon_server: 192.168.78.2
image_path: /mnt/hgfs/Hanlon/image
ipmi_password: junk2
ipmi_username: test2
ipmi_utility: freeipmi
mk_checkin_interval: 30
mk_checkin_skew: 5
mk_gem_mirror: http://localhost:2158/gem-mirror
mk_gemlist_uri: /gems/gem.list
mk_kmod_install_list_uri: /kmod-install-list
mk_log_level: Logger::ERROR
mk_tce_install_list_uri: /tce-install-list
mk_tce_mirror: http://localhost:2157/tinycorelinux
node_expire_timeout: 180
persist_host: 127.0.0.1
persist_mode: :mongo
persist_password: ''
persist_port: 27017
persist_timeout: 10
persist_username: ''
register_timeout: 120
rz_mk_boot_debug_level: Logger::ERROR
rz_mk_boot_kernel_args: ''
sui_allow_access: 'true'
sui_mount_path: /docs
$ 

Note that it is this configuration file that contains all of the sensitive information that end users shouldn’t be concerned with (where the server is persisting its data and the username and password used by the persistence layer, for example). By separating out these parameters into a separate configuration file we can ensure that the sensitive information contained in this server configuration file can be properly protected, while still providing the ability to connect to the Hanlon server from a remote location (using either the CLI or the RESTful API).

As an aside, we also added a few new configuration parameters to these two files that didn’t appear in previous releases of Hanlon (or Razor). Specifically, we added an http_timeout parameter to the client configuration file (that controls how long the CLI will wait for a response from the RESTful API before timing out, something that is quite useful to have control over when uploading large ISOs through the image slice). This value defaults to 60 seconds (the default for HTTP requests in Ruby). We also added two new server-side configuration parameters:

  • a new ipmi_utility parameter, which controls which IPMI utility should be used to query for and control the power state of a node, more on that later in this posting, and
  • a new persist_dbname parameter to the server configuration, which controls the name of the database that should be used for persistence by the Hanlon server (a useful parameter to be able to set when running spec tests, for example).

Reasonable default values are set for these two new server-side configuration parameters (an empty string and ‘project_hanlon’ respectively), preserving the existing behavior provided by previous versions of Hanlon (and Razor).

With these changes in place it is now possible to deploy the CLI for Hanlon (the cli/hanlon script, its configuration, and all of its dependencies) remotely from the machine on which the Hanlon server is being run (either as a Rackup application under a framework like Puma or directly as a WAR file under a Java servlet container framework like JBoss or Tomcat). Provided the server allows for remote access to the RESTful endpoint used by the CLI, and provided the CLI is configured properly, it should now be possible to use all of the functionality in the CLI in such a remote deployment scenario.

New models and new model types

In this new version of Hanlon, we have actually added some interesting new ‘no-op’ models to the framework and have also extended some existing models to provide support for new features during the OS (or Hypervisor) deployment process. As such, we felt it would be helpful to users (new and old) if we summarized some of these changes.

Two new ‘no-op’ model types (and corresponding policies)

From the beginning, the only way to add a node to Hanlon (or Razor, for that matter) was to let Hanlon discover that node using the Hanlon (or Razor) Microkernel. If Hanlon knew nothing about a given node, then when that node powered up and network booted it would be booted into the Microkernel, and the Microkernel would then checkin and register the node with Hanlon. Hanlon could then make a policy-based decision as to what model (if any) should be bound to a given node based on its hardware profile.

Unfortunately, that made it rather difficult to build an inventory of nodes using Hanlon (and the Hanlon Microkernel) in two scenarios that are quite common in large datacenters:

  • if you didn’t already know what sort of operating system or hypervisor you wanted to provision to a node during the node discovery process, or
  • if the node had already been provisioned with an existing operating system (or hypervisor) and you didn’t want to overwrite that OS/Hypervisor instance with something new

In either of those two scenarios, since an active_model instance was never bound to such nodes (because an operating system or hypervisor was not deployed onto them by Hanlon), any information gathered about those nodes by the Microkernel would simply disappear from Hanlon shortly after they were powered off (and the Microkernel that they were booted into stopped checking in with Hanlon). In an attempt to resolve this issue, there have been suggestions over the past couple of years that perhaps we should come up with a way of ‘manually’ adding nodes to Hanlon (or Razor) to cover these sorts of scenarios, but we felt that this didn’t fit well into the philosophy behind Hanlon (we try to keep everything automated and policy-driven so that it can scale as easily as possible to thousands, or tens of thousands, of nodes). How then could we support adding these sorts of nodes to Hanlon?

The answer, as it turns out, was quite simple. We just added two new ‘no-op’ models (and two corresponding policy types) to the list of models supported by Hanlon. Those two new models (the discover_only and boot_local models) are best described as follows:

  • discover_only — when a model of this type is bound to a node, the node will boot into the Microkernel (every time) whenever the node is (re)booted; this has the effect of allowing for updates to the node inventory in Hanlon (by powering on nodes bound to this type of model) since the Microkernel these nodes are booted into will checkin with Hanlon, register any new facts that it might find during the boot/checkin process with Hanlon, and then power off again.
  • boot_local — when a model of this type is bound to a node, the node in question will boot locally (every time) in response to any (re)boot; no changes are made to the underlying node, but it is added to the inventory of nodes maintained by Hanlon (and that information will be preserved until the boot_local model that was bound to the node is removed).

Keep in mind that before either of these model types are bound to the node, the node would have booted up into the Microkernel, and the Microkernel would have checked in and registered with Hanlon. So binding a node with either of these models will ensure that the node in question has been added to the inventory of nodes maintained by Hanlon. The only difference is how those nodes behave in subsequent (re)boots (as is outlined, above).

Support for a new set of ‘optional parameters’ that can be defined for models of a given type

Previous releases of Hanlon supported the concept of a ‘required metadata hash’ for use in gathering (and storing) any metadata specific to a given model type during the model creation process. For models like the ‘redhat_6’ model, this metadata is actually quite simple (consisting of the root password, node name prefix, and domainname to use during deployment of the OS instance to a node), while for other models (like the ‘vmware_esxi_5’ model) this metadata could actually get quite involved. Not only that, but there were several requests by members of the community over the years to provide a mechanism for specifying additional meta-data parameters for some model instances of a given type but not for others. As an example, in the case of a ‘redhat_6’ model a user might want to specify a partitioning plan (complete with partitions, volume groups, and logical volumes) that they would like to have created or an additional package group that they would like have installed during the node provisioning process, but other users might not want to specify any of these ‘optional metadata parameters’ (preferring a simpler deployment). These sorts of optional metadata parameters were not supported in previous versions of Hanlon (or Razor), since users would be asked for a value for any field that was added to the set of required metadata parameters (and that was the only mechanism provided for specifying model-specific parameters that should be used during the OS/Hypervisor provisioning process).

This version of Hanlon changes all of that. It is now possible to define a set of ‘optional meta-data parameters’ for which a user can provide values when constructing a new model instance (how they do so isn’t specified here, more on that later). If values for these ‘optional parameters’ are not provided, then they assumed to not be assigned a value (unlike the ‘required parameters’, which will always be gathered from the user when a model instance is being created and which are always assigned a value, even if it is a default value). If, on the other hand, values are provided for these parameters when creating a given model instance, then the values assigned for those parameters can be used to add additional features to the OS (or Hypervisor) instances deployed to any nodes bound to that model instance.

Currently, we are only supporting the use of these parameters in the ‘vmware_esxi_5’ model (it’s how we’re allowing for installation of additional VIBs or for the creation of a VSAN using a set of ESX5.x nodes), but we have no doubt that this new feature will quickly be added to additional models. The set of optional parameters supported by models of a given type is still constrained based on what is defined in the code for those models, but this does provide a nice compromise between flexibility (in terms of what sorts of features can be enabled during the OS provisioning process for a given model type) and constraint (giving us the ability to keep the number of models templates to a minimum and keep the process for creating new model instances as simple as possible).

I’ll leave it to Joe Callen (@jcpowermac), who did the vast majority of this work, to explain how he is using these new features to enable the creation of VSANs and the installation of additional VIBs while deploying ESXi using the ‘vmware_esxi_5’ model under this new version Hanlon. He has a nice blog posting of his own that explains the details, here. This is a new and exciting area of development in the Hanlon codebase, and one that we feel will lead to additional model development (as other models are added to or extended by the community).

Support for use of an ‘Answers File’ when creating new models

The second part of adding the ‘optional parameters’ to models that we described above involves how to actually provide values for the ‘optional parameters’ that a user wants to specify when creating a new model. Rather than try to walk the user through some sort of interactive dialog to collect values for these optional parameters (something that we found to be confusing, at best), we decided that we would combine this requirement with a previous enhancement request from another Hanlon user and collect these values using an external ‘Answers File’, which could be used during the model creation/update process to provide values for these optional parameters and for the required metadata hash parameters that also need to be supplied by the user during the model creation process.

To accomplish this, a new command-line flag was added to the Hanlon model slice’s CLI (the --option flag, which can be shortened to -o for convenience) that takes a single argument (the name of the YAML file containing the answers that the user wants to provide). Any optional parameters not included in that answers file are assumed to be blank (not specified), but if the user leaves out any of the required metadata hash parameters for a given model from that answers file an interactive session will be started on the command-line to collect those unspecified required metadata parameters (they are required, after all). The end result is a system that provides users with a great deal of flexibility when it comes to creating an answers file; they could create a file that is very generic and ‘fill in’ the instance-specific required metadata parameters interactively or, if they are trying to drive the model creation process through an external tool of some sort, they could provide an answers file that is very specific to a given model (one that specifies all of the parameters needed to create a given model instance) in order to avoid the need to provide answers interactively. Of course, the old style of providing answers interactively via the CLI is still supported, but only required metadata parameters can be specified this way (not optional ones).

As an example, here’s what the new ‘hanlon model add’ command might look like:

$ hanlon model add -t vmware_esxi_5 -l test-esxi -i 6ZK -o esx-model.yaml

where the esx-model.yaml file looks like this:

$ cat esx-model-params.yaml
root_password: "test1234"
ip_range_network: "10.53.252"
esx_license: "AAAAA-BBBBB-CCCCC-DDDDD-EEEEE"
ip_range_subnet: "255.255.255.0"
ip_range_start: "50"
ip_range_end: "60"
hostname_prefix: "esxi-node"
nameserver: "10.53.252.123"
ntpserver: "10.53.252.246"
vcenter_name: "foovc"
vcenter_datacenter_path: "dc1"
vcenter_cluster_path: "cluster"
packages:
- {url: "http://foo.org/foo1.vib", force: false }
- {url: "http://foo.org/foo2.vib", force: true }
$ 

Note that the YAML file shown above contains a mix of required metadata hash parameters (like the ‘ip_range_network’ and ‘hostname_prefix’ parameters) and optional metadata hash parameters (like the ‘vcenter_name’ and ‘vcenter_datacenter_path’ parameters). Since values are provided for all of the required metadata hash parameters in this answer file, the user would not be asked for any additional information when using it to create a new ‘vmware_esxi_5’ model instance.

Added BMC support

This new release of Hanlon also provides users with the ability to query the power state of a node or to control the power state of a node using the Hanlon node slice (either via the CLI or through the RESTful API). From the CLI, this new functionality is provided via the new ‘–bmc’ command-line flag (which can be abbreviated using the shorter ‘-b’ form if you wish to do so). To obtain the power-state of a node, simply include that flag as part of a ‘hanlon get’ command, for example:

$ hanlon node -i 564DC8E3-22AC-0D46-6001-50B003AECE0B -b -u test -p junk

or

hanlon node 52GX2NDEBiTY47IqTbsjMu --bmc -u test -p junk

which corresponds to the following RESTful operations

GET /hanlon/v1/node/power?ipmi_username=test&ipmi_password=junk&hw_id=564DC8E3-22AC-0D46-6001-50B003AECE0B

or

GET /hanlon/v1/node/52GX2NDEBiTY47IqTbsjMu/power?ipmi_username=test&ipmi_password=junk

Notice that you can include an IPMI username and/or password directly through the command-line interface. Alternatively, you can specify the values to use for these parameters directly in the Hanlon server configuration file (by assigning values to the ipmi_username and ipmi_password fields in this file). If you provide values for these two fields in the server configuration file and also specify values for them when invoking this functionality via the CLI (or in the body/query parameters of the the corresponding RESTful API call), then the values on the CLI override those provided in the server configuration file (giving you the ability to use different usernames and passwords with each BMC in your network if you are so inclined or to use the same username and password with every BMC if you are not).

So we’ve described how you can get the current power state of a given node using the node slice. There are also a corresponding set of commands that can be used to (re)set the power-state of a node. To power a node on, for example, you would run a command that looks like one of these two commands (if you were using the Hanlon CLI):

hanlon node -i 564DC8E3-22AC-0D46-6001-50B003AECE0B --bmc on -u test -p junk

or

hanlon node update 52GX2NDEBiTY47IqTbsjMu --bmc on -u test -p junk

which would correspond to the following pair of RESTful operations:

POST /hanlon/v1/node/power

or

POST /hanlon/v1/node/52GX2NDEBiTY47IqTbsjMu/power

Since these are POST commands, the new power-state, IPMI username, IPMI password, and Hardware ID (if necessary) are all specified as fields in the JSON string that makes up the body of the request. For the first request (where you want to change the power-state of a node with a given Hardware ID) that body would look like this:

{"power_command":"on", "ipmi_username":"test", "ipmi_password":"junk", "hw_id":"564DC8E3-22AC-0D46-6001-50B003AECE0B"}

while for the second (where the node is identified by UUID, not Hardware ID) the body would look like this:

{"power_command":"on", "ipmi_username":"test", "ipmi_password":"junk"}

It should be noted here that for this functionality in the node slice, an update command from the CLI corresponds to a RESTful POST operation, not a PUT operation. This differs from the update command for the other slices in the Hanlon CLI (which map to a PUT, not a POST), but it was felt that this was the right mapping to make. The reason behind this choice is that a PUT operation is assumed to be idempotent in a RESTful interface (which is true for the update commands supported by the other slices in Hanlon), but for the node slice, the update command (which updates the power state of a given node) is not an operation that we can assume to be idempotent.

It should also be noted, that for this functionality to work, not only does the node in question have to have a Baseboard Management Controller, you also must have discovered that node using a relatively new (v2.0.0 or later) version of the Hanlon Microkernel (older versions will not report the facts necessary to map a given node to its BMC) and you’ll have to have one of the two recognized IPMI utilities installed locally on the Hanlon server node (ipmitool or freeipmi). Without one of these two utilities available on the Hanlon node, an error will be thrown if you try to execute one of these commands (to determine the power-status of a node or change the power-state of a node).

Finally, there are a limited number of power states supported when updating the power-state of a node via the node slice. The complete list is as follows: ‘on’, ‘off’, ‘reset’, ‘cycle’ or ‘softShutdown’. Attempting to use an unrecognized state will result in an error being thrown by the node slice’s RESTful API. Attempting to transition a node into an incompatible state (attempting a ‘softShutdown’ on a node that is already powered off, for example) will likely also result in an error being thrown.

In closing

After several weeks of intense development to add the functionality needed to support a NextGen datacenter lab environment we are managing with Hanlon, we feel that the changes we’ve made are ready for prime time. I’d like to thank several individuals who made it all possible, specifically three of my CSC colleagues who have put in long hours on this project over the last month or two:

  • Joe Callen (@jcpowermac)
  • Russell Callen (@mtnbikenc) and
  • Sankar Vema (@sankarvema)

without their tireless work, we would not have nearly as polished a product as you see here. In addition, I’d like to thank the following community members for their help in patching a few of the holes that we found in the previous release, specifically:

  • Cody Bunch (@bunchc), from Rackspace
  • JJ Asghar (@jjasghar), from Chef and
  • Seth Thomas (@cheeseplus), also from Chef

their contributions, while smaller, are no less significant. Thanks again to everyone who made this release possible, and we look forward to building out this community of developers further moving forward.