Announcing the release of Hanlon v2.0

We recently released a major update to Hanlon that is focused on making Hanlon more usable in a production environment, and I’d like to go through some of the changes that we’ve made in that release here, which include:

  • Support for the use of the recently added ‘Hardware ID’ as an identifier for the node in the node and active_model slices
  • Changes that allow for deployment of the tools that make up the Hanlon CLI (the client) separately from the Hanlon server (which provides the RESTful API used by the CLI)
  • Support for new models and new model types
  • A simplified interface for creation of new models, with support for additional (model-specific) parameters in the Hanlon CLI using an answers file (to allow for automation of what was, up until now, an interactive process)
  • Additional power-status and power-control functionality in the node slice for nodes with an attached Baseboard Managment Controller (or BMC)

Overall, our focus in putting together this new release has been on adding features to Hanlon that will make it easier than ever to use Hanlon as part of the suite of tools that you already use to manage the infrastructure in your datacenter. Hopefully this release starts to realize that goal. To help you get started with the new version, here’s a brief outline of what was added in each of these categories.

Hardware ID support in the node and active_model slices

The first big change in the latest version of Hanlon is that the ‘Hardware ID’ of a node can now be used to identify the node in the GET and POST operations supported by the node slice’s RESTful API (and the CLI equivalents to these commands). To accomplish this, a new --hw_id command line flag (or the corresponding -i short form of this flag) has been added to the node slice’s CLI. An example of using this flag might look something like this:

$ hanlon node --hw_id 564DC8E3-22AC-0D46-6001-50B003AECE0B -f attrib

which will return the attributes registered with Hanlon by the Hanlon Microkernel for the specified node (the node with an SMBIOS UUID value of 564DC8E3-22AC-0D46-6001-50B003AECE0B). The corresponding RESTful command would look something like this:

GET /hanlon/v1/node?uuid=564DC8E3-22AC-0D46-6001-50B003AECE0B

As you can see, the Hardware ID value is included as a query parameter in the corresponding GET command against the /node endpoint in the RESTful API.

It should be noted that this capability is supported by any of the node subcommands previously supported by the Hanlon CLI, which includes the display of detailed information about or the field values for a specific node. The only difference is that you can now use this field (the Hardware ID, which is unique to a given node and assigned to that node by the manufacturer) to identify the node you are interested in. Previously you would only be able to use the UUID assigned to that node by Hanlon during the node registration process when executing these same commands (a value that can change over time). Giving users the ability to obtain this same information using the Hardware ID (which will typically be mapped into the SMBIOS UUID of a node, a unique string that is assigned to the node by the manufacturer) should make it much easier to gather the node-related information that you need to manage your environment using (external) automation systems.

It should also be noted that this same capability has also been added to the active_model slice, providing users with the ability to search for the active_model associated with a given node based on that node’s ‘Hardware ID’. An example of this sort of command from the CLI would be something like the following:

$ hanlon active_model --hw_id 564DC8E3-22AC-0D46-6001-50B003AECE0B

and the corresponding RESTful operation would look like this:

GET /hanlon/v1/active_model?uuid=564DC8E3-22AC-0D46-6001-50B003AECE0B

As was the case with the changes made to the node slice, adding the ability to search for an active_model instance based on the Hardware ID associated with a given node should make it much simpler for external systems to use the Hanlon API to determine which active_model instance (if any) is bound to a given node. This should make automated handling of nodes throughout their lifecycle much simpler.

Separation of the Hanlon client and server

Another big change in this version of Hanlon is that the client and server are now completely decoupled from each other. Previously, because of server-side code that was directly executed by the client (rather than relying on a RESTful request) and because of server-side configuration information that was used within the CLI, it was not possible to run an instance of the Hanlon CLI that was truly remote from the perspective of the machine being used to run the Hanlon server. With the changes in this release, such remote execution is now possible. While most of the changes that were necessary to accomplish this were behind the scenes (and, as such, shouldn’t be apparent to the end user), there were some changes made to how the Hanlon configuration is managed that are significant and that users will have to concern themselves with. We will discuss those changes here.

First (and foremost), changes were made to truly separate the Hanlon configuration into two separate files; the client and server configuration. These files are the cli/config/hanlon_client.conf and web/config/hanlon_server.conf files, respectively (all file paths in this posting are relative to the location where Hanlon was installed on your system). Examples of these two files are shown here; first, the client configuration:

$ cat cli/config/hanlon_client.conf
#
# This file is the main configuration for ProjectHanlon
#
# -- this was system generated --
#
#
--- !ruby/object:ProjectHanlon::Config::Client
noun: config
admin_port: 8025
api_port: 8036
api_version: v1
base_path: /hanlon
hanlon_log_level: Logger::ERROR
hanlon_server: 192.168.78.2
http_timeout: 90
$ 

As you can see, this client configuration has been reduced down to just the minimal set of parameters that are necessary for the CLI to do its job (all dependencies on the underlying server configuration parameters have been removed unless they were absolutely necessary). The server configuration file is much more complete. An example of that configuration (which is much closer to the original Razor configuration file in form) is shown here:

$ cat web/config/hanlon_server.conf
#
# This file is the main configuration for ProjectHanlon
#
# -- this was system generated --
#
#
--- !ruby/object:ProjectHanlon::Config::Server
noun: config
admin_port: 8025
api_port: 8036
api_version: v1
base_path: /hanlon
daemon_min_cycle_time: 30
force_mk_uuid: ''
hanlon_log_level: Logger::ERROR
hanlon_server: 192.168.78.2
image_path: /mnt/hgfs/Hanlon/image
ipmi_password: junk2
ipmi_username: test2
ipmi_utility: freeipmi
mk_checkin_interval: 30
mk_checkin_skew: 5
mk_gem_mirror: http://localhost:2158/gem-mirror
mk_gemlist_uri: /gems/gem.list
mk_kmod_install_list_uri: /kmod-install-list
mk_log_level: Logger::ERROR
mk_tce_install_list_uri: /tce-install-list
mk_tce_mirror: http://localhost:2157/tinycorelinux
node_expire_timeout: 180
persist_host: 127.0.0.1
persist_mode: :mongo
persist_password: ''
persist_port: 27017
persist_timeout: 10
persist_username: ''
register_timeout: 120
rz_mk_boot_debug_level: Logger::ERROR
rz_mk_boot_kernel_args: ''
sui_allow_access: 'true'
sui_mount_path: /docs
$ 

Note that it is this configuration file that contains all of the sensitive information that end users shouldn’t be concerned with (where the server is persisting its data and the username and password used by the persistence layer, for example). By separating out these parameters into a separate configuration file we can ensure that the sensitive information contained in this server configuration file can be properly protected, while still providing the ability to connect to the Hanlon server from a remote location (using either the CLI or the RESTful API).

As an aside, we also added a few new configuration parameters to these two files that didn’t appear in previous releases of Hanlon (or Razor). Specifically, we added an http_timeout parameter to the client configuration file (that controls how long the CLI will wait for a response from the RESTful API before timing out, something that is quite useful to have control over when uploading large ISOs through the image slice). This value defaults to 60 seconds (the default for HTTP requests in Ruby). We also added two new server-side configuration parameters:

  • a new ipmi_utility parameter, which controls which IPMI utility should be used to query for and control the power state of a node, more on that later in this posting, and
  • a new persist_dbname parameter to the server configuration, which controls the name of the database that should be used for persistence by the Hanlon server (a useful parameter to be able to set when running spec tests, for example).

Reasonable default values are set for these two new server-side configuration parameters (an empty string and ‘project_hanlon’ respectively), preserving the existing behavior provided by previous versions of Hanlon (and Razor).

With these changes in place it is now possible to deploy the CLI for Hanlon (the cli/hanlon script, its configuration, and all of its dependencies) remotely from the machine on which the Hanlon server is being run (either as a Rackup application under a framework like Puma or directly as a WAR file under a Java servlet container framework like JBoss or Tomcat). Provided the server allows for remote access to the RESTful endpoint used by the CLI, and provided the CLI is configured properly, it should now be possible to use all of the functionality in the CLI in such a remote deployment scenario.

New models and new model types

In this new version of Hanlon, we have actually added some interesting new ‘no-op’ models to the framework and have also extended some existing models to provide support for new features during the OS (or Hypervisor) deployment process. As such, we felt it would be helpful to users (new and old) if we summarized some of these changes.

Two new ‘no-op’ model types (and corresponding policies)

From the beginning, the only way to add a node to Hanlon (or Razor, for that matter) was to let Hanlon discover that node using the Hanlon (or Razor) Microkernel. If Hanlon knew nothing about a given node, then when that node powered up and network booted it would be booted into the Microkernel, and the Microkernel would then checkin and register the node with Hanlon. Hanlon could then make a policy-based decision as to what model (if any) should be bound to a given node based on its hardware profile.

Unfortunately, that made it rather difficult to build an inventory of nodes using Hanlon (and the Hanlon Microkernel) in two scenarios that are quite common in large datacenters:

  • if you didn’t already know what sort of operating system or hypervisor you wanted to provision to a node during the node discovery process, or
  • if the node had already been provisioned with an existing operating system (or hypervisor) and you didn’t want to overwrite that OS/Hypervisor instance with something new

In either of those two scenarios, since an active_model instance was never bound to such nodes (because an operating system or hypervisor was not deployed onto them by Hanlon), any information gathered about those nodes by the Microkernel would simply disappear from Hanlon shortly after they were powered off (and the Microkernel that they were booted into stopped checking in with Hanlon). In an attempt to resolve this issue, there have been suggestions over the past couple of years that perhaps we should come up with a way of ‘manually’ adding nodes to Hanlon (or Razor) to cover these sorts of scenarios, but we felt that this didn’t fit well into the philosophy behind Hanlon (we try to keep everything automated and policy-driven so that it can scale as easily as possible to thousands, or tens of thousands, of nodes). How then could we support adding these sorts of nodes to Hanlon?

The answer, as it turns out, was quite simple. We just added two new ‘no-op’ models (and two corresponding policy types) to the list of models supported by Hanlon. Those two new models (the discover_only and boot_local models) are best described as follows:

  • discover_only — when a model of this type is bound to a node, the node will boot into the Microkernel (every time) whenever the node is (re)booted; this has the effect of allowing for updates to the node inventory in Hanlon (by powering on nodes bound to this type of model) since the Microkernel these nodes are booted into will checkin with Hanlon, register any new facts that it might find during the boot/checkin process with Hanlon, and then power off again.
  • boot_local — when a model of this type is bound to a node, the node in question will boot locally (every time) in response to any (re)boot; no changes are made to the underlying node, but it is added to the inventory of nodes maintained by Hanlon (and that information will be preserved until the boot_local model that was bound to the node is removed).

Keep in mind that before either of these model types are bound to the node, the node would have booted up into the Microkernel, and the Microkernel would have checked in and registered with Hanlon. So binding a node with either of these models will ensure that the node in question has been added to the inventory of nodes maintained by Hanlon. The only difference is how those nodes behave in subsequent (re)boots (as is outlined, above).

Support for a new set of ‘optional parameters’ that can be defined for models of a given type

Previous releases of Hanlon supported the concept of a ‘required metadata hash’ for use in gathering (and storing) any metadata specific to a given model type during the model creation process. For models like the ‘redhat_6’ model, this metadata is actually quite simple (consisting of the root password, node name prefix, and domainname to use during deployment of the OS instance to a node), while for other models (like the ‘vmware_esxi_5’ model) this metadata could actually get quite involved. Not only that, but there were several requests by members of the community over the years to provide a mechanism for specifying additional meta-data parameters for some model instances of a given type but not for others. As an example, in the case of a ‘redhat_6’ model a user might want to specify a partitioning plan (complete with partitions, volume groups, and logical volumes) that they would like to have created or an additional package group that they would like have installed during the node provisioning process, but other users might not want to specify any of these ‘optional metadata parameters’ (preferring a simpler deployment). These sorts of optional metadata parameters were not supported in previous versions of Hanlon (or Razor), since users would be asked for a value for any field that was added to the set of required metadata parameters (and that was the only mechanism provided for specifying model-specific parameters that should be used during the OS/Hypervisor provisioning process).

This version of Hanlon changes all of that. It is now possible to define a set of ‘optional meta-data parameters’ for which a user can provide values when constructing a new model instance (how they do so isn’t specified here, more on that later). If values for these ‘optional parameters’ are not provided, then they assumed to not be assigned a value (unlike the ‘required parameters’, which will always be gathered from the user when a model instance is being created and which are always assigned a value, even if it is a default value). If, on the other hand, values are provided for these parameters when creating a given model instance, then the values assigned for those parameters can be used to add additional features to the OS (or Hypervisor) instances deployed to any nodes bound to that model instance.

Currently, we are only supporting the use of these parameters in the ‘vmware_esxi_5’ model (it’s how we’re allowing for installation of additional VIBs or for the creation of a VSAN using a set of ESX5.x nodes), but we have no doubt that this new feature will quickly be added to additional models. The set of optional parameters supported by models of a given type is still constrained based on what is defined in the code for those models, but this does provide a nice compromise between flexibility (in terms of what sorts of features can be enabled during the OS provisioning process for a given model type) and constraint (giving us the ability to keep the number of models templates to a minimum and keep the process for creating new model instances as simple as possible).

I’ll leave it to Joe Callen (@jcpowermac), who did the vast majority of this work, to explain how he is using these new features to enable the creation of VSANs and the installation of additional VIBs while deploying ESXi using the ‘vmware_esxi_5’ model under this new version Hanlon. He has a nice blog posting of his own that explains the details, here. This is a new and exciting area of development in the Hanlon codebase, and one that we feel will lead to additional model development (as other models are added to or extended by the community).

Support for use of an ‘Answers File’ when creating new models

The second part of adding the ‘optional parameters’ to models that we described above involves how to actually provide values for the ‘optional parameters’ that a user wants to specify when creating a new model. Rather than try to walk the user through some sort of interactive dialog to collect values for these optional parameters (something that we found to be confusing, at best), we decided that we would combine this requirement with a previous enhancement request from another Hanlon user and collect these values using an external ‘Answers File’, which could be used during the model creation/update process to provide values for these optional parameters and for the required metadata hash parameters that also need to be supplied by the user during the model creation process.

To accomplish this, a new command-line flag was added to the Hanlon model slice’s CLI (the --option flag, which can be shortened to -o for convenience) that takes a single argument (the name of the YAML file containing the answers that the user wants to provide). Any optional parameters not included in that answers file are assumed to be blank (not specified), but if the user leaves out any of the required metadata hash parameters for a given model from that answers file an interactive session will be started on the command-line to collect those unspecified required metadata parameters (they are required, after all). The end result is a system that provides users with a great deal of flexibility when it comes to creating an answers file; they could create a file that is very generic and ‘fill in’ the instance-specific required metadata parameters interactively or, if they are trying to drive the model creation process through an external tool of some sort, they could provide an answers file that is very specific to a given model (one that specifies all of the parameters needed to create a given model instance) in order to avoid the need to provide answers interactively. Of course, the old style of providing answers interactively via the CLI is still supported, but only required metadata parameters can be specified this way (not optional ones).

As an example, here’s what the new ‘hanlon model add’ command might look like:

$ hanlon model add -t vmware_esxi_5 -l test-esxi -i 6ZK -o esx-model.yaml

where the esx-model.yaml file looks like this:

$ cat esx-model-params.yaml
root_password: "test1234"
ip_range_network: "10.53.252"
esx_license: "AAAAA-BBBBB-CCCCC-DDDDD-EEEEE"
ip_range_subnet: "255.255.255.0"
ip_range_start: "50"
ip_range_end: "60"
hostname_prefix: "esxi-node"
nameserver: "10.53.252.123"
ntpserver: "10.53.252.246"
vcenter_name: "foovc"
vcenter_datacenter_path: "dc1"
vcenter_cluster_path: "cluster"
packages:
- {url: "http://foo.org/foo1.vib", force: false }
- {url: "http://foo.org/foo2.vib", force: true }
$ 

Note that the YAML file shown above contains a mix of required metadata hash parameters (like the ‘ip_range_network’ and ‘hostname_prefix’ parameters) and optional metadata hash parameters (like the ‘vcenter_name’ and ‘vcenter_datacenter_path’ parameters). Since values are provided for all of the required metadata hash parameters in this answer file, the user would not be asked for any additional information when using it to create a new ‘vmware_esxi_5’ model instance.

Added BMC support

This new release of Hanlon also provides users with the ability to query the power state of a node or to control the power state of a node using the Hanlon node slice (either via the CLI or through the RESTful API). From the CLI, this new functionality is provided via the new ‘–bmc’ command-line flag (which can be abbreviated using the shorter ‘-b’ form if you wish to do so). To obtain the power-state of a node, simply include that flag as part of a ‘hanlon get’ command, for example:

$ hanlon node -i 564DC8E3-22AC-0D46-6001-50B003AECE0B -b -u test -p junk

or

hanlon node 52GX2NDEBiTY47IqTbsjMu --bmc -u test -p junk

which corresponds to the following RESTful operations

GET /hanlon/v1/node/power?ipmi_username=test&ipmi_password=junk&hw_id=564DC8E3-22AC-0D46-6001-50B003AECE0B

or

GET /hanlon/v1/node/52GX2NDEBiTY47IqTbsjMu/power?ipmi_username=test&ipmi_password=junk

Notice that you can include an IPMI username and/or password directly through the command-line interface. Alternatively, you can specify the values to use for these parameters directly in the Hanlon server configuration file (by assigning values to the ipmi_username and ipmi_password fields in this file). If you provide values for these two fields in the server configuration file and also specify values for them when invoking this functionality via the CLI (or in the body/query parameters of the the corresponding RESTful API call), then the values on the CLI override those provided in the server configuration file (giving you the ability to use different usernames and passwords with each BMC in your network if you are so inclined or to use the same username and password with every BMC if you are not).

So we’ve described how you can get the current power state of a given node using the node slice. There are also a corresponding set of commands that can be used to (re)set the power-state of a node. To power a node on, for example, you would run a command that looks like one of these two commands (if you were using the Hanlon CLI):

hanlon node -i 564DC8E3-22AC-0D46-6001-50B003AECE0B --bmc on -u test -p junk

or

hanlon node update 52GX2NDEBiTY47IqTbsjMu --bmc on -u test -p junk

which would correspond to the following pair of RESTful operations:

POST /hanlon/v1/node/power

or

POST /hanlon/v1/node/52GX2NDEBiTY47IqTbsjMu/power

Since these are POST commands, the new power-state, IPMI username, IPMI password, and Hardware ID (if necessary) are all specified as fields in the JSON string that makes up the body of the request. For the first request (where you want to change the power-state of a node with a given Hardware ID) that body would look like this:

{"power_command":"on", "ipmi_username":"test", "ipmi_password":"junk", "hw_id":"564DC8E3-22AC-0D46-6001-50B003AECE0B"}

while for the second (where the node is identified by UUID, not Hardware ID) the body would look like this:

{"power_command":"on", "ipmi_username":"test", "ipmi_password":"junk"}

It should be noted here that for this functionality in the node slice, an update command from the CLI corresponds to a RESTful POST operation, not a PUT operation. This differs from the update command for the other slices in the Hanlon CLI (which map to a PUT, not a POST), but it was felt that this was the right mapping to make. The reason behind this choice is that a PUT operation is assumed to be idempotent in a RESTful interface (which is true for the update commands supported by the other slices in Hanlon), but for the node slice, the update command (which updates the power state of a given node) is not an operation that we can assume to be idempotent.

It should also be noted, that for this functionality to work, not only does the node in question have to have a Baseboard Management Controller, you also must have discovered that node using a relatively new (v2.0.0 or later) version of the Hanlon Microkernel (older versions will not report the facts necessary to map a given node to its BMC) and you’ll have to have one of the two recognized IPMI utilities installed locally on the Hanlon server node (ipmitool or freeipmi). Without one of these two utilities available on the Hanlon node, an error will be thrown if you try to execute one of these commands (to determine the power-status of a node or change the power-state of a node).

Finally, there are a limited number of power states supported when updating the power-state of a node via the node slice. The complete list is as follows: ‘on’, ‘off’, ‘reset’, ‘cycle’ or ‘softShutdown’. Attempting to use an unrecognized state will result in an error being thrown by the node slice’s RESTful API. Attempting to transition a node into an incompatible state (attempting a ‘softShutdown’ on a node that is already powered off, for example) will likely also result in an error being thrown.

In closing

After several weeks of intense development to add the functionality needed to support a NextGen datacenter lab environment we are managing with Hanlon, we feel that the changes we’ve made are ready for prime time. I’d like to thank several individuals who made it all possible, specifically three of my CSC colleagues who have put in long hours on this project over the last month or two:

  • Joe Callen (@jcpowermac)
  • Russell Callen (@mtnbikenc) and
  • Sankar Vema (@sankarvema)

without their tireless work, we would not have nearly as polished a product as you see here. In addition, I’d like to thank the following community members for their help in patching a few of the holes that we found in the previous release, specifically:

  • Cody Bunch (@bunchc), from Rackspace
  • JJ Asghar (@jjasghar), from Chef and
  • Seth Thomas (@cheeseplus), also from Chef

their contributions, while smaller, are no less significant. Thanks again to everyone who made this release possible, and we look forward to building out this community of developers further moving forward.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s