A closer look at Runecast

Last week, I had the pleasure of catching up with a new startup called Runecast. These guys are doing something that is very close to my heart. As systems become more and more complex, and with fewer people taking on more responsibility, highlighting potential issues, and providing descriptive guidance to resolving an issue is now critical. This is something that is resonating in the world of HCI, hyper-converged infrastructure, where the vSphere administrator may also be the storage administrator, and perhaps the network administrator too. This is where Runecast come in. Using a myriad of resources such as VMware’s Knowledgebase system, Security Hardening Guide, various Best Practices, and other assorted information, Runecast can monitor your vSphere infrastructure and bring to your attention the need for some remediation. This could be because something in the logs matched an issue reported in a KB, or new hosts have been added to a cluster which have not been security hardened, or that VMware has released a new patch or update, and it is relevant to your environment.

Setup

The Runecast product comes as a virtual appliance (OVA). Simply deploy it in your infrastructure and connect it to your vCenter Servers. The appliance needs 2 vCPUs and 6GB RAM. The latest appliance, version 1.5 (which released today incidentally), has 2 x VMDKs (40GB). This is primarily to facilitate a new feature in v1.5 which allows the appliance to gather logs and provide reporting on multiple vCenter Servers. Once logged in, the appliance can be setup to monitor ESXi hosts as well as VMs. For ESXi hosts, the syslog is redirected to the appliance, and the appropriate firewall rules are configured. Virtual Machine log output can also be redirected to the appliance. This is done by adding an entry to the VM’s .vmx, and then the VM needs to be powered cycled or migrated for the update to take effect. The appliance only needs internet access to download new updates. However, updates can be provided in other ways for sites that do not have access to the outside world. This diagram provides a basic overview of the architecture.

We were told that Runecast are currently making new updates every 2 weeks on average, but if there is a critical update from VMware, they will push this to their users quicker than that.

Demo

I must say that the demo was very intuitive. After about 30 minutes, I felt like I would be able to drive this very easily myself. Stanimir Markov demonstrated the product to us and during the demo, we saw examples of issues related to security hardening being highlighted, missing best practices, as well as alerts being generated because the log analysis caught something  that was highlighted in a VMware Knowledgebase article. I then deployed it myself in my own lab, and had it running in a matter of minutes. Probably the easiest way to get a feel for it is via some of the screenshots that I took. Here is a nice one which highlights whether a bunch of different best practices have passed or failed, and also how many objects in the inventory are impacted by this check.

This is another nice one – KBs discovered. This is highlighting whether or not the criteria in a certain knowledgebase article is applicable in this environment, and again how many objects are affected. Not only that, but each of those objects can then be queried to get even further detail, such as the ESXi host below. You’ll also notice that event of the alerts/warnings comes with a severity level, which the Runecast team determine based on the impact the “known issue” can have on your environment. As you can see below, this one could cause a PSOD (Purple Screen of Death), so it is categorized as “Critical” from an availability perspective.

The other nice feature is that you can read the KB articles via the appliance interface. There is no need to connect to the VMware KB site to review the content. Very useful again for those sites that do not have internet access.

I’ll just add one more screenshot that I thought was interesting. This was from the security hardening view. I know that this was a very important category for a lot of customers, but it requires a lot of due diligence to make sure it was implemented correctly. This is especially true in HCI, where you might be regularly scaling out the HCI system by adding new hosts to the cluster. Manually making sure all the security hardening is in place can be tedious. With Runecast, you can verify that the security hardening changes have indeed been implemented:

While I haven’t been able to touch on all aspects of the Runecast interface in this post, I was very impressed by its simplicity, and ease-of-use. Compared to a lot of other interfaces, it seemed very intuitive to use, with no steep learning-curve needed. Other items that impressed me were the ability to get an inventory view, and see how many alerts are associated with each host, or VM, or datastore, or network, etc. I also liked the filtering mechanism, where some alerts could be ignored temporarily or permanently, perhaps during a planned maintenance period.

One limitation is around remote alerting. Right now this is only available via email, but the Runecast team are working on additional notification mechanisms, such as SNMP traps and web hooks for applications such as Slack, etc. This is feedback that they have heard from many of their customers.

About Runecast

Runecast are currently up at 10 full time employees, with another 4 part time employees. The majority of the development work is being carried out in the Czech Republic, and they have a presence in many other countries. Runecast offer a free 30 day trial of their product, and I also believe the VMware vExperts have an access to an NFR license. Licensing is based on an annual subscription rate, which I understood to be $250 per CPU per year. Runecast Analyzer can be downloaded here.

I must say that I was impressed by this product. Like I said at the beginning of this post, as HCI becomes more prevalent, the onus will be on fewer people to manage more of the infrastructure. Those people are invariably the vSphere admins, and tooling will be critical in reducing troubleshooting time. The next step will not just be proactive highlighting of potential issues, but prescriptive guidance and remediation in the event of a failure.

Runecast are participating in a number of VMware User Group meetings globally this year, and they will also be at VMworld. Go check them out if you see them.

The post A closer look at Runecast appeared first on CormacHogan.com.

2-node vSAN topologies review

There has been a lot of discussion in the past around supported topologies for 2-node vSAN, specifically around where we can host the witness. Now my good pal Duncan has already highlighted some of this in his blog post here, but the questions continue to come up about where I can, and where I cannot place the witness for a 2-node vSAN deployment. I also want to highlight that many of these configuration considerations are covered by our official documentation. For example, there is the very comprehensive VMware Virtual SAN 6.2 for Remote Office and Branch Office Deployment Reference Architecture which talks about hosting the witness back in a primary data center, as well as another Reference Architecture document which covers Running VMware vSAN Witness Appliance in VMware vCloud Air. So considering all of the above, let’s look at some topologies that are supported with 2-node vSAN deployments, and one which ones are not:

Witness running in the main DC

In this full example, we fully support having the witness (W) run remotely on another vSphere site, such as back in your primary datacenter. This is covered in detail in the VMware Virtual SAN 6.2 for Remote Office and Branch Office Deployment Reference Architecture  mentioned earlier.

Witness running in vCloud Air

In this next example, we fully support having the witness (W1) run remotely in vCloud Air. This is covered in detail in the Running VMware vSAN Witness Appliance in VMware vCloud Air Reference Architecture mentioned earlier.

Witness running on another standard vSAN deployment

Now this one is interesting. A common question is whether or not one can run the witness (W) on a vSAN deployment back on the main DC. The answer is yes, this is fully supported. The crux of the matter, as stated by the vSAN Lead Engineer Christian Dickmann, is that “We support any vSphere to run the witness that has independent failure properties”. So in other words, any failure on the 2-node vSAN at the remote site will not impact the availability of the standard vSAN environment at the main DC.

Witness running on another 2-node vSAN deployment, and vice-versa

This final configuration is the one which Duncan has described in detail on his post, so I won’t go into it too much. Suffice to say that this configuration breaks the guidance around “We support any vSphere to run the witness that has independent failure properties.” In this case there is an inter-dependency between the 2-node vSAN deployments at each of the remote sites, as each site hosts the witness of the other 2-node deployment (W1 is the witness for the 2-node vSAN deployment at remote site 1, and W2 is the witness for the 2-node vSAN deployment at remote site 2). Thus if one site has a failure, it impacts the availability of the other site. [Update] As of March 16th, 2017, VMware has change our stance with around this configuration. We will now support this through our RPQ process. There are several constraints with this deployment and  customers need to fully understand and agree with those for us to approve the RPQ. So we will change this to not recommend, but supported via RPQ.

Hope this helps clarify the support around the different 2-node topologies, especially for witness placement.

Licensing

There is one final topic that I wish to bring up with 2-node + witness deployments, and that is around licensing. Note that even though the witness is an appliance, it is an ESXi host running in a VM. And although we supply a license with the appliance, it will still consume a license in vCenter when it comes to management. For example, say you deploy a 2-node vSAN. The 2-node vSAN will need 2 ESXi hosts at the remote site, but  there may be a 3rd physical server that could be used for hosting vCenter as well as the witness appliance. If you are using a vSphere Essentials license, you will not be able to add the witness appliance as vSphere Essentials can only manage 3 hosts. There is some discussion about this internally at VMware at the moment, but as of right now, this is a restriction that you may encounter with vSphere Essentials.

The post 2-node vSAN topologies review appeared first on CormacHogan.com.

vSphere 6.5 p01 – Important patch for users of Automated UNMAP

VMware has just announced the release of vSphere 6.5 p01 (Patch ESXi-6.5.0-20170304001-standard). While there are a number of different issues addressed in the patch, there is one in particular that I wanted to bring to your attention. Automated UNMAP is a feature that we introduced in vSphere 6.5. This patch contains a fix for some odd behaviour seen with the new Automated UNMAP feature. The issue has only been observed with certain Guest OS, certain filesystems, and a certain block sizes format. KB article 2148987 for the patch describes it as follows:

Tools in guest operating system might send unmap requests that are not 
aligned to the VMFS unmap granularity. Such requests are not passed to 
the storage array for space reclamation. In result, you might not be able 
to free space on the storage array.

It would seem that when a Windows NTFS filesystem is formatted with 4KB blocks, Automated UNMAP is not working. However if the NTFS is formatted with a larger block size, say 32KB or 64KB, then the Automated UNMAP works just fine. After investigating this internally, the issue seems to be related to the alignment of the UNMAP requests that the Guest OS is sending down. These have start offsets which are not aligned on the required 1 MB boundary, which is a requirement for Automated UNMAP to work. For VMFS to process the UNMAP, the requests have to arrive in 1MB aligned, and in 1MB multiples. Even though the NTFS partition in the Guest OS is aligned correctly, the UNMAP requests are not aligned, so we cannot do anything with them.

Our engineering team also made the observation that when some of the filesystem internal files grow to a certain size, the starting clusters which are available for allocation are not aligned on 1MB boundaries. When subsequent file truncate/trim requests come in, the corresponding UNMAP requests are not aligned properly.

While investigations continue into why NTFS is behaving this way, we have provided an interim solution in vSphere 6.5 p01. Now when a Guest OS sends an UNMAP request, and the starting block or ending block offset is unaligned to the configured UNMAP granularity, VMFS will now UNMAP as many of the 1MB blocks in the request as possible, and zero out the misaligned ones (which should only be the misaligned beginning of the UNMAP request, or the misaligned end of the UNMAP request, or both).

If testing this for yourself, you can use something like the “optimize drive” utility on Windows to send SCSI UNMAP command to reclaim storage, e.g.

defrag.exe /O [/G] E:

Note that /G is not supported on some Windows versions. On Linux, tools like fstrim or can be used, e.g.

# sg_unmap -l 4097 -n 40960 -v /dev/sdb
 unmap cdb: 42 00 00 00 00 00 00 00 18 00

The post vSphere 6.5 p01 – Important patch for users of Automated UNMAP appeared first on CormacHogan.com.

Cool Tool – vCheck Daily Report for NSX

vCheck is a PowerShell HTML framework script, the script is designed to run as a scheduled task before you get into the office to present you with key information via an email directly to your inbox in a nice easily readable format.

This script picks on the key known issues and potential issues scripted as plugins for various technologies written as Powershell scripts and reports it all in one place so all you do in the morning is check your email.


One of they key things about this report is if there is no issue in a particular place you will not receive that section in the email, for example if there are no datastores with less than 5% free space (configurable) then the disk space section in the virtual infrastructure version of this script, it will not show in the email, this ensures that you have only the information you need in front of you when you get into the office.

This script is not to be confused with an Audit script, although the reporting framework can also be used for auditing scripts too. I don't want to remind you that you have 5 hosts and what there names are and how many CPUs they have each and every day as you don't want to read that kind of information unless you need it, this script will only tell you about problem areas with your infrastructure.

Intel Optane support for vSAN, first HCI solution to deliver it

Advertise here with BSA


I am in Australia this week for the Sydney and Melbourne VMUG UserCon’s. Had a bunch of meetings yesterday and this morning the news was dropped that Intel Optane support was released for vSAN. The performance claims look great, 2.5x more IOPS and 2.5x less latency. (I don’t know the test specifics yet.) On top of that, Optane typically has a higher endurance rating, meaning that the device can incur a lot more writes, which makes it an ideal device for the vSAN caching layer.

While talking to customers the past couple of days though it was clear to me that performance is one thing, but flexibility of configuration is much more important. With vSAN you have the ability to select any server from the vSphere HCL and pick the components you want as long as they are on the vSAN HCL.  Or you can simply pick a ready node and swap components as needed. As long as the controller remains the same for a ready node you can do that. Either way, you have choice, and now with Optane being certified you can use the latest in flash technology with vSAN!

Oh for those paying attention, the Intel P4800X Optane device isn’t listed on the HCL yet. The database is being updated as we speak, and the device should be included soon!

"Intel Optane support for vSAN, first HCI solution to deliver it" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.

Intel Optane support for vSAN, first HCI solution to deliver it

Advertise here with BSA


I am in Australia this week for the Sydney and Melbourne VMUG UserCon’s. Had a bunch of meetings yesterday and this morning the news was dropped that Intel Optane support was released for vSAN. The performance claims look great, 2.5x more IOPS and 2.5x less latency. (I don’t know the test specifics yet.) On top of that, Optane typically has a higher endurance rating, meaning that the device can incur a lot more writes, which makes it an ideal device for the vSAN caching layer.

While talking to customers the past couple of days though it was clear to me that performance is one thing, but flexibility of configuration is much more important. With vSAN you have the ability to select any server from the vSphere HCL and pick the components you want as long as they are on the vSAN HCL.  Or you can simply pick a ready node and swap components as needed. As long as the controller remains the same for a ready node you can do that. Either way, you have choice, and now with Optane being certified you can use the latest in flash technology with vSAN!

Oh for those paying attention, the Intel P4800X Optane device isn’t listed on the HCL yet. The database is being updated as we speak, and the device should be included soon!

"Intel Optane support for vSAN, first HCI solution to deliver it" originally appeared on Yellow-Bricks.com. Follow me on twitter - @DuncanYB.

Flash ROMs with a Raspberry Pi

I previously wrote a series of articles about my experience flashing a ThinkPad X60 laptop with Libreboot. After that, the Libreboot project expanded its hardware support to include the ThinkPad X200 series, so I decided to upgrade. The main challenge with switching over to the X200 was that unlike the X60, you can't perform the initial Libreboot flash with software. more>>

Multi-Factor Authentication for the Hybrid Configuration Wizard and Remote PowerShell

You can now use an Administrator account that is enabled for Multi-Factor Authentication to sign in to Exchange Online PowerShell and the Office 365 Hybrid Configuration Wizard (HCW).

In case you are not aware, the Azure multi-factor authentication is a method of verifying who you are that requires the use of more than just a username and password. Using MFA for Office 365, users are required to acknowledge a phone call, text message, or app notification on their smart phones after correctly entering their passwords. They can sign in only after this second authentication factor has been satisfied. You can read more about the Office 365 Multi Factor Authentication option here.

Many Exchange Online customers wanted the extra level of security that is offered with Multi-Factor Authentication, which allows you to force the administrator account to use Multi-Factor Authentication. However, because of a limitation in Remote PowerShell, Exchange Online administrators could not connect with a Multi-Factor enabled account. In addition, as the Office 365 Hybrid Wizard also requires Remote PowerShell connections to Exchange Online, prior to now, the account you used to run the HCW could not be enabled for Multi-Factor Authentication.

The Exchange Online PowerShell Module

There is a new module that was created that can be downloaded to allow you to connect with an account that is enabled for Multi-Factor Authentication. You can download the module from the Exchange Online Administration Center (the steps are outlined in this article).

image

Note: We do not plan to discontinue traditional methods of connecting to Remote PowerShell; if you are not using Multi-Factor Authentication you can continue to connect using the methods you already have in place.

The Hybrid Wizard Update

The Hybrid Wizard has also been updated to allow for Multi-Factor Authentication enabled administrators to authenticate.

Note: There is an issue with this new Authentication method in the 21 Vianet Greater China tenants. For customers with Tenants in that region you cannot use the MFA module or Hybrid integration mentioned in this article and should instead use the Hybrid Wizard located here: http://aka.ms/HCWCN

In order to keep the sign in experience consistent for all customer whether they have MFA enabled or are using traditional credentials, we have updated our credentials page in the wizard.

On the Credential page of the wizard you will see that the “next” button is not available. You are required to pick your credential for on-premises (which by default will be the currently signed in credentials) and “sign in” to Office 365.

image

Once you select “sign in” you will be prompted for credentials in a familiar looking screen.

image

If you have Multi-Factor Authentication enabled for the administrator, you would then be prompted for the second factor of authentication.

image

Once verified, you would see the credential card for both the on-premises and Exchange Online administrators. You will also notice that the “next” button is now activated.

image

Conclusion

Your feedback about not being able to use MFA enabled account for Exchange Online administration was loud and clear! Please keep providing us feedback so we can continue to identify and address your needs.

The Exchange Team