Montag, 8. Februar 2021

How to delete aged third party snapshots nutanix ahv

You will receive the following error message in Prism Element:

System has x aged third-party backup snapshot(s) and they may unnecessarily consume storage space in the cluster.

The problem occurs if snaps are not removed successfully by the backup. 
You have to remove the snaps by a script from one of the cvms:

change directory:
cd /home/nutanix/bin

download the latest script:

list snapshots:
python --list_all admin

delete snaps (in my case older than 7 days)
python --delete_snapshot_older_than 7 admin

If you got an error running the script, please check if:
-> the admin password do not user special characeters
-> the directory is /home/nutanix/bin

Montag, 7. Dezember 2020

Setup IPMI , so incorrect attemps do not lock the IPMI

 To show the Configuration (from any CVM):

hostssh "./ipmitool lan print 1"

 apply settings that IPMI does not get locked:

hostssh "./ipmitool lan set 1 bad_pass_tresh 0 0 0 0"

Donnerstag, 26. November 2020

Uninstall NVIDIA driver package from AHV

 you can use following command to remove the NVIDIA driver from ahv with the following command:

yum remove NVIDIA-vGPU-ahv-2019-440.121.x86_64

please replace the NVIDIA-vGPU-ahv-2019-440.121.x86_64 with your current installed driver

Montag, 30. März 2020

vCenter Repoint Domain for VCSA with embedded PSC > 6.7U1

There is a possibility to repoint an existing vCenter with an embedded psc to another vcenter, so that is using the same vsphere.local domain. You have to start this command from the vcenter you would like to repoint (not the "target" vCenter).

cmsso-util domain-repoint -m execute --src-emb-admin Administrator --replication-partner-fqdn vCenter1.domain.intern --replication-partner-admin Administrator --dest-domain-name vsphere.local

Enter Source embedded vCenter Server Admin Password :
Enter Replication partner Platform Services Controller Admin Password :

The domain-repoint operation will export License, Tags, Authorization data
before repoint and import after repoint.

WARNING: Global Permissions for the source vCenter Server system will be lost. The
         administrator for the target domain must add global permissions manually.
         Source domain users and groups will be lost after the Repoint operation.
         User 'Administrator@vsphere.local' will be assigned administrator role on the
         source vCenter Server system.

         The default resolution mode for Tags and Authorization conflicts is Copy,
         unless overridden in the conflict files generated during pre-check.

         Solutions and plugins registered with vCenter Server must be re-registered.

         Before running the Repoint operation, you should backupof all nodes
         including external databases. You can use file based backups to restore in
         case of failure. By using the Repoint tool you agree to take the responsibility
         for creating backups, otherwise you should cancel this operation.

         Starting with vSphere 6.7, VMware announced a simplified vCenter Single Sign-On
         domain architecture by enabling vCenter Enhanced Linked Mode support for
         vCenter Server Appliance installations with an embedded Platform Services
         Controller. You can use the vCenter Server converge utility to change the
         deployment topology from an external Platform Services Controller to an
         embedded Platform Services Controller with support for vCenter Enhanced Linked
         Mode. As of this release, the external Platform Services Controller
         architecture is deprecated and will not be available in future releases. For
         more information, see

         The following license keys are being copied to the target Single Sign-On
         domain. VMware recommends using each license key in only a single domain. See
         "vCenter Server Domain Repoint License Considerations" in the vCenter Server
         Installation and Setup documentation.

Repoint Node Information:
         Source embedded vCenter Server:vCenter2.domain.intern

         Replication partner Platform Services Controller: vCenter1.domain.intern
         Thumbprint: 58:B5:23:A4:F6:4H:BA:7C:07:00:8F:7A:7F:7A:A5:A5:3D:EB:51:C2

All Repoint configuration settings are correct; proceed? [Y|y|N|n]: y

Starting License export                                                         ... Done
Starting Authz Data export                                                      ... Done
Starting Tagging Data export                                                    ... Done
Export Service Data                                                             ... Done
Uninstalling Platform Controller Services                                       ... Done
Stopping all services                                                           ... Done
Updating registry settings                                                      ... Done
Re-installing Platform Controller Services                                      ... Done
Registering Infra services                                                      ... Done
Updating Service configurations                                                 ... Done
Starting License import                                                         ... Done
Starting Authz Data import                                                      ... Done
Starting Tagging Data import                                                    ... Done
Applying target domain CEIP participation preference                            ... Done
Starting all services                                                           ... Done
Repoint successful.

root@vCenter2 [ ~ ]#

Donnerstag, 5. Dezember 2019

Enable disabled user accout in ipmi using ipmitool on hypervisor

If users are disabled in IPMI, you can enable it by using ipmitool on the hypervisor:

This will show all users:

[root@esxi]# /ipmitool user list

so mostly user "2" is the default ADMIN user. you can reenable it using following command:

[root@esxi]# /ipmitool user enable 2

Dienstag, 12. November 2019

Maintenance Mode for CVM

ncli host list (to get UUID from the host)

ncli host edit id=<insert uuid here> enable-maintenance-mode='false' | 'true'

Use 'true' to get into maintenance mode and 'false' to get off

Maintenancemode für AHV und ESX siehe hier:

Placing the CVM in maintenance does not migrate VMs from the host. The hypervisor must be placed in maintenance mode to migrate VMs. See the relevant section for your hypervisor below to accomplish this.

First, get the CVM host ID:

nutanix@cvm$ ncli host ls

The host ID is consists of the characters to the right of the double colons. In the example below it is "11":

Id                        : xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx::11

To place CVM maintenance mode:

nutanix@cvm$ ncli host edit id=<host_id> enable-maintenance-mode=true

To exit CVM maintenance mode execute the command below from a CVM that is not in maintenance mode:

nutanix@cvm$ ncli host edit id=<host_id> enable-maintenance-mode=false


To place ESXi host in maintenance mode:

root@esxi# esxcli system maintenanceMode set --enable true

To end ESXi host maintenance mode:

root@esxi# esxcli system maintenanceMode set --enable false

To verify the maintenance mode status of an ESXi host:

root@esxi# esxcli system maintenanceMode get


To get the AHV hypervisor address:

nutanix@cvm$ acli host.list

To enter AHV host maintenance mode:

nutanix@cvm$ acli host.enter_maintenance_mode <hypervisor_address> wait=true

To exit AHV host maintenance mode:

nutanix@cvm$ acli host.exit_maintenance_mode <hypervisor_address>


Error – Foundation service running on one of the nodes

Under normal operations, the foundation service is stopped on all cluster nodes. Only if you destroy a cluster, the foundation service gets started permanently until you create a new cluster / add the nodes to an existing cluster.
As far as I know the only other component within AOS is LCM, which leverages the foundation service for certain hardware update tasks like a BIOS update. This is also the most common reason, why a foundation process is started / still running in an “normal” cluster: Some sort of previous failed LCM actions.
To check if and where Foundation is running ssh into one of the CVMs and run the following command:
allssh 'genesis status | grep foundation'
As you can see in my output it was running on my CVM with the .24 IP-address (the process IDs in the brackets is the indication that the process is up and running):
To stop the foundation process just ssh to the related CVM and run:
genesis stop foundation
The output will directly show you that the service is now stopped
Now just run LCM / Foundation upgrade again and the pre-checks will succeed.

Node stucks in Phoenix (no boot back to Hypervisor)

If a node stucks in Phoenix you can get it back to the HV, but first check that there are no tasks runing from phoenix: On a CVM check this:

ecli task.list include_completed=false

If there is nothing running, you can do this in Phoenix:

python /phoenix/

Then the host will boot back to HV, you will have to remove the CVM from maintenance mode:

cluster status (to see, if everything in the Cluster is ok)

ncli host list (to get UUID from the host)

ncli host edit id=<insert uuid here> enable-maintenance-mode='false'

Check the metadataring:

nodetool -h localhost ring

If the Host is AHV, you have to leave Maintenance Mode for AHV, too:

acli host.list
acli host.exit_maintenance_mode <hostname>

Mittwoch, 15. Mai 2019

Performance on NFS datastores decrease within a couple of days after host reboot

known issue ->

set max connection per nfs store to 32 by using:

esxcfg-advcfg -s 32 /SunRPC/MaxConnPerIP

or via GUI in advanced settings of the esxi server

Dienstag, 16. April 2019

Change VM VLAN ID int acli (before 5.10.X)

In pre-5.10.x versions, you cannot change the vlan id of an existing vm nic in Prism. For that, you can use following command from any of the cvms

1. Please shutdown the VM
2. Connect to any of the cvm
3. Update the nic: acli vm.nic_update VMNAME 50:6b:8d:1e:40:50 network=VLAN_150

Prism Central Registration failed

We got problems by registrating PE at PC

Please verify that if you configured a proxy server, that you whitelist the PC-IP-Adress at PE Site and the PE-IP-Adress from PC Site. Otherwise Prism tries to connect by using the proxy.

Dienstag, 9. April 2019

Configuring VMware vSwitches by vSphere CLI (using hostssh for nutanix cluster)

Configuring VMware vSwitches by vSphere CLI (using hostssh for nutanix cluster)

you can set the following commands in a nutanix CVM. If you are not using nutanix - you can remove the "hostssh" parameter in front of the command.

1. Set active uplink ports in vSwitch0

hostssh esxcli network vswitch standard policy failover set -a vmnic2,vmnic3 -v vSwitch0

2. Remove unused ports in vSwitch0

hostssh esxcli network vswitch standard uplink remove --uplink-name=vmnic0 --vswitch-name=vSwitch0

3. Create VM portgroup in vSwitch 0

hostssh esxcli network vswitch standard portgroup add -p <VM NETWORK> -v vSwitch0

4. Set vlan ID for the newly created portgroup

hostssh esxcli network vswitch standard portgroup set -p <VM NETWORK> --vlan-id 55

Freitag, 1. Februar 2019

Clear stucking task in Nutanix

If there is a stucked task in nutanix which you cannot "cancel" it, there are a way to clean it by using the ecli (prior 5.5.5 you have to use the acli commands for that)

connect to a cvm:

ecli task.list include_completed=false
ergon_update_task --task_uuid='UUID' --task_status=succeeded

Dienstag, 22. Januar 2019

Check latest FATAL Errors in a Nutanix Cluster

allssh „ls -rtl  data/logs/*FATAL | tail -n 5“

Deployment Nutanix Witness VM on ESXi

Deploy the ova
You can change the ip in the following config-file
$ sudo vi /etc/sysconfig/network-scripts/ifcfg-eth0
change the parameters
$ sudo reboot
After the reboot please enter folliwing comand to create the witness (as a witness)
$ cluster -s –cluster_function_list=witness_vm create
Register the witness in the nutanix menu and set the witness at the protection domains.

Break Replication timeout at Nutanix Metro Installation

Error (only metro deployments): Break replication timeout of metro protection domain is below the recommended minimum of 15 seconds or metro availability is disabled.
You have to set the timeout to 15s
Login auf CVM Cluster 1:
ncli pd list
protection-domain set-auto-replication-break name=“Metro01″ wait-period=15
You have to set the timeout on the second cluster for his "active" storage containers as well

Deployment Xtract VM on AHV Cluster

  1. Download the latest Xtract VM at Nutanix support portal
  2. start Xtract Cli  (is included in the *.zip) C:\install .\cli-windows-amd64-1.1.3.exe -c <Nutanix_Cluster_IP_address>
  3. Deployment: 
    deploy-vm vm-container <container_name> vm-network <network_name> ip-address <static_IP_address> netmask <netmask> gateway <gateway_IP_address> dns1 <DNS_IP_address> dns2 <DNS_IP_address>

Performance Diagnostics at a Nutanix cluster

Enter the following command at one of the cvms. By doing that, some diag-VMs were automatically deployed to check the cluster performance.
diagnostics/ --display_latency_stats --run_iperf run
Cleanup the installation:
diagnostics/ cleanup

Maintenance Mode on Nutanix AHV

on one of the cvms:

acli host.list
enter maintenance mode with the following command:

acli host.enter_maintenance_mode <hostname>

Shut down the CVM.

cvm_shutdown -P now

Shutdown AHV Node:

root@ahv# shutdown -h now 

Exit Maintenance Mode:

acli host.exit_maintenance_mode <hostname>

Networktagging Nutanix AHV

1) Tagged VLAN ID on AHV Hypervisor  (on each node)
nutanix@CVM$ ssh root@ „ovs-vsctl set port br0 tag=123“
nutanix@CVM$ ssh root@ „ovs-vsctl list port br0“
2) Tagged VLAN ID on CVM (on each cvm):
– Anmelden an der betreffenden CVM, dann beispielhaft für VLAN ID 123:
nutanix@CVM$ change_cvm_vlan 123
3) change Uplinks to only 10G Ports
– login on cvm
– list current state:
#> allssh manage_ovs show_uplinks
– change to 10G:
#> allssh ‚manage_ovs –bridge_name br0 –interfaces 10g –bond_name br0-up update_uplinks‘