$ sudo cookbook -c /home/jbond/cookbook.yaml sre.hardware.dell sretest1001.eqiad.wmnet START - Cookbook sre.hardware.dell for hosts sretest1001.eqiad.wmnet Management Password: sretest1001.eqiad.wmnet (IDRAC): update sretest1001.eqiad.wmnet: Already have: /srv/firmware/poweredge-r440/iDRAC-with-Lifecycle-Controller_Firmware_WPNPP_WN64_5.10.30.00_A00.EXE sretest1001.eqiad.wmnet (IDRAC): latest_version: 5.10.30.00, current_version: 5.10.10.00 ==> sretest1001.eqiad.wmnet IDRAC: About to upload /srv/firmware/poweredge-r440/iDRAC-with-Lifecycle-Controller_Firmware_WPNPP_WN64_5.10.30.00_A00.EXE, please confirm Type "go" to proceed or "abort" to interrupt the execution > go ==> sretest1001.eqiad.wmnet IDRAC: About to install Available-25227-5.10.30.00__iDRAC.Embedded.1-1, please confirm Type "go" to proceed or "abort" to interrupt the execution > go sretest1001.eqiad.wmnet (IDRAC): has job ID - /redfish/v1/TaskService/Tasks/JID_559941188108 [IDRAC.2.5.RED003] Downloading package. [1/30, retrying in 30.00s] Polling task: JID_559941188108 not completed yet: status=OK, state=Pending, completed=None% [IDRAC.2.5.RED003] Downloading package. [2/30, retrying in 30.00s] Polling task: JID_559941188108 not completed yet: status=OK, state=Pending, completed=None% [IDRAC.2.5.RED001] Job completed successfully. Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet Testing Redfish API connection to sretest1001.mgmt.eqiad.wmnet sretest1001.eqiad.wmnet (IDRAC): now at version: 5.10.30.00 sretest1001.eqiad.wmnet (BIOS): update Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='sretest1001.mgmt.eqiad.wmnet', port=443): Read timed out. (read timeout=10)")': /redfish/v1/Systems/System.Embedded.1?$select=BiosVersion sretest1001.eqiad.wmnet: Already have: /srv/firmware/poweredge-r440/BIOS_38PH6_WN64_2.14.2.EXE sretest1001.eqiad.wmnet (BIOS): latest_version: 2.14.2, current_version: 1.3.7 ==> sretest1001.eqiad.wmnet BIOS: About to upload /srv/firmware/poweredge-r440/BIOS_38PH6_WN64_2.14.2.EXE, please confirm Type "go" to proceed or "abort" to interrupt the execution > go ==> sretest1001.eqiad.wmnet BIOS: About to install Available-159-2.14.2__BIOS.Setup.1-1, please confirm Type "go" to proceed or "abort" to interrupt the execution > go sretest1001.eqiad.wmnet (BIOS): has job ID - /redfish/v1/TaskService/Tasks/JID_559946045007 START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet Scheduling downtime on Icinga server alert1001.wikimedia.org for hosts: sretest1001 Created silence ID 07f35e87-65bc-4c24-bf77-249ea26a028c Rebooting 1 hosts in batches of 1 with 0.0s of sleep in between: sretest1001.eqiad.wmnet ----- OUTPUT of 'reboot-host' ----- ================ PASS |██████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00, 1.12hosts/s] FAIL | | 0% (0/1) [00:00= 100.0% threshold) for command: 'reboot-host'. 100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands. [1/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [2/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [3/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [4/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [5/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [6/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [7/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [8/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [9/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [10/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [11/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [12/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [13/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [14/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [15/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [16/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [17/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [18/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [19/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) [20/240, retrying in 10.00s] Attempt to run 'spicerack.remote.RemoteHosts.wait_reboot_since' raised: Unable to get uptime for sretest1001.eqiad.wmnet Caused by: Cumin execution failed (exit_code=2) Found reboot since 2022-06-23 09:30:08.489764 for hosts sretest1001.eqiad.wmnet [1/60, retrying in 30.00s] Attempt to run 'spicerack.puppet.PuppetHosts.wait_since' raised: Successful Puppet run too old (2022-06-23 09:29:40 <= 2022-06-23 09:30:08.489764) on: sretest1001.eqiad.wmnet Successful Puppet run found [1/15, retrying in 3.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal..check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run [2/15, retrying in 6.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal..check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run [3/15, retrying in 9.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal..check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run [4/15, retrying in 12.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal..check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run [5/15, retrying in 15.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal..check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run [6/15, retrying in 18.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal..check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run [7/15, retrying in 21.00s] Attempt to run 'spicerack.icinga.IcingaHosts.wait_for_optimal..check' raised: Not all services are recovered: sretest1001:Check for large files in client bucket,DPKG,MD RAID,configured eth,puppet last run Deleted silence ID 07f35e87-65bc-4c24-bf77-249ea26a028c END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet [IDRAC.2.5.PR19] The specified job has completed successfully. sretest1001.eqiad.wmnet (BIOS): now at version: 2.14.2 END (PASS) - Cookbook sre.hardware.dell (exit_code=0) for hosts sretest1001.eqiad.wmnet