Buster + 10.4 epic: https://phabricator.wikimedia.org/T250666 * Log reimage: `!log reimaging HOST to buster T250666` * Send change against puppet repo: ** Disable notifications for host (e.g. https://gerrit.wikimedia.org/r/c/operations/puppet/+/592876) ** Allow host to pxe install, but pause at partitioning step. (e.g. https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/592884) ** Reverted after * Run puppet agent on apt1001, apt2001, and icinga. * Set host to install as buster (e.g. https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/592887) * Depool host (potentially from multiple sections) * `systemctl stop mariadb && umount /srv` ** For ''multi-instance'', need to stop `mariadb@sX`. * Take copy of `/srv` entry from `/etc/fstab` * Connect to mgmt interface * Attach to serial console ** On dells (`/admin1->`), use `console com2`. Escape is `^\` * From cumin host, inside screen: `sudo -E wmf-auto-reimage --no-verify -p TICKET FQDN` * When install reaches partitioning step, select "manual", format the 40G partition asext4, set mountpoint as `/` ** Partitioner should wipe `/` and `swap`. Anything else, you done fucked up. * [[Tendril doesn't like this in-place upgrade, so it requires a disable + drop + add + enable after upgrade, otherwise the Act. (last contact) field doesn't get updated.|[https://wikitech.wikimedia.org/wiki/MariaDB#Stretch_+_10.1_-%3E_Buster_+_10.4_known_issues]] ** Check out [[tendril repo|https://gerrit.wikimedia.org/r/#/admin/projects/operations/software/tendril]] on a cumin host. (Use http, as you don't have your ssh key available). ** For ''multi-instance'', remember to run these for all ports ** Remove host from tendril:
``` ./tendril-host-drop.sh HOST PORT | sudo -i mysql -h db1115.eqiad.wmnet tendril ```
** After, re-add host to tendril:
``` ./tendril-host-add.sh HOST PORT ~/.my.cnf.tendril tendril | sudo -i mysql -h db1115.eqiad.wmnet tendril ./tendril-host-enable.sh HOST PORT | sudo -i mysql -h db1115.eqiad.wmnet tendril ```
* Wait for host to finish reimaging * Check that wmf-mariadb104 is installed. * Re-add `/srv` to `/etc/fstab` * Mount `/srv` * Check if the contents of `/srv` are already owned by the `mysql` user, if not, fix. * Disable replication while we run `mysql_upgrade`: ` systemctl set-environment MYSQLD_OPTS="--skip-slave-start"` ** Does not need to be reverted. * Start mariadb: `systemctl start mariadb` ** For ''multi-instance'', need to start `mariadb@sX` * Check service logs: `journalctl -xe -u mariadb`, should only see errors about internal tables that will be fixed by `mysql_upgrade` * For ''multi-instance'', need to specify socket: `-S /run/mysqld/mysqld.sX.sock` ** Run `mysql_upgrade` ** Start slave: `mysql -e "start slave"` ** Check slave status: `mysql -e "show slave status\G"` * [[https://phabricator.wikimedia.org/T247290#5956794]]: Restart prom mysql exporter ** For ''multi-instance'', need to use per-instance service `prometheus-mysqld-exporter@sX.service` * Re-add host to tendril (see above) * Once it's back in tendril, revert partman change * Wait for icinga to be fully green, then revert notifications change. * Wait until replication lag is fully gone, then start slowly repooling server. (If it's in codfw, can just go straight to full repoolling).