** Run all the following as the analytics user. ** 0) Deploy new version of refinery as per https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Deploy/Refinery-source 1) open sqark2-sql and run: ALTER TABLE `wmf`.`editors_daily` CHANGE COLUMN user_fingerprint_or_id user_fingerprint_or_name STRING COMMENT 'If an anonymous user, this is a hash of the IP + UA, otherwise it is their global username across wiki dbs' ; now run: show partitions wmf.editors_daily; exit; example output: spark-sql (default)> show partitions wmf.editors_daily; partition month=2022-06 month=2022-07 2) For each month partition shown above, run the following HQL that will INSERT OVERWRITE the partition with valid data. Each run should take 2-3 minutes: spark2-sql --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 2G --conf spark.dynamicAllocation.maxExecutors=64 \ -f editors_daily_monthly.hql \ -d refinery_hive_jar=hdfs://analytics-hadoop/wmf/cache/artifacts/airflow/refinery-hive-0.1.27-shaded.jar \ -d source_table=wmf_raw.mediawiki_private_cu_changes \ -d user_history_table=wmf.mediawiki_user_history \ -d destination_table=wmf.editors_daily \ -d month=2022-06 \ -d coalesce_partitions=1 3) Now let's create the new monthly table: spark2-sql -f create_editors_by_country_monthly_table.hql.hql \ --database wmf Finally, run the following for the same partition sets we found above in step (1) to populate the new editors_by_country_monthly_table table. Each run should take 2-3 minutes: spark2-sql --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 2G --conf spark.dynamicAllocation.maxExecutors=64 \ -f unique_editors_by_country_monthly.hql \ -d source_table=wmf.editors_daily \ -d destination_table=wmf.unique_editors_by_country_monthly \ -d month=2022-06 \ -d coalesce_partitions=1