Let's get some useful thresholds for models. Generally, these thresholds are going to look a lot worse than they really are -- mostly because of labels we used to train are messy and incomplete. We're targeting at least 70% precision, but we're likely to get that when we ask for 50% precision -- and in some cases, we'll still get it when we target even lower precision. So! We're going to use ORES "threshold optimization" querying system. We'll need to make a call for each topic in order to get an appropriate threshold: * Culture.Biography.Biography* [[maximum recall @ precision >= 0.5](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22Culture.Biography.Biography*%22.%22maximum%20recall%20@%20precision%20%3E=%200.5%22)] ``` { "!f1": 0.925, "!precision": 0.996, "!recall": 0.863, "accuracy": 0.877, "f1": 0.662, "filter_rate": 0.759, "fpr": 0.137, "match_rate": 0.241, "precision": 0.5, "recall": 0.977, "threshold": 0.086 } ``` * Culture.Biography.Women [[maximum recall @ precision >= 0.5](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22Culture.Biography.Women%22.%22maximum%20recall%20@%20precision%20%3E=%200.5%22)] ``` { "!f1": 0.993, "!precision": 0.995, "!recall": 0.99, "accuracy": 0.985, "f1": 0.572, "filter_rate": 0.981, "fpr": 0.01, "match_rate": 0.019, "precision": 0.501, "recall": 0.668, "threshold": 0.667 } ``` * Culture.Media.Entertainment [[maximum recall @ precision >= 0.5](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22Culture.Media.Entertainment%22.%22maximum%20recall%20@%20precision%20%3E=%200.5%22)] ``` { "!f1": 0.998, "!precision": 0.998, "!recall": 0.998, "accuracy": 0.996, "f1": 0.47, "filter_rate": 0.997, "fpr": 0.002, "match_rate": 0.003, "precision": 0.503, "recall": 0.442, "threshold": 0.646 } ``` * STEM.Mathematics [maximum recall @ precision >= 0.3](https://ores.wikimedia.org/v3/scores/enwiki/?models=articletopic&model_info=statistics.thresholds.%22STEM.Mathematics%22.%22maximum%20recall%20@%20precision%20%3E=%200.3%22)] ``` { "!f1": 1.0, "!precision": 1.0, "!recall": 0.999, "accuracy": 0.999, "f1": 0.401, "filter_rate": 0.999, "fpr": 0.001, "match_rate": 0.001, "precision": 0.309, "recall": 0.571, "threshold": 0.903 } ``` Here, we can see some diversity. Culture.Biography.Biography* is easy to model and it's very common in the labeled data, so we can get very high precision and very high recall and a strict threshold. STEM.Mathematics is on the other end of the spectrum. There are very few math-related articles at all. I've relaxed the minimum precision to 0.3 in order to get a threshold.