12:25 paravoid: what the hell? 12:26 paravoid: we hit 687M/s on bits today, while our normal is at 350M/s 12:30 paravoid: so, something got deployed today and is causing havoc 12:30 manybubbles: paravoid: havoc on the indside or causing clients to request more stuff? 12:30 ori: varnishstat 12:30 paravoid: see the links above 12:31 paravoid: varnishtop? 12:31 ori: that 12:31 ori: got output handy? 12:31 paravoid: it's easy enough, but note that we have two separate effects 12:32 paravoid: we have https://graphite.wikimedia.org/render/?title=HTTP%20Requests/sec%20%28excludes%20bits.wikimedia.org:%20css/js%29%20-8hours&from=-8hours&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=1&lineMode=connected&target=color%28cactiStyle%28alias%28scale%28reqstats.requests,%220.01666%22%29,%20%22requests/sec%22%29%29,%22blue%22%29 12:32 paravoid: which *excludes* bits 12:32 twkozlowski: interesting spike at 18:45 12:32 paravoid: and it's > 10% increase in HTTP req/s 12:33 paravoid: then we have bits 12:33 paravoid: 22331.75 RxURL /geoiplookup 12:33 paravoid: 9740.95 RxURL /favicon/wikipedia.ico 12:33 paravoid: 9488.79 RxURL /static-1.23wmf10/extensions/UniversalLanguageSelector/resources/images/cog-sprite.svg?2014-01-09T16:51:40Z 12:33 paravoid: 8995.63 RxURL /static-1.23wmf10/skins/common/images/poweredby_mediawiki_88x31.png 12:33 paravoid: 8927.57 RxURL /static-1.23wmf10/skins/vector/images/search-ltr.png?303-4 12:34 paravoid: that's top-5 12:35 paravoid: bits is gradually getting better compared to the ~19:00 spike 12:36 paravoid: however the traffic levels are still elevated and they have also been much more elevated than the past two weeks 12:36 paravoid: cf. http://ganglia.wikimedia.org/latest/graph.php?r=month&z=xlarge&c=Bits+caches+esams&m=cpu_report&s=by+name&mc=2&g=network_report 12:40 paravoid: %35-40 of bits traffic is ULS, but I'm not sure if that's new 12:40 paravoid: 35-40% even :) 12:40 manybubbles: we accept either notation 12:41 Reedy: %3%5%-%4%0% 12:41 ori: Nikerabbit: ? 12:42 Nemo_bis: flying 12:44 paravoid: Rates:15.8Gbps/42.4Gbps - ifspeed: 40Gbps 12:47 Nemo_bis: that bits spike looks very similar to the one we had at the last upgrade 12:47 ori: paravoid: is the 35-40% cog-sprite.svg, or is there anything else? 12:47 paravoid: no, everything that matches UniversalLanguageSelector 12:48 ori: paravoid: can you provide a few other URLs? 12:48 ori: paravoid: Nemo_bis is right that pushing out a new release tends to cause a spike, because many bits URLs encode the MediaWiki release, so a lot of cached resources become invalid all at once 12:48 paravoid: autonym, some other fonts 12:48 paravoid: wait a sec for the urls 12:48 paravoid: yes, I'm aware of this 12:49 paravoid: so the deploy spike is getting better by the minute 12:49 Coren: !log Deleted all but last 50K lines from /var/log/ganglia/ganglia_parser.log (never rotated, not at 11G) from neon 12:49 Nemo_bis: actually last time was much worse, bits had hit 100 load 12:49 paravoid: however the monthly shows an abnormal elevation of traffic the past 2-3 weeks 12:49 paravoid: that + the deploy spike is filling up all kinds of pipes 12:49 ori: there is a live bug with ULS that causes it to load too many fonts 12:50 ori: niklas merged a workaround but i'm not sure it made it to wmf10, let me check 12:50 paravoid: Total bytes: 1616669434 12:50 paravoid: Universal Language Selector: 585067766 12:50 ori: I want to be careful not to attribute it to ULS just yet though 12:50 paravoid: 100943578 /static-current/extensions/UniversalLanguageSelector/data/fontrepo/fonts/Autonym/Autonym.woff?version=20131205 12:50 paravoid: 64545146 /static-current/extensions/UniversalLanguageSelector/data/fontrepo/fonts/SiyamRupali/SiyamRupali.woff?version=1.070 12:50 paravoid: 36328294 /static-current/extensions/UniversalLanguageSelector/data/fontrepo/fonts/Jomolhari/Jomolhari.woff?version=0.003 12:50 paravoid: etc. 12:50 paravoid: yes, I'm not blaming it on ULS yet either 12:51 paravoid: as I said, I'm not sure what the number was e.g. 2 weeks ago 12:52 ori: a weekly cron job that runs varnishtop for a minute and emails the output to ops@ would be very useful 12:52 paravoid: runs varnishtop where? :) 12:52 ori: oh, right. i guess you need to collate the output of multiple bits. but even a single bits cache would be pretty good. 12:53 ori: because you're really checking for misbehaving mediawiki code, which is not dc-aware. 12:53 paravoid: (text, bits, mobile, upload) x (front, back) x (eqiad, esams, ulsfo) 12:53 ori: front bits 12:53 ori: any DC 12:53 paravoid: well yes, bits doesn't have backend 12:53 paravoid: DC matters 12:54 paravoid: DCs tend to attract specific languages 12:54 ori: yeah, not saying it's perfect, just saying that even a single host would be very useful 12:54 ori: i'm chatty because i'm waiting for the ULS repo to clone, 40mb or so 12:55 paravoid: what's the commit/ 12:56 ori: paravoid: https://gerrit.wikimedia.org/r/#/c/107028/ , did not make it to wmf10 12:56 ori: i am backporting 13:00 logmsgbot: !log ori synchronized php-1.23wmf10/extensions/UniversalLanguageSelector/resources/js/ext.uls.webfonts.js 'Update UniversalLanguageSelector to master for I2da436caa: Wait till rendering thread completion before applying webfonts (Bug: 59958)' 13:02 ori: let's see if that negates the trend 13:02 paravoid: it doesn't look like a very likely match 13:02 ori: between the issue and ULS, or between the issue and this specific patch? 13:03 paravoid: the latter for sure 13:03 paravoid: I'm not sure about the former 13:08 paravoid: hrm 13:08 paravoid: we're back to November levels 13:08 paravoid: which isn't too bad 13:09 paravoid: it's different than two weeks ago, but otoh two weeks ago was new year's 13:10 paravoid: https://graphite.wikimedia.org/render/?title=HTTP%20Requests/hour%20%28excludes%20bits.wikimedia.org:%20css/js%29%20-6week&from=-6%20weeks&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=1&lineMode=connected&target=color%28cactiStyle%28alias%28hitcount%28scale%28reqstats.requests,%220.01666%22%29,%20%221hour%22%29,%20%22requests/hour%22%29%29,%22blue%22%29 13:10 paravoid: there was definitely a drop 13:10 Nemo_bis: all these URLs give me "no data" graphs btw 13:11 paravoid: did you open the whole URL? 13:11 paravoid: maybe it's getting trimmed by the server 13:11 paravoid: is should end in %22blue%22%29 13:11 Nemo_bis: yes it's not trimmed, not that long 13:12 Nemo_bis: hmpf it's firefox following another religion of percent-encoding it seems 13:12 paravoid: I use firefox too 13:14 ori: paravoid: what is generating that data again? 13:14 ori: mediawiki itself -> profiler, right? 13:14 paravoid: what data? 13:14 ori: in the graph you linked to above 13:14 paravoid: http req/hr? 13:14 paravoid: no that's not mediawiki 13:14 ori: yes 13:15 ori: oh, that's reqstats 13:15 ori: that's the perl script, yes? 13:15 paravoid: yes 13:15 paravoid: consumes from udp2log, writes to carbon 13:15 ori: collector on professor was saturated and dropping some % of data. i don't remember if carbon was, too 13:20 ori: paravoid: ULS backport seems to have helped, no? 13:23 paravoid: indeed 13:23 paravoid: good catch!