21:01:32 #startmeeting ArchCom IRC meeting 21:01:32 Meeting started Wed Oct 26 21:01:32 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:32 The meeting name has been set to 'archcom_irc_meeting' 21:01:52 #topic T138783 - SVG in HTML 21:01:52 T138783: SVG Upload should (optionally) allow the xhtml namespace - https://phabricator.wikimedia.org/T138783 21:02:06 #topic T138783 - HTML in SVG 21:02:09 :-) 21:02:14 hi folks! 21:03:06 so....today's topic is about whether we should allow HTML in our SVG 21:03:52 * robla goes to find more links to describe this better.... 21:04:45 21:05:44 matmarex suggested "We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML" 21:05:52 does that seem like a good idea? 21:06:26 MatmaRex is not in this channel and is marked /away 21:06:35 but maybe we have bawolff? 21:06:42 Yep 21:07:16 i think it's important to differentiate between a) should mediawiki support this at all and b) should this be allowed/used on wikimedia sites 21:07:45 also, what html is typically encountered in that context, and how well is that supported by different renderers? 21:08:13 as i wrote on the bug, seems in theory workable but im concerned the sanitizer wants to rewrite during sanitization which means youd have to be very pedantic in writing your svgs 21:08:41 unless we change the upload pipeline to allow files to be modified during upload 21:08:47 as for (a), if there was a patch in gerrit to allow HTML behind a feature flag, it would presumably be accepted, we don't have a massively high bar for that sort of thing, right? 21:09:54 TimStarling: I think the bar is "is this secure?" 21:09:58 yeah, you would have trouble using sanitizer directly 21:10:25 TimStarling: or more nicely, some dort of pluggable validation thingy, yea. 21:10:29 I'm not really a big fan of Sanitizer in terms of the way the code is currently structured 21:10:33 I'd guess the security depends on the sanitizer? 21:10:38 then this could be an extension rather than a feature flag 21:11:12 SMalyshev: also on the consumer of the html. currently, it's us rendering these, not the browser. 21:11:32 you mean SVGs? 21:11:45 what originally started this conversation was a discussion about a draw.io plugin where the feature flag wanted was "turn off sanitization" 21:11:51 yes. well, the html inside the svgs 21:11:56 ah yes if we have SVG renderer we also need to see what's going on there 21:12:26 according to bawolff, our SVG renderer does not support HTML, you'll just get an empty box 21:12:26 They were using svg foreign objecty things 21:12:56 so there may be fallback for renders not supporting it 21:12:56 which is not surprising, it's a very complicated spec to pull in 21:13:56 * DanielK_WMDE_ wants interactive/scripted svg 21:14:03 an idea I had last hour: maybe we could have an option in the image link which enables client-side SVG rendering 21:14:24 whether or not rsvg properly handles the fallback i have no idea as i havent tested that 21:14:43 you know there are performance reasons to do SVG in the server at the moment 21:14:47 TimStarling: sounds ok for 3rd parties. for wikimedia, we'd need a decent fallback renderer 21:14:55 DanielK_WMDE: well thats a whole other bag of worms :) 21:15:08 if sanitizer follows https://meta.wikimedia.org/wiki/Help:HTML_in_wikitext is seems ok 21:15:44 #info question discussed: should we use the Sanitizer class as HTML validation library for SVG? 21:15:49 Sanitizer is what the parser itself uses so it should 21:15:59 SMalyshev: i think that page is basically documenting what the sanitizer does 21:16:57 Sanitizer::removeHTMLtags() is currently only used by the parser, so it probably does some things which are wikitext-specific 21:17:23 and like bawolff previously said, it works by rewriting dubious HTML to be less dubious 21:17:32 The translated entity names for example are wikitext specific 21:17:41 so you either have to allow it to alter uploads or have a strict rejection mode 21:18:15 (side note: please feel free to add "#info" tag notes to get captured in the summary of this discussion) 21:19:01 Allowing mediawiki to alter uploads would make a bunch of things unrelated to this much easier 21:19:03 for example, unquoted attribute values are rewritten to be double-quoted, which avoids relying on client parsing details 21:19:46 there's nothing really stopping us from doing it, right? 21:19:47 #info question discussed: should MediaWiki be able to alter SVG uploads on save? 21:20:29 I think some code has to be rearchitected, but not a major amount 21:20:53 AaronSchulz: does it sound difficult to you? 21:21:03 The upload code is something a lot of people find scary and dont like touching 21:21:31 but i dont thimk it would be very hard 21:22:20 bawolff: yeah, probably 21:23:00 UploadBase::performUpload() just uploads $this->mTempPath 21:23:19 it's not even trying to move the file from the stash, it just reuploads 21:23:39 so if you modify that file, you're all done, right? 21:24:17 I imagine you would want to also give the user a warning 21:24:31 sure 21:24:51 #info bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult 21:26:46 #info TimStarling would like there to be an image link option which does client-side SVG rendering 21:27:41 If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify 21:28:09 #info 14:27:41 If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify 21:28:22 thoughts on that? 21:28:36 * robla goes to find the DomPurify link 21:28:38 slightly complicated by that project being in js, but thats probably not an issue for wmf at least 21:28:57 #link https://github.com/cure53/DOMPurify 21:29:14 and i think we would be better using someone elses sanitizer than our own 21:31:09 Since more eyes on their sanitizer than ours 21:31:11 bawolff: you're suggesting that the Wikimedia instances would rely on server-side Javascript (i.e. Node.js). do you think it makes sense to require Node.js for secure SVG? 21:31:31 (for third parties? 21:31:37 hi, sorry i'm late to SVG party. DanielK_WMDE_ I second your desire for interactive SVGs :) Graphs at this point can produce SVG output, but graphs does not (yet) support an ability to create URL links 21:31:54 i wouldnt say im happy about it... 21:32:04 that library does not look like an improvement 21:32:38 (hi All) 21:33:04 so if graphoid was allowed to produce SVG output, it would already be a quality improvement, but not necessarily a functional improvement right away 21:33:39 but at least it gives a path forward, whereas without SVG, we have no easy way of producing clickable graphs 21:33:59 legoktm pointed out T86874 on wikitech-l 21:33:59 T86874: Make SVG sanitization into a library - https://phabricator.wikimedia.org/T86874 21:34:35 TimStarling: can you be more specific? I havent looked at that library in detail im more coming from a i dont like our current approach 21:35:15 #info question discussed: is DOMPurify an improvement over the status quo 21:35:53 for a start, it's only 860 physical lines of code, so if you wanted it you could just port it to PHP 21:36:51 it's apparently just an XSS filter, not a cross-site request filter 21:38:12 Yes, there would have to be a layer on top for parts outside their security model but inside ours 21:38:23 I don't know how well it has been reviewed 21:39:18 IIRC csteipp viewed DOMPurify favorably, but we didn't manage to finish the review before he left 21:39:25 it's a lot more permissive than Sanitizer 21:40:48 agreed on the need to layer our specific restrictions on top 21:41:23 var IS_ALLOWED_URI = /^(?:(?:(?:f|ht)tps?|mailto|tel):|[^a-z]|[a-z+.\-]+(?:[^a-z+.\-:]|$))/i; 21:41:36 parsoid's sanitizer works at the token / sax level, so it shouldn't be all that hard to wrap it into a SAX visitor 21:41:45 I've got to go 21:42:41 TimStarling: d'oh! ok....thanks for the discussion! 21:42:58 I definitely think that we should leverage other people's work in this space as much as possible 21:43:36 #info 14:42:58 I definitely think that we should leverage other people's work in this space as much as possible 21:45:06 gwicke: do you think we should stop working on our PHP version of SVG validation? 21:45:09 robla, do you keep a list of potential usecases? Graphoid SVG output would be the first in line 21:45:36 (i meant i will be the first person to get into that line) 21:46:14 yurik: I don't. This topic isn't *yet* an RFC, but if you want to turn this into an RFC, that'd be a great place to put your case on the first line 21:46:14 Isnt graphoid svgs entirely machine generated (ie not attacker controlled)? Does it need general sanitation? 21:47:00 robla: I think we should look at features / maturity of different options, and then evaluate a) which requires the least initial investment, and b) which will need the bigger long term maintenance effort on our end 21:47:03 bawolff, that was my understanding too, but for some reason csteip (i think) wanted to sanitize everything regardless of the source 21:47:18 "just in case" (tm) 21:47:42 gwicke: I can't commit to writing an RFC for this, but that all seems reasonable. would you be up for writing it? 21:47:46 Gwicke: do you know other options? 21:48:05 I haven't checked the PHP code in a while, but my recollection from the last time I talked about this with csteipp was that it wasn't very complete for things like html-in-SVG 21:48:24 DomPurify was the only one i could find that was even remotely good 21:48:36 the problem with that logic is that Vega is already enabled on the client, so if there is a security vulnerability, people are already exposed to it (i hope we caught all of them). So just adding SVG output does not seem to increase the attack surface 21:48:41 bawolff: that's the only one we seriously considered back then 21:49:23 but who knows, the web library world is changing every couple of months these days.. 21:49:56 Im intrigued by Tim's comment about porting to php was interesting. That honestly never even occured to me 21:50:23 but im not sure about the library support in php for DOM 21:50:42 which dompurify seems to use extensively 21:51:04 https://github.com/OWASP/java-html-sanitizer looks potentially interesting 21:51:25 perhaps its there. Im honestly not very familar with that area of the php ecosystem 21:51:38 * robla creates an RFC page https://www.mediawiki.org/wiki/Requests_for_comment/SVG_Upload_should_(optionally)_allow_the_xhtml_namespace 21:51:41 I don't think it supports SVG docs, however 21:53:06 ok...we're coming to the end of the hour. This has been a really informative discussion 21:53:32 ...and we have a place we can collaborate on prose ^^^ 21:53:57 https://github.com/darylldoyle/svg-sanitizer 21:54:14 Did we come to a conclusion about the original issue? 21:54:42 * subbu only read the last few lines of this rfc since he got here late 21:55:01 bawolff: I think the original issue was about should we allow XHTML at all, and I haven't heard any objections to it 21:55:26 "just" need to solve the sanitization challenge.. 21:55:27 the conversation seems moved to "which validation library should we use?" 21:55:43 Subbu: thats exactly what im looking for 21:55:49 that question seems it will be solved after vi vs emacs ;-) 21:56:24 seriously though, we have a bunch of validation options we should probably consider in the RFC writeup 21:56:27 the second issue is that we'll need to upgrade our rasterization tools to support HTML 21:56:51 we do have at least one option for that, but it's not rsvg 21:56:52 gwicke: yeah, I don't think we'll have time to touch on that topic 21:57:11 let's take the discussion back to phab. I'll take the action to summarize there 21:57:12 Thanks for the discussion, All! 21:57:33 any last thoughts before I hit #endmeeting? 21:57:56 (I'll end this arbitrarily very close to the top of the hour) 21:58:31 we should continue this conversation over on #wikimedia-multimedia maybe :-) 21:58:44 Gwicke: i dont know if thats actually neccesary for the original use case 21:58:47 gwicke: further thoughts about the sanitization issue? 21:58:51 * robla regrets not more explicitly pinging them before this 21:59:02 60 second warning 21:59:14 obviously it would make more sense 21:59:52 alrighty....thank you everyone! No meeting planned next week, and the following week has an option or two 22:00:01 Thank you! 22:00:07 #endmeeting