htmlpurifier
Standards compliant HTML filter written in PHP
HTML Purifier - Filter your HTML the standards-compliant way! html filter that guards against xss and ensures standards-compliant output.
Is there anyway to make HTML Purifier preserve the implict spaces that would typically be seen in rendered HTML?
For example you would typically expect a space between Foo
and Bar
in these following cases:
Foo<br/>Bar
Example 1
<div>Foo</div><div>Bar</div>
Example 2
Source: (StackOverflow)
I have an application that needs to display foreign HTML data (e.g. HTML-encoded email texts, though not only) safely - i.e., remove XSS attempts and other nasty stuff. But still be able to display HTML as it should look like. Solutions considered so far aren't ideal:
- Clean HTML with something like HTMLPurifier. Works fine, but once email size goes over 100K it becomes very slow - tens of seconds per email. I suspect any secure enough parser would be as slow in PHP - some emails are really bad HTML, I've seen some that generate 150K HTML for one page of text.
- Display HTML in an iframe - here the problem is that iframe needs then to be in another origin to be safe from XSS AFAIK, and this would require different domain for the same app. Setting up application with two domains is much more work and may be very hard in some setups (such as hosting that gives only one domain name).
Any other solutions that can achieve this result?
Source: (StackOverflow)
As per the HTML Purifier smoketest, 'malformed' URIs are occasionally discarded to leave behind an attribute-less anchor tag, e.g.
<a rel='nofollow' href="javascript:document.location='http://www.google.com/'">XSS</a>
becomes <a>XSS</a>
...as well as occasionally being stripped down to the protocol, e.g.
<a rel='nofollow' href="http://1113982867/">XSS</a>
becomes <a rel='nofollow' href="http:/">XSS</a>
While that's unproblematic, per se, it's a bit ugly. Instead of trying to strip these out with regular expressions, I was hoping to use HTML Purifier's own library capabilities / injectors / plug-ins / whathaveyou.
Point of reference: Handling attributes
Conditionally removing an attribute in HTMLPurifier is easy. Here the library offers the class HTMLPurifier_AttrTransform
with the method confiscateAttr()
.
While I don't personally use the functionality of confiscateAttr()
, I do use an HTMLPurifier_AttrTransform
as per this thread to add target="_blank"
to all anchors.
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_Target();
// purify down here
HTMLPurifier_AttrTransform_Target
is a very simple class, of course.
class HTMLPurifier_AttrTransform_Target extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context) {
// I could call $this->confiscateAttr() here to throw away an
// undesired attribute
$attr['target'] = '_blank';
return $attr;
}
}
That part works like a charm, naturally.
Handling elements
Perhaps I'm not squinting hard enough at HTMLPurifier_TagTransform
, or am looking in the wrong place(s), or generally amn't understanding it, but I can't seem to figure out a way to conditionally remove elements.
Say, something to the effect of:
// more configuration stuff up here
$htmlDef = $htmlPurifierConfiguration->getHTMLDefinition(true);
$anchor = $htmlDef->addElementHandler('a');
$anchor->elem_transform_post[] = new HTMLPurifier_ElementTransform_Cull();
// add target as per 'point of reference' here
// purify down here
With the Cull class extending something that has a confiscateElement()
ability, or comparable, wherein I could check for a missing href
attribute or a href
attribute with the content http:/
.
HTMLPurifier_Filter
I understand I could create a filter, but the examples (Youtube.php and ExtractStyleBlocks.php) suggest I'd be using regular expressions in that, which I'd really rather avoid, if it is at all possible. I'm hoping for an onboard or quasi-onboard solution that makes use of HTML Purifier's excellent parsing capabilities.
Returning null
in a child-class of HTMLPurifier_AttrTransform
unfortunately doesn't cut it.
Anyone have any smart ideas, or am I stuck with regexes? :)
Source: (StackOverflow)
I use mode_rewrite with codeiniter to have url's such as:
/controller/param1/param2
Most of the times param1 and param2 will be IDs from the database (in other words numbers) and nothing else.
The question is, what can and what should I do to protect it against potential hacker attacks? Should I use html purifier for that, or is there a better way, or is there even a need to do something?
I am really new at security and protection, and I just heard about html purifier and I would rather not used it everywhere like beginners probably tend to.
Should I just preg_match()?
If preg_match() is the soluton, which expresiion accepts only numeric value (ID value)?
Source: (StackOverflow)
My idea is to somehow minify HTML code in server-side, so client receive less bytes.
What do I mean with "minify"?
Not zipping. More like, for example, jQuery creators do with .min.js versions. In other words, I need to remove unnecessary white-spaces and new-lines, but I can't remove so much that presentation of HTML changes (for example remove white-space between actual words in paragraph).
Is there any tools that can do it? I know there is HtmlPurifier. Is it able to do it? Any other options?
P.S. Please don't offer regex'ies. I know that only Chuck Norris can parse HTML with them. =]
Source: (StackOverflow)
I want to allow a limited white list of HTML tags that users can use in my forum. So I have configured the HTML Purifier like so:
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,a[href|rel|target|title],img[src],span[style],strong,em,ul,ol,li');
$purifier = new HTMLPurifier($config);
What I am wondering is, does the default configuration of the HTML Purifier still apply, with the exception of a reduced number of accepted HTML tags or do I need to re-set every possible configuration parameter manually?
Additionally, should I tweak the default configuration in any way to stay safe? I am new to the whole XSS protection thing, new to HTML Purifier and didn't find that the manual gave a lot of 'basic' tips and hints.
Source: (StackOverflow)
This is kind of a special combination of tags that I want to allow in HTMLPurifier, but can't seem to get the combination to work.
I can get script tags to work, but then embed tags get removed (I enable the script tags with HTML.Trusted = true). When I get embed tags back in, script tags are stripped out (I remove HTML.Trusted). The following is my config:
$config->set('HTML.Trusted', true);
$config->set('HTML.SafeEmbed', true);
$config->set('HTML.SafeObject', true);
$config->set('Output.FlashCompat', true);
I even tried adding in the following which made things worse:
$config->set('HTML.Allowed', 'object[width|height|data],param[name|value],embed[src|type|allowscriptaccess|allowfullscreen|width|height],script[src|type]');
Also, I can't seem to get iframes to work no matter what. I tried adding:
$config->set('HTML.DefinitionID', 'enduser-customize.html iframe');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true);
$iframe = $def->addElement(
'iframe', // name
'Block', // content set
'Empty', // allowed children
'Common', // attribute collection
array( // attributes
'src*' => 'URI#embedded',
'width' => 'Pixels#1000',
'height' => 'Pixels#1000',
'frameborder=' => 'Number',
'name' => 'ID',
)
);
$iframe->excludes = array('iframe' => true);
Any help on getting the entire combo to work, or even script tags with object/param and embed would be GREATLY appreciated!!!
Oh yeah, this is obviously not for all users, just "special" users.
Thanks!
PS - please don't link me to http://htmlpurifier.org/docs/enduser-customize.html
UPDATE
I found a solution for adding iframes at the bottom of the thread here: http://htmlpurifier.org/phorum/read.php?3,4646
The current configuration is now:
$config->set('HTML.Trusted', true);
$config->set('HTML.SafeEmbed', true);
$config->set('HTML.SafeObject', true);
$config->set('Output.FlashCompat', true);
$config->set('Filter.Custom', array( new HTMLPurifier_Filter_MyIframe() ));
UPDATE TO THE UPDATE
If you're having trouble with my comment in the HTMLPurifier forum, it may be because I mean for the method to look like this:
public function preFilter($html, $config, $context) {
return preg_replace("/iframe/", "img class=\"MyIframe\" ", preg_replace("/<\/iframe>/", "", $html));
}
Source: (StackOverflow)
I'm using HTMLPurifier
to sanitize HTML string (it's about security).
Some attributes (like width
or height
) are removed when HTMLPurifier is called. I don't consider this as a security issue.
How can I add this attribute without redefining the whitelist ?
I searched on Stackoverflow and HTMLPurifier documentation, but the only solution seems to be :
$config->set('HTML.Allowed', 'p,b,a[href],i');
But this is not a solution, because I don't want to redefine the whitelist (I trust the default HTMLPurifier configuration, I just want to add an exception).
Source: (StackOverflow)
Is it possible to have htmlpurifier use the html5 doctype?
The documentation here states that you can change the doctype and encoding with the following:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding
$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
but then in the install instructions here states that the supported doctypes are:
256 Other supported doctypes include:
257
258 * HTML 4.01 Strict
259 * HTML 4.01 Transitional
260 * XHTML 1.0 Strict
261 * XHTML 1.0 Transitional
262 * XHTML 1.1
Is it possible to do the following to allow html5 doctype?
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'UTF-8'); // replace with your encoding
$config->set('HTML', 'Doctype', 'html5'); // replace with your doctype
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
Or is there another way?
Source: (StackOverflow)
I'm using HTMLPurifier and even thou I have :
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
it removes all 'target' attribues from the links.
Any idea why is it doing it?
Source: (StackOverflow)
How can I use HTMLPurifier to filter xss but also to allow iframe Vimeo and Youtube video?
require_once 'htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Trusted', true);
$config->set('Filter.YouTube', true);
$config->set('HTML.DefinitionID', '1');
$config->set('HTML.SafeObject', 'true');
$config->set('Output.FlashCompat', 'true');
$config->set('HTML.FlashAllowFullScreen', 'true');
$purifier = new HTMLPurifier($config);
$temp = $purifier->purify($temp);
Source: (StackOverflow)
I'm researching PHP security best practices and specifically the HTML Purifier library.
I like the idea of using a third-party library to help strengthen the security of my sites, but I'm confused about a few things...
First, a general question... What does HTML Purifier do that practicing secure PHP programming can't?
If I'm using HTML Purifier, does that mean I get to skip common security measures like using PHP functions to filter input and escape output?
One of the response comments for this question seems to suggest that HTML Purifier is only needed for elements that allow HTML tags, such as WYSIWYG editors. Is this correct?
Has anyone noticed a performance lag from using HTML Purifier? This article makes it seem like performance impact is worth considering.
Are there any up-to-date tutorials on integrating HTML Purifier with a non-framework PHP application? Everything I've found is either old or framework-specific.
Just to confirm that I've done my homework before asking this...
This question is essentially the same as mine, but the lone response seems to just list another best practice that the asker forgot to mention
This 'bountiful' question is a terrific resource about HTML Purifier and HTML5, but assumes foundational knowledge
This comparison page on HTML Purifier's site is more of a comparison to other filters
Source: (StackOverflow)
Is there a simple approach to add a HTML5 ruleset for HTMLPurifier?
HP can be configured to recognize new tags with:
// setup configurable HP instance
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'html5 draft');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // no caching
$def = $config->getHTMLDefinition(true);
// add a new tag
$form = $def->addElement(
'article', // name
'Block', // content set
'Flow', // allowed children
'Common', // attribute collection
array( // attributes
)
);
// add a new attribute
$def->addAttribute('a', 'contextmenu', "ID");
However this is clearly a bit of work. Since there are a lot of new HTML5 tags and attributes that had to be registered. And new global attributes should be combinable even with existing HTML 4 tags. (It's difficult to judge from the docs how to augment core rules). So, is there a more useful config format/array structure to feed new and updated tag+attribute+context configuration (inline/block/empty/flow/..) into HTMLPurifier?
# mostly confused about how to extend existing tags:
$def->addAttribute('input', 'type', "...|...|...");
# or how to allow data-* attributes (if I actually wanted that):
$def->addAttribute("data-*", ...
And of course not all new HTML5 tags are fit for unrestricted allowance. HTMLPurifier is all about content filtering. Defining value constraints is where it's at. -- <canvas>
for example might not be that big of a deal when it appears in user content. Because it's useless at best without Javascript (which HP already filters out). But other tags and attributes might be undesirable; so a flexible configuration structure is imperative for enabling/disabling tags and their associated attributes.
(Guess I should update some research...). But there's still no practical compendium/specification (no, XML DTDs aren't) that suits a HP configuration.
(Uh, and HTML5 is no longer a draft.)
Source: (StackOverflow)
My html purifier settings now allow only these tags
$configuration->set('HTML.Allowed', 'p,ul,ol,li');
I want to allow indentation of lists and my editor uses this html
<ul style="margin-left: 40px;">
How should I change my HTMLPurifier Allowed tags? I thought to add style
, but I think it would be better to specify exactly which style is allowed, which in this case would be margin-left
. What is the right way to change the HTML.Allowed for this case?
Source: (StackOverflow)
I am using HTML Purifier to protect my application from XSS attacks. Currently I am purifying content from WYSIWYG editors because that is the only place where users are allowed to use XHTML markup.
My question is, should I use HTML Purifier also on username and password in a login authentication system (or on input fields of sign up page such as email, name, address etc)? Is there a chance of XSS attack there?
Source: (StackOverflow)