XSS protection

Why “protection” rather than “prevention”?

Simply because it wouldn´t be realistic. Effective XSS (abbreviation to Cross-site Scripting) prevention is very hard to get. That´s why you´ll find security advisories describing XSS vulnerabilities even for the most popular software products providing a web-based user interface. This article intends to show you why that´s the case.

The essential step to approach XSS “protection”

Like with all, it´s comprehension. “Cross-site” refers to the fact, that the web server sends data to the client, “Scripting” details that such data could be executed by the client (but not by the server itself). But in order for a web server to send executable code to the client, this data first has to get to the server. Either by being programmed as part of the web application the server provides or … sent to the server by a client!

This understanding brings two basic concepts to the mind

  • What kind of data a (web) client executes that a (web) server does not? It must be the currently most widely-spread programming language: Javascript (JS). The potential harm running untrustworthy JS code is infinite as JS is virtually omnipotent.
  • If data has to be stored on the (web) server before it can unleash its potential harm to clients, it needs to be considered which data is worth being trusted and which is not

Untrustworthy data

Which data can be deemed to be trustworthy? Only the one that is fully under control of your party. Typically this is your program code (provided you´ve conducted adequate quality checks). Nothing else!

Second, there might be code of other parties used by your own code. You may check the quality of such code as well and – if it proofs to be secure – trust it (on a per-release basis).

Third, there is (arbitrary) data generated by the clients (users) that your party has no control about. As a consequence, such data is always(!) untrustworthy and logically in scope of XSS protection.

Usages of untrustworthy data

Having deduced the data in scope, currently there are three ways to use such data in web UIs presented by a browser and thus potentially prone to XSS

  • Display the data as text, in text inputs (incl. text areas) or select boxes
  • Use the data as part of generated JS code
  • Use the data as part of generated Cascaded Style Sheets (CSS)

In order to consider the attack surface of your application, will you need untrustworthy data to be part of generated CSS? Probably not. Will you need untrustworthy data to be part of generated JS? Probably yes. Will you need to display untrustworthy data within your web UI? Definitely!
So we have made our priorities clear. Why is that important? Because each of these cases requires a different approach to XSS protection.

Consequently, this article will focus on the top priority, which is displaying untrustworthy data as HTML.

Display untrustworthy data as HTML text

Depending on the main use case of your application, there are two basic strategies to approach XSS protection

  • If your application is to be used just to process basic text (no potentially malicious special characters (‘, “, <, >), no line feeds, tabs etc., then the easiest and most effective approach is to delete all unsupported characters from untrustworthy data before persisting the same (XSS input sanitization). The result could even be referred to as an XSS “prevention” as there is nothing left that could be exploited
  • If your application is to be used to process untrustworthy data exactly the way it has been entered by users (blogs, descriptions including code snippets, emojicons etc.), then things get far more complicated

Consequently, this article will focus on the latter (thus more interesting) case. By the way: the software powering this web site is such an application.

No XSS input sanitization

If you must display untrustworthy data exactly the way it has been entered by users, the logical consequence is that you wouldn´t want to touch them when persisting the same. This implies that your persistence layer may contain potentially (XSS) malicious data (not to be confused with SQL injection, whose input sanitization is always mandatory (but that´s another story out of scope of this article). So the persisted data remains perfectly untrustworthy.
Subsequently, whenever data is extracted, it needs to be properly XSS sanitized before being output to a web UI.

XSS output sanitization

“Web” UI means that data is output to (X)HTML, so regarding XSS some aspects of HTML are good to be known. Not all HTML elements use the same syntax pattern with regards to the position of data to be displayed

  • Text inputs define the output value in a value attribute
  • Text areas and select options enclose the output value between their start and end tags

Good news might be that modern web browsers automatically sanitize the values of the value attributes. But attackers are smart enough to break that mitigation. As every string requires delimiters, the same may be used to intentionally end a string previously input followed by malicious (JS) code.

Break the implicit XSS mitigation on output values

Following a classic HTML example

<input type=”text” value=”some output data”>

That code displays a text input field holding the value some output value. Typically it is part of a web form that sends the data entered to the server on submit. According to the approach described above the web server will not XSS sanitize that value before persisting it. If the application grants users to edit their data, the form is loaded and the inputs are pre-populated with the existing values requested from the persistence layer. And we rely on the browser to sanitize them…

Now imagine an attacker entering the following value into the text field

” <script>alert(“XSS!”)</script>

and sending it to the server which stores it unchanged. When opening the edit form, this value will be loaded which causes the browser to parse the following code

<input type=”text” value=”” <script>alert(“XSS!”)</script>”>

Wow! The leading double quotes terminate the value of the value attribute and the following code is proper JS which will run like a charm popping up the message XSS!. XSS in action…

How to XSS output sanitize

Fortunately, HTML offers means to overcome such “misinterpretations”. Any potentially ambiguous or special character has a unique literal referred to as an HTML entity. By replacing each ambiguous character by its corresponding HTML entity, things get very clear to the browser (backslashes inserted in order to prevent the browser from displaying the corresponding HTML characters)

<input type=”text” value=”&\quot; &\lt;script&\gt;alert(&\quot;XSS!&\quot;)&\lt;/script&\gt;”>

Nothing left to execute. More obvious is the case of HTML elements enclosing the output value between tags

<textarea rows=”3″ cols=”25″><script>alert(“XSS!”)</script></textarea>

will cause the script to be executed. Once sanitized, the code changes to (backslashes inserted in order to prevent the browser from replacing them by their corresponding HTML character)



It is advisable to encapsulate sanitizers into dedicated classes or functions in order for them to serve as references thus avoiding instances (Don´t Repeat Yourself). For the same reason, it is also advisable to encapsulate the code generating HTML elements into a dedicated class, so the generator methods inherently make use of the sanitizer methods or functions. In a best case, there will be no other option than using the dedicated generators in order to produce output, because if just a single output is not considered for sanitization, it will be vulnerable.


Having realized the ideas described above, it´s not too challenging to verify the basic effectiveness of your XSS protection (best by using an offensive security testing tool in order to automate it). The following examples are taken from the Housekeeping application in order to make them easily reproducible.

Having realized the ideas described above, it´s not too challenging to verify the basic effectiveness of your XSS protection (best by using an offensive security testing tool in order to automate it). The following examples are taken from the Housekeeping application in order to make them easily reproducible.

Elements with a “value” attribute

In all of the following examples no popup should JS should appear.

Text input field. As it is a search field, the search should just run normally and re-display the search value entered.

XSS value terminator attack
Classic XSS attack on value terminator string

And is should find malicious code if a user (attacker) has entered such.

XSS script tag attack
Classic XSS attack on script tag

Elements enclosing output between tags

Text area. If loading an existing comment for edit, nothing unusual should happen

Javascript in text area
Text area containing Javascript

Select option using untrustworthy data

Variable select entries might be vulnerable to XSS attacks
XSS attack on variable select entry

Simple text output

Display Javascript code as text
Javascript code in comment

A comment containing Javascript

A note containing Javascript (as well as SQL injection strings and multi-byte characters)

Finding any characters, including potentially executable ones
Search hits on multi-byte as well as script tag characters

This also demonstrates that even search hits on malicious code can be (securely) highlighted using CSS. This advanced technique is described in the the dedicated how-to.

Comments are closed.