Friday, September 30, 2011

XSLT, entities, Java, Xalan...

Java
The Apache XML Project / Xalan-J
As already mentioned, I have been using XSLT and Xalan-J for quite a long while. It was recently that I was dealing with an HTML-related transform that used a lot of special characters.

Previously I blindly used to assume that named character references are a part of HTML, not defined for generic XML, so not available in XSLT files too. No big problem if numeric character references are available. But that time I had a good incentive to give it a better thought . After all, the very name of XML states extensibility. So what prevents us from getting named character references being defined for XSLT files too?

In fact, nothing prevents. An absolutely legal way to get them defined is to extend the document type declaration for XSLT, like this:
<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE transform [
  <!ENTITY % w3centities-f PUBLIC "-//W3C//ENTITIES Combined Set//EN//XML"
      "http://www.w3.org/2003/entities/2007/w3centities-f.ent">
  %w3centities-f;
]>
<xsl:transform version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
...

With this small modification the miracle happened, and named character references stopped complaining . Everything was fine, but, as one of the Murphy's laws states, "If everything seems to be going well, you have obviously overlooked something" .

This time the pitfall was in the reference to the entities definition,
"http://www.w3.org/2003/entities/2007/w3centities-f.ent". If we tell the XSLT engine to use some external resource, it naturally has to go for it . In case we do not take necessary care, the only place to go is the Internet. Everything is going to work, but maybe slower than we might expect, and crashing if Internet connection is not available.

The standard approach to deal with this is using a catalog-based entity/URL resolver. This time unexpectedly it did not help. Nothing was wrong with the resolver and the catalog, and still the XSLT engine persistently went fetching the entities from the Internet.

The cause of the issue was found in the Xalan-J sources. Perhaps nobody before considered seriously using external entities in an XSLT file, thus no traces of using an entity resolver for an XSLT file in the code. It may be worth mentioning that those of us who might be not familiar with Xalan-J as such and just plainly use JAXP, are likely to tread on the same issue, as Xalan-J is a part of the reference JAXP implementation that is endorsed into standard JDK/JRE implementations like Oracle J2SE and OpenJDK.

The Fix


The last Xalan-J release 2.7.1 was taken as the basis for amendments. For those who need just things working, a patched Xalan-J binary build is available as xalan-2.7.1-usn-20110928.jar. Those interested in what is under the hood are welcome to have a look at the modifications.

The fix was submitted to the Xalan-J project as an attachment to issue XALANJ-2544.

Usage – HOW-TO


In case you feel like using some named character references in your XSLT file:
  1. Add appropriate entities to your XSLT transform/stylesheet, either one by one, like this:
    <?xml version="1.0" standalone="yes" ?>
    <!DOCTYPE transform [
      <!ENTITY nbsp  "&#x000A0;">
      <!ENTITY ndash "&#x02013;">
    ]>
    <xsl:transform version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    ...
    
    or all at once:
    ...
    <!DOCTYPE transform [
      <!ENTITY % w3centities-f PUBLIC "-//W3C//ENTITIES Combined Set//EN//XML"
          "http://www.w3.org/2003/entities/2007/w3centities-f.ent">
      %w3centities-f;
    ]>
    ...
    
  2. If you could limit yourself to a definite set of entities, or your XSLT engine is not Xalan-based, have fun
  3. If not, go ahead and download the patched Xalan-J binary as xalan-2.7.1-usn-20110928.jar.
  4. Arrange your JVM to use the patched .jar, keeping in mind that Xalan-J is an endorsed technology. One of possible approaches is to use -Djava.endorsed.dirs=path_to_the_folder_containing_the_new_jar JVM launcher option. See more on the Oracle Java Endorsed Standards Override Mechanism page.
  5. Make sure you use some XML Entity Resolver. You may wish to have a look at Apache XML Commons Resolver as a good candidate.
  6. Get yourself a catalog for your resolver, if not yet. If starting from scratch, that may be a plain text file in OASIS TR9401 format:
    PUBLIC "-//W3C//ENTITIES Combined Set//EN//XML"
           "file:///path_to_your_local_copy/w3centities-f.ent"
    SYSTEM "http://www.w3.org/2003/entities/2007/w3centities-f.ent"
           "file:///path_to_your_local_copy/w3centities-f.ent"
    
  7. Get your catalog known to the resolver, say, via the xml.catalog.files system property supplied to the JVM launcher: -Dxml.catalog.files=your_catalog_file_name_with_path.
  8. Enjoy .

Sunday, September 4, 2011

Java, Xalan, Unicode command line arguments...

Java
I have always believed on bare word that Java is Unicode-friendly by design. And I have been using Xalan-J happily for quite a long while already.

It was just recently that I ran into Unicode-related problem with Java, and that was occasionally related to Xalan-J. Strictly speaking, Xalan-J was not to blame. But that was Xalan-J Command-Line Utility who refused to do its job when handling file names with characters missing from my local system code page.

It did not take a long search to discover that the problem is known and platform-specific, at least specific for Windows. The bug is known at least since the early days when the yet-to-come Java 6 was called Mustang ("Support for Unicode in Mustang ?" , 2005). It was discussed later on JavaRanch - Unicode: cmd parameters (main args); exec parameters; filenames and on OpenJDK 'core-libs-dev' mailing list - RFE 4519026: (process) Process should support Unicode on Win NT, request for review and Unicode support in Java JRE on Windows.

The origin of the bug looks like dating back to the days when it was necessary to have Java running on both Unicode and non-Unicode (e.g. Windows 9x) platforms, so using non-Unicode system calls in Java launcher code was somehow justified. A lot of water has flown under the bridge since then, and Windows 9x is no more supported by new JVM versions... And still my quick experiment revealed that the new Java 7 comes with the same bug in its place...

The right approach would be to get the bug fixed in the Java launcher. But unfortunately it is already long since I wrote something in C/C++... So the quick-and-dirty solution happened to be a Java wrapper.

Wrapper Implementation


The first idea was to get a wrapper just for Xalan-J. But then I remembered of the other command-line utilities written in Java and came to an idea of a general-purpose command line wrapper.

It was necessary to provide possibility of handling Unicode parameters at command shell level (e.g. file names, etc). And then supply all the command line arguments to Java, bypassing the command line . The solution was found as passing the command line arguments via a temporary file, one argument per line, to be written by command shell in some flavor of Unicode, to be further read in by the Java wrapper. And there was no rich variety of convenient Unicode flavors under Windows, as the cmd tool only allows easy output in UTF-16LE using the /u switch.

The resulting wrapper is available for download and free use under Apache 2.0 license, as both source and a compiled .jar.

Other command-line tools coming to my mind besides Xalan-J, that might benefit from using with this wrapper, include: Batik, FOP, CSSToXSLFO, ...

Usage – HOW-TO


  1. Download the .jar file and save it to a location of your choice.
  2. Prepare / modify the script you use to launch your command-line tool. Say, for Xalan-J your script might contain something like:
    @cmd /u /c echo -IN >%TMP%\xalan-args.tmp
    @cmd /u /c echo %~f1 >>%TMP%\xalan-args.tmp
    @cmd /u /c echo -XSL >>%TMP%\xalan-args.tmp
    @cmd /u /c echo %~f2 >>%TMP%\xalan-args.tmp
    @cmd /u /c echo -OUT >>%TMP%\xalan-args.tmp
    @cmd /u /c echo %~f3 >>%TMP%\xalan-args.tmp
    @if not %4. == .  cmd /u /c echo %~4 >>%TMP%\xalan-args.tmp
    @if not %5. == .  cmd /u /c echo %~5 >>%TMP%\xalan-args.tmp
    ...
    java [...] usn.unicode.UnicodeLauncher org.apache.xalan.xslt.Process %TMP%\xalan-args.tmp
    
    Ensure that usn-unicode-20110903.jar is on your class path. Note that all right angle redirection characters are doubled, except the first line... Refer to the "Using batch parameters" Microsoft page for syntax like %~f1. Note also %~4 instead of %4 to remove surrounding quotes if any.
  3. Have fun

Friday, August 19, 2011

Tomcat Redirect Valve

Apache Tomcat

⇓ Probably the most important part – "All we need is already here"

added on 2011-11-27

⇓ Another important news – ... now implemented as Tomcat 8 component

added on 2013-03-17

⇓ And the final news – ... now available as standard Tomcat feature

added on 2014-11-16



Background


Every mature web resource eventually realizes that it needs URL redirection.

Leaving aside manual and client-side redirection techniques, and concentrating on server-side redirection, one can see that in the Apache HTTP Server world mod_rewrite is the canonical and powerful tool to do the job.

In the Java world there is also a recognized leader for this task. That is UrlRewriteFilter, also hosted at Google Code. BTW, the front page of that project contains a good list of reasons for one to be interested in URL rewriting (redirection).

There is nothing wrong about UrlRewriteFilter. It is feature rich, and mature, and well supported, and actively developed. Nothing wrong except it is a Filter .

On one hand, being a Filter is not bad. In any case Filter is a standard concept defined by the Servlet specification, so portability across different web application containers may be considered as guaranteed.

On the other hand, having URL rewriting as part of a web application may be not always sufficient. It is not uncommon for a servlet container to host more than one web application, and thus separation of responsibilities may become a concern. Web application belongs to the developer's responsibility domain. Meanwhile the container belongs to the administrator's responsibility domain. And an administrator may have his own ideas on the requirements for redirection. So an administrator may need a separate facility for setting up server-side redirects securely, without risk of being blamed for ruining the developer's masterpiece... And speaking seriously, an administrator may occasionally wish to establish server-wide redirection policies that would be common for all the applications being hosted.

Unfortunately there is no common standard for Java web application containers that we could base upon for this task. Every container seems to have its own specifics. Narrowing our needs to Tomcat as one of the widely used containers we may come to an idea that a Valve may be the right tool. BTW, there is an noteworthy consideration of Valves vs Filters in Nicolas Fränkel's blog.

The idea of a Redirect Valve for Tomcat has been in the air for quite a while already. Curious to note, it was even on the Tomcat 4 TODO list. And deep googling for a "RedirectValve" reveals several in-house implementations of such a Valve being mentioned across various exception logs here and there. Meanwhile it looks like nobody cared to make such a tool publicly available so far...

Implementation


The first draft implementation of a Redirect Valve was suggested in 2002 by Jens Andersen. Unfortunately the implementation was incomplete and thus was not adopted for inclusion into the Tomcat project.

Bringing the existing draft to working state was not a hard job. So the final result:
  • allows configuration of a Valve instance in terms of Java regular expressions and substitutions using capturing groups;
  • for every incoming HTTP request:
    • makes an effort to reconstruct the original request URL from sparse pieces available at the valve level;
    • saves the URL as a request attribute to be used by subsequent RedirectValve instances that may follow in the pipeline;
    • matches the URL against a pre-configured regular expression;
    • in case the match succeeds, sets the response status code as SC_MOVED_PERMANENTLY, sets the response "Location" header as prescribed by a pre-configured expression using capturing groups and gets the request out of the processing pipeline;
Well, nothing fancy at all, but looks like doing the job for standard cases...

This RedirectValve implementation was submitted to the Tomcat project again. The result was the same as for the submission of 2002, the reasons now being that in the times of UrlRewriteFilter there is no need for other tools, and those willing to have a redirection Valve are free to wrap UrlRewriteFilter into a Valve by themselves...

Admitting this may be true, I still dare to offer this Valve to the public. Please feel free to use and modify. The license is Apache 2.0. Both the source and a compiled .jar are available. Still, as the code is not likely to become a part of the Tomcat project, the package name prefix is changed from org.apache to usn.apache .

Usage – HOW-TO


  1. Download the .jar file .
  2. Copy it to the lib folder of your Tomcat installation or to any other appropriate location of your choice. You may also wish to have a look at Tomcat classloader overview at MuleSoft.
  3. Register RedirectValve as a Valve with your Tomcat by adding an appropriate section to the contents of relevant <Host> element in your conf/server.xml file. Say, a task of redirecting/mapping a Tomcat web application to a virtual directory at your front-end server on the same machine might be configured like this:
    <Valve
        className="usn.apache.catalina.valves.RedirectValve"
        from="^http://([^:]+):8080/my-web-app/(.*)$"
        to="http://{1}/my-virtual-dir-at-the-front-end-server/{2}"
        />
    
    Feel free to add as many RedirectValve elements in a series as you wish.
  4. Restart Tomcat.
  5. Enjoy .



"All we need is already here"

added on 2011-08-24

Being curious about when Google gets my blog indexed and listed , I have occasionally found somewhere in the third dozen of search result pages that a "Redirect Valve" for Tomcat already exists as RewriteValve, a component of JBoss Web Server. Looks like the code does not have any dependencies on JBoss internals, and the license is LGPLv2.1 or later. The functionality looks really rich and closely resembles mod_rewrite.

I wish I knew it before starting off with my simplistic implementation... Well, one more great tool to enjoy!



... now implemented as Tomcat 8 component

added on 2013-03-17

It recently appeared that a new rewrite valve implementation is being developed as a component of Tomcat 8. The docs say that the capabilities and configuration facilities resemble Apache HTTP Server mode_rewrite closely. This is good news for all of us IMHO...



... now available as standard Tomcat feature

added on 2014-11-16

The latest and probably the final news on this topic is that as of June 2014 Tomcat 8 has reached release status, with rewrite valve as a standard component. Nothing more left to be desired...