Code obfuscation in Java

You carefully reviewed your Java source code for thing you don’t want to happen?
Well, look again and take a close look at your comments…

What do you think the following example will produce?

public class OMG {
  public static void main(String[] args) throws Exception {
     /*
       \u006a\u0075\u006e\u006b\u0079\u002a\u002f
       \u0053\u0079\u0073\u0074\u0065\u006d\u002e
       \u006f\u0075\u0074\u002e\u002f\u002f\u0078
       \u0070\u0072\u0069\u006e\u0074\u006c\u006e
       \u0028\u0022\u0048\u006f\u0077\u003f\u0022
       \u0029\u003b\u002f\u002a\u0020\u0062\u0079
       \u0040\u006d\u0069\u0068\u0069\u0034\u0032
     */
  }
}

It’s a main method with just some comments in it… seems strange but reasonably harmless doesn’t it?
Well, it isn’t that harmless…

The output of this is the following:

How?

As part of the compilation step Java performs a conversion of the ASCII escaped unicode characters in the comments to real unicode characters. These characters may also close a comment block. The upper code block thus gets converted into this:

public class OMG {
  public static void main(String[] args) throws Exception {
     /*
       junky*/
       System.
       out.//x
       println
       ("How?"
       );/* by
       @mihi42
     */
  }
}

According to Joshua J. Drake’s Black Hat article someone called Michael Schierl reported that first. But as he does not provide a reference and I could not find anything besides Joshua’s article via Google, the credits have to stand unlinked for now.

Edit: At least found his twitter account… 🙂

2 thoughts on “Code obfuscation in Java”

  1. I am not sure if I “reported that first”. Several Java Puzzlers (http://www.javapuzzlers.com/) are about weird effects of Unicode escape decoding.

    But it was (probably) me who created some exposure to this little “gem” by my tweet a few years ago: https://twitter.com/mihi42/status/72734748329521152

    By the way, Java also treats Unicode direction marks (LTR override, RTL override) as normal whitespace; writing part of a line right-to-left is also quite a good obfuscation technique (since most editors/IDEs will obey them nowadays), until you notice it and strip out all the override characters.

Comments are closed.