Tuesday, 15 April 2014

How many Base64 encoders is in JDK/JRE

Base64 variants

Base64 encoding is an algorithm that converts binary data into ASCII. Resulting string consist of characters A-Z,a-z,0-9 and two extra '+' (plus) and '/' (slash) and also padding character '=' (equals). Conversion does not happens the same every time, there is few variants of it.

- Simple (basic) encoding creates single longlonglonglonglonglonglonglonglonglonglonglooooong= base64 encoded line.

- Fixed line lenght encoding, sometimes also called Mime base64 encoding or chunked encoding or simply line folding. Instead of single long line, it produces multiple lines usualy 76 characters long. It is quite important, because it is mandatory in some use scenarios (Binary attachments) , while harmful in others (BASIC authentication header).

- URL (safe) encoding produce string that can be used as parameter value in URL. Because '+' and '/' are not allowed, they are encoded as '-' and '_' while '=' padding character is usually removed.

Now back to initial question...

How many Base64 encoders/decoders is present in Oracle (Sun) JDK?

I found 6 of them

sun.misc.BASE64Encoder Since Java 1.0? Well we all know that we should not touch anything from sun.* or com.sun.* packages. So we don't.

javax.xml.bind.DatatypeConverter Since Java 1.6 - This one actually works, but allows you only basic encoding. No mime or url encoding.

java.util.Base64 Since Java 1.8 - Finaly generaly usable Base64 encoder/decoder allowing basic, mime and url safe encoding.

And finaly some curiosities illustrating how even Sun/Oracle JDK/JRE contributors were missing Base64 encoder, so they created their own.

java.util.prefs.Base64 Since Java 1.4, but has default (package) visibility, therefore not usable

com.sun.net.httpserver.Base64 Since Java 1.6, but has default (package) visibility, therefore not usable

com.sun.org.apache.xml.internal.security.utils.Base64 - Similar story as sun.misc.BASE64Encoder, it also internaly uses XMLUtils.ignoreLineBreaks() to perform line folding...

Let's see some encoding results

Both commons-codec 1.6+ and Java8 java.util.Base64 can produce and consume any of mentioned base64 variants, but beware of quite different encoding results. I think that a lot of headaches is coming because of that.

In following test, commons-codec 1.9 and Java8u5 is used

Mime (chunked) encoding
import org.apache.commons.codec.binary.Base64;

String string = "This string encoded will be longer that 76 characters and cause MIME base64 line folding";
 
byte[] encodeBase64Chunked = Base64.encodeBase64Chunked(string.getBytes());
System.out.println("commons-codec Base64.encodeBase64Chunked\n" + new String(encodeBase64Chunked));

String encodeMimeToString = java.util.Base64.getMimeEncoder().encodeToString(string.getBytes());
System.out.println("java.util.Base64.getMimeEncoder().encodeToString\n" + encodeMimeToString);
prints
commons-codec Base64.encodeBase64Chunked
VGhpcyBzdHJpbmcgZW5jb2RlZCB3aWxsIGJlIGxvbmdlciB0aGF0IDc2IGNoYXJhY3RlcnMgYW5k
IGNhdXNlIE1JTUUgYmFzZTY0IGxpbmUgZm9sZGluZw==

java.util.Base64.getMimeEncoder().encodeToString
VGhpcyBzdHJpbmcgZW5jb2RlZCB3aWxsIGJlIGxvbmdlciB0aGF0IDc2IGNoYXJhY3RlcnMgYW5k
IGNhdXNlIE1JTUUgYmFzZTY0IGxpbmUgZm9sZGluZw==

Java8 mime Encoder ends with '==' padding and does not add last newline (CR/LF) after that!

URL (safe) encoding
String string = "ůůůůů";

String encodeUrlToString = java.util.Base64.getUrlEncoder().encodeToString(string.getBytes());
System.out.println("java.util.Base64.getUrlEncoder().encodeToString\n" + encodeUrlToString);

String encodeBase64URLSafeString = Base64.encodeBase64URLSafeString(string.getBytes());
System.out.println("commons-codec Base64.encodeBase64URLSafeString\n" + encodeBase64URLSafeString);
prints
java.util.Base64.getUrlEncoder().encodeToString
xa_Fr8Wvxa_Frxc=
commons-codec Base64.encodeBase64URLSafeString
xa_Fr8Wvxa_Frxc
Java8 url Encoder leaves padding '=' at the end of the result, which makes it unusable as URL parameter value!

UPDATE: This was reported while ago and it has turned out, that any Encoder can be switched into non-padding using withoutPadding() method.

String string = "ůůůůů";
String encodeUrlToString = Base64.getUrlEncoder().withoutPadding().encodeToString(string.getBytes());
System.out.println("java.util.Base64.getUrlEncoder().withoutPadding().encodeToString\n" + encodeUrlToString);
prints
java.util.Base64.getUrlEncoder().withoutPadding().encodeToString
xa_Fr8Wvxa_Frw

Note: In quite old commons-codec 1.4 chunking was incostitently turned on by default for encode() method, resulting in nasty surprises. See Jira ticket.

Happy Base64 encoding

No comments:

Post a Comment