# Jaded - Java Deserialization for Rust

Java has a much maligned (for good reason) serialization system
built into the standard library. The output is a binary stream
mapping the full object hierarchy and the relations between them.

The stream also includes definitions of classes and their hierarchies
(super classes etc). The full specification is defined
[here](https://docs.oracle.com/en/java/javase/17/docs/specs/serialization/protocol.html).

In any new application there are probably better ways to serialize data
with fewer security risks but there are cases where a legacy application
is writing stuff out and we want to read it in again. If we want to read
it in a separate application it'd be good if we weren't bound to Java.

## Example

### In Java
```java
import java.io.FileOutputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class Demo implements Serializable {
    private static final long serialVersionUID = 1L;
    private String message;
    private int i;
    public Demo(String message, int count) {
        this.message = message;
        this.i = count;
    }
    public static void main(String[] args) throws Exception {
        Demo d = new Demo("helloWorld", 42);
        try (FileOutputStream fos = new FileOutputStream("demo.obj", false);
                ObjectOutputStream oos = new ObjectOutputStream(fos);) {
            oos.writeObject(d);
        }
    }
}
```

### From Rust
```rust
use std::fs::File;
use jaded::{Parser, Result};

fn main() -> Result<()> {
    let sample = File::open("demo.obj").expect("File missing");
    let mut parser = Parser::new(sample)?;
    println!("Read Object: {:#?}", parser.read()?);
    Ok(())
}
```

### Output from Rust
```
Read Object: Object(
    Object(
        ObjectData {
            class: "Demo",
            fields: {
                "i": Primitive(
                    Int(
                        42,
                    ),
                ),
                "message": JavaString(
                    "helloWorld",
                ),
            },
            annotations: [],
        },
    ),
)
```
## Conversion to Rust types
For most uses cases, the raw object representation is not very ergonomic
to work with. For ease of use, types can implement `FromJava`, and can then be
read directly from the stream.

In the majority of cases this implementation can be automatically derived by
enabling the `derive` feature.

```rust
#[derive(Debug, FromJava)]
struct Demo {
    message: String,
    i: i32,
}
```
Demo objects can then be read directly by the parser
```rust
fn main() -> Result<()> {
    let sample = File::open("demo.obj").expect("File missing");
    let mut parser = Parser::new(sample)?;
    let demo: Demo = parser.read_as()?;
    println!("Read Object: {:#?}", demo);
    Ok(())
}
```
Output from rust
```
Read Object: Demo {
    message: "helloWorld",
    i: 42,
}
```
### Objects with custom writeObject methods
Often classes, including many in the standard library, customise the way they
are written using a `writeObject` method that complements the builtin serialization
methods for fields. This data is written as an embedded stream of bytes and/or
objects. These cannot be associated with fields without the original Java source
so are included in the `annotations` field of the `ObjectData` struct (empty in
the example above).

As this stream often contains important data from the class, a mechanism is
provided to read useful data from it using an interface similar to the
ObjectInputStream that would be used in the Java class itself.

An example of custom serialization in Java is the `ArrayList`. The source for its
`writeObject` methods can be seen
[here](https://github.com/openjdk/jdk17u/blob/master/src/java.base/share/classes/java/util/ArrayList.java)
but the gist is that it writes the number of elements it contains, then writes
each element in turn.

Because the embedded custom stream could contain anything we have to manually
implement the methods to read from it but these can then be used by the
derived implementation of `FromJava`:

In Java
```java
import java.util.List;
import java.util.ArrayList;
import java.io.FileOutputStream;
import java.io.ObjectOutputStream;
public class Demo {
    public static void main(String[] args) throws Exception {
        List<String> keys = new ArrayList<>();
        keys.add("one");
        keys.add("two");
        keys.add("three");
        try (FileOutputStream fos = new FileOutputStream("demo.obj", false);
                ObjectOutputStream oos = new ObjectOutputStream(fos);) {
            oos.writeObject(keys);
        }
    }
}
```
In rust
```rust
use std::fs::File;
use jaded::{Parser, Result, FromJava, FromJava, AnnotationIter, ConversionResult};

#[derive(Debug, FromJava)]
struct ArrayList<T> {
    // Size is written as a 'normal' field
    size: i32,
    // values are written to the custom stream so need attributes
    #[jaded(extract(read_values))]
    values: Vec<T>,
}

// extraction method must be callable as
//     function(&mut AnnotationIter) -> ConversionResult<S> where S: Into<T>
// Where T is the type of the field being assigned to.
fn read_values<T>(annotations: &mut AnnotationIter) -> ConversionResult<Vec<T>>
where
    T: FromJava
{
    (0..annotations.read_i32()?)
        .into_iter()
        .map(|_| annotations.read_object_as())
        .collect()
}


fn main() -> Result<()> {
    let sample = File::open("demo.obj").expect("File missing");
    let mut parser = Parser::new(sample)?;
    let array: ArrayList<String> = parser.read_as()?;
    println!("{:#?}", array);
    Ok(())
}
```
This gives the array list as expected
```
ArrayList {
    size: 3,
    values: [
        "one",
        "two",
        "three",
    ],
}
```

`FromJava` is implemented for `Option<T>` and `Box<T>` so that recursive structs
can be deserialized and null fields in the serialized class can be handled.
Note that if a field is null the conversion will fail unless that field is
given as `Option<T>`. The example above would have failed if there was a null
string in the serialized list. Changing `values` to be `Vec<Option<T>>`
would allow it to still be read.

### Renaming fields

In Java conventions, field names use `camelCase` whereas Rust field names use
`snake_case`. By default, the derive macro looks for a field named the same
as the mapped field in Rust so to prevent Rust structs needing to use camelCase,
fields can be given attributes to use a different field in the Java class.

```rust
#[derive(FromJava)]
struct Demo {
    #[jaded(field = "fooBar")]
    foo_bar: String,
}
```

If all fields are to be renamed, the struct can be given a 'rename' attribute.
This will convert all field names to camelCase before reading them from Java.
Individual fields can still be overridden if required.

```rust
#[derive(FromJava)]
#[jaded(rename)]
struct Demo {
    foo_bar: String,
}
```

### Polymorphism
In Java, a field can be declared as an interface and the concrete implementation
cannot be known until runtime. `Jaded` can go some way towards deserializing
these fields, using the built in derive macro with an enum.

Each variant of the enum can be assigned a concrete implementation and the
fully qualified class name (FQCN) of the object being read will determine which
variant is returned.

For instance, to read a field declared as a list, you might define an enum as
follows

```rust
#[derive(FromJava)]
enum List<T> {
    #[jaded(class = "java.util.ArrayList")] //  without generics
    ArrayList(
        #[extract(read_list)] // See above for read method
        Vec<T>
    ),
    #[jaded(class = "java.util.Collections$EmptyList")]
    Empty,
    #[jaded(class = "java.util.Arrays$ArrayList")]
    // result of using Arrays.asList in Java
    Array {
        a: Vec<T>, // Array is written to field called a
    },
}
```
Combined with the `from` field attribute and a `From<List>` implementation, this
enables fields declared as `List` in Java to be read to `Vec<T>` in rust.

While this helps to support polymorhpism, it still requires all potential
implementations to be known up front. For many use cases this should be
adequate.

## Features
### derive
Allow FromJava to be derived automatically

### serde
Add serde serialize/deserialize support for the intermediate types
(`PrimitiveType`, `ObjectData`, `Value` and `Content`). This does not support
using serde annotations for deserialisation to user types - `FromJava` is still
required for that - only that raw data can be written to other formats without
knowing the data type beforehand.

## Limitations
### Java Polymorphism
In Java, a field can be declared as an interface and the concrete implementation
can be anything. This means that in Rust we can't reliably convert read objects
to structs unless we know that a stream is going to be using a specific
implementation.
While deserializing to an enum would cover most of the common cases, there
is nothing stopping some client code creating a `CustomList` with a
completely different serialized representation and using that in the class that
is being read in Rust.

### Ambiguous serialization
Unfortunately, there are also limits to what we can do without the original code
that created the serial byte stream. The protocol linked above lists four types
of object. One of which, classes that implement `java.lang.Externalizable` and
use PROTOCOL_VERSION_1 (not been the default since v1.2), are not readable by
anything other than the class that wrote them as their data is nothing more
than a stream of bytes.

Of the remaining three types we can only reliably deserialize two.

 * 'Normal' classes that implement `java.lang.Serializable` without having
   a writeObject method

   These can be read as shown above

 * Classes that implement Externalizable and use the newer PROTOCOL_VERSION_2

   These can be read, although their data is held fully by the annotations
   fields of the ObjectData struct and the get_field method only returns None.
 * Serializable classes that implement writeObject

   These objects are more difficult. The spec above suggests that they have their
   fields written as 'normal' classes and then have optional annotations
   written afterwards. In practice this is not the case and the fields are
   only written if the class calls defaultWriteObject as the **first** call
   in their writeObject method. This is mentioned as a requirement in the spec
   so we can assume that this is correct for classes in the standard library
   but it is something to be aware of if user classes are being deserialized.

A consequence of this is that once we have found a class that we can't read,
it is difficult to get back on track as it requires picking out the marker
signifying the start of the next object from the sea of custom data.

## Future plans

 * Add implementations of `FromJava` for common Java and Rust types so that
   for instance `ArrayList` and `HashMap` can be read to the equivalent `Vec`
   and `HashMap` types in Rust.
 * Possible tie in with Serde. I've not yet looked into how the serde data
   model works but this seems like it would be a useful way of accessing
   Java data.
 * Reduce the amount of cloning of data required. As the Java stream contains
   back references to previous objects in the stream, the actual data in objects
   read, can't be passed to the caller. Currently, calling `read` on a parser
   instance builds the next object by cloning any data referenced by the next
   object and returning that. This can lead to the same data being cloned many
   times. It would be better to keep an internal pool of read objects and return
   objects built of references to that pool. This would mean that read could
   not be called while a reference to a returned object was still held but if
   required, the clone could be made on the client side.

## State of development

Very much a work in progress at the moment. I am writing this for another
application I am working on so I imagine there will be many changes in
the functionality and API at least in the short term as the requirements
become apparent. As things settle down I hope things will become more stable.

## Contributions

As this project it is still very much in a pre-alpha state, I imagine things being
quite unstable for a while. That said, if you notice anything obviously broken
or have a feature that you think would be useful that I've missed entirely,
do open issues. I'd avoid opening PRs until it's been discussed in an issue as
the current repo state may lag behind development.