[Solved] parsing large JSON with java/GSON, can’t read the JSON structure


JSON by default declares one top value only (and yes, this would be a valid JSON document), but there is JSON streaming that uses arbitrary techniques to concatenate multiple JSON elements into a single stream assuming that the stream consumer can parse it (read more). Gson supports a so-called lenient mode that turns off the “one top value only” mode (and does some more things irrelevant to the question) for JsonReader: setLenient. Having the lenient mode on, you can read JSON elements one by one, and it turns out that this mode can be used to parse/read line-delimited JSON and concatenated JSON values since they are simply delimited by zero or more whitespaces that are ignored by Gson (therefore more exotic record separator-delimited JSON and length-prefixed JSON are unsupported). The reason of why it does not work for you is that your initial code assumes that the stream contains a single JSON array (and it does not obviously: it is supposed to be a stream of elements that does not conform the JSON array syntax).

A simple generic JSON stream support might look like this (using Stream API for its more rich API than Iterator has, but it is fine to show an idea, and you can easily adapt it to iterators, callbacks, observable streams, whatever you like):

@UtilityClass
public final class JsonStreamSupport {

    public static <T> Stream<T> parse(@WillNotClose final JsonReader jsonReader, final Function<? super JsonReader, ? extends T> readElement) {
        final boolean isLenient = jsonReader.isLenient();
        jsonReader.setLenient(true);
        final Spliterator<T> spliterator = new Spliterators.AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED) {
            @Override
            public boolean tryAdvance(final Consumer<? super T> action) {
                try {
                    final JsonToken token = jsonReader.peek();
                    if ( token == JsonToken.END_DOCUMENT ) {
                        return false;
                    }
                    // TODO: read more elements in batch
                    final T element = readElement.apply(jsonReader);
                    action.accept(element);
                    return true;
                } catch ( final IOException ex ) {
                    throw new RuntimeException(ex);
                }
            }
        };
        return StreamSupport.stream(spliterator, false)
                .onClose(() -> jsonReader.setLenient(isLenient));
    }

}

And then:

JsonStreamSupport.<Artist>parse(jsonReader, jr -> gson.fromJson(jr, Artist.class))
        .forEach(System.out::println);

Output (assuming Artist has Lombok-generated toString()):

Artist(id=d0ab06e1-751a-414b-a976-da72670391b1, name=Arcing Wires, sortName=Arcing Wires)
Artist(id=6f0c2c16-dd7e-4268-a484-bc7b2ac78108, name=Another, sortName=Another)
Artist(id=e062b6cd-5506-47b0-afdb-72f4279ec38c, name=Agent S, sortName=Agent S)

How many bytes does such an approach, JSON streaming, save so that it is used at the service you’re trying to consume? I don’t know.

solved parsing large JSON with java/GSON, can’t read the JSON structure