wrick's blog

By wrick, history, 8 years ago, In English

In my previous post, I discussed a fast Scanner for Scala.

To benchmark it, I wrote the fastest possible Scanner I could in Java:

package better.files;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.util.Arrays;

/**
 * Hand built using a char buffer
 */
public class ArrayBufferScanner extends AbstractScanner {
  private char[] buffer = new char[1 << 4];
  private int pos = 0;

  private BufferedReader reader;

  public ArrayBufferScanner(BufferedReader reader) {
    super(reader);
    this.reader = reader;
  }

  @Override
  public boolean hasNext() {
    return pos != -1;
  }

  private void loadBuffer() {
    pos = 0;
    while (true) {
      int i;
      try {
        i = reader.read();
      } catch (IOException e) {
        throw new UncheckedIOException(e);
      }
      if (i == -1) {
        pos = -1;
        break;
      }
      char c = (char) i;
      if (c != ' ' && c != '\n' && c != '\t' && c != '\r' && c != '\f') {
        if (pos == buffer.length) {
          buffer = Arrays.copyOf(buffer, 2 * pos);
        }
        buffer[pos++] = c;
      } else if (pos != 0) {
        break;
      }
    }
  }

  @Override
  public String next() {
    loadBuffer();
    return String.copyValueOf(buffer, 0, pos);
  }

  @Override
  public String nextLine() {
    try {
      return reader.readLine();
    } catch (IOException e) {
      throw new UncheckedIOException(e);
    }
  }

   @Override
  public int nextInt() {
    loadBuffer();
    final int radix = 10;
    int result = 0;
    boolean didRun = false;
    for (int i = buffer[0] == '-' || buffer[0] == '+' ? 1 : 0; i < pos; i++, didRun = true) {
      int digit = buffer[i] - '0';
      checkValidNumber(0 <= digit && digit <= 9);
      result = result * radix + digit;
    }
    checkValidNumber(didRun);
    return buffer[0] == '-' ? -result : result;
  }
  
  private void checkValidNumber(boolean condition) {
    if(!condition) throw new NumberFormatException(String.copyValueOf(buffer, 0, pos));
  }
}

Is this the best I can do?

It is barely faster than Java's StreamTokenizer (NOT StringTokenizer) inspite of being much simpler than it: http://docs.oracle.com/javase/8/docs/api/java/io/StreamTokenizer.html

Java source: https://github.com/pathikrit/better-files/blob/master/benchmarks/src/main/java/better/files/ArrayBufferScanner.java

Other Scanners: https://github.com/pathikrit/better-files/blob/master/benchmarks/src/main/scala/better/files/Scanners.scala

Benchmark results: https://github.com/pathikrit/better-files/tree/master/benchmarks

  • Vote: I like it
  • +6
  • Vote: I do not like it

| Write comment?
»
8 years ago, # |
  Vote: I like it 0 Vote: I do not like it

You can do much better. Just take a look at what happens when you call nextInt() the first time. First the BufferedReader copies input to a buffer, then you copy that data to a new buffer, then you allocate a String object (not ideal if you only want a number) which essentially is just yet another buffer, then you actually parse the input.

If you want to write your own efficient I/O class in Java, then byte[] and System.in.read will be your friends. If you think that's to much of a hassle, well then at least BufferedInputStream is preferable to BufferedReader (at least when it comes to reading integers).

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Ah, good point — I have not thought about optimizing the nextInt yet... Yes, this is purely an intellectual exercise at this point since it should be fast enough for CodeForces. Btw, what do you think of http://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html ? I will include it in my benchmarks next.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    I just posted a version (see updated post) with updated nextInt() — much faster now. Anything else?

    • »
      »
      »
      8 years ago, # ^ |
      Rev. 2   Vote: I like it +5 Vote: I do not like it

      Well there's really no point in using a buffer when you use BufferedReader, since it has its own buffer, so you unnecessarily end up doing the same work twice.

      As a reference I include below (more or less) the two classes I use for reading integers (most problems have integers as input) when I wanna make the highscore list on various online judges.

      So feel free to use them as inspiration when you design a more solid class with exception handling etc.

»
8 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Not related to the post,but worth sharing.

But I'd be careful when using Custom classes to read when the input is large ( especially on SPOJ and UVA). I remember one or two times where encapsulating a BufferedReader and StringTokenizer gave me TLE, yet them alone in the main method gave AC.

Unless you are sure that your scanner is really significant for speed, don't use it when you need every second for your program.