[Solved] Can Java delimit input itself, without explicit delimiters?


Assuming you’re using scanner, yes, it could. The scanner operates on the notion that a regexp serves as delimiter: Each match of the regex delimits, and whatever the regexp matches is tossed out (because nobody ‘cares’ about reading the spaces or the commas or whatever). The scanner then gives you stuff in between the delimiters.

Thus, for you to end up with scanner stream ‘5’, ‘+’, and ‘3’, you want a delimiter that delimits on the space between ‘5″https://stackoverflow.com/”+’ and ‘+”https://stackoverflow.com/”3’, whilst matching 0 characters otherwise those would be thrown out.

You can do that, using regexp lookahead/lookbehind. You want a digit to the left and an operator to the right, or vice versa:

String test = "53 + 2*35- 8";
Scanner s = new Scanner(test);
s.useDelimiter("\\s+|(?:(?<=\\d)(?=[-+/*]))|(?:(?=\\d)(?<=[-+/*]))");
while (s.hasNext()) {
  System.out.println("NEXT: '" + s.next() + "'");
}

To break that convoluted regex open:

  • A|B|C means: A or B or C. That’s the ‘outermost’ part of this regexp, we’re looking for one of 3 distinct things to split on.
  • \\s+ means: 1 or more whitespace characters. Thus, input "5 20" would be split into 5 and 20. The whitespace is consumed (i.e. tossed out and not part of your tokens).
  • OR, positive lookbehind ((?<=X) means: Match if, looking backwards, you would see X), and X is \\d here – a digit. We then also have a positive lookahead: (?=X) means: Check for X being here, but don’t consume it (or it would be thrown out, remember, the regex describes the delimiter, and the delimiter is thrown out). We look ahead for one of the symbols.
  • OR, that, but flipped about (first an operator, then a digit).

NB: If you want to avoid the complexity of a regexp, you could just loop through each character, but you’d be building a little state machine, and have to take care of consecutive, non-space separated digits: You need to combine those (10 + 20 is not 1, 0, +, 2, 0 – it’s 10 + 20).

NB2: If you also want to support ( and ) you can edit the regex appropriately (They are, essentially, ‘operators’ and go in the list of operators), however, at some point you’re essentially descriving a grammar for a formal language and should start looking into a parser generator. But that’s all vastly more complicated than any of this.

2

solved Can Java delimit input itself, without explicit delimiters?