[Solved] Shunting-yard algorithm with function support [closed]

Question

There are several problems with your translation of the code from that answer, including one fairly obvious error in the answer itself, which is that the second if (if token is '[') should be else if. But note that the condition in the first if is if token is operand, but your translation is, roughly, if token is operator. Those two issues are certainly enough to prevent your code from working as desired.

Beyond that, it’s important to read the notes in the cited answer, which point out that the pseudocode is based on the input having already been tokenised; the code is not intended to be applied directly to the input. In particular, numbers and variable names should be single elements in the token stream. Also, see note 3 which explains where [ and ] come from; they are not in the input stream, so they must be created by the tokeniser.

It’s generally helpful to try to understand how an algorithm works when you’re trying to write an implementation, particularly when what you are porting is pseudocode. (The most important thing about pseudocode is that it is not code, so it will never have been tested. It’s quite common for even very experienced coders to not notice little errors which slipped into pseudocode. If there were a way of running the pseudocode, such errors would be detected immediately, but since there isn’t, the errors will appear when a concrete implementation is tested. In short, caveat receptor.)

There’s a reasonable explanation of the shunting-yard algorithm on Wikipedia. In particular, the pseudocode on that Wikipedia page handles function calls provided the names of the functions are known.

For yet another discussion of shunting yard, including a note on how to recognise function calls and unary operators (prefix and postfix) while tokenising the input, see this answer on SO, which also includes pseudocode.

An interesting fact about the three Shunting Yard presentations is that they each have their own style of translating function calls to postfix. The concrete problem you need to solve is that a postfix function call is ambiguous unless you somehow know how many arguments are used. The parentheses in the infix call make the argument count explicit, but a parenthesis-free representation like postfix needs a different mechanism.

Using the comma as a binary operator which builds a parameter list, as in the code you are trying to port, is an interesting strategy. However, the pseudocode as presented cannot handle functions with no parameters. (Also, the fact that the [ token must also contain the function name is not very well explained.)

The Wikipedia code, on the other hand, relies on recognising the identifiers used as functions. And it requires each such function to have a well-known number of parameters. That’s ok in a calculator, where functions are usually restricted to things like sin and digamma, but it’s problematic in a general purpose programming language.

My solution inserts a CALL operator into the postfix output; as a note indicates, a practical implementation should also insert the number of arguments supplied as an additional value just before CALL. The implementation details are not well-explained, though. (Sorry!)

Accepted Answer