[Solved] How to retrieve tokens from a string that are keywords?


Lexing is not specific to C (in the sense that you’ll use similar techniques in other programming languages). You could do that with hand-written code (using finite automaton coding techniques). You could use a lexer generator like flex. You might even use regexprs, e.g. regex.h functions on POSIX systems.

Parsing is also a well known domain with standard techniques (at least for context free languages, if you want some efficiency). You could use recursive descent parsing, you could generate a parser using bison (which has examples very close to your homework) or ANTLR. Read more about LL parsing & LR parsing. BTW, parsing techniques can be used for lexing.

BTW, there are tons of free software (e.g. interpreters of scripting languages like Guile, Lua, Python, etc….), JSON, YAML, XML… parsers, several compilers (e.g. tinycc) etc… illustrating these techniques. You’ll learn a lot by studying their source code.

It could be easier for your to sometimes have a lookahead of one or two characters, e.g. by first reading the entire line (with getline(3) or else fgets(3), and perhaps even readline, which gives you a line editor). If you cannot read a whole line consider using fgetc(3) and ungetc when needed. The classifying utilities from <ctype.h> like isalpha might be helpful.

If you care about UTF-8 (and in principle you should) things become slightly more complex since some Unicode characters (like €, é, ?, …) are represented in UTF-8 by several bytes. A library like libunistring should be very helpful.

2

solved How to retrieve tokens from a string that are keywords?