« back


28 Nov 2012

Today, let's do an overview of parsing.

Well, I am supposed to do some parsing. I think a good idea before I start is to figure out what that even means. Sounds fair. Let's go!

So basically, a parser is going to take in some data and then check the syntax to make sure if it is correct and then build some kind of structure out of that data. Let's do an overview of that process...

The "parser" takes in some kind of source string and acts on a set of rules to split the string up into meaningful bits. Meaningful can be relative to what will be meaningful to you to fit your needs if you are writing the parser. Although parsers are mostly used with compilers or interpreters, you can create them yourself for several different needs in software and ones that can communicate to hardware. The comparing of the string against the pre-defined rules is called the Lexical Analysis phase. The program or function that performs the analysis can sometimes be referred to as a scanner, lexical analyzer, or lexer. The source string often times is split up by a series of regular expressions that form the "rules".

The process of the bits actually being broken up is called tokenization and results in those meaningful bits that we were talking about. They are known as tokens. It does not stop here.

The next and last phase of parsing is syntactic analysis. This checks that the tokens form some type of expression that is allowed. If it is a valid expression, then an appropriate action is taken based on the expression. These actions are going to be different based on the type of application you are dealing with. If you write it yourself, you can make it be anything your heart desires!!!!

Now, enjoy this image from the Wikipedia article on parsing.

parser illustration
comments powered by Disqus