Writing a fast parser
Published on September 11, 2016, last updated September 27, 2017
If performance of your Megaparsec parser is worse that you hoped, there may be ways to improve it. This short guide will instruct you what to attempt, but you should always check if you’re getting better results by profiling and benchmarking your parsers (that’s the only way to understand if you are doing the right thing when tuning performance).
If your parser uses a monad stack instead of plain
Parsecmonad (which is a monad transformer over
Identity), make sure you use at least version 0.5 of
transformerslibrary, and at least version 7.0 of
megaparsec. Both libraries have critical performance improvements in those versions, so you can just get better performance for free.
Parsecmonad will be always faster then
ParsecT-based monadic stacks. Avoid using
WriterT, and other monad transformers unless absolutely necessary. The more you add to the stack, the slower it will be.
Do not keep your parsers polymorphic
unless you really have a reason to do so, just don’t. It’s best to “fix” the types of parsers specifying concrete types, such as
type Parser = Parsec Void Textfor every top-level definition. This way GHC will be able to optimize a lot better.
Keep all your parsing code in single file. If that’s not possible, use
INLINEABLEpragmas make GHC dump functions definitions to interface files and this allows for cross-module specialization (I’ve written a tutorial about this, available here).
Use the fast primitives such as
takePwhenever you can. These are fast for
A parser can be written in many ways. Think about your grammar and how parsing happens, when you get some experience with this process, it will be much easier for you to see how to make your parser faster. Sometimes however, making a parser faster will also make your code less readable. If performance of your parser is not a bottleneck in the system you are building, consider preferring readability over performance.