Published on November 7, 2019
Another year has passed and it is time again for a new major version of Megaparsec. What is different this time though, is that this is the least disruptive major release ever. In fact, I think most users will not need to do anything at all to upgrade.
There are reasons for this:
There are not so many issues opened and bugs reported. I think it has to do with the fact that Megaparsec “just works” these days and mostly in a satisfying manner.
The library is widely used now. At the time of this writing there are 146 packages on Hackage that depend on Megaparsec directly. New exciting libraries that choose to build on top of Megaparsec appear. Projects such as Idris and Dhall use Megaparsec to solve their parsing issues.
All this tells us that the library got older and more mature, so perhaps let’s not break something that is working. That said, there is always room for improvement.
Quality assurance with Nix
Before I started working on version 8, I decided to go for better quality assurance using Nix. Understanding how many projects now depend on Megaparsec and remembering my successful experiment with using Nix to find bugs in Ormolu, I decided to try to use Nix to check for breakage, performance changes, and bugs by using dependent packages.
I documented the result in the
HACKING.md file which is now in
Apart from the development shell, the Nix expression provides the following groups of targets:
base, which is a set of closely related packages such as
hspec-megaparsectogether with their tests. By running
nix-build -A base --no-out-linkdeveloper can make sure that all of this still builds and still passes the tests.
deps, which is a set of selected dependencies that I wanted to check for build breakage and test suite failures.
benches, which is a collection of benchmarks. This includes Megaparsec’s micro-benchmarks as well as a few packages that show how the library performs on realistic tasks.
Each of these sets can be accessed to “zoom in” on a particular package or
benchmark. For example, I can run
nix-build -A benches.parsers-bench to
check benchmarks in
parsers-bench. In short, I found that most packages
will still work with the new changes and those that won’t are easy to fix.
In fact, I had to patch the failing packages to continue using the system,
so the patches for upgrading are available.
I found no logic or performance regressions.
Control over parse error locations
Let’s talk about the new features. The primitives
fancyFailure have been replaced by
parseError :: MonadParsec e s m => ParseError s e -> m a -- now 'failure' and 'fancyFailure' are ordinary functions: failure :: MonadParsec e s m => Maybe (ErrorItem (Token s)) -- ^ Unexpected item (if any) -> Set (ErrorItem (Token s)) -- ^ Expected items -> m a failure us ps = do o <- getOffset parseError (TrivialError o us ps) fancyFailure :: MonadParsec e s m => Set (ErrorFancy e) -- ^ Fancy error components -> m a fancyFailure xs = do o <- getOffset parseError (FancyError o xs)
This is not about reducing the number of primitives though (which is also
nice). The main feature that
parseError enables is that we can report
parse errors with arbitrary offsets, not necessarily with current offset
from the parser’s state. This is important when you want to make a parse
error to point to a particular part of input even if you already moved past
that point. In the past this had to be accomplished by first obtaining the
correct offset via
getOffset and then by setting the offset with
setOffset just before reporting the parse error. Not only this is ugly, it
is also error-prone: you could forget to restore the correct offset. Here is
a real example from
o' <- getOffset setOffset o (void . hidden . string) "" -- ↑ if this fails, we want this to be reported at the offset 'o' setOffset (o' + 2)
I will not explain the full context here, but it suffices to say that this
code had a bug for some time because I forgot to account for offset
increment after parsing
+ 2 part). Now the same thing can be
region (setErrorOffset o) $ (void . hidden . string) "" -- N.B. region :: MonadParsec e s m => (ParseError s e -> ParseError s e) -- ^ How to process 'ParseError's -> m a -- ^ The “region” that the processing applies to -> m a
region used to do the same
setOffset hacks and as a side
effect it could change current offset if the function that updates parse
errors changed it. Now
region can use
parseError and do away with its
region f m = do r <- observing m case r of Left err -> parseError (f err) Right x -> return x
Better story for multi-error parsers
From the very beginning of the project we were moving slowly in the
direction of supporting multi-error parsing. In version 7 we even started to
ParseErrorBundle instead of old familiar
was in place for multi-error revolution, except that there was no official
way to report more than one parse error!
One prerequisite for having a multi-error parser is that it should be
possible to skip over a problematic part of input and resume parsing from a
position that is known to be good. This part is accomplished by using the
withRecovery primitive (available beginning from Megaparsec 4.4.0):
-- | @'withRecovery' r p@ allows continue parsing even if parser @p@ -- fails. In this case @r@ is called with the actual 'ParseError' as its -- argument. Typical usage is to return a value signifying failure to -- parse this particular object and to consume some part of the input up -- to the point where the next object starts. -- -- Note that if @r@ fails, original error message is reported as if -- without 'withRecovery'. In no way recovering parser @r@ can influence -- error messages. withRecovery :: (ParseError s e -> m a) -- ^ How to recover from failure -> m a -- ^ Original parser -> m a -- ^ Parser that can recover from failures
Before Megaparsec 8 users had to pick the type
a to be a sum type
including the possibilities for success and failure. For example, it could
Either (ParseError s e) Result. The parse errors had to be collected
and later manually added to the
ParseErrorBundle before displaying.
Needless to say, all of this was an example of advanced usage that was not
Megaparsec 8 adds support for delayed parse errors:
-- | Register a 'ParseError' for later reporting. This action does not end -- parsing and has no effect except for adding the given 'ParseError' to the -- collection of “delayed” 'ParseError's which will be taken into -- consideration at the end of parsing. Only if this collection is empty -- parser will succeed. This is the main way to report several parse errors -- at once. registerParseError :: MonadParsec e s m => ParseError s e -> m () -- | Like 'failure', but for delayed 'ParseError's. registerFailure :: MonadParsec e s m => Maybe (ErrorItem (Token s)) -- ^ Unexpected item (if any) -> Set (ErrorItem (Token s)) -- ^ Expected items -> m () -- | Like 'fancyFailure', but for delayed 'ParseError's. registerFancyFailure :: MonadParsec e s m => Set (ErrorFancy e) -- ^ Fancy error components -> m ()
These errors can be registered in the error-processing callback of
withRecovery making the resulting type
Maybe Result. This takes care of
including the delayed errors in the final
ParseErrorBundle as well as
making the parser fail in the end if the collection of delayed errors in not
With all this, I hope that the practice of writing multi-error parsers will become more common among the users.
- As always, for the full list of changes see the changelog.
- I updated all texts including the official tutorial to be compatible with version 8. I even extended it to include sections teaching how to use the new features.
Satellite packages such as
hspec-megaparsechave been updated and now work with version 8.