Create an Account
username: password:
 
  MemeStreams Logo

MemeStreams Discussion

search


This page contains all of the posts and discussion on MemeStreams referencing the following web page: Regular Expression Matching Can Be Simple And Fast. You can find discussions on MemeStreams as you surf the web, even if you aren't a MemeStreams member, using the Threads Bookmarklet.

Regular Expression Matching Can Be Simple And Fast
by Acidus at 5:50 pm EDT, Apr 27, 2009

Introduction

This is a tale of two approaches to regular expression matching. One of them is in widespread use in the standard interpreters for many languages, including Perl. The other is used only in a few places, notably most implementations of awk and grep. The two approaches have wildly different performance characteristics.

Notice that Perl requires over sixty seconds to match a 29-character string. The other approach, labeled Thompson NFA for reasons that will be explained later, requires twenty microseconds to match the string. That's not a typo.

It may be hard to believe the graphs: perhaps you've used Perl, and it never seemed like regular expression matching was particularly slow. Most of the time, in fact, regular expression matching in Perl is fast enough. As the graph shows, though, it is possible to write so-called “pathological” regular expressions that Perl matches very very slowly. In contrast, there are no regular expressions that are pathological for the Thompson NFA implementation. Seeing the two graphs side by side prompts the question, “why doesn't Perl use the Thompson NFA approach?” It can, it should, and that's what the rest of this article is about.

Today, regular expressions have also become a shining example of how ignoring good theory leads to bad programs. The regular expression implementations used by today's popular tools are significantly slower than the ones used in many of those thirty-year-old Unix tools.

Fascinating read.


 
RE: Regular Expression Matching Can Be Simple And Fast
by Solomon at 11:31 am EDT, Apr 28, 2009

good


 
 
Powered By Industrial Memetics