[relaxng-user] Latest proposal for smart regexes in RELAX NG

David Tolpin dvd at davidashen.net
Fri May 7 09:17:22 ICT 2004


> Your example covered only a subset of the RFC822 address production. 

By the way, my example covers the full syntax of addr-spec in RFC2822.
Jeffrey Friedl's has bugs in it. But only because his purpose was
manifestly to impress an innocent reader by presenting a specimen
of programming style for an obfuscated programming contest.

Let me quote my example again. 

Using XML Schema Datatype Library:

start=element addr-spec {
  xsd:token {
    pattern=
      "(\(([^\(\)\\]|\\.)*\) )?"
    ~ """([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|"([^"\\]|\\.)*")"""
    ~ "@" 
    ~ "([a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+(\.[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+)*|\[([^\[\]\\]|\\.)*\])"
    ~ "( \(([^\(\)\\]|\\.)*\))?"
    
  }
}

With a small extension (implemented in RNV using embedded Scheme interpreter):

datatypes dsl = "http://davidashen.net/relaxng/scheme-datatypes"

start=element addr-spec {
  dsl:token {
    s-pattern="""
      comment = "\(([^\(\)\\]|\\.)*\)"
      atom = "[a-zA-Z0-9!#$%&'*+\-/=?\^_`{|}~]+"
      atoms = atom "(\." atom ")*"
      person = "\"([^\"\\]|\\.)*\""
      location = "\[([^\[\]\\]|\\.)*\]"
      local-part = "(" atom "|" person ")"
      domain = "(" atoms "|" location ")"
      start = "(" comment " )?" local-part "@" domain "( " comment ")?"
    """
  }
}

The only difference is that the same string-ish regexp is broken
into parts.

Rewrite it in XML syntax. Let us see which syntax is more readable.

David


More information about the relaxng-user mailing list