Class RegExpSupport
Constants, some built from others by static methods, to expedite common tasks that use regular expressions.
Inheritance
Inherited Members
Namespace: WizardWrx
Assembly: WizardWrx.Common.dll
Syntax
public static class RegExpSupport
Remarks
Reference: RegExLib.com Regular Expression Cheat Sheet (.NET), at the cross reference cited below.
Fields
| Improve this Doc View SourceCARRIAGE_RETURN
Represents a Carriage Return (CR in Windows text) in a Regular Expression
Declaration
public const string CARRIAGE_RETURN = "\\r"
Field Value
Type | Description |
---|---|
System.String |
ESCAPED_QUOTE
Escaped quote, used to embed quotation marks in regular expressions.
Declaration
public const string ESCAPED_QUOTE = "\\\""
Field Value
Type | Description |
---|---|
System.String |
FRIEDL_GRAY_WHOLE_HTML_TAG_MATCH
Use this to get the whole XML body in one long string. Repeated uses should allow you to perform stepwise refinements, until you get to the innermost tag.
Declaration
public const string FRIEDL_GRAY_WHOLE_HTML_TAG_MATCH = "<(/?\\w+)((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?>.+?</\\1>"
Field Value
Type | Description |
---|---|
System.String |
FRIEDL_HTML_TAG_MATCH
Jeffrey Friedl's regular expression for matching any arbitrary HTML tag.
Jeffrey Friedl is the author of Mastering Regular Expressions, published by O'Reily, which is regarded as the "Bible" of Regular Expressions.
Declaration
public const string FRIEDL_HTML_TAG_MATCH = "</?\\w+((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?>"
Field Value
Type | Description |
---|---|
System.String |
MATCH_ALTERNATION
Like the binary Logical OR operator in a logical expression, this character says "match either the character on its left OR the character on its right.
Regular expressions may contain many alternations, forming a group that behaves commutatively.
Declaration
public const char MATCH_ALTERNATION = '|'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_ESCAPE
Preceding another meta-character, one of these tells the Engine to treat the meta character as a literal.
Preceding certain other characters, one of these signals a special, non-printing character. For example, preceding a lower case a, it signifies an Alarm (Bell). More commonly, however, before a lower case t, this character signifies a Tab, before a lower case n means a Newline, and a lower case r denotes a Carriage Return.
N. B. A Newline in the .NET RegExp Engine and in the Perl RegExp Engine are two different things.
Declaration
public const char MATCH_ESCAPE = '\\'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_GROUP_BEGIN
Define the start of a group. This is the same as a subexpression in Perl.
Declaration
public const char MATCH_GROUP_BEGIN = '('
Field Value
Type | Description |
---|---|
System.Char |
MATCH_GROUP_END
Define the end of a group. This is the same as a subexpression in Perl.
Declaration
public const char MATCH_GROUP_END = ')'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_MULTIPLE_PREVIOUS_CHAR
Match zero or more of the previous character or expression.
Declaration
public const char MATCH_MULTIPLE_PREVIOUS_CHAR = '*'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_ONE_OR_MORE_PREVIOUS_CHAR
Match one or more of of the previous character or expression.
Declaration
public const char MATCH_ONE_OR_MORE_PREVIOUS_CHAR = '+'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_SHORTEST
Append to a greedy match to make it match the fewest possible characters.
Declaration
public const char MATCH_SHORTEST = '?'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_STRING_END
Match end of line, absent the String modifier, which changes its meaning to match end of the entire String.
Declaration
public const char MATCH_STRING_END = '$'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_STRING_START
Match start of line, absent the String modifier, which changes its meaning to match start of the entire String.
Declaration
public const char MATCH_STRING_START = '^'
Field Value
Type | Description |
---|---|
System.Char |
MATCH_WILDCARD_CHAR
Match one of any character, except a Newline (absent the String modifier, which adds the Newline to the list of matched characters.
Use MATCH_MULTIPLE_PREVIOUS_CHAR to extend the match to a string of the same character.
Use MATCH_SHORTEST, following this character, followed by MATCH_MULTIPLE_PREVIOUS_CHAR, to limit the match.
Declaration
public const char MATCH_WILDCARD_CHAR = '.'
Field Value
Type | Description |
---|---|
System.Char |
MODIFIED_FRIEDL_HTML_TAG_MATCH
This is a derivation of Jeffrey Friedl's regular expression, adapted to capture the tag name in the first submatch.
Declaration
public const string MODIFIED_FRIEDL_HTML_TAG_MATCH = "<(/?\\w+)((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?>"
Field Value
Type | Description |
---|---|
System.String |
NEWLINE
Represents a Newline (CR/LF in Windows text) in a Regular Expression
Declaration
public const string NEWLINE = "\\r\\n"
Field Value
Type | Description |
---|---|
System.String |
Remarks
See "How to avoid VBScript regular expression gotchas," at http://www.xaprb.com/blog/2005/11/04/vbscript-regular-expression-gotchas/, especially the responses.
PAGE_TAG_PREFIX
Match the beginning of the Page tag in a ASP.NET page.
Declaration
public const string PAGE_TAG_PREFIX = "<%@ Page"
Field Value
Type | Description |
---|---|
System.String |
PAGE_TAG_SUFFIX
Match the end of the Page tag in a ASP.NET page.
Declaration
public const string PAGE_TAG_SUFFIX = "%>"
Field Value
Type | Description |
---|---|
System.String |
REGEXP_FIRST_MATCH
Not surprisingly, the .NET regular expression returns a collection of matches. Like all collections, individual members are numbered from zero.
Declaration
public const int REGEXP_FIRST_MATCH = 0
Field Value
Type | Description |
---|---|
System.Int32 |
REGEXP_FIRST_SUBMATCH
In the .NET version of the regular expression matching engine, the subexpressions are numbered from 1, just as they are in Perl.
Declaration
public const int REGEXP_FIRST_SUBMATCH = 1
Field Value
Type | Description |
---|---|
System.Int32 |
REGEXP_WHOLE_MATCH
In the .NET version of the regular expression matching engine, the first group, whose index is zero, matches the whole expression.
Declaration
public const int REGEXP_WHOLE_MATCH = 0
Field Value
Type | Description |
---|---|
System.Int32 |
SGML_CLOSING_TAG_ANY
Match any closing HTML or XML tag.
Except in special cases, you should employ the IgnoreCase modifier.
Declaration
public const string SGML_CLOSING_TAG_ANY = "</(.*?)>"
Field Value
Type | Description |
---|---|
System.String |
SGML_CLOSING_TAG_ARBITRARY
Match an arbitrary closing HTML or XML tag.
Except in special cases, you should employ the IgnoreCase modifier.
You must interpolate the tag name into this string by calling the the static string.Format method, passing this string as the format and the tag as the sole substitution value.
You may also pass a tag name to method MatchArbitraryHtmlClosingTag, which returns a pattern. For example, to find all Anchor tags, pass "A" to MatchArbitraryHtmlClosingTag.
Declaration
public const string SGML_CLOSING_TAG_ARBITRARY = "</({0})>"
Field Value
Type | Description |
---|---|
System.String |
SGML_COMPLETE_BODY
Match the whole body of any HTML document. Except in special cases, you must employ the String and IgnoreCase modifiers to get this expression to work.
Declaration
public const string SGML_COMPLETE_BODY = "<body .*?>(.*?)</body>"
Field Value
Type | Description |
---|---|
System.String |
SGML_COMPLETE_HEAD
Match the entire Head section of any HTML document. Except in special cases, you must employ the String and IgnoreCase modifiers to get this expression to work.
Declaration
public const string SGML_COMPLETE_HEAD = "<head.*?>(.*?)</head>"
Field Value
Type | Description |
---|---|
System.String |
SGML_COMPLETE_HTML_DOC
Match the entirety of any HTML document. Use this expression to discard preceding HTTP headers. Except in special cases, you must employ the String and IgnoreCase modifiers to get this expression to work.
Declaration
public const string SGML_COMPLETE_HTML_DOC = "<html>(.*?)</html>"
Field Value
Type | Description |
---|---|
System.String |
SGML_COMPLETE_TAG_ARBITRARY
Match an arbitrary HTML or XML tag that appears on a single line (or multiple lines, if the String modifier is employed).
Except in special cases, you should employ the IgnoreCase modifier.
You must interpolate the tag name into this string by calling the the static string.Format method, passing this string as the format and the tag as the sole substitution value.
You may also pass a tag name to static method MatchArbitraryHtmlTag, which returns a pattern. For example, to find all Anchor tags, pass "A" to MatchArbitraryHtmlTag.
Declaration
public const string SGML_COMPLETE_TAG_ARBITRARY = "<({0})(.*?)>(.*?)</{0}>"
Field Value
Type | Description |
---|---|
System.String |
SGML_OPENING_TAG_ANY
Match any opening HTML or XML tag.
Except in special cases, you should employ the IgnoreCase modifier.
Declaration
public const string SGML_OPENING_TAG_ANY = "<(.*?)(.*?)>"
Field Value
Type | Description |
---|---|
System.String |
SGML_OPENING_TAG_ARBITRARY
Match an arbitrary opening HTML or XML tag.
Except in special cases, you should employ the IgnoreCase modifier.
You must interpolate the tag name into this string by calling the the static string.Format method, passing this string as the format and the tag as the sole substitution value.
You may also pass a tag name to method MatchArbitraryHtmlOpeningTag, which returns a pattern. For example, to find all Anchor tags, pass "A" to MatchArbitraryHtmlOpeningTag.
Declaration
public const string SGML_OPENING_TAG_ARBITRARY = "<({0})(.*?)>"
Field Value
Type | Description |
---|---|
System.String |
TITLE_ATTRIBUTE_LABEL
Title attribute of the ASP.NET Page tag looks like this.
Declaration
public const string TITLE_ATTRIBUTE_LABEL = "Title="
Field Value
Type | Description |
---|---|
System.String |
Methods
| Improve this Doc View SourceExtractTextBetweenMatches(MatchCollection, Int32, String)
Given the System.Text.RegularExpression.Match at index pintMatchIndex
in System.Text.RegularExpression.MatchCollection prxpMatchCollection
,
return the substring that follows the matching text in string pstrInputString
up to the beginning of the next match, or the rest of the string inthe case of the last match.
Declaration
public static string ExtractTextBetweenMatches(MatchCollection prxpMatchCollection, int pintMatchIndex, string pstrInputString)
Parameters
Type | Name | Description |
---|---|---|
System.Text.RegularExpressions.MatchCollection | prxpMatchCollection | Pass in a reference to the System.Text.RegularExpression.MatchCollection attached to a System.Text.RegularExpression.Regex that has one or more matches. |
System.Int32 | pintMatchIndex | Pass in an integer that represents the index of the Match in |
System.String | pstrInputString | Pass in a reference to the string that was fed into the |
Returns
Type | Description |
---|---|
System.String | There are two possible return values: |
Remarks
This method was perfected independently in my RegExpLab project.
MatchAnyCharacterGreedy()
Return a string that matches the maximum number of any character.
Declaration
public static string MatchAnyCharacterGreedy()
Returns
Type | Description |
---|---|
System.String |
MatchAnyCharacterLeastGreedy()
Return a string that matches the minimum number of any character.
Declaration
public static string MatchAnyCharacterLeastGreedy()
Returns
Type | Description |
---|---|
System.String |
MatchArbitraryHtmlClosingTag(String)
Interpolate a tag name into the SGML_CLOSING_TAG_ARBITRARY match expression template.
Declaration
public static string MatchArbitraryHtmlClosingTag(string pstrTagName)
Parameters
Type | Name | Description |
---|---|---|
System.String | pstrTagName | String containing the name of the tag to match. |
Returns
Type | Description |
---|---|
System.String | A Regular Expression match expression that will match the closing tag named in argument pstrTagName. |
MatchArbitraryHtmlOpeningTag(String)
Interpolate a tag name into the SGML_OPENING_TAG_ARBITRARY match expression template.
Declaration
public static string MatchArbitraryHtmlOpeningTag(string pstrTagName)
Parameters
Type | Name | Description |
---|---|---|
System.String | pstrTagName | String containing the name of the tag to match. |
Returns
Type | Description |
---|---|
System.String | A Regular Expression match expression that will match the opening tag named in argument pstrTagName. |
MatchArbitraryHtmlTag(String)
Interpolate a tag name into the SGML_COMPLETE_TAG_ARBITRARY match expression template.
Declaration
public static string MatchArbitraryHtmlTag(string pstrTagName)
Parameters
Type | Name | Description |
---|---|---|
System.String | pstrTagName | String containing the name of the tag to match. |
Returns
Type | Description |
---|---|
System.String | A Regular Expression match expression that will match the tag named in argument pstrTagName. |
MatchAspNetPageTag()
Return a string that matches the Page tag in a ASP.NET document.
Declaration
public static string MatchAspNetPageTag()
Returns
Type | Description |
---|---|
System.String |
MatchFileName(String, String)
Match file names against a true regular expression, as opposed to the anemic masks supported by DOS and Windows. Though occasionally referred to as regular expressions, file specifications that use DOS wild cards are a far cry from true regular expressions.
Declaration
public static bool MatchFileName(string pstrPathString, string pstrRegExpToMatch)
Parameters
Type | Name | Description |
---|---|---|
System.String | pstrPathString | Specify the path string to match against PCRE pstrRegExpToMatch. |
System.String | pstrRegExpToMatch | Specify the Perl Compatible Regular Expression against which to evaluate pstrFileName. |
Returns
Type | Description |
---|---|
System.Boolean | The function returns TRUE if neither string is null or empty AND pstrRegExpToMatch matches PCRE pstrFileName. |
Remarks
This method could have been coded inline. However, since I have at least one other project in the works that requires it, I segregated it in this routine in this small, easily navigable class.
MatchHTMLPageTitleAttribute()
Expression to match the Title attribute of an ASP.NET page.
Declaration
public static string MatchHTMLPageTitleAttribute()
Returns
Type | Description |
---|---|
System.String |