Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Localization and Regex Traits

Overview

Matching a regular expression against a string often requires locale-dependent information. For example, how are case-insensitive comparisons performed? The locale-sensitive behavior is captured in a traits class. xpressive provides three traits class templates: cpp_regex_traits<>, c_regex_traits<> and null_regex_traits<>. The first wraps a std::locale, the second wraps the global C locale, and the third is a stub traits type for use when searching non-character data. All traits templates conform to the Regex Traits Concept.

Setting the Default Regex Trait

By default, xpressive uses cpp_regex_traits<> for all patterns. This causes all regex objects to use the global std::locale. If you compile with BOOST_XPRESSIVE_USE_C_TRAITS defined, then xpressive will use c_regex_traits<> by default.

Using Custom Traits with Dynamic Regexes

To create a dynamic regex that uses a custom traits object, you must use regex_compiler<>. The basic steps are shown in the following example:

// Declare a regex_compiler that uses the global C locale
regex_compiler<char const *, c_regex_traits<char> > crxcomp;
cregex crx = crxcomp.compile( "\\w+" );

// Declare a regex_compiler that uses a custom std::locale
std::locale loc = /* ... create a locale here ... */;
regex_compiler<char const *, cpp_regex_traits<char> > cpprxcomp(loc);
cregex cpprx = cpprxcomp.compile( "\\w+" );

The regex_compiler objects act as regex factories. Once they have been imbued with a locale, every regex object they create will use that locale.

Using Custom Traits with Static Regexes

If you want a particular static regex to use a different set of traits, you can use the special imbue() pattern modifier. For instance:

// Define a regex that uses the global C locale
c_regex_traits<char> ctraits;
sregex crx = imbue(ctraits)( +_w );

// Define a regex that uses a customized std::locale
std::locale loc = /* ... create a locale here ... */;
cpp_regex_traits<char> cpptraits(loc);
sregex cpprx1 = imbue(cpptraits)( +_w );

// A shorthand for above
sregex cpprx2 = imbue(loc)( +_w );

The imbue() pattern modifier must wrap the entire pattern. It is an error to imbue only part of a static regex. For example:

// ERROR! Cannot imbue() only part of a regex
sregex error = _w >> imbue(loc)( _w );

Searching Non-Character Data With null_regex_traits

With xpressive static regexes, you are not limitted to searching for patterns in character sequences. You can search for patterns in raw bytes, integers, or anything that conforms to the Char Concept. The null_regex_traits<> makes it simple. It is a stub implementation of the Regex Traits Concept. It recognizes no character classes and does no case-sensitive mappings.

For example, with null_regex_traits<>, you can write a static regex to find a pattern in a sequence of integers as follows:

// some integral data to search
int const data[] = {0, 1, 2, 3, 4, 5, 6};

// create a null_regex_traits<> object for searching integers ...
null_regex_traits<int> nul;

// imbue a regex object with the null_regex_traits ...
basic_regex<int const *> rex = imbue(nul)(1 >> +((set= 2,3) | 4) >> 5);
match_results<int const *> what;

// search for the pattern in the array of integers ...
regex_search(data, data + 7, what, rex);

assert(what[0].matched);
assert(*what[0].first == 1);
assert(*what[0].second == 6);

PrevUpHomeNext