Konstantin Varlamov e9adcc488f
[libc++][regex] Correctly adjust match prefix for zero-length matches. (#94550)
For regex patterns that produce zero-length matches, there is one
(imaginary) match in-between every character in the sequence being
searched (as well as before the first character and after the last
character). It's easiest to demonstrate using replacement:
`std::regex_replace("abc"s, "!", "")` should produce `!a!b!c!`, where
each exclamation mark makes a zero-length match visible.

Currently our implementation doesn't correctly set the prefix of each
zero-length match, "swallowing" the characters separating the imaginary
matches -- e.g. when going through zero-length matches within `abc`, the
corresponding prefixes should be `{'', 'a', 'b', 'c'}`, but before this
patch they will all be empty (`{'', '', '', ''}`). This happens in the
implementation of `regex_iterator::operator++`. Note that the Standard
spells out quite explicitly that the prefix might need to be adjusted
when dealing with zero-length matches in
[`re.regiter.incr`](http://eel.is/c++draft/re.regiter.incr):
> In all cases in which the call to `regex_search` returns `true`,
`match.prefix().first` shall be equal to the previous value of
`match[0].second`... It is unspecified how the implementation makes
these adjustments.

[Reproduction example](https://godbolt.org/z/8ve6G3dav)
```cpp
#include <iostream>
#include <regex>
#include <string>

int main() {
  std::string str = "abc";
  std::regex empty_matching_pattern("");

  { // The underlying problem is that `regex_iterator::operator++` doesn't update
    // the prefix correctly.
    std::sregex_iterator i(str.begin(), str.end(), empty_matching_pattern), e;
    std::cout << "\"";
    for (; i != e; ++i) {
      const std::ssub_match& prefix = i->prefix();
      std::cout << prefix.str();
    }
    std::cout << "\"\n";
    // Before the patch: ""
    // After the patch: "abc"
  }

  { // `regex_replace` makes the problem very visible.
    std::string replaced = std::regex_replace(str, empty_matching_pattern, "!");
    std::cout << "\"" << replaced << "\"\n";
    // Before the patch: "!!!!"
    // After the patch: "!a!b!c!"
  }
}
```

Fixes #64451

rdar://119912002
2024-06-07 11:15:02 -04:00
..