瀏覽代碼

LibRegex: Explicitly check if a character falls into a table-based range

Previously, for a regex such as /[a-sy-z]/i, we would incorrectly think
the character "u" fell into the range "a-s" because neither of the
conditions "u > s && U > s" or "u < a && U < a" would be true, resulting
in the lookup falling back to assuming the character is in the range.

Instead, first explicitly check if the character falls into the range,
rather than checking if it falls outside the range. If the explicit
checks fail, then we know the character is outside the range.
Timothy Flynn 2 年之前
父節點
當前提交
48cb15283a
共有 2 個文件被更改,包括 11 次插入5 次删除
  1. 4 1
      Tests/LibRegex/Regex.cpp
  2. 7 4
      Userland/Libraries/LibRegex/RegexByteCode.cpp

+ 4 - 1
Tests/LibRegex/Regex.cpp

@@ -690,7 +690,10 @@ TEST_CASE(ECMA262_match)
         { "a|$"sv, "x"sv, true, (ECMAScriptFlags)regex::AllFlags::Global }, // #11940, Global (not the 'g' flag) regexps should attempt to match the zero-length end of the string too.
         { "foo\nbar"sv, "foo\nbar"sv, true }, // #12126, ECMA262 regexp should match literal newlines without the 's' flag.
         { "foo[^]bar"sv, "foo\nbar"sv, true }, // #12126, ECMA262 regexp should match newline with [^].
-        { "^[_A-Z]+$"sv, "_aA"sv, true, ECMAScriptFlags::Insensitive } // Insensitive lookup table: characters in a range do not necessarily lie in the same range after being converted to lowercase.
+        { "^[_A-Z]+$"sv, "_aA"sv, true, ECMAScriptFlags::Insensitive }, // Insensitive lookup table: characters in a range do not necessarily lie in the same range after being converted to lowercase.
+        { "^[a-sy-z]$"sv, "b"sv, true, ECMAScriptFlags::Insensitive },
+        { "^[a-sy-z]$"sv, "y"sv, true, ECMAScriptFlags::Insensitive },
+        { "^[a-sy-z]$"sv, "u"sv, false, ECMAScriptFlags::Insensitive },
     };
     // clang-format on
 

+ 7 - 4
Userland/Libraries/LibRegex/RegexByteCode.cpp

@@ -557,11 +557,14 @@ ALWAYS_INLINE ExecutionResult OpCode_Compare::execute(MatchInput const& input, M
                     upper_case_needle = to_ascii_uppercase(needle);
                     lower_case_needle = to_ascii_lowercase(needle);
                 }
-                if (lower_case_needle > range.to && upper_case_needle > range.to)
+
+                if (lower_case_needle >= range.from && lower_case_needle <= range.to)
+                    return 0;
+                if (upper_case_needle >= range.from && upper_case_needle <= range.to)
+                    return 0;
+                if (lower_case_needle > range.to || upper_case_needle > range.to)
                     return 1;
-                if (lower_case_needle < range.from && upper_case_needle < range.from)
-                    return -1;
-                return 0;
+                return -1;
             });
 
             if (matching_range) {