ladybird/Userland/Libraries/LibWeb/DOM/CharacterData.idl
Shannon Booth d8759d9656 LibWeb: Use UTF-16 code unit offsets and lengths in CharacterData
We were previously assuming that the input offsets and lengths were all
in raw byte offsets into a UTF-8 string. While internally our String
representation may be in UTF-8 from the external world it is seen as
UTF-16, with code unit offsets passed through, and used as the returned
length.

Beforehand, the included test included in this commit would crash
ladybird (and otherwise return wrong values).

The implementation here is very inefficient, I am sure there is a
much smarter way to write it so that we would not need a conversion
from UTF-8 to a UTF-16 string (and then back again).

Fixes: #20971
2023-12-23 20:41:41 +01:00

22 lines
870 B
Text

#import <DOM/ChildNode.idl>
#import <DOM/Element.idl>
#import <DOM/Node.idl>
// https://dom.spec.whatwg.org/#characterdata
[Exposed=Window]
interface CharacterData : Node {
[LegacyNullToEmptyString] attribute DOMString data;
[ImplementedAs=length_in_utf16_code_units] readonly attribute unsigned long length;
DOMString substringData(unsigned long offset, unsigned long count);
undefined appendData(DOMString data);
undefined insertData(unsigned long offset, DOMString data);
undefined deleteData(unsigned long offset, unsigned long count);
undefined replaceData(unsigned long offset, unsigned long count, DOMString data);
// https://dom.spec.whatwg.org/#interface-nondocumenttypechildnode
readonly attribute Element? previousElementSibling;
readonly attribute Element? nextElementSibling;
};
CharacterData includes ChildNode;