`i` is used as the index for 'old lines' in diff generation, not 'new
lines'. Using the wrong index would mean that for certain diffs the
prefixed context information would have wrong content, and could even
result in a crash.
Fix this, and add a test for an input which was previously crashing.
Multiple patches may be concatenated in the same patch file, such as git
commits which are changing multiple files at the same time. To handle
this, parse each patch in order in the patch file, and apply each patch
sequentially.
To determine whether we are at the end of a patch (and not just parsing
another hunk) the parser will look for a leading '@@ ' after every hunk.
If that is found, there is another hunk. Otherwise, we must be at the
end of this patch.
Implement the patch '-p' / '--strip' option, which strips the given
number of leading components from filenames parsed in the patch header.
If not given this option defaults to the basename of that path.
Given a set of lines from the file we are patching, and a patch itself,
this function will try and locate where in the file to apply that patch,
and write the result of patching that file (if successful) to the output
stream.
This is a somewhat naive implementation, but it is enough to parse a
simple unified patch header.
After parsing the patch header, the parser will be at the beginning of
the first hunks range, ready for that hunk to be parsed.
Parsing of the unified hunks now verifies that the expected number of
lines given by the unified location at the beginning of that hunk are
actually in that hunk. Furthermore, we no longer crash when given a
bogus unified range.
As a further benefit, this begins the scaffolding for a patch parser
which should assist us in parsing full patches - even when we are not
aware of the format that a patch has been written in.
There is a little bit more complexity involved here than the other
formats. In particular, this is due to the need to determine whether
an addition line or removal line is just that, or a 'change'.
The existing hunk data structure does not contain any way to easily
store information about context surrounding the additions and removals
in a hunk. While this does work fine for normal diffs (where there is
never any surrounding context) this data structure is quite limiting for
other use cases.
Without support for surrounding context it is not possible to:
* Add support for unified or context format to the diff utility to
output surrounding context.
* Be able to implement a patch utility that uses the surrounding
context to reliably locate where to apply a patch when a hunk range
does not apply perfectly.
This patch changes Diff::Hunk such that its data structure more closely
resembles a unified diff. Each line in a hunk is now either a change,
removal, addition or context.
Allowing hunks to have context inside of them exposes that HackStudio
heavily relies on there being no context in the hunks that it uses for
its' git gutter implementation. The fix here is simple - ask git to
produce us a diff that has no context in it!
Currently the only error that can happen is an OOM. However, in the
future there may be other errors that this function may throw, such as
detecting an invalid patch.
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.
This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.
Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
Previously we would fail to generate any hunks if the old or the new
file was empty. We now do, with the original/target line index set to 0,
as specified by POSIX.
If the location started at 0, and / or the length was 0, it would
originally turn out to be a location of { -1, -1 } when LibDiff was
finished parsing, which was incorrect.
To fix this, we only subtract 1 if `start` or `length` isn't 0.
Previously the algorithm was being performed from the start of the
string to the end, which was a little more convenient when writing
the code, but made it more annoying to be able to properly talk
about the "start" of where the changes were happening, since we
can only re-construct the changes in reverse order of the initial
traversal.
Basically, doing the initial pass in reverse lets us reconstruct
the hunks in the correct order to begin with, and not have to worry
about reversing the hunks / lines within the hunks
For now this is just a standard implementation of the longest
common subsequence algorithm over the lines, except that it doesn't
do any coalescing of the lines. This isn't really ideal since
we get a single Hunk per changed line, and is definitely something
to improve in the future.
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.
Also add some tests to cover basic behavior.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.