`i` is used as the index for 'old lines' in diff generation, not 'new
lines'. Using the wrong index would mean that for certain diffs the
prefixed context information would have wrong content, and could even
result in a crash.
Fix this, and add a test for an input which was previously crashing.
The existing hunk data structure does not contain any way to easily
store information about context surrounding the additions and removals
in a hunk. While this does work fine for normal diffs (where there is
never any surrounding context) this data structure is quite limiting for
other use cases.
Without support for surrounding context it is not possible to:
* Add support for unified or context format to the diff utility to
output surrounding context.
* Be able to implement a patch utility that uses the surrounding
context to reliably locate where to apply a patch when a hunk range
does not apply perfectly.
This patch changes Diff::Hunk such that its data structure more closely
resembles a unified diff. Each line in a hunk is now either a change,
removal, addition or context.
Allowing hunks to have context inside of them exposes that HackStudio
heavily relies on there being no context in the hunks that it uses for
its' git gutter implementation. The fix here is simple - ask git to
produce us a diff that has no context in it!
Previously we would fail to generate any hunks if the old or the new
file was empty. We now do, with the original/target line index set to 0,
as specified by POSIX.
Previously the algorithm was being performed from the start of the
string to the end, which was a little more convenient when writing
the code, but made it more annoying to be able to properly talk
about the "start" of where the changes were happening, since we
can only re-construct the changes in reverse order of the initial
traversal.
Basically, doing the initial pass in reverse lets us reconstruct
the hunks in the correct order to begin with, and not have to worry
about reversing the hunks / lines within the hunks
For now this is just a standard implementation of the longest
common subsequence algorithm over the lines, except that it doesn't
do any coalescing of the lines. This isn't really ideal since
we get a single Hunk per changed line, and is definitely something
to improve in the future.