Rather than overwhelming users with verbiage, I will hide most of my explanation unless it's asked for. My message is still not particularly brief, but it's no longer insanely long.
groggy: The ifdef_stack = [None] assignment made wmllint crash upon nested
#if blocks. The following block of wml should suffice to let it crash. The
inner #endif deletes the data about encountered #ifs.
...
#ifver WESNOTH_VERSION >= 1.11.0
#ifhave ~add-ons/UMC_Music_Book_1/_main.cfg
[binary_path]
path=data/add-ons/UMC_Music_Book_1
[/binary_path]
#endif
#else
#ifhave ~add-ons/UMC_Music/_main.cfg
[binary_path]
path=data/add-ons/UMC_Music
[/binary_path]
#endif
#endif
...
Rather than automatically putting the soundpath in the first frame, we collect a list of begin= keys and their locations. When it's time to convert, we check to see if any of those begin times match with the sound time, and if so, put sound in that [frame]. Only if there is no match do we default to the first frame, and we log a message noting the lack of a match.
Some [sound] tags include a sound_miss key as well. To preserve this data, I created two new variables to capture the sound_miss= and time= values, and use them to fill out the SOUND:HIT_AND_MISS macro. Because the macro is inserted before [frame] instead of after, 'if soundpath' needs to be moved up before the insertion so that the macro ends up under [attack_anim], rather than in the middle of [attack].
It's probably obvious what I tried to do. Unfortunately, only the first string works.
I also added another "and not" condition, to keep the same file from getting multiple entries in is_main.
After my last change, I noticed a puzzling failure by wmllint to convert a weapon special. This special was among some attributes that followed the [frame] sequence. It seems that Python does not wait for the earlier code block to complete before running the new one, and those lines aren't passed through the new block because they've been deleted and stashed in 'postframe'. When they're spewed back out, the new block has already passed those lines by.
I was relieved to find that this was not an issue introduced by my change, but an existing one. When I ran the original wmllint on the file, I found that the special= line got deleted, without being replaced by the [special] tags and macro. The latter is supposed to appear when wmllint hits the [/attack] tag, but never triggers because [/attack] has been changed to [/attack_anim].
Moving this code block up, so that abilities and specials are transformed before the [frame] lift (and 'postframe' stash), appeared to fix the problem. Hopefully, it won't cause a new on to show up.
This code block was actually producing some horrendous output, because key values were not reset to defaults at the closing [/attack] tag, even though many units have more than one attack. Also, the conversion was done when the first [frame] tag was encountered, although most authors put the [sound] block after [frame]s. So, what would typically happen is this:
* The first attack would be converted, usually without a soundpath. If there were any attributes after the [frame] sequence, the result would be non-functional, as the comment introducing this wmllint block warned (and wmllint would crash with an assertion error if "name=" happened to be one of them).
* Subsequent attacks would be converted, inheriting the sound and [attack_filter] from the soundpath and attackname of the *first* attack.
To fix these issues, I did the following:
* In order to do the conversion at a later stage, after the soundpath would normally have been picked up, the variable 'converting' was changed from a 0/1 value to a line index position.
* This enables the opportunity to move post-[frame] lines, for which purpose the new variables in_frame and postframe are created. When encountered, these lines are deleted and appended to postframe.
* When we get to [/attack], we still look to see if we are converting. If so, we go ahead with the replacement of lines[i], before the index position gets changed. Then we carry out the conversion that was originally carried out at the first [frame], using lines[converting] to do it at the same place.
* The lines in postframe are fed back in reverse order before the new closing [/attack] tag.
* Values are cleared to defaults, ready for the next [attack].
* It is no longer true that the frame sequence has to go last in [attack], so that part of the comment can be deleted.
First, the newline is added to "description = " rather than "new_line = ". But description was only changed if it didn't begin with a quotemark, meaning that those that *did* start with a quote weren't getting a newline.
Second, new_line was supposed to inherit indentation through "leader(syntactic)", but the line had already been stripped before "syntactic", in "fields = ".
I had noticed that the line replacing the get_hit_sound with DEFENSE_ANIM didn't have a newline, but assumed that it was part of 'comment'. Nope!
There are also two lines where a misplaced quotation mark led to an extraneous space being added to the end of a line.
I noticed that there were some additional weapon specials (marksman) and abilities (nightstalk, steadfast) that also had macros. I also saw examples of ability= keys that had comma-separated multiple values.
All of these variables are again defined as False when [unit] is in the line, but in_variation was missing from this earlier list. This caused wmllint to crash with an UnboundLocalError on a page of (UtBS) Kaleh-style macros that had no [unit] tag.
This came to my attention because of a Dark Elves scenario with a 'description=' key that was left blank for the value, crashing wmllint with an index error. More broadly, however, the operations in this section are pointless when there is no value.
I realized that there was no need to glob Windows arguments if there were no arguments. This meant moving my previous code block above the "if not arguments" statement, which actually creates an argument. And it meant moving Elvish Hunter's code, since the double quote issue will stop my block from working. Once this decision was made, it made sense to put both code blocks under the same "if" conditions, and to check if there were actually any wildcards during EH's block, before running the arguments through glob.
The Windows cmd shell does not expand wildcards by default, unlike UNIX shells. This imports glob.glob and runs arguments through it on Windows.
This will be frontported once I'm done on this branch and ready to check out master again.
We consider a file a main file if it contains either [campaign], [binary_path], or [textdomain]. (Almost all mainfiles have at least one of those.)
Eventually, we check that the campaign directory exists and that the _main.cfg doesn't already exist. If true, we rename.
We look for a [frame] inside a [defend] tag. If there is no image_defensive, we also use any image as our defend_image. We warn if there is already a DEFENSE_ANIM. If there is a get_hit_sound, we go through with the transformation, but alert the porter.
While getting rid of the deprecated get_hit_sound, we can get rid of the even more hoary image_defensive.
We do this with new variables that are initially set to None. While in_unit, we look for the 'image_defensive=' key. If encountered, we record the line's index position (image_defensive) and its value (defend_image).
When we hit [/unit], we look back to see if there is already a DEFENSE_ANIM. If so, and its reaction image matches the value recorded in defend_image, we figure there is no need to preserve the old key, and enter it into a list of image_defensive attributes to be deleted (image_done). If we are creating a DEFENSE_ANIM, we use defend_image for the reaction image instead of doubling up the base image. Once this is done, again there is no point in keeping image_defensive around, and it is entered into image_done for deletion.
If neither of these cases is met, we offer warnings that an outdated key is in use.
When all of the file's lines have been iterated through, we can then remove those image_defensive lines that have been marked as unnecessary.
Building on our earlier fix pointing this message's line number to the get_hit_sound, it would be even more useful to have the line number of the DEFENSE_ANIM as well. To get this, we change the has_defense_anim variable's value from False to None, and True to i. It is possible that a unit might have two macros for male and female variants, in which case we will assume that the get_hit_sound is most likely associated with the first.
Of course, during 1.5, this macro was renamed to {*NAMED_*LOYAL_UNIT}, but we will stick to upgrading to "1.4", and worry about changing LOYAL_UNIT to NAMED_LOYAL_UNIT in the current wmllint.
The basic "if '{UNIT '" condition is more efficient than subjecting every line to a complex regex. However, it would be theoretically possible for a matching line to fail the substitution. Thus, I used subn() instead of sub(), and only report an upgrade to stdout if there is at least one substitution. In the hypothetical case that no substitution is carried out, I alert the user so they can look into it.
The regular expression looks intimidating, so here's an explanation:
field 1: unit type - The authors of this era seem to have been pretty good about enclosing fields in parentheses, but this part of the regex accounts for all three possibilities: a) parentheses used to enclose; b) quotes used to enclose; c) no enclosure, thus ending with the next space.
field 2: id - Basically a clone of the first field.
field 3: name - Clone of the first two fields, except for allowing a translability underscore in the last two cases (though early UMC seems to all follow the practice of including the translatability underscore with the rest of the name inside parentheses.)
field 4: side - We can expect this to be a number. Old add-ons should all be using single digits, but I allow more than one digit to match anyway. In theory, there could be cases which would break the regex (e.g., enclosing the number in parentheses, a macro substitution), but since I don't know of any, I'll just call it a day.
field 5: x coordinate - Usually this will be a number, but it could also be a variable or a macro substitution.
field 6: y coordinate - Clone of field 6, except with y/Y substituted for x/X.
(A side note: after testing this commit, I noticed that the introduction to hack_syntax called for "set[ting] modcount to nonzero" when modifying lines, there are "modcount += 1" lines throughout the animation transformations, and hack_syntax ends with "return (lines, modcount)". Looking into it, however: a) my code was working fine without it, with no change detected after I tested inserting "modcount += 1"; b) I never figured out just what use wmllint was making of the modcount, despite all the care to increment it upwards; c) the last suite in hack_syntax doesn't use modcount, either, and it turns out to have been written by ESR himself in February 2008, after all the other code.)
The first message has a couple of problems. Technically, get_hit_sound is not a tag, and there is a stray quote mark at the end. Also, i+1 points to the line number of the [/unit] tag, which is not particularly helpful information. This can be changed to point to the line of the get_hit_sound attribute.
For the second message, the %d get_hit_sound is an index position, so +1 for the line number.
In Linux, many 1.2 unit files would crash wmllint, with tracebacks pointing to the "assert male/female_end != -1" line. Male/female_end's value is set to -1, and when it does not meet the condition for converting to i (line index position), the assert statement fails. The "assert male_end" error crashes files with gender=male, or no gender= key (thus defaulting to male). The "assert female_end" error is the female counterpart, and also covers units with both genders.
I found that after commenting out these assert statements, wmllint no longer barfed on those files. Studying the problem for this commit, however, I saw that "endswith()" included a newline. Could it simply be choking on DOS carriage returns? Doing a dryrun in Windows, which defaults to universal newlines support, I did not get the crashes. Change to binary mode, the crashes returned. Insert rstrip() and delete the newlines, and the crashes stop!
These portraits were moved prior to 1.1.9. That was before ESR joined Wesnoth development in April 2007, which may explain why wmllint didn't cover this change. Nevertheless, even many 1.2 campaigns still have the old "portraits/core" filepaths.
These old paths also keep post-1.4 wmllint from updating portrait paths to their current location, after they were moved again in 1.5.9.
I noted last time that my "fix" could not cover all possibilities. After further thought, I decided that the best thing to do in the hard cases is to sys.exit, and give users a clear explanation in stderr so they can re-enter their paths correctly. I may have gotten carried away, but given that many users nowadays are unfamiliar with the command line (even moreso on Windows), I wanted to give them plenty of hand-holding.
I looked up this issue, and it turns out to be a Windows shell problem after all, not Python, which surprised me.
After more testing, I realized that I did not take into account the possibility that the wildcard pattern would not match anything. In that case, the following 'if not arguments' clause would run wmllint on the entire current directory - which could very well be something that you do not want!
Although the original purpose of the in_textdomain and in_binary_path code, an aborted effort to update their paths to "data/add-ons/", has been superseded by code that updates those paths on all lines, it can still be put to use.
Our first step is to move that section below the code that updated UMC paths, so our regexes won't have to deal with "@campaigns" and "data/campaigns" strings. Then we delete the 'if 0:' line that was neutralizing this section, as well as the obsolete path-changing code. The rest is de-indented one level.
Then we look for the use of "~" for userdata, which does not work for textdomains and binary paths.
Our regex object, 'tilde', is constituted thusly: (1) We make sure that the line starts with the "path" key. Any line we're interested in ought to start with this, and this will also keep this code from going wild on the campaign includes, if an author forgot a closing tag (no reset to False). (x) There shouldn't be any whitespace around the = sign, but we'll be kind. (2) On the value side, there shouldn't be anything before the tilde except perhaps a quote. Rather than underestimate the ingenuity of authors in coming up with weird code, however, I allow anything except a comment to match for a few characters. But if we haven't hit the '~' after five characters, I figure something's wrong, and bail. (3) Then we come to the tilde. Normally, it would be adjoining "add-ons/", but some authors interpolate a slash, or 'data/' (here represented as an optional string).
If we match, we rebuild the line, except 'data/add-ons' is substituted for group(3), and we log to stdout.
The Windows cmd shell does not expand wildcards by default, unlike UNIX shells. This imports glob.glob and runs arguments through it on Windows.
Frontported (in modified form) from my 1.4 work!
While testing my next commit, I discovered that EH's fix works when there is only one argument, or if the offender is the last argument, but doesn't work with multiple entries. His fix is meant to work on each argument, but the (unintentionally) escaped quote no longer serves to end the argument, causing following arguments to be considered part of the same argument.
Using split() allows us to break apart these misconjoined arguments. With rstrip(), we prevent an empty string that Windows will also complain it cannot find. However, if there are three or more arguments, there will still be lumped-together arguments unless all arguments up to the second from last also end with a backslash and quote. It is impossible to cover every possible case.
The re.sub handles the probably rare case where a backslash before a quote comes within the argument rather than at the end. However, it will only work if there is only one argument.
All this is unnecessary if the OS is not Windows (also, I haven't had the opportunity to test this on a non-Windows system to see if it has any side-effects there). So I've put it under a sys.platform condition.
This section assumed that "#ifdef" and "#endif" would come at the very start of a line. When an author would indent the #ifdef but not the #endif, ifdef_stack.pop() would kill the starting value of None, leaving an empty list. wmllint would then crash:
File "wmllint", line 1138, in global_sanity_check
recruit[ifdef_stack[-1]] = (i+1, map(lambda x: x.strip(), value.split(",")))
IndexError: list index out of range
Stripping the line not only stops the crashes, but allows wmllint to pick up #ifdefs that it wasn't before.
I then looked more closely at the pop(). #endif shouldn't just drop the last value in the stack, but reset the whole stack back to None. I realized that pop() was leading to wmllint occasionally assigning recruitment that wasn't inside an #ifdef to values from earlier #ifdef stacks, e.g.:
>> starting value: [None]
#ifdef EASY >> [None, 'EASY']
..
#else >> [None, 'EASY', '!EASY']
..
#endif >> pop(): [None, 'EASY']
..
recruit= >> ifdef_stack[-1]: EASY