The 'sed' script in tags_in_post() used a GNU-specific feature, `\+`.
This became unnecessary anyway after previous edits, so remove it.
Also replace whitespace-comma-whitespace by newline directly instead
of doing an intermediary replace.
edit(): The -n functionality (to rename files according to new title) was
broken. After renaming, files were accessed by the old name and not found,
or empty files were recreated under the old name, or both. Fixes:
- Move 'touch' commands for restoring time stamps to more opportune places.
- When renaming, save old file name to exclude it from $relevant_posts.
global_variables(): suppress GNU 'which' error message on setting markdown_bin.
- Globally, now do word splitting (IFS) only on newline (which also makes
"$*" expand with newline separator instead of space).
- Disable globbing (pathmame expansion), to be re-enabled locally using
'set +f' where needed (typically in a subshell).
These changes help eliminate unexpected snags and security vulnerabilities
in case someone forgets to quote a variable somewhere. They should also make
the code "just work" with spaces and other special characters in file names
and tags (as long as they're not newline characters, but that can't happen
with regular use of the script as the newline is the separator). This means
that, as of this change, editing or completely emptying the convert_filename
filter should no longer pose any problems as far as bb.sh is concerned.
The changes to adapt the code to the above are mainly:
- Now that we do word splitting on newline only, we can go back to iterating
through files in a "for" loop instead of using "read" with a here-document,
which is more readable. However, to enable globbing locally, a technique
adaptation is needed, like:
for file in $(set +f; printf '%s\n' *.html)
or
for file in $(set +f; ls -t -- *.html) # sort by date, newest first
Given IFS=$'\n' and globbing disabled globally, this technique is robust
for all special characters in file names except for newlines.
- invoke_editor() function replaces direct $EDITOR calls, because we need to
locally word-split $EDITOR on spaces in case it contains arguments.
- parse_file(): rewrite tag parsing to handle possible spaces in tags
- tags_in_post(): output line-separated instead of space-separated tags;
further adjust sed script to handle possible spaces in tags
- rebuild_tags(): this function was refactored to use an array internally.
Instead of two combined strings, it now takes HTML files and tags as
separate arguments, separated by a single "--tag" argument. This allows
for spaces and other special characters in both file names and tags. (See
also commit a674ec5, which started this but didn't finish it).
- A much shorter test_markdown() function that compares output directly
rather than using temp files.
- Revert to using 'which' rather than 'command -v' for using the markdown binary
because 'command -v' will find the markdown() shell function.
- Use builtin 'command -v' rather than external 'which'.
- The clean_filename() function just removed the initial ./ from a file
name; do this with a parameter substitution instead. This gets rid of the
need to fork subshells for command substitutions, so is much faster.
- Where convenient, replace 'echo ... | sed ...' with fast parameter
substitutions.
* Iterating through 'ls' output using 'for' is very brittle; it relies on word splitting and globbing can also mess it up. It's best to use globs directly, but if using 'ls' cannot be avoided, e.g. if you need to sort by date, at least we can use 'IFS= read -r $i' to read from a here-document filled with the 'ls' output, which leaves everything in file names intact except newlines.
* Other minor cleanups.
- Fix lots of problems with convoluted and broken quoting techniques.
- Group code blocks for redirection into a file rather than doing a separate additive redirect for each command.
- Replace strings using bash parameter substitution rather than piping 'echo' through 'sed', resulting in a faster script.
Another minor code cleanup. Within [[ ... ]] and (( ... )) (but not [ ... ]) there is a different shell parsing context in which field splitting (a.k.a. word splitting) and pathname expansion (a.k.a. filename globbing) don't apply, so consistently use '[[' (and '((' for arithmetics) instead of '[' and remove unnecessary quotes.
Since '[[ x == y]]' does 'case'-like glob pattern matching on 'y', the quotes to the right of '==' need to be kept for variables or glob characters, except if a glob pattern is wanted.
Substitutions (variables, command substitutions, etc.) directly assigned to a scalar variable don't need to be quoted, as field splitting and globbing don't apply in that context. Removing superfluous quotes makes the code look a bit cleaner.