This change will require you to edit your database or reinstall the Orcinus Site Search from scratch after deleting all associated database tables.
Accounts for IPv4 and IPv6.
Don't allow JSON requests to trigger a new crawl or end a stuck one, since the requests may come too fast to handle.
Only allow triggering a new crawl if more than 'sp_timeout_crawl' seconds have passed since we canceled the previous one.
In the future, we might give each successfully initiated crawl a unique ID and then only allow sending a failure email once if it has failed. A very busy search engine is probably indistinguishable from a rapid-fire series of JSON requests.
Determine the correct height of the bar from the data-value and the given height of the tbody rather than including an explicit data-height on the bar.
Better algorithm to determine where to draw horizontal lines on the graph.
Only display the top 10 geolocated search locations with the rest falling under "Other", unless there are only 11 locations in the list.
Save the process id of the crawler in the sp_crawling DB value instead of just a flag; we can use it to compare and further prevent race conditions which still seem to happen occasionally.
Change element where some classes are applied in admin.php to work with updated Bootstrap.
Add the fi, ff, fl, ffi, ffl series of ligatures to $_RDATA['s_latin'] as they are common in PDFs.
The "remove text from titles" feature was coded into the admin UI from the previous version, but was never actually implemented in the crawler. Wow. It works now.
Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something?
Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip
Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.
PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.
Run mb_convert_encoding in ALL cases to remove potentially invalid UTF-8 characters.
Add the "replacement" UTF-8 character to the whitespace array to ensure it's removed.
Update PdfParser to the latest snapshot from the repo.
Add code to allow PdfParser to decode XMP Metadata from PDF files, preferring it over other decoded data.
Provide an 'online' value to the Search Result Mustache template. This will, for example, allow you to put things in your Search Result template that will show up when your site is displayed live (PHP), but will not be output when your site is displayed using the offline Javascript, and vice versa.
eg.
{{#online}}
This will only display in your template if it's online.
{{/online}}
{{^online}}
This will only display in your template it it's offline.
{{/online}}
Don't assume that other data from a PDF is the same as the content. Bypasses some still-unfixed PDFParser encoding issues.
Also exit the crawler script if we are in debug mode and there is a crawl already running.
By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js
If the search UI is using typeahead and the user selects a suggested option to go right to a page, then a search is never logged as a search query; it's like the search never happened. Add a fetch request to log the search query just before sending the user on their way to the page.
Make 's_show_orphans' a runtime variable and normalize the SQL queries it's used in.
Also change generic '$select' variable to more semantic '$crawldata'.