Commit graph

150 commits

Author SHA1 Message Date
Brian Huisman
2b80228f2a Update PHP version check for 8.1.x 2024-05-27 11:52:40 -04:00
Brian Huisman
432e15699d Abbreviate display of IPv6 addresses 2024-05-27 11:18:41 -04:00
Brian Huisman
35cf52b65b Limit number of GEOIP lookups.
Don't geolocate every IP, just unique ones. We'll probably still need a cached set of previously geolocated IPs to speed this up further.
2024-05-27 11:00:36 -04:00
Brian Huisman
528a2dbf91 Store IP as text instead of INT (database change)
This change will require you to edit your database or reinstall the Orcinus Site Search from scratch after deleting all associated database tables.

Accounts for IPv4 and IPv6.
2024-05-17 15:12:09 -04:00
Brian Huisman
f34f5097fc Ignore Cloudflare timeout
Don't even notify the user of a Cloudflare timeout, just keep running the interval.
2024-05-17 10:22:18 -04:00
Brian Huisman
f7fbaadf9b Kludge for 524 Cloudflare timeout response
There is no need to cancel the crawler for an HTTP 524 response.
2024-05-16 13:33:37 -04:00
Brian Huisman
fb7e295490 Update PdfParser to 2.10.0 2024-05-16 12:36:43 -04:00
Brian Huisman
4f679114c3 Misc updates
Some small formatting updates.

Assert that some strings have a length before accessing them via string offset.
2024-05-16 12:30:53 -04:00
Brian Huisman
38a75ce70c Don't display zeros on the graph 2024-04-22 12:47:30 -04:00
Brian Huisman
5d990e44b0 Update search.php
Don't allow JSON requests to trigger a new crawl or end a stuck one, since the requests may come too fast to handle.

Only allow triggering a new crawl if more than 'sp_timeout_crawl' seconds have passed since we canceled the previous one.

In the future, we might give each successfully initiated crawl a unique ID and then only allow sending a failure email once if it has failed. A very busy search engine is probably indistinguishable from a rapid-fire series of JSON requests.
2023-12-06 10:14:18 -05:00
Brian Huisman
a125060d7f Fix $capture typo 2023-11-06 11:29:18 -05:00
Brian Huisman
0c1f359ef0 Graph tweaks
Determine the correct height of the bar from the data-value and the given height of the tbody rather than including an explicit data-height on the bar.

Better algorithm to determine where to draw horizontal lines on the graph.

Only display the top 10 geolocated search locations with the rest falling under "Other", unless there are only 11 locations in the list.
2023-11-06 11:27:15 -05:00
Brian Huisman
9e5dccf8b7 Add Statistics page
Added a Statistics page. Probably still needs some work.
2023-11-03 14:26:43 -04:00
Brian Huisman
438c520f7c Prevent division by zero in edge case 2023-10-18 14:23:00 -04:00
Brian Huisman
4bbe1d967b Misc fixes
Save the process id of the crawler in the sp_crawling DB value instead of just a flag; we can use it to compare and further prevent race conditions which still seem to happen occasionally.
2023-10-17 10:36:34 -04:00
Brian Huisman
5cb7c372fb Couple misc fixes
Change element where some classes are applied in admin.php to work with updated Bootstrap.

Add the fi, ff, fl, ffi, ffl series of ligatures to $_RDATA['s_latin'] as they are common in PDFs.
2023-09-29 12:57:23 -04:00
Brian Huisman
dee454cb8c Update Bootstrap and jQuery
Bootstrap => v5.3.2
jQuery => v3.7.1
2023-09-28 11:37:46 -04:00
Brian Huisman
1860d1f8ce Totally forgot to actually implement this feature
The "remove text from titles" feature was coded into the admin UI from the previous version, but was never actually implemented in the crawler. Wow. It works now.
2023-09-27 15:33:06 -04:00
Brian Huisman
6c961d44a3 Add Query Log row display limit 2023-09-25 11:53:20 -04:00
Brian Huisman
f8bed73c26 Responsive pagination bar on Page Index 2023-09-15 10:32:59 -04:00
Brian Huisman
873a18fbc9 Add text fragments flag and functionality 2023-09-14 13:18:33 -04:00
Brian Huisman
da52e0f7bf Add header image and link
Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something?

Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip
2023-09-14 10:33:59 -04:00
Brian Huisman
4c78a5245f Use REPLACE INTO for resiliency 2023-09-12 10:44:14 -04:00
Brian Huisman
d4e0e409fe Show Page Titles checkbox on Page Index
Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.
2023-09-11 13:32:45 -04:00
Brian Huisman
dd88459d04 Update admin.php
PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.
2023-09-11 12:18:33 -04:00
Brian Huisman
511207e0b2 Add PDF Last Modified multiplier 2023-09-08 15:11:27 -04:00
Brian Huisman
302c8db00e Group statements
Group several statements into single statements. You might not like it, but this makes me happy. :)
2023-07-21 14:17:15 -04:00
Brian Huisman
382511077a Misc updates
Prettify some SQL code.
Add some error-reponse code for fatal failed SQL statements.
2023-07-21 13:04:51 -04:00
Brian Huisman
edf2dc338c admin and pdfparser updates
Add some tooltip text to some elements in admin.php
Merge recent PdfParser updates into the library
2023-07-11 10:24:55 -04:00
Brian Huisman
229129a9e4 Update crawler.php
Get and set sp_crawling in real-time to minimize race conditions.
2023-07-06 15:09:31 -04:00
Brian Huisman
181addfd3d Update PDFDocEncoding.php
Add Yours Truly as the author of this file. I'm humble!
2023-07-04 14:06:28 -04:00
Brian Huisman
a5ff604f58 Update crawler.php
Update crawler.php to also try using XMP metadata from updated PDFParser
2023-07-04 13:46:12 -04:00
Brian Huisman
06cc7fe325 PdfParser update
This update adds XMP metadata and PDFDocEncoding support for regular metadata.
2023-07-04 13:08:42 -04:00
Brian Huisman
30630c6c60 Start enforcing PHP and SQL version limitations. 2023-06-26 15:00:42 -04:00
Brian Huisman
5a39280858 PdfParser PHP-CS-Fixer updates 2023-06-23 12:23:20 -04:00
Brian Huisman
3307baac4d Update crawler.php
Run mb_convert_encoding in ALL cases to remove potentially invalid UTF-8 characters.
Add the "replacement" UTF-8 character to the whitespace array to ensure it's removed.
2023-06-22 15:35:40 -04:00
Brian Huisman
b12e7991e0 Update PdfParser to latest snapshot
Update PdfParser to the latest snapshot from the repo.
Add code to allow PdfParser to decode XMP Metadata from PDF files, preferring it over other decoded data.
2023-06-22 15:33:53 -04:00
Brian Huisman
47562e0a71 Add 'online' value for Mustache template
Provide an 'online' value to the Search Result Mustache template. This will, for example, allow you to put things in your Search Result template that will show up when your site is displayed live (PHP), but will not be output when your site is displayed using the offline Javascript, and vice versa.

eg.
{{#online}}
  This will only display in your template if it's online.
{{/online}}
{{^online}}
  This will only display in your template it it's offline.
{{/online}}
2023-06-22 09:57:33 -04:00
Brian Huisman
042339d3ef Update crawler.php
Don't assume that other data from a PDF is the same as the content. Bypasses some still-unfixed PDFParser encoding issues.
Also exit the crawler script if we are in debug mode and there is a crawl already running.
2023-06-21 17:23:08 -04:00
Brian Huisman
eda57224d9 Remove need for 'jw_depth' value
By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js
2023-06-21 15:07:57 -04:00
Brian Huisman
b17a68c175 Update template.offline.js
Quote jw_depth string.
2023-06-21 12:09:28 -04:00
Brian Huisman
675b25b1e4 Update admin.php
Forgot a [0] dangit.
2023-06-19 12:31:47 -04:00
Brian Huisman
930d4fa793 Update admin.php
Test user-supplied regular expression matches for validity before saving.
2023-06-19 12:12:58 -04:00
Brian Huisman
0a83546411 Update crawler.php
Make sure regexp lines in require and ignore URL fields are actually treated as regexps.
2023-06-19 11:51:57 -04:00
Brian Huisman
e9c0654295 Update search.js
Tweak the comment
2023-06-16 14:47:51 -04:00
Brian Huisman
54bbbb6a65 Log clicked search suggestions
If the search UI is using typeahead and the user selects a suggested option to go right to a page, then a search is never logged as a search query; it's like the search never happened. Add a fetch request to log the search query just before sending the user on their way to the page.
2023-06-16 14:38:24 -04:00
Brian Huisman
8ed10eac36 Tweaks
Fix text1/text2 specification in page index SQL query.
Add eszett also to single 's' for replacement.
2023-06-16 13:33:15 -04:00
Brian Huisman
e76fdf730c s_show_orphans cleanup
Make 's_show_orphans' a runtime variable and normalize the SQL queries it's used in.
Also change generic '$select' variable to more semantic '$crawldata'.
2023-06-15 10:19:05 -04:00
Brian Huisman
e440babc38 Directly reference jw_depth
Don't depend on the id="os_results" element existing in the user template, just use os_odata.jw_depth directly.
2023-06-15 09:48:28 -04:00
Brian Huisman
a489fb1b8e sp_smart => sp_punct
Change sp_smart to sp_punct also in the offline javascript template.
2023-06-14 15:39:30 -04:00