0ct0pu5/orcinus-search

Author	SHA1	Message	Date
Brian Huisman	2b80228f2a	Update PHP version check for 8.1.x	2024-05-27 11:52:40 -04:00
Brian Huisman	432e15699d	Abbreviate display of IPv6 addresses	2024-05-27 11:18:41 -04:00
Brian Huisman	35cf52b65b	Limit number of GEOIP lookups. Don't geolocate every IP, just unique ones. We'll probably still need a cached set of previously geolocated IPs to speed this up further.	2024-05-27 11:00:36 -04:00
Brian Huisman	528a2dbf91	Store IP as text instead of INT (database change) This change will require you to edit your database or reinstall the Orcinus Site Search from scratch after deleting all associated database tables. Accounts for IPv4 and IPv6.	2024-05-17 15:12:09 -04:00
Brian Huisman	f34f5097fc	Ignore Cloudflare timeout Don't even notify the user of a Cloudflare timeout, just keep running the interval.	2024-05-17 10:22:18 -04:00
Brian Huisman	f7fbaadf9b	Kludge for 524 Cloudflare timeout response There is no need to cancel the crawler for an HTTP 524 response.	2024-05-16 13:33:37 -04:00
Brian Huisman	fb7e295490	Update PdfParser to 2.10.0	2024-05-16 12:36:43 -04:00
Brian Huisman	4f679114c3	Misc updates Some small formatting updates. Assert that some strings have a length before accessing them via string offset.	2024-05-16 12:30:53 -04:00
Brian Huisman	38a75ce70c	Don't display zeros on the graph	2024-04-22 12:47:30 -04:00
Brian Huisman	5d990e44b0	Update search.php Don't allow JSON requests to trigger a new crawl or end a stuck one, since the requests may come too fast to handle. Only allow triggering a new crawl if more than 'sp_timeout_crawl' seconds have passed since we canceled the previous one. In the future, we might give each successfully initiated crawl a unique ID and then only allow sending a failure email once if it has failed. A very busy search engine is probably indistinguishable from a rapid-fire series of JSON requests.	2023-12-06 10:14:18 -05:00
Brian Huisman	a125060d7f	Fix $capture typo	2023-11-06 11:29:18 -05:00
Brian Huisman	0c1f359ef0	Graph tweaks Determine the correct height of the bar from the data-value and the given height of the tbody rather than including an explicit data-height on the bar. Better algorithm to determine where to draw horizontal lines on the graph. Only display the top 10 geolocated search locations with the rest falling under "Other", unless there are only 11 locations in the list.	2023-11-06 11:27:15 -05:00
Brian Huisman	9e5dccf8b7	Add Statistics page Added a Statistics page. Probably still needs some work.	2023-11-03 14:26:43 -04:00
Brian Huisman	438c520f7c	Prevent division by zero in edge case	2023-10-18 14:23:00 -04:00
Brian Huisman	4bbe1d967b	Misc fixes Save the process id of the crawler in the sp_crawling DB value instead of just a flag; we can use it to compare and further prevent race conditions which still seem to happen occasionally.	2023-10-17 10:36:34 -04:00
Brian Huisman	5cb7c372fb	Couple misc fixes Change element where some classes are applied in admin.php to work with updated Bootstrap. Add the fi, ff, fl, ffi, ffl series of ligatures to $_RDATA['s_latin'] as they are common in PDFs.	2023-09-29 12:57:23 -04:00
Brian Huisman	dee454cb8c	Update Bootstrap and jQuery Bootstrap => v5.3.2 jQuery => v3.7.1	2023-09-28 11:37:46 -04:00
Brian Huisman	1860d1f8ce	Totally forgot to actually implement this feature The "remove text from titles" feature was coded into the admin UI from the previous version, but was never actually implemented in the crawler. Wow. It works now.	2023-09-27 15:33:06 -04:00
Brian Huisman	6c961d44a3	Add Query Log row display limit	2023-09-25 11:53:20 -04:00
Brian Huisman	f8bed73c26	Responsive pagination bar on Page Index	2023-09-15 10:32:59 -04:00
Brian Huisman	873a18fbc9	Add text fragments flag and functionality	2023-09-14 13:18:33 -04:00
Brian Huisman	da52e0f7bf	Add header image and link Add a nice Orcinus header image with a link to https://greywyvern.com/orcinus/ Eventually this might link to online documentation or something? Move the show-page-titles checkbox from being created by javascript to actually being in the HTML. Unnecessary JS complexity removed. Add a popper tooltip	2023-09-14 10:33:59 -04:00
Brian Huisman	4c78a5245f	Use REPLACE INTO for resiliency	2023-09-12 10:44:14 -04:00
Brian Huisman	d4e0e409fe	Show Page Titles checkbox on Page Index Add a checkbox to enable and disable showing page titles along with the URLs in the Page Index. The status of this checkbox is saved during the admin session. Defaults to 'off'.	2023-09-11 13:32:45 -04:00
Brian Huisman	dd88459d04	Update admin.php PDF Last Modified actually attempts to use "SourceModified" first, then "CreationDate" and lastly "Last Modified". Adjust the tooltip to better describe this.	2023-09-11 12:18:33 -04:00
Brian Huisman	511207e0b2	Add PDF Last Modified multiplier	2023-09-08 15:11:27 -04:00
Brian Huisman	302c8db00e	Group statements Group several statements into single statements. You might not like it, but this makes me happy. :)	2023-07-21 14:17:15 -04:00
Brian Huisman	382511077a	Misc updates Prettify some SQL code. Add some error-reponse code for fatal failed SQL statements.	2023-07-21 13:04:51 -04:00
Brian Huisman	edf2dc338c	admin and pdfparser updates Add some tooltip text to some elements in admin.php Merge recent PdfParser updates into the library	2023-07-11 10:24:55 -04:00
Brian Huisman	229129a9e4	Update crawler.php Get and set sp_crawling in real-time to minimize race conditions.	2023-07-06 15:09:31 -04:00
Brian Huisman	181addfd3d	Update PDFDocEncoding.php Add Yours Truly as the author of this file. I'm humble!	2023-07-04 14:06:28 -04:00
Brian Huisman	a5ff604f58	Update crawler.php Update crawler.php to also try using XMP metadata from updated PDFParser	2023-07-04 13:46:12 -04:00
Brian Huisman	06cc7fe325	PdfParser update This update adds XMP metadata and PDFDocEncoding support for regular metadata.	2023-07-04 13:08:42 -04:00
Brian Huisman	30630c6c60	Start enforcing PHP and SQL version limitations.	2023-06-26 15:00:42 -04:00
Brian Huisman	5a39280858	PdfParser PHP-CS-Fixer updates	2023-06-23 12:23:20 -04:00
Brian Huisman	3307baac4d	Update crawler.php Run mb_convert_encoding in ALL cases to remove potentially invalid UTF-8 characters. Add the "replacement" UTF-8 character to the whitespace array to ensure it's removed.	2023-06-22 15:35:40 -04:00
Brian Huisman	b12e7991e0	Update PdfParser to latest snapshot Update PdfParser to the latest snapshot from the repo. Add code to allow PdfParser to decode XMP Metadata from PDF files, preferring it over other decoded data.	2023-06-22 15:33:53 -04:00
Brian Huisman	47562e0a71	Add 'online' value for Mustache template Provide an 'online' value to the Search Result Mustache template. This will, for example, allow you to put things in your Search Result template that will show up when your site is displayed live (PHP), but will not be output when your site is displayed using the offline Javascript, and vice versa. eg. {{#online}} This will only display in your template if it's online. {{/online}} {{^online}} This will only display in your template it it's offline. {{/online}}	2023-06-22 09:57:33 -04:00
Brian Huisman	042339d3ef	Update crawler.php Don't assume that other data from a PDF is the same as the content. Bypasses some still-unfixed PDFParser encoding issues. Also exit the crawler script if we are in debug mode and there is a crawl already running.	2023-06-21 17:23:08 -04:00
Brian Huisman	eda57224d9	Remove need for 'jw_depth' value By using the location of the search.js script file, we can determine the root URL of an offline installation as long as the online script has been installed at https://example.com/orcinus/js/search.js	2023-06-21 15:07:57 -04:00
Brian Huisman	b17a68c175	Update template.offline.js Quote jw_depth string.	2023-06-21 12:09:28 -04:00
Brian Huisman	675b25b1e4	Update admin.php Forgot a [0] dangit.	2023-06-19 12:31:47 -04:00
Brian Huisman	930d4fa793	Update admin.php Test user-supplied regular expression matches for validity before saving.	2023-06-19 12:12:58 -04:00
Brian Huisman	0a83546411	Update crawler.php Make sure regexp lines in require and ignore URL fields are actually treated as regexps.	2023-06-19 11:51:57 -04:00
Brian Huisman	e9c0654295	Update search.js Tweak the comment	2023-06-16 14:47:51 -04:00
Brian Huisman	54bbbb6a65	Log clicked search suggestions If the search UI is using typeahead and the user selects a suggested option to go right to a page, then a search is never logged as a search query; it's like the search never happened. Add a fetch request to log the search query just before sending the user on their way to the page.	2023-06-16 14:38:24 -04:00
Brian Huisman	8ed10eac36	Tweaks Fix text1/text2 specification in page index SQL query. Add eszett also to single 's' for replacement.	2023-06-16 13:33:15 -04:00
Brian Huisman	e76fdf730c	s_show_orphans cleanup Make 's_show_orphans' a runtime variable and normalize the SQL queries it's used in. Also change generic '$select' variable to more semantic '$crawldata'.	2023-06-15 10:19:05 -04:00
Brian Huisman	e440babc38	Directly reference jw_depth Don't depend on the id="os_results" element existing in the user template, just use os_odata.jw_depth directly.	2023-06-15 09:48:28 -04:00
Brian Huisman	a489fb1b8e	sp_smart => sp_punct Change sp_smart to sp_punct also in the offline javascript template.	2023-06-14 15:39:30 -04:00

1 2 3

150 commits