Generate Atom feeds (+ delete Parsedown)

This commit is contained in:
Miraty 2023-05-30 22:36:40 +02:00
parent 6d4778ffba
commit 6670711220
8 changed files with 235 additions and 3022 deletions

View file

@ -1,6 +1,8 @@
# mkht.php
mkht.php is a PHP script for building Gemini, Markdown and HTML/CSS sites from source documents in Gemini, Markdown Extra, HTML, PHP, CSS and Less. mkht.php is a PHP script for building Gemini, Markdown and HTML/CSS sites from source documents in Gemini, Markdown Extra, HTML, PHP, CSS and Less.
# Usage ## Usage
Place your pages tree in `/src/*/*.(gmi|md)`. Place your pages tree in `/src/*/*.(gmi|md)`.
@ -16,11 +18,15 @@ Optional files:
`destination` is optionnal and can be: `destination` is optionnal and can be:
* `onion` if you want links ending with .onion when available * `onion` if you want links ending with .onion when available
# Input ## Input
Pages in `/src`can use Gemini (if using `gmi` extension), Markdown, HTML and PHP. Pages in `/src` can use Gemini (if using `gmi` extension), Markdown, HTML and PHP.
# Output Files starting with a dot or not ending in `.gmi`, `.md` or `.html` are ignored.
Files containing `draft` in their name are ignored for Atom feeds.
## Output
* `/*/*.gmi` (if using `.gmi` extension in /src) * `/*/*.gmi` (if using `.gmi` extension in /src)
* `/*/*.md` * `/*/*.md`
@ -28,24 +34,30 @@ Pages in `/src`can use Gemini (if using `gmi` extension), Markdown, HTML and PHP
* `/*/*.gz` * `/*/*.gz`
Note that format translation is only done in the following order: Note that format translation is only done in the following order:
Gemini > Markdown > HTML, which means that the last of these formats you will use will be the first that will be readable by hypertext browsers. (PHP is executed before.) Gemini > Markdown > HTML, which means that the last of these formats you will use will be the first that will be readable by hypertext browsers. (PHP is always executed first.)
# External dependencies ## Data persistence
IDs are attributed to titles according to their content, therefor modifying a title breaks links to page sections.
### For atom feeds
* Make sure modification timestamps of source files are preserved. For example, `cp --preserve=timestamps` must be used instead of just `cp` when backing up or migrating.
* Renaming/moving a page creates a new page and delete the older.
## External dependencies
* PHP * PHP
* gzip * gzip
* find
* pandoc * pandoc
# Internal libraries used ## Internal libraries used
| Name | Description | Repository | | Name | Description | Repository |
| --------------- | ---------------------------- | ----------------------------------------- | | --------------- | ---------------------------- | ----------------------------------------- |
| less.php | Less compiler in PHP | https://github.com/wikimedia/less.php | | less.php | Less compiler in PHP | https://github.com/wikimedia/less.php |
| parsedown | Markdown compiler in PHP | https://github.com/erusev/parsedown |
| parsedown-extra | Extension for Markdown Extra | https://github.com/erusev/parsedown-extra |
# License ## License
[AGPLv3+](LICENSE) [AGPLv3+](LICENSE)

369
mkht.php
View file

@ -1,40 +1,48 @@
#!/usr/bin/php #!/usr/bin/php
<?php <?php
if (php_sapi_name() !== 'cli') if (php_sapi_name() !== 'cli')
exit('Must be run from CLI'); exit('Must be run from CLI.' . PHP_EOL);
// Initialization
const LF = "\n"; const LF = "\n";
define('ROOT', dirname($_SERVER['SCRIPT_FILENAME'])); define('ROOT', dirname($_SERVER['SCRIPT_FILENAME']));
$use_pandoc = true; if (!extension_loaded('tidy'))
echo 'PHP tidy extension unavailable. Feature disabled.' . PHP_EOL;
if (isset($argv[1])) foreach (['pandoc', 'gzip'] as $command) {
define('SITE', $argv[1]); exec('command -v ' . $command, result_code: $code);
else if ($code !== 0)
define('SITE', getcwd()); exit($command . ' command not available.' . PHP_EOL);
}
if (isset($argv[2])) foreach ($argv as $arg) {
define('DESTINATION', $argv[2]); if ($arg === '-f')
else $opt['force'] = true;
define('DESTINATION', 'dns'); else
$args[] = $arg;
}
$opt['force'] ??= false;
define('SITE', $args[1] ?? getcwd());
define('DESTINATION', $args[2] ?? 'dns');
if (file_exists(SITE . '/config.ini')) if (file_exists(SITE . '/config.ini'))
$config = parse_ini_file(SITE . '/config.ini'); $config = parse_ini_file(SITE . '/config.ini');
if (!isset($config['css'])) $config['title'] ??= '';
$config['css'] = 1; $config['css'] ??= 1;
$config['header'] ??= false;
$config['author'] ??= '';
$config['base-url'] ??= [];
$config['center-index'] ??= false;
$config['default-lang'] ??= NULL;
if (!isset($config['header'])) if (!isset($config['id'])) {
$config['header'] = false; $config['id'] = bin2hex(random_bytes(32));
file_put_contents(SITE . '/config.ini', 'id = "' . $config['id'] . '"' . LF, FILE_APPEND);
if (!isset($config['centerIndex'])) }
$config['centerIndex'] = false;
if (!isset($config['defaultLang']))
$config['defaultLang'] = '';
// Less > CSS // Less > CSS
if ($config['css'] == true) { if ($config['css'] == true) {
@ -69,176 +77,223 @@ function clearnetOrOnion($clearnet_url, $onion_url) {
return (DESTINATION === 'onion') ? $onion_url : $clearnet_url; return (DESTINATION === 'onion') ? $onion_url : $clearnet_url;
} }
exec('find ' . SITE . "/src -name '*.gmi' -o -name '*.md'", $pages); $files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator(SITE . '/src', RecursiveDirectoryIterator::SKIP_DOTS));
foreach ($pages as $page) { foreach($files as $file) {
$info = new SplFileInfo($file->getPathName());
if ($info->getType() !== 'file' OR !in_array($info->getExtension(), ['gmi', 'md', 'html'], true) OR str_starts_with($info->getPathname(), '.'))
continue;
$files_dates[$info->getPathname()] = $info->getMTime();
}
$pathParts = pathinfo(str_replace('/src', '', $page)); asort($files_dates);
// Create parent directory if needed ob_start();
if (!file_exists($pathParts['dirname'])) ?>
mkdir($pathParts['dirname'], 0755, true); <?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><?= $config['title'] ?></title>
<id>urn:publicid:<?= $config['id'] ?></id>
<?php
foreach ($config['base-url'] as $url)
echo ' <link rel="self" type="application/atom+xml" href="' . $url . '/feed.atom"></link>' . LF;
?>
<updated><?= date('c', $files_dates[array_key_last($files_dates)]) ?></updated>
<author>
<name><?= $config['author'] ?></name>
</author>
<?php
$feed = ob_get_clean();
// Execute PHP code foreach ($files_dates as $src_page => $last_mod) {
ob_start(); $dest_page = str_replace('/src/', '/', $src_page);
eval('?>' . file_get_contents($page));
file_put_contents($pathParts['dirname'] . '/' . $pathParts['basename'], ob_get_contents());
ob_end_clean();
// Convert Gemtext to Markdown $page_content = file_get_contents($src_page);
if ($pathParts['extension'] === 'gmi') {
$gmilines = explode(LF, file_get_contents($pathParts['dirname'] . '/' . $pathParts['basename']));
foreach ($gmilines as $key => $line) { preg_match('/^# ?(?<title>.*)$/Dm', $page_content, $matches);
if (substr($line, 0, 2) === '=>') { $title = $matches['title'] ?? NULL;
preg_match('/=> +(.[^ ]+)/', $line, $lnUrl);
preg_match('/=> +.[^ ]+ +(.+)/', $line, $lnTitle);
$urlPathParts = pathinfo(parse_url($lnUrl[1], PHP_URL_PATH)); $path_parts = pathinfo($dest_page);
// .gmi > .md for local links $base_filepath = $path_parts['dirname'] . '/' . $path_parts['filename'];
if (!str_contains($lnUrl[1], ':') AND $urlPathParts['extension'] === 'gmi') // If it's a local link
$lnUrl[1] = $urlPathParts['dirname'] . '/' . $urlPathParts['filename'] . '.md';
if (isset($lnTitle[1])) { if (!file_exists($dest_page) OR (filemtime($src_page) > filemtime($dest_page)) OR $opt['force']) {
$gmilines[$key] = '[' . $lnTitle[1] . '](' . $lnUrl[1] . ')'; echo 'Compiling ' . $src_page . ' ' . date("Y-m-d H:i:s", $last_mod) . LF;
} else {
$gmilines[$key] = '[' . $lnUrl[1] . '](' . $lnUrl[1] . ')'; // Create parent directory if needed
if (!file_exists($path_parts['dirname']))
mkdir($path_parts['dirname'], 0755, true);
// Execute PHP code
ob_start();
eval('?>' . $page_content);
file_put_contents($base_filepath . '.gmi', ob_get_contents());
ob_end_clean();
// Convert Gemtext to Markdown
if ($path_parts['extension'] === 'gmi') {
$gmilines = explode(LF, file_get_contents($base_filepath . '.gmi'));
foreach ($gmilines as $key => $line) {
if (str_starts_with($line, '=>')) {
preg_match('/=> +(.[^ ]+)/', $line, $lnUrl);
preg_match('/=> +.[^ ]+ +(.+)/', $line, $lnTitle);
$urlPathParts = pathinfo(parse_url($lnUrl[1], PHP_URL_PATH));
// .gmi > .md for local links
if (!str_contains($lnUrl[1], ':') AND $urlPathParts['extension'] === 'gmi') // If it's a local link
$lnUrl[1] = $urlPathParts['dirname'] . '/' . $urlPathParts['filename'] . '.md';
$gmilines[$key] = '[' . ($lnTitle[1] ?? $lnUrl[1]) . '](' . $lnUrl[1] . ')';
} }
} }
$code = '';
foreach ($gmilines as $line)
$code .= LF . $line;
file_put_contents($base_filepath . '.md', $code);
} }
$code = '';
foreach ($gmilines as $line) {
$code = $code . LF . $line;
}
file_put_contents($pathParts['dirname'] . '/' . $pathParts['filename'] . '.md', $code);
}
// Compile Markdown to HTML // Compile Markdown to HTML
$markdown = file_get_contents($pathParts['dirname'] . '/' . $pathParts['filename'] . '.md'); $markdown = file_get_contents($base_filepath . '.md');
if (preg_match("/# (.*)\\n/", $markdown, $matches)) // If a main heading is found $process = proc_open('pandoc --fail-if-warnings -f markdown_phpextra-citations-native_divs-native_spans+abbreviations+hard_line_breaks+lists_without_preceding_blankline -t html --wrap none', [
$title = $matches[1]; // Then it will be the HTML page <title>
else
$title = NULL;
if ($use_pandoc) {
$process = proc_open('pandoc --fail-if-warnings -f markdown -t html', [
0 => ['pipe', 'r'], 0 => ['pipe', 'r'],
1 => ['pipe', 'w'], 1 => ['pipe', 'w'],
], $pipes); ], $pipes);
if (is_resource($process) !== true) if (is_resource($process) !== true)
exit('Can\'t spawn pandoc.'); exit('Can\'t spawn pandoc.' . PHP_EOL);
fwrite($pipes[0], $markdown); fwrite($pipes[0], $markdown);
fclose($pipes[0]); fclose($pipes[0]);
$pageContent = fread($pipes[1], 1000); $pageContent = stream_get_contents($pipes[1]);
fclose($pipes[1]); fclose($pipes[1]);
if (proc_close($process) !== 0) if (proc_close($process) !== 0)
exit('pandoc failed.'); exit('pandoc failed.' . PHP_EOL);
} else {
require_once ROOT . '/parsedown/Parsedown.php';
require_once ROOT . '/parsedown-extra/ParsedownExtra.php';
$Parsedown = new ParsedownExtra;
$Parsedown = $Parsedown->setUrlsLinked(false);
$Parsedown = $Parsedown->setMarkupEscaped(false);
$Parsedown = $Parsedown->setBreaksEnabled(true);
$pageContent = $Parsedown->text($markdown);
}
// .md > .html for local links // .md > .html for local links
$pageContent = preg_replace('#<a href="(?!.*:)(.*)\.md">#', '<a href="$1.html">', $pageContent); $pageContent = preg_replace('#<a href="(?!.*:)(.*)\.md">#', '<a href="$1.html">', $pageContent);
// Add header and footer to HTML $relativePathToRoot = '';
$urlPath = str_replace(SITE, '', $pathParts['dirname']); for ($i = substr_count(str_replace(SITE, '', $path_parts['dirname']), '/') ; $i > 0 ; $i--)
$relativePathToRoot = ''; $relativePathToRoot .= '../';
for ($i = substr_count($urlPath, '/') ; $i > 0 ; $i--)
$relativePathToRoot .= '../';
ob_start(); ob_start();
?> ?>
<!DOCTYPE html> <!DOCTYPE html>
<html lang="<?php <html lang="<?php
preg_match('#\.([a-zA-Z-]{2,5})\.#', $pathParts['basename'], $lang); preg_match('#\.([a-zA-Z-]{2,5})\.#', $path_parts['basename'], $file_lang);
if (isset($lang[1])) { if (isset($file_lang[1])) {
echo $lang[1]; $lang = $file_lang[1];
} else { } else {
preg_match('#/([a-z]{2})(/|$)#', $pathParts['dirname'], $lang); preg_match('#/([a-z]{2})(/|$)#', $path_parts['dirname'], $dir_lang);
if (isset($lang[1])) $lang = $dir_lang[1] ?? $config['default-lang'];
echo $lang[1]; }
echo $lang ?? '';
?>">
<head>
<meta charset="utf-8">
<?php
if (isset($title) AND isset($config['title']))
echo '<title>' . $title . ' · ' . $config['title'] . '</title>';
else if (isset($title))
echo '<title>' . $title . '</title>';
else if (isset($config['title']))
echo '<title>' . $config['title'] . '</title>';
?>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="referrer" content="no-referrer">
<meta name="author" content="<?= $config['author'] ?>">
<?php
if ($config['announce-feed'])
echo '<link rel="alternate" type="application/atom+xml" href="' . $relativePathToRoot . 'feed.atom">' . LF;
if ($config['announce-css'])
echo '<link rel="stylesheet" media="screen" href="' . $relativePathToRoot . 'css/' . CSS_FILENAME . '">' . LF;
if (file_exists(SITE . '/head.inc.html'))
echo file_get_contents(SITE . '/head.inc.html');
?>
</head>
<body>
<?php
if ($config['header']) {
?>
<header>
<a href="./<?= $relativePathToRoot ?>">
<?php
if (file_exists(SITE . '/img/logo.webp'))
echo '<img src="img/logo.webp" ' . getimagesize(SITE . '/img/logo.webp')[3] . ' alt="' . $config['title'] . '" />';
else
echo $config['site-title'];
?>
</a>
</header>
<?php
}
if ($config['center-index'] AND $path_parts['filename'] === 'index')
echo '<div class="centered">' . $pageContent . '</div>';
else else
echo $config['defaultLang']; echo '<main>' . $pageContent . '</main>';
} if (file_exists(SITE . '/end.inc.html'))
require SITE . '/end.inc.html';
echo '</body></html>';
?>"> $pageContent = ob_get_clean();
<head>
<meta charset="utf-8">
<?php
if (isset($title) AND !is_null($title) AND isset($config['siteTitle']))
echo '<title>' . $title . ' · ' . $config['siteTitle'] . '</title>';
else if (isset($title) AND !is_null($title))
echo '<title>' . $title . '</title>';
else if (isset($config['siteTitle']))
echo '<title>' . $config['siteTitle'] . '</title>';
?>
<meta name="viewport" content="width=device-width, initial-scale=1">
<?php
if ($config['css'] == true)
echo '<link rel="stylesheet" media="screen" href="' . $relativePathToRoot . 'css/' . CSS_FILENAME . '">' . LF;
if (file_exists(SITE . '/head.inc.html')) if (extension_loaded('tidy')) {
echo file_get_contents(SITE . '/head.inc.html'); $pageContent = tidy_repair_string($pageContent, [
?>
</head>
<body>
<?php
if ($config['header']) {
?>
<header>
<a href="./<?= $relativePathToRoot ?>">
<?php
if (file_exists(SITE . '/img/logo.webp'))
echo '<img src="img/logo.webp" ' . getimagesize(SITE . '/img/logo.webp')[3] . ' alt="' . $config['siteTitle'] . '" />';
else
echo $config['siteTitle'];
?>
</a>
</header>
<?php
}
if ($config['centerIndex'] AND $pathParts['filename'] === 'index')
echo '<div class="centered">' . $pageContent . '</div>';
else
echo '<main>' . $pageContent . '</main>';
if (file_exists(SITE . '/end.inc.html'))
require SITE . '/end.inc.html';
echo '</body></html>';
$pageContent = ob_get_clean();
if (extension_loaded('tidy')) {
$tidy = new tidy;
$tidy->parseString($pageContent, [
'indent' => true, 'indent' => true,
'keep-tabs' => true, 'indent-spaces' => 4,
'wrap' => 0 'output-xhtml' => true,
] 'wrap' => 0,
); ]);
$tidy->cleanRepair(); $pageContent = str_replace(' ', ' ', $pageContent);
$pageContent = tidy_get_output($tidy); }
} else {
echo 'tidy extension unavailable' . PHP_EOL; file_put_contents($base_filepath . '.html', $pageContent);
// Gzip compression
exec('gzip --keep --fast --force ' . $base_filepath . '.html');
} }
file_put_contents($pathParts['dirname'] . '/' . $pathParts['filename'] . '.html', $pageContent); $relative_addr = substr_replace($base_filepath . '.html', '', strpos($base_filepath, SITE), strlen(SITE));
// Gzip compression // As of RFC 3151: A URN Namespace for Public Identifiers
exec('gzip --keep --fast --force ' . $pathParts['dirname'] . '/' . $pathParts['filename'] . '.html'); $public_id = 'urn:publicid:' . $config['id'] . str_replace('/', '%2F', $relative_addr);
}
ob_start(); preg_match('#\<body\>(?<content>.*)\</body\>#s', file_get_contents($base_filepath . '.html'), $match);
$atom_entry_content = $match['content'];
// Make relative links absolute
$atom_entry_content = preg_replace_callback('# href=\"(?<relative_url>[^:"]+)\"#', function ($matches) {
global $config;
global $path_parts;
return ' href="' . $config['base-url'][0] . substr($path_parts['dirname'], strlen(SITE)) . '/' . $matches['relative_url'] . '"';
}, $atom_entry_content);
if (!in_array('draft', explode('.', $path_parts['basename']), true)) {
ob_start();
?>
<entry>
<title><?= $title ?></title>
<id><?= $public_id ?></id>
<updated><?= date('c', $last_mod) ?></updated>
<?php
foreach ($config['base-url'] as $base_url)
echo ' <link rel="alternate" type="text/html" href="' . $base_url . $relative_addr . '"></link>' . LF;
?>
<content type="html"><?= htmlspecialchars($atom_entry_content) ?></content>
</entry>
<?php
$feed .= ob_get_clean();
}
} }
file_put_contents(SITE . '/feed.atom', $feed . '</feed>' . LF);
if ($config['css'] == true) if ($config['css'] == true)
exec('gzip --keep --fast --force ' . SITE . '/css/' . CSS_FILENAME); exec('gzip --keep --fast --force ' . SITE . '/css/' . CSS_FILENAME);

View file

@ -1,20 +0,0 @@
The MIT License (MIT)
Copyright (c) 2013 Emanuil Rusev, erusev.com
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View file

@ -1,686 +0,0 @@
<?php
#
#
# Parsedown Extra
# https://github.com/erusev/parsedown-extra
#
# (c) Emanuil Rusev
# http://erusev.com
#
# For the full license information, view the LICENSE file that was distributed
# with this source code.
#
#
class ParsedownExtra extends Parsedown
{
# ~
const version = '0.8.0';
# ~
function __construct()
{
if (version_compare(parent::version, '1.7.1') < 0)
{
throw new Exception('ParsedownExtra requires a later version of Parsedown');
}
$this->BlockTypes[':'] []= 'DefinitionList';
$this->BlockTypes['*'] []= 'Abbreviation';
# identify footnote definitions before reference definitions
array_unshift($this->BlockTypes['['], 'Footnote');
# identify footnote markers before before links
array_unshift($this->InlineTypes['['], 'FootnoteMarker');
}
#
# ~
function text($text)
{
$Elements = $this->textElements($text);
# convert to markup
$markup = $this->elements($Elements);
# trim line breaks
$markup = trim($markup, "\n");
# merge consecutive dl elements
$markup = preg_replace('/<\/dl>\s+<dl>\s+/', '', $markup);
# add footnotes
if (isset($this->DefinitionData['Footnote']))
{
$Element = $this->buildFootnoteElement();
$markup .= "\n" . $this->element($Element);
}
return $markup;
}
#
# Blocks
#
#
# Abbreviation
protected function blockAbbreviation($Line)
{
if (preg_match('/^\*\[(.+?)\]:[ ]*(.+?)[ ]*$/', $Line['text'], $matches))
{
$this->DefinitionData['Abbreviation'][$matches[1]] = $matches[2];
$Block = array(
'hidden' => true,
);
return $Block;
}
}
#
# Footnote
protected function blockFootnote($Line)
{
if (preg_match('/^\[\^(.+?)\]:[ ]?(.*)$/', $Line['text'], $matches))
{
$Block = array(
'label' => $matches[1],
'text' => $matches[2],
'hidden' => true,
);
return $Block;
}
}
protected function blockFootnoteContinue($Line, $Block)
{
if ($Line['text'][0] === '[' and preg_match('/^\[\^(.+?)\]:/', $Line['text']))
{
return;
}
if (isset($Block['interrupted']))
{
if ($Line['indent'] >= 4)
{
$Block['text'] .= "\n\n" . $Line['text'];
return $Block;
}
}
else
{
$Block['text'] .= "\n" . $Line['text'];
return $Block;
}
}
protected function blockFootnoteComplete($Block)
{
$this->DefinitionData['Footnote'][$Block['label']] = array(
'text' => $Block['text'],
'count' => null,
'number' => null,
);
return $Block;
}
#
# Definition List
protected function blockDefinitionList($Line, $Block)
{
if ( ! isset($Block) or $Block['type'] !== 'Paragraph')
{
return;
}
$Element = array(
'name' => 'dl',
'elements' => array(),
);
$terms = explode("\n", $Block['element']['handler']['argument']);
foreach ($terms as $term)
{
$Element['elements'] []= array(
'name' => 'dt',
'handler' => array(
'function' => 'lineElements',
'argument' => $term,
'destination' => 'elements'
),
);
}
$Block['element'] = $Element;
$Block = $this->addDdElement($Line, $Block);
return $Block;
}
protected function blockDefinitionListContinue($Line, array $Block)
{
if ($Line['text'][0] === ':')
{
$Block = $this->addDdElement($Line, $Block);
return $Block;
}
else
{
if (isset($Block['interrupted']) and $Line['indent'] === 0)
{
return;
}
if (isset($Block['interrupted']))
{
$Block['dd']['handler']['function'] = 'textElements';
$Block['dd']['handler']['argument'] .= "\n\n";
$Block['dd']['handler']['destination'] = 'elements';
unset($Block['interrupted']);
}
$text = substr($Line['body'], min($Line['indent'], 4));
$Block['dd']['handler']['argument'] .= "\n" . $text;
return $Block;
}
}
#
# Header
protected function blockHeader($Line)
{
$Block = parent::blockHeader($Line);
if ($Block !== null && preg_match('/[ #]*{('.$this->regexAttribute.'+)}[ ]*$/', $Block['element']['handler']['argument'], $matches, PREG_OFFSET_CAPTURE))
{
$attributeString = $matches[1][0];
$Block['element']['attributes'] = $this->parseAttributeData($attributeString);
$Block['element']['handler']['argument'] = substr($Block['element']['handler']['argument'], 0, $matches[0][1]);
}
return $Block;
}
#
# Markup
protected function blockMarkup($Line)
{
if ($this->markupEscaped or $this->safeMode)
{
return;
}
if (preg_match('/^<(\w[\w-]*)(?:[ ]*'.$this->regexHtmlAttribute.')*[ ]*(\/)?>/', $Line['text'], $matches))
{
$element = strtolower($matches[1]);
if (in_array($element, $this->textLevelElements))
{
return;
}
$Block = array(
'name' => $matches[1],
'depth' => 0,
'element' => array(
'rawHtml' => $Line['text'],
'autobreak' => true,
),
);
$length = strlen($matches[0]);
$remainder = substr($Line['text'], $length);
if (trim($remainder) === '')
{
if (isset($matches[2]) or in_array($matches[1], $this->voidElements))
{
$Block['closed'] = true;
$Block['void'] = true;
}
}
else
{
if (isset($matches[2]) or in_array($matches[1], $this->voidElements))
{
return;
}
if (preg_match('/<\/'.$matches[1].'>[ ]*$/i', $remainder))
{
$Block['closed'] = true;
}
}
return $Block;
}
}
protected function blockMarkupContinue($Line, array $Block)
{
if (isset($Block['closed']))
{
return;
}
if (preg_match('/^<'.$Block['name'].'(?:[ ]*'.$this->regexHtmlAttribute.')*[ ]*>/i', $Line['text'])) # open
{
$Block['depth'] ++;
}
if (preg_match('/(.*?)<\/'.$Block['name'].'>[ ]*$/i', $Line['text'], $matches)) # close
{
if ($Block['depth'] > 0)
{
$Block['depth'] --;
}
else
{
$Block['closed'] = true;
}
}
if (isset($Block['interrupted']))
{
$Block['element']['rawHtml'] .= "\n";
unset($Block['interrupted']);
}
$Block['element']['rawHtml'] .= "\n".$Line['body'];
return $Block;
}
protected function blockMarkupComplete($Block)
{
if ( ! isset($Block['void']))
{
$Block['element']['rawHtml'] = $this->processTag($Block['element']['rawHtml']);
}
return $Block;
}
#
# Setext
protected function blockSetextHeader($Line, array $Block = null)
{
$Block = parent::blockSetextHeader($Line, $Block);
if ($Block !== null && preg_match('/[ ]*{('.$this->regexAttribute.'+)}[ ]*$/', $Block['element']['handler']['argument'], $matches, PREG_OFFSET_CAPTURE))
{
$attributeString = $matches[1][0];
$Block['element']['attributes'] = $this->parseAttributeData($attributeString);
$Block['element']['handler']['argument'] = substr($Block['element']['handler']['argument'], 0, $matches[0][1]);
}
return $Block;
}
#
# Inline Elements
#
#
# Footnote Marker
protected function inlineFootnoteMarker($Excerpt)
{
if (preg_match('/^\[\^(.+?)\]/', $Excerpt['text'], $matches))
{
$name = $matches[1];
if ( ! isset($this->DefinitionData['Footnote'][$name]))
{
return;
}
$this->DefinitionData['Footnote'][$name]['count'] ++;
if ( ! isset($this->DefinitionData['Footnote'][$name]['number']))
{
$this->DefinitionData['Footnote'][$name]['number'] = ++ $this->footnoteCount; # » &
}
$Element = array(
'name' => 'sup',
'attributes' => array('id' => 'fnref'.$this->DefinitionData['Footnote'][$name]['count'].':'.$name),
'element' => array(
'name' => 'a',
'attributes' => array('href' => '#fn:'.$name, 'class' => 'footnote-ref'),
'text' => $this->DefinitionData['Footnote'][$name]['number'],
),
);
return array(
'extent' => strlen($matches[0]),
'element' => $Element,
);
}
}
private $footnoteCount = 0;
#
# Link
protected function inlineLink($Excerpt)
{
$Link = parent::inlineLink($Excerpt);
$remainder = $Link !== null ? substr($Excerpt['text'], $Link['extent']) : '';
if (preg_match('/^[ ]*{('.$this->regexAttribute.'+)}/', $remainder, $matches))
{
$Link['element']['attributes'] += $this->parseAttributeData($matches[1]);
$Link['extent'] += strlen($matches[0]);
}
return $Link;
}
#
# ~
#
private $currentAbreviation;
private $currentMeaning;
protected function insertAbreviation(array $Element)
{
if (isset($Element['text']))
{
$Element['elements'] = self::pregReplaceElements(
'/\b'.preg_quote($this->currentAbreviation, '/').'\b/',
array(
array(
'name' => 'abbr',
'attributes' => array(
'title' => $this->currentMeaning,
),
'text' => $this->currentAbreviation,
)
),
$Element['text']
);
unset($Element['text']);
}
return $Element;
}
protected function inlineText($text)
{
$Inline = parent::inlineText($text);
if (isset($this->DefinitionData['Abbreviation']))
{
foreach ($this->DefinitionData['Abbreviation'] as $abbreviation => $meaning)
{
$this->currentAbreviation = $abbreviation;
$this->currentMeaning = $meaning;
$Inline['element'] = $this->elementApplyRecursiveDepthFirst(
array($this, 'insertAbreviation'),
$Inline['element']
);
}
}
return $Inline;
}
#
# Util Methods
#
protected function addDdElement(array $Line, array $Block)
{
$text = substr($Line['text'], 1);
$text = trim($text);
unset($Block['dd']);
$Block['dd'] = array(
'name' => 'dd',
'handler' => array(
'function' => 'lineElements',
'argument' => $text,
'destination' => 'elements'
),
);
if (isset($Block['interrupted']))
{
$Block['dd']['handler']['function'] = 'textElements';
unset($Block['interrupted']);
}
$Block['element']['elements'] []= & $Block['dd'];
return $Block;
}
protected function buildFootnoteElement()
{
$Element = array(
'name' => 'div',
'attributes' => array('class' => 'footnotes'),
'elements' => array(
array('name' => 'hr'),
array(
'name' => 'ol',
'elements' => array(),
),
),
);
uasort($this->DefinitionData['Footnote'], 'self::sortFootnotes');
foreach ($this->DefinitionData['Footnote'] as $definitionId => $DefinitionData)
{
if ( ! isset($DefinitionData['number']))
{
continue;
}
$text = $DefinitionData['text'];
$textElements = parent::textElements($text);
$numbers = range(1, $DefinitionData['count']);
$backLinkElements = array();
foreach ($numbers as $number)
{
$backLinkElements[] = array('text' => ' ');
$backLinkElements[] = array(
'name' => 'a',
'attributes' => array(
'href' => "#fnref$number:$definitionId",
'rev' => 'footnote',
'class' => 'footnote-backref',
),
'rawHtml' => '&#8617;',
'allowRawHtmlInSafeMode' => true,
'autobreak' => false,
);
}
unset($backLinkElements[0]);
$n = count($textElements) -1;
if ($textElements[$n]['name'] === 'p')
{
$backLinkElements = array_merge(
array(
array(
'rawHtml' => '&#160;',
'allowRawHtmlInSafeMode' => true,
),
),
$backLinkElements
);
unset($textElements[$n]['name']);
$textElements[$n] = array(
'name' => 'p',
'elements' => array_merge(
array($textElements[$n]),
$backLinkElements
),
);
}
else
{
$textElements[] = array(
'name' => 'p',
'elements' => $backLinkElements
);
}
$Element['elements'][1]['elements'] []= array(
'name' => 'li',
'attributes' => array('id' => 'fn:'.$definitionId),
'elements' => array_merge(
$textElements
),
);
}
return $Element;
}
# ~
protected function parseAttributeData($attributeString)
{
$Data = array();
$attributes = preg_split('/[ ]+/', $attributeString, - 1, PREG_SPLIT_NO_EMPTY);
foreach ($attributes as $attribute)
{
if ($attribute[0] === '#')
{
$Data['id'] = substr($attribute, 1);
}
else # "."
{
$classes []= substr($attribute, 1);
}
}
if (isset($classes))
{
$Data['class'] = implode(' ', $classes);
}
return $Data;
}
# ~
protected function processTag($elementMarkup) # recursive
{
# http://stackoverflow.com/q/1148928/200145
libxml_use_internal_errors(true);
$DOMDocument = new DOMDocument;
# http://stackoverflow.com/q/11309194/200145
$elementMarkup = mb_convert_encoding($elementMarkup, 'HTML-ENTITIES', 'UTF-8');
# http://stackoverflow.com/q/4879946/200145
$DOMDocument->loadHTML($elementMarkup);
$DOMDocument->removeChild($DOMDocument->doctype);
$DOMDocument->replaceChild($DOMDocument->firstChild->firstChild->firstChild, $DOMDocument->firstChild);
$elementText = '';
if ($DOMDocument->documentElement->getAttribute('markdown') === '1')
{
foreach ($DOMDocument->documentElement->childNodes as $Node)
{
$elementText .= $DOMDocument->saveHTML($Node);
}
$DOMDocument->documentElement->removeAttribute('markdown');
$elementText = "\n".$this->text($elementText)."\n";
}
else
{
foreach ($DOMDocument->documentElement->childNodes as $Node)
{
$nodeMarkup = $DOMDocument->saveHTML($Node);
if ($Node instanceof DOMElement and ! in_array($Node->nodeName, $this->textLevelElements))
{
$elementText .= $this->processTag($nodeMarkup);
}
else
{
$elementText .= $nodeMarkup;
}
}
}
# because we don't want for markup to get encoded
$DOMDocument->documentElement->nodeValue = 'placeholder\x1A';
$markup = $DOMDocument->saveHTML($DOMDocument->documentElement);
$markup = str_replace('placeholder\x1A', $elementText, $markup);
return $markup;
}
# ~
protected function sortFootnotes($A, $B) # callback
{
return $A['number'] - $B['number'];
}
#
# Fields
#
protected $regexAttribute = '(?:[#.][-\w]+[ ]*)';
}

View file

@ -1,31 +0,0 @@
> You might also like [Caret](http://caret.io?ref=parsedown) - our Markdown editor for the Desktop.
## Parsedown Extra
[![Build Status](https://img.shields.io/travis/erusev/parsedown-extra/master.svg?style=flat-square)](https://travis-ci.org/erusev/parsedown-extra)
An extension of [Parsedown](http://parsedown.org) that adds support for [Markdown Extra](https://michelf.ca/projects/php-markdown/extra/).
[See Demo](http://parsedown.org/extra/)
### Installation
Include both `Parsedown.php` and `ParsedownExtra.php` or install [the composer package](https://packagist.org/packages/erusev/parsedown-extra).
### Example
``` php
$Extra = new ParsedownExtra();
echo $Extra->text('# Header {.sth}'); # prints: <h1 class="sth">Header</h1>
```
### Questions
**Who uses Parsedown Extra?**
[October CMS](http://octobercms.com/), [Bolt CMS](http://bolt.cm/), [Kirby CMS](http://getkirby.com/), [Grav CMS](http://getgrav.org/), [Statamic CMS](http://www.statamic.com/) and [more](https://www.versioneye.com/php/erusev:parsedown-extra/references).
**How can I help?**
Use it, star it, share it and in case you feel generous, [donate some money](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=528P3NZQMP8N2).

View file

@ -1,20 +0,0 @@
The MIT License (MIT)
Copyright (c) 2013-2018 Emanuil Rusev, erusev.com
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

File diff suppressed because it is too large Load diff

View file

@ -1,103 +0,0 @@
<!-- ![Parsedown](https://i.imgur.com/yE8afYV.png) -->
<p align="center"><img alt="Parsedown" src="https://i.imgur.com/fKVY6Kz.png" width="240" /></p>
<h1>Parsedown</h1>
[![Build Status](https://travis-ci.org/erusev/parsedown.svg)](https://travis-ci.org/erusev/parsedown)
[![Total Downloads](https://poser.pugx.org/erusev/parsedown/d/total.svg)](https://packagist.org/packages/erusev/parsedown)
[![Version](https://poser.pugx.org/erusev/parsedown/v/stable.svg)](https://packagist.org/packages/erusev/parsedown)
[![License](https://poser.pugx.org/erusev/parsedown/license.svg)](https://packagist.org/packages/erusev/parsedown)
Better Markdown Parser in PHP - <a href="http://parsedown.org/demo">Demo</a>.
## Features
* One File
* No Dependencies
* [Super Fast](http://parsedown.org/speed)
* Extensible
* [GitHub flavored](https://github.github.com/gfm)
* [Tested](http://parsedown.org/tests/) in 5.3 to 7.3
* [Markdown Extra extension](https://github.com/erusev/parsedown-extra)
## Installation
Install the [composer package]:
composer require erusev/parsedown
Or download the [latest release] and include `Parsedown.php`
[composer package]: https://packagist.org/packages/erusev/parsedown "The Parsedown package on packagist.org"
[latest release]: https://github.com/erusev/parsedown/releases/latest "The latest release of Parsedown"
## Example
```php
$Parsedown = new Parsedown();
echo $Parsedown->text('Hello _Parsedown_!'); # prints: <p>Hello <em>Parsedown</em>!</p>
```
You can also parse inline markdown only:
```php
echo $Parsedown->line('Hello _Parsedown_!'); # prints: Hello <em>Parsedown</em>!
```
More examples in [the wiki](https://github.com/erusev/parsedown/wiki/) and in [this video tutorial](http://youtu.be/wYZBY8DEikI).
## Security
Parsedown is capable of escaping user-input within the HTML that it generates. Additionally Parsedown will apply sanitisation to additional scripting vectors (such as scripting link destinations) that are introduced by the markdown syntax itself.
To tell Parsedown that it is processing untrusted user-input, use the following:
```php
$Parsedown->setSafeMode(true);
```
If instead, you wish to allow HTML within untrusted user-input, but still want output to be free from XSS it is recommended that you make use of a HTML sanitiser that allows HTML tags to be whitelisted, like [HTML Purifier](http://htmlpurifier.org/).
In both cases you should strongly consider employing defence-in-depth measures, like [deploying a Content-Security-Policy](https://scotthelme.co.uk/content-security-policy-an-introduction/) (a browser security feature) so that your page is likely to be safe even if an attacker finds a vulnerability in one of the first lines of defence above.
#### Security of Parsedown Extensions
Safe mode does not necessarily yield safe results when using extensions to Parsedown. Extensions should be evaluated on their own to determine their specific safety against XSS.
## Escaping HTML
> **WARNING:** This method isn't safe from XSS!
If you wish to escape HTML **in trusted input**, you can use the following:
```php
$Parsedown->setMarkupEscaped(true);
```
Beware that this still allows users to insert unsafe scripting vectors, such as links like `[xss](javascript:alert%281%29)`.
## Questions
**How does Parsedown work?**
It tries to read Markdown like a human. First, it looks at the lines. Its interested in how the lines start. This helps it recognise blocks. It knows, for example, that if a line starts with a `-` then perhaps it belongs to a list. Once it recognises the blocks, it continues to the content. As it reads, it watches out for special characters. This helps it recognise inline elements (or inlines).
We call this approach "line based". We believe that Parsedown is the first Markdown parser to use it. Since the release of Parsedown, other developers have used the same approach to develop other Markdown parsers in PHP and in other languages.
**Is it compliant with CommonMark?**
It passes most of the CommonMark tests. Most of the tests that don't pass deal with cases that are quite uncommon. Still, as CommonMark matures, compliance should improve.
**Who uses it?**
[Laravel Framework](https://laravel.com/), [Bolt CMS](http://bolt.cm/), [Grav CMS](http://getgrav.org/), [Herbie CMS](http://www.getherbie.org/), [Kirby CMS](http://getkirby.com/), [October CMS](http://octobercms.com/), [Pico CMS](http://picocms.org), [Statamic CMS](http://www.statamic.com/), [phpDocumentor](http://www.phpdoc.org/), [RaspberryPi.org](http://www.raspberrypi.org/), [Symfony Demo](https://github.com/symfony/demo) and [more](https://packagist.org/packages/erusev/parsedown/dependents).
**How can I help?**
Use it, star it, share it and if you feel generous, [donate](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=528P3NZQMP8N2).
**What else should I know?**
I also make [Nota](https://nota.md/) — a writing app designed for Markdown files :)