Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions man/sanitize-string.1.ronn
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
sanitize-string(1) -- Strip markup and control characters from a string
========================================================================

<!--
# Copyright (C) 2025 ENCRYPTED SUPPORT LLC <adrelanos@whonix.org>
# See the file COPYING for copying conditions.
-->

## SYNOPSIS

`sanitize-string [--help] max_length [string]`

## DESCRIPTION

`sanitize-string` combines the functionality of `strip-markup`(1) and
`stdisplay`(1) to fully sanitize an untrusted string by removing both
HTML markup tags and dangerous terminal control characters (such as ANSI
escape sequences). The result can be safely displayed in a terminal or
used in non-HTML text contexts.

If a string is provided as the second positional argument, it is used
as the input. Otherwise, the string is read from standard input.

The `max_length` argument specifies the maximum number of characters to
output. Set it to `nolimit` to allow arbitrarily long strings. When a
limit is set, the output is truncated to that many characters.

### Sanitization order

Sanitization is performed in three steps:

1. Strip ANSI escape sequences and control characters (via `stdisplay`).
2. Strip HTML markup tags (via `strip-markup`).
3. Strip ANSI escape sequences and control characters again, in case
the markup stripping step decoded HTML entities into escape
characters.

This ordering ensures that neither markup nor escape sequences can be
used to smuggle the other past the sanitizer.

## RETURN VALUES

* `0` Successfully sanitized and printed the result.
* `1` Usage error (missing or invalid arguments).

## EXAMPLES

Sanitize a string with no length limit:

<code>
sanitize-string nolimit '&lt;b&gt;Hello&lt;/b&gt;'
</code>

Output: `Hello`

Sanitize and truncate to 10 characters:

<code>
sanitize-string 10 'This is a long untrusted string.'
</code>

Output: `This is a `

Sanitize from standard input:

<code>
echo '&lt;script&gt;alert(1)&lt;/script&gt;' | sanitize-string nolimit
</code>

Use `--` to separate options from positional arguments:

<code>
sanitize-string -- nolimit '--help'
</code>

## SEE ALSO

strip-markup(1), stdisplay(1)

## AUTHOR

This man page has been written by Patrick Schleizer (adrelanos@whonix.org).
76 changes: 76 additions & 0 deletions man/strip-markup.1.ronn
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
strip-markup(1) -- Strip HTML markup tags from a string
========================================================

<!--
# Copyright (C) 2025 ENCRYPTED SUPPORT LLC <adrelanos@whonix.org>
# See the file COPYING for copying conditions.
-->

## SYNOPSIS

`strip-markup [--help] [string]`

## DESCRIPTION

`strip-markup` strips HTML markup tags from an untrusted string,
returning only the text content. It is intended to ensure that a string
will not be interpreted as HTML markup in isolation.

If a string is provided as an argument, it is used as the input.
Otherwise, the string is read from standard input.

HTML character references (such as `&amp;`, `&lt;`, `&#60;`) are
decoded to their corresponding characters.

### Double-strip protection

`strip-markup` performs two consecutive strip passes over the input. If
the second pass further transforms the text, this indicates that the
first pass revealed new markup that was hidden inside nested tags (for
example, `<<b>b>Bold<</b>/b>`). In this case, the tool treats the
input as malicious and replaces all `<`, `>`, and `&` characters in the
first-pass result with underscores (`_`), so that the neutered text is
visible to the user as a warning.

### Scope

`strip-markup` ensures that its output does not contain HTML tags. It
does **not** escape the output for safe embedding in HTML attributes or
other HTML contexts. If the output will be inserted into HTML, the
caller is responsible for applying appropriate context-specific
escaping.

## RETURN VALUES

* `0` Successfully stripped markup and printed the result.
* `1` Usage error.

## EXAMPLES

Strip tags from a string argument:

<code>
strip-markup '&lt;p&gt;Hello &lt;b&gt;world&lt;/b&gt;.&lt;/p&gt;'
</code>

Output: `Hello world.`

Strip tags from standard input:

<code>
echo '&lt;p&gt;Hello&lt;/p&gt;' | strip-markup
</code>

Use `--` to pass strings that start with `-`:

<code>
strip-markup -- '--help'
</code>

## SEE ALSO

sanitize-string(1), stdisplay(1)

## AUTHOR

This man page has been written by Patrick Schleizer (adrelanos@whonix.org).