Strip HTML Tags

Contents  Previous  Next

 

Menu: Block > Strip HTML Tags

 

Default Shortcut Key: none

 

Macro function: StripHTMLTags()

 

The Strip HTML Tags command can be used to remove HTML tags from selected text.  HTML tags are markup sequences which appear within the '<' and '>' characters.  Boxer does not require that the tag names found within these brackets be legitimate HTML tags.  It merely removes any text found to be within such delimiters.  In this way, Boxer will be able to process new tags properly as the HTML standard evolves.

 

bm2Caution: If the text being processed contains unbalanced angle bracket characters--specifically an unmated open angle bracket--then all text following the open angle bracket will be treated as an HTML tag, and will be removed.

 

In addition to stripping HTML tags, the following HTML sequences will be converted to their character equivalents:

 

&nbsp;     <space>

&amp;      &     

&quot;     "     

&lt;       <     

&gt;       >     

&ndash;    -

&mdash;    --

&lsquo;    '

&rsquo;    '

&ldquo;    "

&rdquo;    "

&hellip;   ...

 

The conversion of other such sequences is complicated by the fact that accented characters do not map to unique character codes in the ANSI and OEM characters set. These translations are therefore not performed.