The source zip
file contains the needed classes and file:
-
RTF2HTMLConverter.h/cpp, which is the main
converter
class CRTF_HTMLConverter
-
RTF2HTMLTree.h/cpp, which contains Alexander
Kovachev's template tree class
- Util.h/cpp,
some very simple helper routines
- RTF2HTML.h/cpp,
a console-based converter demo app
The converter
class itself does no reading or writing from/to
files or RichEdit
controls; this has to be done
outside. (For example,
to learn how to stream the complete
RTF content
from/into a RichEdit
controls, just look here in the same
section.) The class is derived only from CObject,
and works with CString >>/<< streaming
functions. When streaming in, the data is
converted.
Note:
Only the RTF->HTML
direction is supported at the moment. There is
also a very small subset of possible
RTF
supported, at this time.
- Bold,
Italic, Underline
- Font Size,
Color, and Face
- Paragraph
alignment
- Special
characters, such as encoded German Umlauts
I hope the
class is easy to extend (for new tags, mostly
::R2H_InterpretTag has to be modified) and any
suggestions or extensions are very welcome; I'll
post them here. But please don't give me "My
RTF file
isn't correctly exported" comments; I mentioned
it is only a demo and only a few tags are
currently supported. I've made my
RTF file
using the WordPad editor shipped with Windows;
MS Word builds a more complex
RTF
structure. For complete
RTF documentation, see MSDN ("RTF
Specification").
An
RTF file
stores text data in a structured way, together
with formatting tags (slightly similar to
HTML). Let's
have the following
example:
--
TEST BIG SMALL AGAIN
BROWN
BLUE
AND
ANOTHER fONT.
This is a left-aligned paragraph
right-aligned one
centered one
--
It is represented in
RTF as the following:
{\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\
fcharset0 Arial;}{\f1\fmodern\fprq1\fcharset0 Courier New;}
{\f2\fswiss\fprq2\fcharset0 Arial;}
{\f3\fnil\fcharset2 Symbol;}}
{\colortbl ;\red128\green0\blue0;\red0\green0\blue255;}
\viewkind4\uc1\pard\f0\fs24 TEST \fs40 BIG \fs24 SMALL AGAIN\b
\cf1 BROWN \cf2 BLUE \cf0\b0 AND \f1 ANOTHER fONT.\par
\par
\par
\f2 This is a left-aligned paragraph\par
\pard\qr right-aligned one\par
\pard\qc centered one\par
}
ConvertRTF2HTML is the main converting
procedure. It performs the following steps:
-
R2H_BuildTree
As you see, RTF
has a nested structure, where each section is
in braces {}. So, our first step is to build a
tree structure :
+RTF1
+COLORTBL
+FONTTBL
+F0
+F1
+F2
+F3
Here, I've just noted the section's first
attribute (section name). Each section then
contains more code; both plain text (RTF1 is
the main section with the main text) and
attributes.
-
R2H_SetMetaData
Sub-Items such as colortbl and fonttbl are
helper tables and in the main text's RTF tags
there are references to it, so these global
attributes have to be scanned and stored.
-
R2H_CreateHTMLElements
Loop thru RTF1 main text and add
HTML
elements. HTML
elements could be either:
- Plain
Text—Is added like it is
- RTF tags
starting with a \. These have to be
converted to the correspondig HTML tags with
R2H_InterpretTag. Sometimes, there
must be look-ups in global tables (e.g.
color or font table), or previously inserted
elements must be scanned or modified.
-
R2H_GetHTMLHeader—Write
HTML header in target HTML
-
R2H_GetHTMLElements—Dump added HTML
elements in target HTML
-
R2H_GetHTMLFooter—Write
HTML
footer in target HTML
Ready!
Downloads
Download source and demo
project - 15 Kb