VB Script – HTML to Plain

It’s far away from my daily business, but sometimes I need to write a few lines of VB script. (Defninitly not for Websites!!!!)

I have an applicaton that gets mails from a POP-server and parses the Text. The parser is to complex an specific to publish, but a simple HTML2Plain-Script to save the actual content of an HTML-Mail to database could be useful for someone else.

It’s far away from my daily business, but sometimes I need to write a few lines of VB script. (Defninitly not for Websites!!!!)

I have an applicaton that gets mails from a POP-server and parses the Text. The parser is to complex an specific to publish, but a simple HTML2Plain-Script to save the actual content of an HTML-Mail to database could be useful for someone else.

It can be enhanced, but it works fine:

Function HTML2Plain(text)
Dim startpos
Dim old_text
text = Right(text,Len(text) - InStr(text,"<body>")+1)
text = Replace(text,"<br/>",vbCrLf)
text = Replace(text,"<br>",vbCrLf)
text = Replace(text,"</p>
<p>",vbCrLf)
text = Replace(text,"</tr>
<p>",vbCrLf)
text = Replace(text,"</td>
<p>",vbTab)
text = Replace(text,"</table>
<p>",vbCrLf&vbCrLf)
text = Replace(text,"</h1>
<p>",vbCrLf&"=============================================="&vbCrLf)
text = Replace(text,"</h2>
<p>",vbCrLf&"----------------------------------------------"&vbCrLf)
text = Replace(text,"</h3>
<p>",vbCrLf&vbCrLf)
text = Replace(text,"</h4>
<p>",vbCrLf)
text = Replace(text,"&auml;","ä") 'german Umlaut characters
text = Replace(text,"&Auml;","Ä")
text = Replace(text,"&ouml;","ö")
text = Replace(text,"&Ouml;","Ö")
text = Replace(text,"&uuml;","ü")
text = Replace(text,"&Uuml;","Ü")
text = Replace(text,"&szlig;","ß")
text = Replace(text,"&nbsp;"," ")

Set objRegExp = New RegExp
objRegExp.IgnoreCase = True
objRegExp.Multiline = True
objRegExp.Pattern = "<[^>]*>" 'remove all other tags

While(old_text <> text)
old_text = text
text = objRegExp.Replace(text, "")
Wend

old_text=""
While(old_text <> text)
old_text = text
text = Replace(text,"  "," ")
Wend

HTML2Plain = replaceCharacterCodes(text)
End Function

The last line calls a function to replace encoded special characters with their real (human readable) character:

Function replaceCharacterCodes(text)
 Dim startpos
 Dim endpos
 Dim code

 startpos = InStr(text,"&#")
 While startpos>0
 startpos = startpos + 2
 endpos = InStr(startpos,text,";")
 code =  Mid(text,startpos,endpos-startpos)
 text = Left(text,startpos-3) & ChrW(code) & Right(text,Len(text)-endpos)
 startpos = InStr(text,"&#")

 Wend
 replaceCharacterCodes = text
End Function

 

It can be enhanced, but it works fine:

Function HTML2Plain(text)
Dim startpos
Dim old_text
text = Right(text,Len(text) - InStr(text,"")+1)
text = Replace(text,"
",vbCrLf)
text = Replace(text,"
",vbCrLf)
text = Replace(text,"",vbCrLf)
text = Replace(text,"

“,vbCrLf) text = Replace(text,”

“,vbTab) text = Replace(text,”

“,vbCrLf&vbCrLf) text = Replace(text,”

“,vbCrLf&”==============================================”&vbCrLf) text = Replace(text,”

“,vbCrLf&”———————————————-“&vbCrLf) text = Replace(text,”

“,vbCrLf&vbCrLf) text = Replace(text,”

“,vbCrLf) text = Replace(text,”ä”,”ä”) ‘german Umlaut characters text = Replace(text,”Ä”,”Ä”) text = Replace(text,”ö”,”ö”) text = Replace(text,”Ö”,”Ö”) text = Replace(text,”ü”,”ü”) text = Replace(text,”Ü”,”Ü”) text = Replace(text,”ß”,”ß”) text = Replace(text,” “,” “) Set objRegExp = New RegExp objRegExp.IgnoreCase = True objRegExp.Multiline = True objRegExp.Pattern = “<[^>]*>” ‘remove all other tags While(old_text <> text) old_text = text text = objRegExp.Replace(text, “”) Wend old_text=”” While(old_text <> text) old_text = text text = Replace(text,” “,” “) Wend HTML2Plain = replaceCharacterCodes(text) End Function

The last line calls a function to replace encoded special characters with their real (human readable) character:

Function replaceCharacterCodes(text)
 Dim startpos
 Dim endpos
 Dim code

 startpos = InStr(text,"&#")
 While startpos>0
 startpos = startpos + 2
 endpos = InStr(startpos,text,";")
 code =  Mid(text,startpos,endpos-startpos)
 text = Left(text,startpos-3) & ChrW(code) & Right(text,Len(text)-endpos)
 startpos = InStr(text,"&#")

 Wend
 replaceCharacterCodes = text
End Function