UTF8Encoding

UTF8Encoding


A set of functions to be used to convert character arrays to and from byte arrays.


Implements:

Encoding 
ICloneable 
IObject 

Public:

Properties:

NameDescription
 BodyName (get) Gets the encoding name to be used in with the mail agent body tags.  
 CodePage (get) Gets the code page identifier for this encoding.  
 DecoderFallback (get) Gets the DecoderFallback object for the current Encoding object.  
 DecoderFallback (set) Sets the DecoderFallback object for the current Encoding object.  
 EncoderFallback (get) Gets the EncoderFallback object for the current Encoding object.  
 EncoderFallback (set) Sets the EncoderFallback object for the current Encoding object.  
 EncodingName (get) Gets the human-readable description of the current encoding.  
 HeaderName (get) Gets the encoding name to be used in with the mail agent header tags.  
 IsBrowserDisplay (get) Gets if this encoding can be used by browsers to display text.  
 IsBrowserSave (get) Gets if this encoding can be used to save data with this encoding.  
 IsMailNewsDisplay (get) Gets if this encoding can be used to display mail and news by mail and news clients.  
 IsMailNewsSave (get) Gets if this encoding can be used to save data by mail and news clients.  
 IsReadOnly (get) When implemented in a derived class, gets a value indicating whether the current encoding is read-only.  
 IsSingleByte (get) Gets if the current encoding uses single-byte code points.  
 WebName (get) Gets the encoding name registered with the Internet Assigned Numbers Authority.  
 WindowsCodePage (get) Gets the Windows Operating Systems code page for this encoding.  

Methods:

NameDescription
 Clone Creates a clone of the current Encoding instance.  
 Equals Determines whether the specified value is equal to the current UTF8Encoding object.  
 GetByteCount Calculates the number of bytes produced by encoding the characters in the specified String or Integer().

 

 GetBytes Encodes all the characters in the specified character array or string into a sequence of bytes.  
 GetBytesEx Encodes a set of characters into an array of bytes, returning the number of bytes produced.  
 GetCharCount Calculates the number of characters produced by decoding a sequence of bytes from the specified byte array.  
 GetChars Decodes a sequence of bytes from the specified byte array into a set of characters.  
 GetCharsEx Decodes a sequence of bytes from the specified byte array into the specified character array.  
 GetDecoder Obtains a decoder that converts a UTF-8 encoded sequence of bytes into a sequence of Unicode characters.  
 GetEncoder Obtains an encoder that converts a sequence of Unicode characters into a UTF-8 encoded sequence of bytes.  
 GetHashCode Returns the hash code for the current instance.  
 GetMaxByteCount Calculates the maximum number of bytes produced by encoding the specified number of characters.  
 GetMaxCharCount Calculates the maximum number of characters produced by decoding the specified number of bytes.  
 GetPreamble Returns a Unicode byte order mark encoded in UTF-8 format, if the constructor for this instance requests a byte order mark.  
 GetString Decodes a range of bytes from a byte array into a string.  
 ToString Returns a string representation of the current object.  

Remarks

Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. Decoding is the process of transforming a sequence of encoded bytes into a set of Unicode characters.

UTF-8 encoding represents each code point as a sequence of one to four bytes.

The GetByteCount method determines how many bytes result in encoding a set of Unicode characters, and the GetBytes method performs the actual encoding.

Likewise, the GetCharCount method determines how many characters result in decoding a sequence of bytes, and the GetChars and GetString methods perform the actual decoding.

UTF8Encoding corresponds to the Windows code page 65001.

Optionally, the UTF8Encoding object provides a preamble, which is an array of bytes that can be prefixed to the sequence of bytes resulting from the encoding process. If the preamble contains a byte order mark (BOM), it helps the decoder determine the byte order and the transformation format or UTF. The GetPreamble method retrieves an array of bytes that can include the BOM. For more information on byte order and the byte order mark, see The Unicode Standard at the Unicode home page.

Note
To enable error detection and to make the class instance more secure, the application should use the UTF8Encoding constructor that takes a ThrowOnInvalidBytes parameter and set that parameter to true. With error detection, a method that detects an invalid sequence of characters or bytes throws a ArgumentException. Without error detection, no exception is thrown, and the invalid sequence is generally ignored.

Examples

The following example demonstrates how to use a UTF8Encoding to encode a string of Unicode characters and store them in a byte array. Notice that when encodedBytes is decoded back to a string there is no loss of data.

Public Sub Main()
    Dim UTF8            As New UTF8Encoding
    Dim UnicodeString   As String
    Dim EncodedBytes()  As Byte
    Dim DecodedString   As String
    Dim b               As Variant
    
    Set Console.OutputEncoding = Encoding.UTF8
    
    ' A Unicode string with two characters outside an 8-bit code range.
    UnicodeString = t("This unicode string contains two characters with codes outside an 8-bit code range, Pi (\u03a0) and Sigma (\u03a3).")
    Console.WriteLine "Original string:"
    Console.WriteLine UnicodeString
    
    ' Encode the string.
    EncodedBytes = UTF8.GetBytes(UnicodeString)
    Console.WriteLine
    Console.WriteLine "Encoded bytes:"
    
    For Each b In EncodedBytes
        Console.WriteValue "[{0}]", b
    Next
    Console.WriteLine
    
    ' Decode bytes back to string.
    ' Notice Pi and Sigma characters are still present.
    DecodedString = UTF8.GetString(EncodedBytes)
    Console.WriteLine
    Console.WriteLine "Decoded bytes:"
    Console.WriteLine DecodedString
    Console.ReadKey
End Sub

See Also

Project CorLib Overview

Class UTF8Encoding Overview

Encoding

ASCIIEncoding

UTF7Encoding

UTF32Encoding

UnicodeEncoding