Encoding

Encoding


Provides an interface for a provider to encode and decode unicode characters to and from bytes. Also contains information on the ability to use the encoded characters in certain situations without integerity loss.


Public:

Properties:

NameDescription
 BodyName (get) Returns the encoding name to be used in with the mail agent body tags.  
 CodePage (get) Returns the code page identifier for this encoding.  
 DecoderFallback (get) Gets the DecoderFallback object for the current Encoding object.  
 DecoderFallback (set) Sets the DecoderFallback object for the current Encoding object.  
 EncoderFallback (get) Gets the EncoderFallback object for the current Encoding object.  
 EncoderFallback (set) Sets the EncoderFallback for the current Encoding object.  
 EncodingName (get) When implemented in a derived class, gets the human-readable description of the current encoding.  
 HeaderName (get) Returns the encoding name to be used in with the mail agent header tags.  
 IsBrowserDisplay (get) Indicates if this encoding can be used by browsers to display text.  
 IsBrowserSave (get) Indicates if this encoding can be used to save data with this encoding.  
 IsMailNewsDisplay (get) Indicates if this encoding can be used to display mail and news by mail and news clients.  
 IsMailNewsSave (get) Indicates if this encoding can be used to save date by mail and news clients.  
 IsReadOnly (get) When implemented in a derived class, gets a value indicating whether the current encoding is read-only.  
 IsSingleByte (get) Returns if the current encoding uses single-byte code points.  
 WebName (get) Returns the encoding name registered with the Internet Assigned Numbers Authority.  
 WindowsCodePage (get) Returns the Windows Operating Systems code page for this encoding.  

Methods:

NameDescription
 Clone Creates a shallow copy of the current Encoding object.  
 Equals Returns a boolean indicating if the value and this object instance are the same instance.  
 GetByteCount Returns the number of bytes that would be produced from the set of characters using this encoding.  
 GetBytes Encodes a set of characters into an array of bytes.  
 GetBytesEx Encodes a set of characters into an array of bytes, returning the number of bytes produced.  
 GetCharCount When implemented in a derived class, calculates the number of characters produced by decoding a sequence of bytes from the specified byte array.  
 GetChars When implemented in a derived class, decodes all the bytes in the specified byte array into a set of characters.  
 GetCharsEx When implemented in a derived class, decodes a sequence of bytes from the specified byte array into the specified character array.  
 GetDecoder When implemented in a derived class, obtains a decoder that converts an encoded sequence of bytes into a sequence of characters.  
 GetEncoder When implemented in a derived class, obtains an encoder that converts a sequence of Unicode characters into an encoded sequence of bytes.  
 GetHashCode Returns a pseudo-unique number identifying this instance.  
 GetMaxByteCount When implemented in a derived class, calculates the maximum number of bytes produced by encoding the specified number of characters.  
 GetMaxCharCount Returns the maximum number of characters than can be decoded from the number of bytes specified.  
 GetPreamble When implemented in a derived class, returns a sequence of bytes that specifies the encoding used.  
 GetString When implemented in a derived class, decodes all the bytes in the specified byte array into a string.  
 ToString Returns a string representation of this object instance.  

Remarks

Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. In contrast, decoding is the process of transforming a sequence of encoded bytes into a set of Unicode characters.

Note that Encoding is intended to operate on Unicode characters instead of arbitrary binary data, such as byte arrays. If your application must encode arbitrary binary data into text, it should use a protocol such as uuencode, which is implemented by methods such as Convert.ToBase64CharArray.

VBCorLib provides the following implementations of the Encoding class to support current Unicode encodings and other encodings:

The Encoding class is primarily intended to convert between different encodings and Unicode. Often one of the derived Unicode classes is the correct choice for your application.

Your applications use the GetEncoding method to obtain other encodings. They should use the GetEncodings method to get a list of all encodings.

If the data to be converted is available only in sequential blocks (such as data read from a stream) or if the amount of data is so large that it needs to be divided into smaller blocks, your application should use the Decoder or the Encoder provided by the GetDecoder method or the GetEncoder method, respectively, of a derived class.

The UTF-16 and the UTF-32 encoders can use the big endian byte order (most significant byte first) or the little endian byte order (least significant byte first). For example, the Latin Capital Letter A (U+0041) is serialized as follows (in hexadecimal):

The GetPreamble method retrieves an array of bytes that includes the byte order mark (BOM). If this byte array is prefixed to an encoded stream, it helps the decoder to identify the encoding format used.

For more information on byte order and the byte order mark, see The Unicode Standard at the Unicode home page.

Note that the encoding classes allow errors to:

Your applications are recommended to throw exceptions on all data stream errors. An application either uses a "throwonerror" flag when applicable or uses the EncoderExceptionFallback and DecoderExceptionFallback classes. Best fit fallback is often not recommended because it can cause data loss or confusion and is slower than simple character replacements. For ANSI encodings, the best fit behavior is the default.

Examples

The following example converts a string from one encoding to another.

Public Sub Main()
    Dim UnicodeString   As String
    Dim AsciiEncoding   As Encoding
    Dim UnicodeEncoding As Encoding
    Dim AsciiBytes()    As Byte
    Dim UnicodeBytes()  As Byte
    Dim AsciiChars()    As Integer
    Dim AsciiString     As String
    
    Set Console.OutputEncoding = Encoding.UTF8
    UnicodeString = t("This string contains the unicode character Pi (\u03a0)")
    
    ' Create two different encodings.
    Set AsciiEncoding = Encoding.ASCII
    Set UnicodeEncoding = Encoding.Unicode
    
    ' Convert the string into a byte array.
    UnicodeBytes = UnicodeEncoding.GetBytes(UnicodeString)
    
    ' Perform the convertion from one encoding to the other.
    AsciiBytes = Encoding.Convert(UnicodeEncoding, AsciiEncoding, UnicodeBytes)
    
    ' Convert the new Byte() into a Char() and then into a string.
    AsciiChars = AsciiEncoding.GetChars(AsciiBytes)
    AsciiString = NewString(AsciiChars)
    
    ' Display the strings created before and after the conversion.
    Console.WriteLine "Original string: " & UnicodeString
    Console.WriteLine "Ascii converted string: " & AsciiString
    Console.ReadKey
End Sub

' This example code produces the following output.
'
'    Original string: This string contains the unicode character Pi (Π)
'    Ascii converted string: This string contains the unicode character Pi (?)

See Also

Project CorLib Overview

Class Encoding Overview

EncodingStatic

ASCIIEncoding

UTF7Encoding

UTF8Encoding

UTF32Encoding

UnicodeEncoding