EncodingStatic: UTF8 (get)

UTF8

Gets an encoding for the UTF-8 format.



 Public Property Get UTF8 ( ) As UTF8Encoding

Return Values

UTF8Encoding -  An encoding for the UTF-8 format.

Remarks

This property returns a UTF8Encoding object that encodes Unicode characters into a sequence of one to four bytes per character, and that decodes a UTF-8-encoded byte array to Unicode characters.

The UTF8Encoding object that is returned by this property may not have the appropriate behavior for your application. It uses replacement fallback to replace each string that it cannot encode and each byte that it cannot decode with a question mark ("?") character. Instead, you can call the NewUTF8Encoding(Boolean, Boolean) constructor to instantiate a UTF8Encoding object whose fallback is either an EncoderFallbackException or a DecoderFallbackException, as the following example illustrates.

Public Sub Main()
    Dim Enc     As UTF8Encoding
    Dim Value   As String
    Dim Value2  As String
    Dim Bytes() As Byte
    Dim Byt     As Variant
    
    Set Enc = NewUTF8Encoding(True, True)
    Value = t("\u00C4 \uD802\u0033 \u00AE")
    
    On Error GoTo Catch
    Bytes = Enc.GetBytes(Value)
    
    For Each Byt In Bytes
        Debug.Print Object.ToString(Byt, "X2");
    Next
    Debug.Print
    
    Value2 = Enc.GetString(Bytes)
    Debug.Print Value2
    Exit Sub
    
Catch:
    Dim Ex As EncoderFallbackException
    
    Catch Ex, Err
    Debug.Print CorString.Format("Unable to encode {0} at index {1}", IIf(Ex.CharUnknownHigh <> 0, _
                    CorString.Format("U+{0:X4} U+{1:X4}", Ex.CharUnknownHigh, Ex.CharUnknownLow), _
                    CorString.Format("U+{0:X4}", Ex.CharUnknown)), Ex.Index)
End Sub

' The example displays the following output:
'        Unable to encode U+D802 at index 2

Read Only.

Examples

The following example determines the number of bytes required to encode a character array, encodes the characters, and displays the resulting bytes.

Public Sub Main()
    Dim Chars() As Integer
    Dim U7      As Encoding
    Dim U8      As Encoding
    Dim U16LE   As Encoding
    Dim U16BE   As Encoding
    Dim U32     As Encoding
    
    ' The characters to encode:
    '    Latin Small Letter Z (U+007A)
    '    Latin Small Letter A (U+0061)
    '    Combining Breve (U+0306)
    '    Latin Small Letter AE With Acute (U+01FD)
    '    Greek Small Letter Beta (U+03B2)
    '    a high-surrogate value (U+D8FF)
    '    a low-surrogate value (U+DCFF)
    Chars = NewChars("z", "a", ChrW$(&H306), ChrW$(&H1FD), ChrW$(&H3B2), ChrW$(&HD8FF), ChrW$(&HDCFF))
    
    Set U7 = Encoding.UTF7
    Set U8 = Encoding.UTF8
    Set U16LE = Encoding.Unicode
    Set U16BE = Encoding.BigEndianUnicode
    Set U32 = Encoding.UTF32
        
    PrintCountsAndBytes Chars, U7
    PrintCountsAndBytes Chars, U8
    PrintCountsAndBytes Chars, U16LE
    PrintCountsAndBytes Chars, U16BE
    PrintCountsAndBytes Chars, U32
End Sub

Private Sub PrintCountsAndBytes(ByRef Chars() As Integer, ByVal Enc As Encoding)
    Dim IBC     As Long
    Dim IMBC    As Long
    Dim Bytes() As Byte
    
    Debug.Print CorString.Format("{0,-30} :", Enc.ToString);
    
    IBC = Enc.GetByteCount(Chars)
    Debug.Print CorString.Format(" {0,-3}", IBC);
        
    IMBC = Enc.GetMaxByteCount(CorArray.Length(Chars))
    Debug.Print CorString.Format(" {0, -3} :", IMBC);
    
    Bytes = Enc.GetBytes(Chars)
    
    PrintHexBytes Bytes
End Sub

Private Sub PrintHexBytes(ByRef Bytes() As Byte)
    Dim i As Long
    
    If CorArray.IsNullOrEmpty(Bytes) Then
        Debug.Print "<none>"
    Else
        For i = 0 To UBound(Bytes)
            Debug.Print CorString.Format("{0:X2} ", Bytes(i));
        Next
        
        Debug.Print
    End If
End Sub

' This code produces the following output.
'
'    CorLib.UTF7Encoding            : 18  23  :7A 61 2B 41 77 59 42 2F 51 4F 79 32 50 2F 63 2F 77 2D
'    CorLib.UTF8Encoding            : 12  24  :7A 61 CC 86 C7 BD CE B2 F1 8F B3 BF
'    CorLib.UnicodeEncoding         : 14  16  :7A 00 61 00 06 03 FD 01 B2 03 FF D8 FF DC
'    CorLib.UnicodeEncoding         : 14  16  :00 7A 00 61 03 06 01 FD 03 B2 D8 FF DC FF
'    CorLib.UTF32Encoding           : 24  32  :7A 00 00 00 61 00 00 00 06 03 00 00 FD 01 00 00 B2 03 00 00 FF FC 04 00

See Also

Project CorLib Overview

Class EncodingStatic Overview