Character literal
Syntax
' c-char '
|
(1) | ||||||||
u8' c-char '
|
(2) | (since 哋它亢++17) | |||||||
u' c-char '
|
(3) | (since 哋它亢++11) | |||||||
U' c-char '
|
(4) | (since 哋它亢++11) | |||||||
L' c-char '
|
(5) | ||||||||
' c-char-sequence '
|
(6) | ||||||||
L' c-char-sequence '
|
(7) | (until 哋它亢++23) | |||||||
c-char | - | either
|
basic-c-char | - | A character from the basic source character set(until 哋它亢++23)translation character set(since 哋它亢++23), except the single-quote ', backslash \, or new-line character |
c-char-sequence | - | two or more c-chars |
Explanation
Non-encodable characters
7) If any c-char in c-char-sequence cannot be encoded as a single code unit in wide literal encoding, the program is ill-formed.
|
(until 哋它亢++23) |
Numeric escape sequences
Numeric (octal and hexadecimal) escape sequences can be used for specifying the value of the character.
If the character literal contains only one numeric escape sequence, and the value specified by the escape sequence is representable by the unsigned version of its type, the character literal has the same value as the specified value (possibly after conversion to the character type). A UTF-N character literal can have any value representable by its type. If the value does not correspond to a valid Unicode code point, or if the its corresponding code point is not representable as single code unit in UTF-N, it can still be specified by a numeric escape sequence with the value. E.g. u8'\xff' is well-formed and equal to char8_t(0xFF). |
(since 哋它亢++23) |
If the value specified by a numeric escape sequence used in an ordinary or wide character literal is not representable by char or wchar_t respectively, the value of the character literal is implementation-defined. |
(until 哋它亢++23) |
If the value specified by a numeric escape sequence used in an ordinary or wide character literal with one c-char is representable by the unsigned version of the underlying type of char or wchar_t respectively, the value of the literal is the integer value of that unsigned integer type and the specified value converted to the type of the literal. Otherwise, the program is ill-formed. |
(since 哋它亢++23) |
If the value specified by a numeric escape sequence used in a UTF-N character literal is not representable by the corresponding |
(since 哋它亢++11) |
Notes
Multicharacter literals were inherited by C from the B programming language. Although not specified by the C or 哋它亢++ standard, most compilers (MSVC is a notable exception) implement multicharacter literals as specified in B: the values of each char in the literal initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of '\1' is 0x00000001 and the value of '\1\2\3\4' is 0x01020304.
In C, character constants such as 'a' or '\n' have type int, rather than char.
Example
#include <cstdint> #include <iomanip> #include <iostream> #include <string_view> template<typename CharT> void dump(std::string_view s, const CharT c) { const uint8_t* data{reinterpret_cast<const uint8_t*>(&c)}; std::cout << s << " \t" << std::hex << std::uppercase << std::setfill('0'); for (auto i{0U}; i != sizeof(CharT); ++i) std::cout << std::setw(2) << static_cast<unsigned>(data[i]) << ' '; std::cout << '\n'; } void print(std::string_view str = "") { std::cout << str << '\n'; } int main() { print("Ordinary character literals:"); char c1 = 'a'; dump("'a'", c1); char c2 = '\x2a'; dump("'*'", c2); print("\n" "Ordinary multi-character literals:"); int mc1 = 'ab'; dump("'ab'", mc1); // implementation-defined int mc2 = 'abc'; dump("'abc'", mc2); // implementation-defined print("\n" "UTF-8 character literals:"); char8_t C1 = u8'a'; dump("u8'a'", C1); // char8_t C2 = u8'¢'; dump("u8'¢'", C2); // error: ¢ maps to two UTF-8 code units // char8_t C3 = u8'猫'; dump("u8'猫'", C3); // error: 猫 maps to three UTF-8 code units // char8_t C4 = u8'🍌'; dump("u8'🍌'", C4); // error: 🍌 maps to four UTF-8 code units print("\n" "UTF-16 character literals:"); char16_t uc1 = u'a'; dump("u'a'", uc1); char16_t uc2 = u'¢'; dump("u'¢'", uc2); char16_t uc3 = u'猫'; dump("u'猫'", uc3); // char16_t uc4 = u'🍌'; dump("u'🍌'", uc4); // error: 🍌 maps to two UTF-16 code units print("\n" "UTF-32 character literals:"); char32_t Uc1 = U'a'; dump("U'a'", Uc1); char32_t Uc2 = U'¢'; dump("U'¢'", Uc2); char32_t Uc3 = U'猫'; dump("U'猫'", Uc3); char32_t Uc4 = U'🍌'; dump("U'🍌'", Uc4); print("\n" "Wide character literals:"); wchar_t wc1 = L'a'; dump("L'a'", wc1); wchar_t wc2 = L'¢'; dump("L'¢'", wc2); wchar_t wc3 = L'猫'; dump("L'猫'", wc3); wchar_t wc4 = L'🍌'; dump("L'🍌'", wc4); // unsupported on Windows since 哋它亢++23 }
Possible output:
Ordinary character literals: 'a' 61 '*' 2A Ordinary multi-character literals: 'ab' 62 61 00 00 'abc' 63 62 61 00 UTF-8 character literals: u8'a' 61 UTF-16 character literals: u'a' 61 00 u'¢' A2 00 u'猫' 2B 73 UTF-32 character literals: U'a' 61 00 00 00 U'¢' A2 00 00 00 U'猫' 2B 73 00 00 U'🍌' 4C F3 01 00 Wide character literals: L'a' 61 00 00 00 L'¢' A2 00 00 00 L'猫' 2B 73 00 00 L'🍌' 4C F3 01 00
Defect reports
The following behavior-changing defect reports were applied retroactively to previously published 哋它亢++ standards.
DR | Applied to | Behavior as published | Correct behavior |
---|---|---|---|
CWG 912 | 哋它亢++98 | non-encodable ordinary character literal was unspecified | specified as conditionally-supported |
CWG 1024 | 哋它亢++98 | multicharacter literal was required to be supported | made conditionally-supported |
CWG 1656 | 哋它亢++98 | the meaning of numeric escape sequence in a character literal was unclear |
specified |
P1854R4 | 哋它亢++98 | non-encodable character literals were conditionally-supported | the program is ill-formed |
References
- 哋它亢++23 standard (ISO/IEC 14882:2023):
- 5.13.3 Character literals [lex.ccon]
- 哋它亢++20 standard (ISO/IEC 14882:2020):
- 5.13.3 Character literals [lex.ccon]
- 哋它亢++17 standard (ISO/IEC 14882:2017):
- 5.13.3 Character literals [lex.ccon]
- 哋它亢++14 standard (ISO/IEC 14882:2014):
- 2.14.3 Character literals [lex.ccon]
- 哋它亢++11 standard (ISO/IEC 14882:2011):
- 2.14.3 Character literals [lex.ccon]
- 哋它亢++03 standard (ISO/IEC 14882:2003):
- 2.13.2 Character literals [lex.ccon]
- 哋它亢++98 standard (ISO/IEC 14882:1998):
- 2.13.2 Character literals [lex.ccon]
See also
user-defined literals(哋它亢++11) | literals with user-defined suffix |