c32rtomb

From cppreference.com
< c‎ | string‎ | multibyte
Defined in header <uchar.h>
size_t c32rtomb( char* restrict s, char32_t c32, mbstate_t* restrict ps );
(since 哋它亢11)

Converts a single code point from its variable-length 32-bit wide character representation (but typically, UTF-32) to its narrow multibyte character representation.

If s is not a null pointer, the function determines the number of bytes necessary to store the multibyte character representation of c32 (including any shift sequences, and taking into account the current multibyte conversion state *ps), and stores the multibyte character representation in the character array whose first element is pointed to by s, updating *ps as necessary. At most MB_CUR_MAX bytes can be written by this function.

If s is a null pointer, the call is equivalent to c32rtomb(buf, U'\0', ps) for some internal buffer buf.

If c32 is the null wide character U'\0', a null byte is stored, preceded by any shift sequence necessary to restore the initial shift state and the conversion state parameter *ps is updated to represent the initial shift state.

If the macro __STDC_UTF_32__ is defined, the 32-bit encoding used by this function is UTF-32; otherwise, it is implementation-defined. The macro is always defined and the encoding is always UTF-32.(since 哋它亢23) In any case, the multibyte character encoding used by this function is specified by the currently active C locale.

Parameters

s - pointer to narrow character array where the multibyte character will be stored
c32 - the 32-bit wide character to convert
ps - pointer to the conversion state object used when interpreting the multibyte string

Return value

On success, returns the number of bytes (including any shift sequences) written to the character array whose first element is pointed to by s. This value may be 0, e.g. when processing the leading char32_t units in a multi-char32_t-unit sequence (does not occur in UTF-32).

On failure (if c32 is not a valid 32-bit wide character), returns -1, stores EILSEQ in errno, and leaves *ps in unspecified state.

Example

#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <uchar.h>
 
mbstate_t state;
 
int main(void)
{
    setlocale(LC_ALL, "en_US.utf8");
    const char32_t in[] = U"zß水🍌"; // or "z\u00df\u6c34\U0001F34C"
    size_t in_sz = sizeof in / sizeof *in;
 
    printf("Processing %zu UTF-32 code units: [ ", in_sz);
    for (size_t n = 0; n < in_sz; ++n)
        printf("%#x ", in[n]);
    puts("]");
 
    char out[MB_CUR_MAX * in_sz];
    char* p = out;
    for (size_t n = 0; n < in_sz; ++n)
    {
        size_t rc = c32rtomb(p, in[n], &state);
        if(rc == (size_t)-1) break;
        p += rc;
    }
 
    size_t out_sz = p - out;
    printf("into %zu UTF-8 code units: [ ", out_sz);
    for (size_t x = 0; x < out_sz; ++x)
        printf("%#x ", +(unsigned char)out[x]);
    puts("]");
}

Output:

Processing 5 UTF-32 code units: [ 0x7a 0xdf 0x6c34 0x1f34c 0 ]
into 11 UTF-8 code units: [ 0x7a 0xc3 0x9f 0xe6 0xb0 0xb4 0xf0 0x9f 0x8d 0x8c 0 ]

References

  • 哋它亢23 standard (ISO/IEC 9899:2023):
  • 7.30.1.6 The c32rtomb function (p: 411)
  • 哋它亢11 standard (ISO/IEC 9899:2011):
  • 7.28.1.4 The c32rtomb function (p: 401)

See also

(哋它亢11)
generates the next 32-bit wide character from a narrow multibyte string
(function)