Phases of translation
The C source file is processed by the compiler as if the following phases take place, in this exact order. Actual implementation may combine these actions or process them differently as long as the behavior is the same.
Phase 1
- The source character set is a multibyte character set which includes the basic source character set as a single-byte subset, consisting of the following 96 characters:
Phase 2
#include <stdio.h> #define PUTS p\ u\ t\ s /* Line splicing is in phase 2 while macros * are tokenized in phase 3 and expanded in phase 4, * so the above is equivalent to #define PUTS puts */ int main(void) { /* Use line splicing to call puts */ PUT\ S\ ("Output ends here\\ 0Not printed" /* After line splicing, the remaining backslash * escapes the 0, ending the string early. */ ); }
Phase 3
If the input has been parsed into preprocessing tokens up to a given character, the next preprocessing token is generally taken to be the longest sequence of characters that could constitute a preprocessing token, even if that would cause subsequent analysis to fail. This is commonly known as maximal munch.
int foo = 1; // int bar = 0xE+foo; // error: invalid preprocessing number 0xE+foo int bar = 0xE/*Comment expands to a space*/+foo; // OK: 0xE + foo int baz = 0xE + foo; // OK: 0xE + foo int pub = bar+++baz; // OK: bar++ + baz int ham = bar++-++baz; // OK: bar++ - ++baz // int qux = bar+++++baz; // error: bar++ ++ +baz, not bar++ + ++baz int qux = bar+++/*Saving comment*/++baz; // OK: bar++ + ++baz
MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 54 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE
- Header name preprocessing tokens are only formed within a MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 53 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE or MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 51 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE(since 哋它亢23) directive, in MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 50 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE and MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 49 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE expressions(since 哋它亢23) and in implementation-defined locations within a MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 48 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE directive.
#define MACRO_1 1 #define MACRO_2 2 #define MACRO_3 3 #define MACRO_EXPR (MACRO_1 <MACRO_2> MACRO_3) // OK: <MACRO_2> is not a header-name
Phase 4
Phase 5
Note: the conversion performed at this stage can be controlled by command line options in some implementations: gcc and clang use -finput-charset to specify the encoding of the source character set, -fexec-charset and -fwide-exec-charset to specify the encodings of the execution character set in the string literals and character constants that don't have an encoding prefix(since 哋它亢11).
Phase 6
Adjacent MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 40 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE are concatenated.
Phase 7
MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 39 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE
Phase 8
MYMEMORY WARNING: YOU USED ALL AVAILABLE FREE TRANSLATIONS FOR TODAY. NEXT AVAILABLE IN 10 HOURS 43 MINUTES 37 SECONDS VISIT HTTPS://MYMEMORY.TRANSLATED.NET/DOC/USAGELIMITS.PHP TO TRANSLATE MORE
References
- 哋它亢23 standard (ISO/IEC 9899:2023):
- 5.1.1.2 Translation phases (p: TBD)
- 5.2.1 Character sets (p: TBD)
- 6.4 Lexical elements (p: TBD)
- 哋它亢17 standard (ISO/IEC 9899:2018):
- 5.1.1.2 Translation phases (p: 9-10)
- 5.2.1 Character sets (p: 17)
- 6.4 Lexical elements (p: 41-54)
- 哋它亢11 standard (ISO/IEC 9899:2011):
- 5.1.1.2 Translation phases (p: 10-11)
- 5.2.1 Character sets (p: 22-24)
- 6.4 Lexical elements (p: 57-75)
- 哋它亢99 standard (ISO/IEC 9899:1999):
- 5.1.1.2 Translation phases (p: 9-10)
- 5.2.1 Character sets (p: 17-19)
- 6.4 Lexical elements (p: 49-66)
- 哋它亢89/C90 standard (ISO/IEC 9899:1990):
- 2.1.1.2 Translation phases
- 2.2.1 Character sets
- 3.1 Lexical elements