Discussion about this post

User's avatar
snarlferb's avatar

In this language, are you askin' how to maintain separate namespaces in symbol table, such that you can have foo(bar) and int(bar) be distinct, or do you want a parser lookahead type thing for it, or do you intend to have the function call syntax also be something different like the different casting is (assuming there's traditional things like function calls in this language) p.s. whats the dwelling name of the language in case we refer to it? im workin' on a custom shell called 'blush'

Expand full comment
Magne's avatar

What if you could create a compiler that would transpile C into a strict unambiguous subset of C, which could then both be read more easily/clearly by humans, and also targeted more easily by tooling etc.?

This subset should be able to be compiled by pre-existing C compilers, of course.

I was going to suggest the name C-- or Sub-C, but both have already been taken.

The following is some excerpts from some perplexity.ai assisted research I did just now:

TLDR: Maybe SubC could work for this purpose: https://github.com/jezze/subc

C-- doesn work for this purpose. Because:

C-- is a low level portable assembly-like language intended to be emitted by compilers of higher-level (functional) languages like Haskell, and not for reading by humans. Even though C-- is a syntactical subset of C, it embodies different operational semantics and extra runtime features (guaranteed TCO, accurate GC, efficient exception handling etc.), removes some features (variadic functions, pointer syntax, and some aspects of C's type system), so it requires an intermediate transpilation step (using a trampoline approach or other transformations) to generate standard C code suitable for existing C compilers to handle correctly. So the syntax subset may be too minimalist and non-overlapping semantic-wise (i.e. not a simple subset but a distinct intermediate language).

Full C includes **complex and sometimes underspecified semantics** for features like pointer provenance, arithmetic overflow behavior, aliasing, and evaluation order, which are challenging to preserve in a minimalist subset like C--. Consequently, the semantic gaps mean that the C-- subset cannot directly and cleanly represent all idiomatic or system-level C programs without **substantial transformations and loss of fidelity**.

This underlines why the C-- subset is used primarily for specific compiler infrastructures with adapted semantics rather than as a straightforward restricted form of C aimed at broad compatibility.

Clight is a large subset of C used in formal verification contexts. But the subset may be too large.

SubC seems to be what I was thinking of: https://github.com/jezze/subc

Though it is described as intended for teaching compiler programming, and not for production use.. See the "DIFFERENCES BETWEEN SUBC (THIS VERSION) AND FULL C89" section in the readme on that url.

It is more practical to transpile C to SubC, than converting C to extremely minimalist subsets like C--, because SubC retains more expressive C features.

Expand full comment
13 more comments...

No posts

Ready for more?