Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 250407
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T21:30:31+00:00 2026-05-11T21:30:31+00:00

I’d like to build a C pre-processor / compiler that allows functions to be

  • 0

I’d like to build a C pre-processor / compiler that allows functions to be collected from local and online sources. ie:

#fetch MP3FileBuilder http://scripts.com/MP3Builder.gz
#fetch IpodDeviceReader http://apple.com/modules/MP3Builder.gz

void mymodule_main() {
  MP3FileBuilder(&some_data);
}

That’s the easy part.

The hard part is I need a reliable way to “sandbox” the imported code from direct or unrestricted access to disk or system resources (including memory allocation and the stack). I want a way to safely run small snippets of untrusted C code (modules) without the overhead of putting them in separate process, VM or interpreter (a separate thread would be acceptable though).

REQUIREMENTS

  • I’d need to put quotas on its access to data and resources including CPU time.
  • I will block direct access to the standard libraries
  • I want to stop malicious code that creates endless recursion
  • I want to limit static and dynamic allocation to specific limits
  • I want to catch all exceptions the module may raise (like divide by 0).
  • Modules may only interact with other modules via core interfaces
  • Modules may only interact with the system (I/O etc..) via core interfaces
  • Modules must allow bit ops, maths, arrays, enums, loops and branching.
  • Modules cannot use ASM
  • I want to limit pointer and array access to memory reserved for the module (via a custom safe_malloc())
  • Must support ANSI C or a subset (see below)
  • The system must be lightweight and cross-platform (including embedded systems).
  • The system must be GPL or LGPL compatible.

I’m happy to settle for a subset of C. I don’t need things like templates or classes. I’m primarily interested in the things high-level languages don’t do well like fast maths, bit operations, and the searching and processing of binary data.

It is not the intention that existing C code can be reused without modification to create a module. The intention is that modules would be required to conform to a set of rules and limitations designed to limit the module to basic logic and transformation operations (like a video transcode or compression operations for example).

The theoretical input to such a compiler/pre-processor would be a single ANSI C file (or safe subset) with a module_main function, NO includes or pre-processor directives, no ASM, It would allow loops, branching, function calls, pointer maths (restricted to a range allocated to the module), bit-shifting, bitfields, casts, enums, arrays, ints, floats, strings and maths. Anything else is optional.

EXAMPLE IMPLEMENTATION

Here’s a pseudo-code snippet to explain this better. Here a module exceeds it’s memory allocation quota and also creates infinite recursion.

buffer* transcodeToAVI_main( &in_buffer ) {
    int buffer[1000000000]; // allocation exceeding quota
    while(true) {} // infinite loop
    return buffer;
}

Here’s a transformed version where our preprocessor has added watchpoints to check for memory usage and recursion and wrapped the whole thing in an exception handler.

buffer* transcodeToAVI_main( &in_buffer ) {
    try {
        core_funcStart(__FILE__,__FUNC__); // tell core we're executing this function
        buffer = core_newArray(1000000000, __FILE__, __FUNC__); // memory allocation from quota
        while(true) {
           core_checkLoop(__FILE__, __FUNC__, __LINE__) && break; // break loop on recursion limit
        } 
        core_moduleEnd(__FILE__,__FUNC__);
    } catch {
        core_exceptionHandler(__FILE__, __FUNC__);
    }
    return buffer;
}

I realise performing these checks impact the module performance but I suspect it will still outperform high-level or VM languages for the tasks it is intended to solve. I’m not trying to stop modules doing dangerous things outright, I’m just trying to force those dangerous things to happen in a controlled way (like via user feedback). ie: “Module X has exceeded it’s memory allocation, continue or abort?”.

UPDATE

The best I’ve got so far is to use a custom compiler (Like a hacked TCC) with bounds checking and some custom function and looping code to catch recursions. I’d still like to hear thoughts on what else I need to check for or what solutions are out there. I imagine that removing ASM and checking pointers before use solves a lot of the concerns expressed in previous answers below. I added a bounty to pry some more feedback out of the SO community.

For the bounty I’m looking for:

  • Details of potential exploits against the theoretical system defined above
  • Possible optimisations over checking pointers on each access
  • Experimental open-source implementations of the concepts (like Google Native Client)
  • Solutions that support a wide range of OS and devices (no OS/hardware based solutions)
  • Solutions that support the most C operations, or even C++ (if that’s possible)

Extra credit for a method that can work with GCC (ie, a pre-processor or small GCC patch).

I’ll also give consideration to anyone who can conclusively prove what I’m attempting cannot be done at all. You will need to be pretty convincing though because none of the objections so far have really nailed the technical aspects of why they think it’s impossible. In the defence of those who said no this question was originally posed as a way to safely run C++. I have now scaled back the requirement to a limited subset of C.

My understanding of C could be classed as “intermediate”, my understanding of PC hardware is maybe a step below “advanced”. Try to coach your answers for that level if you can. Since I’m no C expert I’ll be going largely based on votes given to an answer as well as how closely the answer comes to my requirements. You can assist by providing sufficient evidence for your claims (respondents) and by voting (everyone else). I’ll assign an answer once the bounty countdown reaches 6 hours.

Finally, I believe solving this problem would be a major step towards maintaining C’s relevance in an increasingly networked and paranoid world. As other languages close the gap performance-wise and computing power grows it will be harder and harder to justify the added risk of C development (as it is now with ASM). I believe your answers will have a much greater relevance than scoring a few SO points so please contribute what you can, even if the bounty has expired.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T21:30:31+00:00Added an answer on May 11, 2026 at 9:30 pm

    Since the C standard is much too broad to be allowed, you would need to go the other way around: specify the minimum subset of C which you need, and try to implement that. Even ANSI C is already too complicated and allows unwanted behaviour.

    The aspect of C which is most problematic are the pointers: the C language requires pointer arithmitic, and those are not checked. For example:

    char a[100];
    printf("%p %p\n", a[10], 10[a]);
    

    will both print the same address. Since a[10] == 10[a] == *(10 + a) == *(a + 10).

    All these pointer accesses cannot be checked at compile time. That’s the same complexity as asking the compiler for ‘all bugs in a program’ which would require solving the halting problem.

    Since you want this function to be able to run in the same process (potentially in a different thread) you share memory between your application and the ‘safe’ module since that’s the whole point of having a thread: share data for faster access. However, this also means that both threads can read and write the same memory.

    And since you cannot prove compile time where pointers end up, you have to do that at runtime. Which means that code like ‘a[10]’ has to be translated to something like ‘get_byte(a + 10)’ at which point I wouldn’t call it C anymore.

    Google Native Client

    So if that’s true, how does google do it then? Well, in contrast to the requirements here (cross-platform (including embedded systems)), Google concentrates on x86, which has in additional to paging with page protections also segment registers. Which allows it to create a sandbox where another thread does not share the same memory in the same way: the sandbox is by segmentation limited to changing only its own memory range. Furthermore:

    • a list of safe x86 assembly constructs is assembled
    • gcc is changed to emit those safe constructs
    • this list is constructed in a way that is verifiable.
    • after loading a module, this verification is done

    So this is platform specific and is not a ‘simple’ solution, although a working one. Read more at their research paper.

    Conclusion

    So whatever route you go, you need to start out with something new which is verifiable and
    only then you can start by adapting an existing a compiler or generating a new one. However, trying to mimic ANSI C requires one to think about the pointer problem. Google modelled their sandbox not on ANSI C but on a subset of x86, which allowed them to use existing compilers to a great extend with the disadvantage of being tied to x86.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 161k
  • Answers 161k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer No, there is no documented way to discover which of… May 12, 2026 at 11:44 am
  • Editorial Team
    Editorial Team added an answer X86 will work without alignment, but performance is better when… May 12, 2026 at 11:44 am
  • Editorial Team
    Editorial Team added an answer Shouldn't something like this do it? 'Mc Donald's'.replace(/&#(\d+);/g, function(m, g)… May 12, 2026 at 11:44 am

Related Questions

I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I am currently running into a problem where an element is coming back from
Seemingly simple, but I cannot find anything relevant on the web. What is the
Does anyone know how can I replace this 2 symbol below from the string
Configuring TinyMCE to allow for tags, based on a customer requirement. My config is

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.