Python Embedded C Compiler

This project is in the planning stage.  But because the richeness of the Python language and the available tools (modules, libraries) available in Python it appears that a set of tools that can be used for embedded programming can be developed.

The goals of this project is to create a complete C compiler toolchain in Python.  Like most moden compilers this will have a frontend and backend.  This compiler toolchain will be easily retargetable for different embedded processors and microcontrollers.  The most important features of this project will be the Intermediate format (after the C-code is parsed, TBD XML?) and the ISA description (XML?).  These two descriptions will make targetting a new embedded processor simple.  There already exists numerous tools in Python that make this project possible.  In many ways this project is more about the overall design and pulling the different Python modules together and not reimplementing the different parts.  

  1. Number one goal is KISS, want to keep this design and implementation as clean as possible sacrificing performance and optimized output if needed.
  2. Easily Retargetable.  In less than a day retarget for a new ISA.  Basic support for most common C code quickly.  
  3. Educational tool and platform for experimentation.  Encourage the experimentation of new ISA easily create toolchains for these ISA.
  4. Ease of Use.  The toolchain can be used to quickly evaluate different processors.  Then move to a commericial highly optimized compiler if needed.  Gives a free option to start with.  But also a platfrom for commercial developers to develop optimization plug-ins, different target plug-ins etc.
  5. 100% Python.  All modules and code used to be Python (other than plug-ins that may be create to increase performance all plug-ins API clearly defined and usable from Python).  Provide an simple platfrom (1 language) for contribution and Python will take care of the portability.  As far as perfomance goes I think there are enough folks looking to improve Python performance (ucpy) that the performance will come with time.  Python wrapped C++ libs ok (wxPython, etc) but want to keep limited.
Current items to be completed
  1. Compiler designer to review the following and make any suggestions on the 30000ft design.
  2. Decide on Intermediate format (AST, C--, ???).
  3. ISA XML description design.  Complete description that fits into C programming paradigm.  If the ISA XML completely describes the ISA and how it can be mapped to C programming everything can be automated. 
Proposed Python Coding Standard for the Project.

Would like to use generic Python as much as possible.  And regretably (for some) any non-Python modules and/or projects that make sense to incorporate would like to rewrite them in Python versus wrapping the pre-existing modules.  The reason for the rewrite would be to keep the number one goal of a set of tools that encourages participation and experimentation.  The Python language provides this.

Python Modules Used.

Front End Compiler

Python C Preprocessor

At one time there existed a Python preprocessor, PYM "A Macro Preprocessor".  It used the C preprocessor language but made it generic for any language (HTML, etc).  If an existing preprocessor implemented in Python is not available will want to create a standalone C preprocessor that can be used on C files and/or assembly files.

Python C Parser

The goal is to use (E)BNF description of C (C99).  The PyParsing Python Module will be used to parse the code.  PyParsing will read-in the BNF description parse the input code and create the pyparsing code to parse the code.  The following is an example of C language BNF and the produced pyparsing code from the BNF description.

Intermediate Format (AST Described in XML)

The intermediate format is a verbose optimized representation of the original C code.  Everything up to here is the frontend of the compiler.  Parsing the C-code and generating the intermediate representation.  This rerpesenation can then be run through the optimizer which will generate another AST representation.  Also will try and leverage any current intermediate descriptions.  Want to maintian the goal of the complete compiler chain being in Python!

Like many other portions of this project will try and leverage previous work.  The only difference is that this portion of the design will be borrowed but will be rewritten in Python.  Again the reason for writting the complete compiler in Python is make the tool easily portable and a common programming language for clean implementation.

The intermediate format has to be flexible enough to represent all the C properties and map easily to assembly and hardware.  The hardware mapping is a secondary goal but is one worth pursuing.  Verilog and/or MyHDL can be produced from the intermediate format.  Also the intermediate format should also represent objects, mainly OOP provided by ObjC.  

Atul's Mini-C Compiler

This is a Python C compiler written for a compiler course.  A different frontend Python parsing Module was used but it created an abstract syntax tree (AST) intermediate format.

- Abstract Syntax Tree, compiler design using AST
1. Comvert the program into an AST
2. Perfrom type-checking and semantic analysis on the tree
3. Rearrange the tree to perform optimizations
4. Convert the tree into the target code.

- Abstract Semantic Graph
- Tree Compiler Compiler - A discussion here how aspect orient programming approach using treecc can be used to build a compiler.


C-- is another format, has a virtual machine etc.  This maybe a good source for design ideas but not sure if it is 100% applicable.  They want compiler frontends to produce c-- code??
This area is definetly not my expertise.  I will gladly take any suggestions on any of the frontend or backend work.

Python Optimizer, Pre and Post Intermediate format

A compiler wouldn't be a compiler with out optimizing the code.  But as already stated this is a secondary goal.  Want to provide the hooks and environment for code optimization experimentation.

Back End Compiler

This is where most of my work will exist.  Hopefully some interest will be developed for the frontend and a complete compiler tool chain can be developed.

XML Instruction Set Architecture Description

An extensible portable definition / descriptioni of a processor design.  This will lead to the automatic creation of instruction set simulators, assemblers, etc.  

Python XIF to ASM, Optimizer

Generate annotated assembly code.

Python Assembler 

Python Regular Expression Assembler.  The assembler will automatically be generated based on the ISA XML Description. (xisad)

Python Blend and Pyastra

Along with writting C and Assembly code want other tools that assist in writting highly optimized code.  These two tools will assist in writting assembly code.  Also I imagine that these will be completed before the C frontend and then can be used to generate code.

Python Blend

This is a tool based on the Circuit Cellar Java Blend tool.  It provides very minimal C flow control statements blended with assembly statements.  This help develop all the start up and control code easily.

Py2Asm (PyAstra)

This is a branch of the PyAstra project.  It is indended to take a small subset of Python code and create assembly code. 

Python Linker and Python Binary Formatter

Pulls everything together.

Creates hex, srecord, elf, etc files.

Why Python is a Good Language to Develop a Compiler

Why Python is a good language for a toolchain.
Is it?

Why Python is not a good language for a toolchain
Is it not?

Resources and Related Projects

Currently this project will focus on C compiler toolchain (preprocessor, c-compiler, assembler, assembly ext and linker) but if there is enough interest and support will extend to a larger tool chain, simulator, debugger, and profilers.    

Commercial products that automatically (similar/same goals) generate ISS (instruction set simulators), assemblers, linkers and debuggers.
CoWare, LISA

These commercial tools are extremely expensive.  Hopefully this project can generate some cheap competition.