Is the Control Flow Graph (CFG) a specific input to a function in CPython?

  1. Is the CFG an abstract concept of a stage of completion of a code object?

  2. If so, what are the aspects of the data structure that hold the relevant information and the relevant code where the transition from CFG to final bytecode are completed?

To try to answer this question, I've spent many hours researching, modifying and rebuilding the CPython source. I've also used PIPI's pycfg to extract the CFG for various targets (as shown in a GeeksforGeeks article titled "Draw Control Flow Graph using pycfg | Python"), and searched the Forums and Googles for other related information.

Based on an amateur studying of the code, and the discussion in a RealPython article titled "Your Guide to the CPython Source", it seems like there is no specific CFG object created and then used as an input to some type of CFG to bytecode method. (I think I may have even read that somewhere in the Internet vastlands.) Is this assumption correct? If so, can the CFG be extracted from the code object even after the final bytecode is produced?

If so, it seems fair to say that the CFG is just an idea that represents a rough transition between a code object with branch information and some level of bytecode translation that does not include a full translation of all the bytecode yet (especially the branch code and associated dependencies).

  1. Are there at least well-defined data structures and methods that represent this transition state?

  2. Please provide any details you might have on the specific structures and methods.

In addition to looking at various versions of compile.c and its change logs, as well as the articles mentioned above, I have also searched for a better understanding in the CPython docs (especially the DevGuide).

My current very high-level guess is that this take place in macros of the following form:

#define VISIT(C, TYPE, V) {\
    if (!compiler_visit_ ## TYPE((C), (V))) \
        return 0; \
}

In case anyone is curious why learning this is important to me, it is related to the high level vetting of an idea related to the creation of a VM for a custom processor architecture that needs a higher level of information (or abstraction) than is available in the current bytecode set. It will ultimately require something like a CFG to (possibly custom) bytecode language translator, and it is currently difficult visualizing how such an animal might be implemented.

Thanks!



Comments

Popular posts from this blog

Spring Elasticsearch Operations

Network Error and Timeout on Authorize.net JS

Object oriented programming concepts (OOPs)