Simple auto-complete vs ML-based auto-complete
This post is response to the below interaction on Twitter.
Context
I have used auto-complete support in IDEs like Eclipse, IntelliJ, and Visual Studio for languages such as Java, Kotlin, Python, and F#. I consider these instances of auto-complete as simple instances based on simple program analysis (i.e., a mix of type/class analysis, lots of intra-procedural analysis, some inter-procedural analysis) and heuristics (i.e., ranking candidates based on contextual type information and past user-local usage data).
Recently, I have had the opportunity to use ML-based auto-completion support that I guess is based on Large Language Models (LLMs) adapted to code. This is akin to auto-complete support offered by GitHub CoPilot.
My issues
When I am thinking about the next token to write, simple auto-complete suggests a list of candidate identifiers tailored to the context, e.g., variable/field names, method names.
While picking the identifiers is easy, selecting becomes harder in case of involved expressions like method call snippets as I may have to reach out for other sources of information (e.g., API docs) to pick the ideal candidate method call template. In such cases, iterating through the candidate list usually suffices as the IDE displays method signatures and associated API documentation.
In both of the above instances, the simple auto-complete support complements my code-writing process by helping me identify the next token to type.
In contrast, when I am thinking about the next token to write, ML-based auto-complete suggests candidate expressions/snippets (complete with identifiers filling any holes in snippets) that may fit the context.
If I were thinking about writing the next expression, such support would be ideal. However, as is, the presented list of expressions interrupts my writing process forcing me to switch from selecting the next token to selecting an entire expression. In short, ML-based auto-complete requires me to switch between different levels of granularity/abstraction, and this disrupts my flow.
Furthermore, selecting an expression often involves reasoning about the appropriateness/correctness of sub-expressions used in candidate expressions, e.g., arguments in method calls. Switching between selecting/writing and reasoning is more disruptive, similar to trying to simultaneously write and edit a manuscript.
From writing code to writing specifications
I can get more mileage out of ML-based auto-complete support if I can retrain myself to think about the next expression to write. This change is doable. Come to think of it, I do it already :) Even so, there is another issue.
After thinking about an expression, I break down the expression into tokens because of the code’s textual and sequential nature. Now, if I was able to express my thoughts about the next expression holistically (i.e., all parts of an expression), then I believe ML-based auto-complete can elevate writing code to writing specifications (albeit micro specifications) :D