One of the most recent advancements in natural language processing (NLP) is the emergence of large language models (LLMs) built using vast datasets with enormous amounts of data. Several LLMs are available, such as Google’s BERT and OpenAI’s GPT-2 and GPT-3. They can be trained on 570 gigabytes of text. It is possible to generate everything from simple essays to actual financial models with these models.
Here are five AI-based code generators based on the large language models that can generate high-quality code:
1. OpenAI Codex
OpenAI made the model available through a private beta to developers and platform companies to build tools and integration.
While Tabnine is not an end-to-end code generator, it puts the auto-completion feature of the integrated development environment (IDE) on steroids. Developed in Rust by Jacob Jackson when he was a student at the University of Waterloo, Tabnine has evolved into a fully-fledged, AI-based code completion tool.
Tabnine supports over 20 languages and 15 editors, including popular IDEs like VS Code, IntelliJ, Android Studio, and even Vim. It is available at the price of $432 per year for a team of 3 developers.
CodeT5 can potentially bring three capabilities to software programming:
- Text-to-code generation: generate code based on the natural language description
- Code autocompletion: complete the whole function of code given the target function name
- Code summarization: generate the summary of a function in natural language description
Polycoder is an open source alternative to OpenAI’s Codex. Developed by the researchers at Carnegie Mellon University, the model is based on OpenAI’s GPT-2, which is trained on a 249 GB codebase written in 12 programming languages. According to PolyCoder’s authors, the program is capable of writing C with greater accuracy than any other model, including Codex.
While most of the code generators are not open source, Polycoder is one of the first open source code generation models.
Cogram, a Y-Combinator, Berlin-based Startup, is a code generation tool aimed at data scientists and Python programmers using SQL queries and Jupyter Notebooks. Data scientists can write queries in the English language that the tool translates into complex SQL queries with joins and grouping. It supports SQLite, PostgreSQL, MySQL, and Amazon Redshift.
Python and Julia developers can integrate Cogram with Jupyter Notebooks to auto-generate code. The tool can generate contextual code for a specific task based on the comments. Data scientists can even generate visualizations based on mainstream Python modules such as Matplotlib, Plotly, or Seaborn.