Many aspects of software development can be tedious, repetitive enough to become boring, but still too complex for simple templating. When GitHub Copilot appeared last year, I suddenly saw an opportunity to rid myself from the “less fun” parts of development. As luck would have it, our MLE Lead David Louis Hollembaek had the same thought, so we decided to evaluate Copilot with some examples common to our work and share the experience with you.
Intro from David — A few years ago, I started experimenting with web scraping for a personal project. Great libraries for gathering web content exist for Python, and bs4 provides convenient tools for working with HTML content. However, it took me several hours of trial-and-error to get through the basics of extracting values from meta headers. Fast-forward to a few months ago when I was granted access to the Copilot preview: I immediately thought my struggle with meta headers would be a great application to start testing this new functionality. My v.1 notebook was a mess, and I decided it would be best to pretend that I was starting from zero. Import BS4, request a page, and wow — Copilot starts suggesting blocks to retrieve the URLs that I had in an existing list. bs4 for reading metadata? Almost as quickly as I typed the name of the value that I was looking for, Copilot had suggested the whole line that I had spent so many hours crafting through trial-and-error previously. This was starting to feel a little crazy. Scraping web content is a common activity though, and bs4 is a very popular library with many published projects using it. “Of course it is good at this”, I thought, but what about something more obscure – unofficial Google Search libraries with very few stars on PyPi perhaps? I added the import, and as I started a new line, Copilot suggested the needed code to search for new pages and start working with the result list.
Video 1: Web scraping for cigar shops nearby
In less than an hour, I had a working copy of my original application with the same functionality that had previously cost me many hours of precious free time. None of the tasks were complex, but I was still impressed and the experience had me wondering how far this could go. Could Copilot take on specialised work like the machine learning tasks we see in our daily work projects? These are high value activities performed by specialists. Could ML engineers and data scientists benefit from Copilot in a similar way? After a discussion in a team meeting, I found that Felix was asking the same questions, so we set out to see if CoPilot could help engineers in our team with machine learning tasks of any kind.
Based on our experience, we will describe three examples that showcase Copilot’s capabilities and shortcomings in the following section.
GitHub Copilot supports machine learning engineers in programming their solutions. With the help of machine learning, retailers, for example, can provide their customers with suitable product recommendations – as does the Swiss online market LeShop.ch.
How does this work and what advantages does it bring LeShop? Find out here.
EDA (Exploratory Data Analysis) and model prototyping
Github CoPilot loves encapsulated statements. Imagine you want to analyse a dataset and rapidly develop a basic model for a use case to get some kind of baseline. EDA and model prototyping consist of a lot of single encapsulated statements like:
Video 2: Doing exploratory data analysis and model development with GitHub Copilot
What I did here was only a quick investigation. Nevertheless, I was impressed by how well Copilot is able to translate comments into code and predict my intentions. The script took me three minutes, while without Copilot, I would have needed that time only to search for heatmap visualisations (shame on me for relying more and more on dashboard setups instead of Python viz libraries).
Another thing which caught my eye was the recommendation for data columns. GPT-3 obviously had a very strong focus on “columns” and the dataset name “pima-indians-diabetes” and retrieved the remembered column names correctly. As long as it´s only column names, but what if Copilot “accidently” recreates training data? The legal aspect regarding copyright is still questionable and must be considered when working with GitHub Copilot.
In theory, meta programming is the generation of code with code based on data. It is useful, among other things, to avoid lots of boilerplate code and provides a high flexibility through its dynamic nature.
In this example (Figure 1), I have a base class “Fruit”, and objects which inherit from the base class and may change in every simulation run.
Figure 1: CLASS_NAMES contains the objects that need to be created while Fruit is the base class (which was dynamic in the real use case).
GitHub Copilot recommends the straightforward class definition for every item in the list CLASS_NAMES. While it´s easy to read (as can be seen in Figure 2), it would have created lots of unnecessary code.
Figure 2: GitHub Copilot’s recommendation for class creation
In oder to avoid this, the classes can be created dynamically with the code snippet displayed in Figure 3. This will generate a class for every string in CLASS_NAMES which inherits from the class Fruit.
Figure 3: Dynamic class generation
Copilot behaves the same way in other cases as well, which is only natural for this type of implementation. Since Copilot always recommends the next line(s), it will not be able to magically separate the logic in another class or function. The developer needs to be aware of how to structure their code for a clean project. If I separate the logic beforehand, Copilot will acknowledge this and use the separated function instead. A special case are decorators which seem to confuse Copilot in some situations. This might be due to the execution of decorators and the decorated function not being bound to the sequence of lines.
Implementing all CRUD endpoints in a REST interface often causes a lot of boilerplate code. Copilot saves a lot of time by analysing the already written endpoints to derive new endpoints simply by a short description.
Video 3: Demo REST API for an image labelling service
Attentive observers have recognised that at 0:52, Copilot suddenly uses the Path class from the pathlib package, while in the first endpoint, it has used a simple string. I imported Path during the second endpoint (since Copilot wrongly assumed that the file path is of the type Path) which Copilot quickly picked up.
Finally, there is another important aspect which shows that Copilot is not bound to best practices, but only as proficient as the average GitHub user. Instead of different names for the endpoints, one would rather have used “/image” for every endpoint, but differentiated with the HTTP method.
GitHub Copilot has been in closed beta for several months now and will be continuously improved. It is by no means perfect and produces not-working code from time to time, but if you are aware of this and check the generated code, you should be able to increase your development efficiency.
While the legal aspect is still questionable, since Copilot was also partially trained on code that was not meant to be used by third parties, we are confident that this will also be clarified in the future. 
More recently, DeepMind has also entered the competition with their newest model “AlphaCode”. Hence, the code completion products will very likely improve a lot in the near future.
In the process of writing of this article, we had the chance to work with some other applications which we thought might be noteworthy for the MLE crowd:
Arduino IoT Sketches
As previously mentioned, Copilot suggested simple structures quickly and accurately, but we learned to trust it a little too much, and several larger suggestions which looked good at first glance did not compile. This led to mind-numbing C++ debugging. As stated in our key learnings at the beginning of our article, Copilot is not a replacement for knowing the language you are coding in or understanding what the structure of your programme should be.
We observed Copilot make several correct suggestions for common TFX workflows in recent lab setups. Most guesses were correct with the biggest time-savers being suggestions for inputs to complex processes. What did I call that schema 20 minutes ago while setting up a tf Model Analysis example? Copilot remembered, and seemed to know exactly what the other inputs should be for functions with limited context like:
csv_to_tfrecord(exampleSchema, csv_file, tfrecord_file)