Codebeez

We are Codebeez. We specialise in everything Python and help our clients build custom scalable, robust and maintainable solutions.

I have recently returned from the EuroPython 2023 conference in Prague. I disembarked from my replacement flight (the original flight got cancelled) just in time for the first keynote. From that moment on it was 3 days of immersion into Python (although a few Rust talks managed to sneak in), the Python ecosystem, and fields in which Python is a major player.

I ended up going by myself, offering ample opportunity to mingle with fellow attendees. By my estimate, EuroPython attendees are disproportionately consultants, freelancers, and otherwise independent developers. Their other uniting quality was of course that they were all very enthusiastic about python and up to date with the latest developments, which made them very interesting to chat with.

For this article, I will go over first all of the keynote talks, and then the parallel talks that I attended.

Keynotes

The first keynote explained how to host meetups, which was useful because it's what I've been intending to do for Codebeez as well. Tips I specifically remember for hosting meetups are: - You don't have to have speakers to host a meetup - Having regularity in scheduling is key, so that people don't have to ask if/when the next one is. - Delegate by resignation: Trust that someone else will fill the void left by your absence.

That afternoon's keynote was on Large Language Models (LLMs, currently trending due to ChatGPT hype) vs NLP models. The speaker herself was a NLP advocate and core developer of SpaCy. Here I learned that NLP and LLM are entirely dissimilar (Indeed, I did not know that), and that NLP models will still be crucial for encoding specialized knowledge. Most notably, the speaker recommends using LLMs as a prototype, which can then be replaced with a specifically trained NLP model later on.

The Thursday morning keynote was especially interesting: It discussed the history and future of microprocessors. My most notable takeaway was that the "nanometer processes", such as the 10nm announced by intel and the 4nm announced by TSMC, are mostly just marketing. There are no features of the transistor that are actually 4nm in size. Instead, since the addition of FinFETs these terms have been a stand-in for transistor density per surface unit. But the transistors themselves are now stacked in three dimensions, increasing this density. This stacking leads to heat dissipation issues, however, and therefore the nanometer metric does not correlate well to the performance gain you might otherwise expect. This speaker also pleaded for all software engineers to start writing more parallel programs, but this plea has gone unheard for 15 years and counting.

The Thursday afternoon keynote speaker was not a programmer at all, but rather an author that has written a book on the developer community. What I mostly remember is that he was an incredible public speaker, expertly wielding the power of voice, tone and rhythm to captivate the audience with this words.

On Friday morning we were posed the following question: Is AI product or person? Overwhelmingly, the room voted product. This to me seems like the only obvious answer. Apparently I am wrong, and different audiences poll very differently to this question. According to the speaker, whether AI is a product or not is important for the regulation that is currently being drafted, to ensure the responsible use of AI and to protect against misinformation.

Parallel Talks

By my own estimate, the talks I ended up attending can be roughly sorted into three categories:

General Python: Features of the language and how to leverage them
Tricks and tools to boost Python performance
AI: Progress made in machine learning and supporting frameworks

I will be going over some points I personally found interesting in each of these

General Python

There were a few talks about the Python language, its features and use of the standard libraries. Below are the points that stayed with me:

There was an interesting discussion on subclassing versus composition. What sparked me was the recommendation to not prematurely deduplicate (i.e. create a base class containing code common to two classes instead of just writing the members of the class twice). Because I do this kind of deduplication almost by instinct, it was good to take a step back and reconsider if it truly is always worth it. Because now that I think about it, it may often be unnecessary effort.

I learned that other programming languages such as Go do not use subclassing at all. For composition, however, Go uses something called struct embedding, which essentially inlines the members of a different 'class' into your declared class. This seemed like a good method for solving a familiar problem.

I also learned functions have a __signature__ dunder method which can be used to update the method help and autocompletion at runtime. This can be combined with Python descriptors in order to dynamically provide function documentation in a way that plays well with Python's dynamic nature. This is unfortunately not useful for static type hinting/checking. As a big fan of static type checking I always looking for ways to have more of it, so I was somewhat disappointed these tricks did not offer such a solution.

Python Performance

Python did not become popular for its great performance, but there are a lot of to approach C levels of performance without losing Python's flexibility. Some of the ones I've that were discussed are below.

The maintainers of HPy are working hard on the conversion of numpy to their replacement Python C api. Users of PyPy and GraalPy will benefit from this new API that can be used to write C extensions for Python libraries. A major performance boost comes from the fact that refcount garbage collection can be omitted in favor of a proper Garbage Collector, which provdes to be much faster owing to the sophistication of modern GCs. The migration of numpy would be a major achievement on the road to mass adoption of HPy. Unfortunately, libraries using numpy, such as pandas, will not benefit from this as long as their own C API is not also migrated.

Cython3 was recently released, 5 years after the first commit to work on this was pushed. Cython3 is a way to gain C-level performance without straying too far from Python syntax. The speaker, who is also the maintainer of Cython, promises performance boosts of a factor 250 compared to naively written python. However, this requires significant investment in learning all of Cython's features (especially ufunc and numpy interoperability). The performance boost is also far less shocking when benchmarking against peak numpy performance. Still, I am very tempted to use it in the future, even if I am not sure if it is all that much performant over code using numpy and numba JIT compilation.

Running on ARM processors can be highly beneficial for performance, but only on programs that make use of multiprocessing. AArch64 is now able to efficiently run Python 3.11, and promises performance benefits, although benchmarks using pyperformance and codespeed are not yet available to prove this claim. Furthermore, Windows-on_ARM will officially support Python3.11 and above as well, but popular packages such as Tensorflow and PyTorch do not yet have native support on this platform.

Finally, you can write Rust modules and import them in Python, causing them to be compiled on the fly. This can offer both performance and memory safety for critical parts of one's Python code. Interesting, but requires learning Rust and is less mature compared to Cython or pybind. Memory leaks also feel less likely to me when one's code is mostly Python with only a small scope of performance-heavy sections being written in a compiled language.

Machine Learning and (Generative) AI

It feels superfluous to discuss this topic point in great detail. Blogs about generative AI are legion, and there are no great insights I can offer to add to that body of knowledge. So I will keep my thoughts brief.

Many AI presentations ended up just being exhibitions of the various machine learning models and their capabilities, rather than explaining the technology or using Python code to drive such models. Of course, the ease of use of generative AI and their immediately palpable, often mesmerizing results, make such presentations interesting in their own right. But one presenter rightly observed that the people actually developing these models is a tiny fraction compared to people training them, in turn a tiny fraction of those only using AI models within a giant sea of people talking about AI.

This is unfortunate, because I was hoping to learn something more technical. Being a mere software developer, my knowledge of data science is noticeably lagging, and I predict this will be a disadvantage in the future. Conversely, however, during a presentation discussing the new capabilities of PyTorch 2.0 I was utterly lost. It seems I will have to resort to some intense self-study.

Lastly, talks at EuroPython hosted a lively discussion about open source in the age of AI. Being generally open-source advocates, presenters seem convinced that open source AI models will eventually outperform the proprietary ones, even those of industry behemoths like Meta and Microsoft. Personally I believe the future is fundamentally unknowable, especially when it comes to novel technologies. But I hope, of course, as most reasonable people would, that generative AI will be a collaborative effort that belongs to everyone.

Takeaways

I left EuroPython with renewed motivation to enhance my own skills. There are a lot of exciting ways in which python can be used that I have only just scratched the surface of. I am hoping to take these lessons and find ways to apply them at the frontier of my own projects. Furthermore, generative AI remains an area in which I am lagging: I'm only vaguely familiar with the terms and technologies use, despite the fact that the number of potential applications of this emerging field is growing rapidly.

Furthermore, I got a chance to talk and listen to people equally enthusiastic about Python and the technologies it makes accessible. It was interesting to meet and talk with them during breaks, to find out what challenges and opportunities others have experienced in the field. The threshold for walking up to people here is very low, and there are social events where doing so is encouraged, making it very easy to meet a lot of people.

There are many other conferences on Python being hosted in many countries around Europe and the world. I'm hoping to attend another one in the near future, this time perhaps as a presenter.