We are Codebeez. We specialise in everything Python and help our clients build custom scalable, robust and maintainable solutions.

Python is a dynamically typed language. The types of objects are of secondary importance compared to their behavior. To divine the behavior of objects, duck-typing is recommended.

In Python, the principle of duck-typing is encapsulated in the saying, "If it walks like a duck and quacks like a duck, then it must be a duck." This is often combined with the "EAFP" (Easier to Ask for Forgiveness than Permission) approach rather than "LBYL" (Look Before You Leap): Instead of checking whether an operation is likely to succeed beforehand (as with LBYL), Python developers prefer to perform the operation and manage the error if it occurs (EAFP).

These principles are illustrated with the code snippet below:

try:
    potential_duck.quack()  # duck-typing
except AttributeError:
    print("sorry, I guess")  # ask for forgiveness

However, over the course of Python 3 there's been a push to provide type annotations and static type checking. The results of this is seen in the typing module, which contains a the tools to annotate your classes and functions with type hints.

Adding static typing, where appropriate, can be invaluable to the maintainability of your code base and prevent a lot of frustration and crashes. But when working with existing Python code, which often uses principles common to dynamic typing, going back and adding type hints can be equally frustrating.

Luckily, typing tools have been created to help us with this. One of them, the target of this article, is the Protocol

The Protocol class

The Protocol class provides static type checking for duck-typed class usage. Consider the following protocol and an implementation thereof:

import time
from typing import Protocol


class DuckLike(Protocol):

    def walk(self, to: str) -> None: ...
    def quack(self) -> str: ...


class Chicken:

    def walk(self, to):
        time.sleep(5)
        self.location = to

    def quack(self): return "tok"


def add_to_zoo(duck: DuckLike):
    duck.walk("zoo")
    duck.quack()

add_to_zoo(Chicken())  # Type checker allows this

The usage of Protocol shows striking similarity to e.g. an Interface in Java, with one important distinction: the Protocol class does not have to be explicitly inherited by other classes for the type checker to allow it. A Chicken is not a duck, but it behaves like this for the purposes of add_to_zoo.

We still get all the benefits of classical type checking as if the object was inherited. The following changes to Chicken lead to errors:

# Invalid: method `quack` is missing
class Chicken:
    def walk(self, to):
        time.sleep(5)
        self.location = to


# Invalid: `walk` has the wrong signature
class Chicken:
    def walk(self): time.sleep(2)
    def quack(self): return "tok"

The main benefit of Protocol for python comes when dealing with external libraries, with types we cannot modify directly. When the time comes to create polymorphic functions, which may accept external library classes or one of our own classes, we have no common ancestor by which to annotate it. Specifying a Protocol sidesteps this issue. It allows us to define only the behavior we expect our annotated parameter to have.

The code below is valid, and allows us to provide type checking on the export function without strongly coupling it to the pandas DataFrame. Thus, if we ever want to switch frameworks, whether to another external library such as polars, or even a custom-built framework, we can reuse this function, as long as there exists a to_csv method which takes a file as a parameter.

from pathlib import Path
from typing import  Protocol

import pandas as pd

class Writable(Protocol):
    def to_csv(self, path_or_buf) -> None: ...


def export(data: Writable, output_path: Path):
    data.to_csv(output_path)

df = pd.DataFrame({'id': [1,2,3], 'date': ['2023-05-13', '2022-01-31', '2028-07-06']})
export(df, Path('~/out.csv'))

Abstract collections: Type hinting according to behavior

Python already provides us some pre-built protocols to use with which we can use to annotate our code.

If the typing module is still unfamiliar to you (and given its size you can't be blamed if it is), the obvious thing to do is to annotate every variable with exactly the object type we expect to call it with. Consider for example the following code:

def my_function(my_list):
    for item in my_list:
        do_something(item)
        do_something_else(item)


the_list = [1, 2, 3, 4, 5]

my_function(the_list)

What type annotation should my_list have? list[int] seems like the obvious choice (n.b: subscripting as list[int] only works for Python 3.9 or higher. For older version, use typing.List[int]). But how is my_list actually used in my_function? It is only iterated over. Other features of a list, such as size checks or accessing items by index, aren't needed.

Now what if in the future we want to use my_function with a tuple, a set, or even a numpy array? Nothing in the code forbids it: those classes all quack like ducks. The type annotation of my_list can reflect this accordingly. This is done with the module, collections.abc.Iterable.

from collections.abc import Iterable
import numpy as np

def my_function(my_list: Iterable[int]):
    for item in my_list:
        do_something(item)
        do_something_else(item)


my_function([1,2,3])  # valid
my_function((1, 2, 3))  # valid
my_function(np.array([1, 2, 3]))  # also valid

For those coming from statically typed languages, or codebases chock-full of inheritance and interfaces, this should be nothing new. But to those struggling to get a handle on their dynamically typed codebase, these protocols are invaluable for keeping code loosely coupled.

Protocols at runtime

The abstract collection Protocols have an additional benefit for those who prefer LBYL (Look Before You Leap) over EAFP for type checking. Namely, they exhibit some unique behavior when used for the the isinstance check in python:

from collections.abc import Iterable

class MyIterable:

    def __iter__(self):
        yield 1
        yield 2
        yield 3


print(isinstance(MyIterable(), Iterable))  # prints True

For strict adherents to OOP this is heresy, but from a duck-typer's point of view this makes perfect sense: The MyIterable class has the properties which are required for it to be Iterable, and therefore it must be an Iterable.

Other objects in the collections.abc module follow the same logic. Anything that can be used with len(anything) must be Sized, and anything that can be called (that is, has the __call__ dunder method implemented must be a Callable.

This principle extends to many other dunder methods which have nothing to do with collections as well. Which is why types such as typing.SupportsAbs and typing.SupportsInt also exist, checking for the existence of their associated dunder methods (__abs__ and __int__ in this case, which ensure our ducks' compliance with the abs() and int() builtins).

If we wanted to use our custom Protocol with isinstance checks we first need to decorate it with @typing.runtime_checkable. Protocols by default do not exist at runtime, a property they share with a lot of members of the typing module, which is meant for static checking first and foremost. The decorator additionally adds runtime functionality, through a custom implementation of __instancecheck__, which is the dunder method checked under the hood by the isinstance builtin.

from typing import Protocol, runtime_checkable


@runtime_checkable
class DuckLike(Protocol):

    def walk(self, to: str) -> None: ...
    def quack(self) -> str: ...

isinstance(Chicken(), DuckLike)  # returns True. Without the decorator, it would raise a TypeError

Conclusion

The Protocol class is an excellent example of how static type checking practices can translate into Pythonic approaches to writing code.

Python will always remain a dynamically typed language (this is reasserted by the type hinting PEP: "Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention".

The language profits a lot from the flexibility offered by dynamic typing. However, static type checking can be invaluable for larger code-bases, as it can intercept a great amount of issues before they happen. Using the Protocol class, the benefits of static type checking can be conferred onto the dynamic nature of existing Python code.