Welcome! Please see the About page for a little more info on how this works.

+11 votes
in Protocols by
edited by

I'm working on a project focusing on data integration from several sources and formats (xlsx, csv, jsons, etc). After implementing the first integration and their validations leveraging spec, I started to study how to improve my code by using procotols and defrecords.

However, the use-case of procotols and custom types are not clear to me. When should I use protocols and defrecords? There are some recommendations on it?

Because, as I'm learning, programming to abstractions makes code reuse better. But how to spot these moments?

I always find tutorials about the how to implement or some very shallow comparisons to Java Interfaces, but there are not many materials about these larger patterns on your application that might be better solved with protocols and records.

3 Answers

+12 votes
by

Protocols (and multimethods) are Clojure tools that provide polymorphic, open, behavioral abstractions. That is, callers can invoke the same operation on different inputs, and some implementation will be chosen and invoked on their behalf. Because these systems are open, they can be extended after the fact, without altering the caller.

Generally, any time you sense a behavioral abstraction, particularly one that would be useful to extend later (although that's not required), then you should be looking at protocols and multimethods. These tools differ in several respects - protocols do fast, type-based dispatch on the first argument and support groups of operations. Multimethods do value-based dispatch on all the arguments for a single operation (this subsumes type-based dispatch on first argument). In the case where both are possible, protocols are typically faster and better.

One piece of protocol advice - protocol functions are better as SPI to hook in implementations than in API as functions consumers call directly. It is often helpful to wrap protocol methods with a normal function that can supply additional logic if needed around the call into the protocol. Our experience has been that this works great architecturally and as the code evolves over time. One downside is that it hurts the performance advantage of protocols over multimethods, so consider this carefully.

Records are most frequently comparable to maps with known fields for information uses. (deftype is more typically used to create your own custom constructs and is usually a different, lower-level use case.) When comparing to maps, there are many subtle tradeoffs. In general, consumers typically interact with maps and records in the same way (where records implement the map interfaces).

They tend to differ in construction (records have pre-built factory methods and maps do not), and in having a "type" (the generated concrete record class), which makes them amenable to being hooked by a protocol. Additionally, the ability to inline protocol implementations inside a record makes for a sweet spot in performance for the particular case of informational maps tied to polymorphic behavior (by leveraging the highly optimized paths in the JVM for this). Records vs maps is not a simple choice - it's important to compare all the dimensions first.

by
Given how SPI refers to so many acronyms, it might be a good idea in this context to clarify what you mean by SPI here.
by
Yes, thanks. I mean “service provider interface”, basically things that your component requires from another.
by
thanks for the detail explanation, very helpful.
+7 votes
by

Overall, you wouldn't use protocols, records, and custom types very often. Using normal maps with regular functions will do for most application domain specific logic. Still, I'll try and explain when you'd want to reach for them.

TL;DR

Use deftype when you need to implement custom data containers which need mutability under the hood, or some form of data encapsulation. This is mostly for very primitive constructs like data-structures or reference types.

If you have a function which is called with different things at runtime, and what it does must be specific to the things it was called with. Then you want to use protocols or multi-methods. If you can manage with choosing what to do based on the type of the first argument, use protocol, if not, use multi-methods.

If you have a function that should do different things based on the type of a Map. So say you need to tag a Map as representing something, like a Person, a Car, a User, etc. And you have functions which will be called with Maps of more than one type, and you want them to choose what to do based on that type, then use defrecord.

If you have a function which needs to do something based on the number of arguments it was called with, use multi-arity.

Finally, if you find yourself writing many functions that start with the same name, but end with a kind of discriminator which relates to the type of what you call it with. Say: add-user, add-role, add-item, this might be a good indicator that you can model this instead using protocols or multi-methods and a single add function.

deftype

I'll start with deftype. You would reach for a deftype the least often. A deftype allows you to implement a new abstract data type (ADT). It is useful for creating new types of data containers. Basically, anything which needs to encapsulate data (mostly mutable data), and provide access to the data only through a safe interface which enforces all invariant of the underlying data in whatever way makes sense for that type.

For example, if you wanted to add a new data-structure, say you needed to implement a doubly-linked list, deftype would be what you'd use.

Now, you almost never will need it, since most of the useful data containers have already been implemented for you either in core, in interop, or as a library. For example, Java provides a doubly linked list already: https://docs.oracle.com/javase/8/docs/api/java/util/LinkedList.html

To give a better intuition, deftype allows data encapsulation. In general, you don't need data encapsulation in Clojure, because the default data-structures and bindings are immutable, so there is no danger in exposing the data to outsider as read-only. That said, for implementing the data containers themselves, like the immutable data-structures or various reference types like Atom, you're going to need mutation in order to provide a space and time efficient implementation. In such case, you shouldn't let outsiders be able to touch the data freely, because they could easily break the required invariant. So instead, you want to provide an abstract interface, such as for a doubly-linked list, you might have add-first, add-last, remove, next, prev, etc. So again, deftype is useful for adding any form of data container to the language, which you'll almost never need to do yourself.

defrecord

You would reach for defrecord when you need to create a custom type, in order to add semantic meaning to some Map. A record is really just a Map, but with the type for it replaced by the record type name. So instead of having type Map, it will have the type as the record name.

Say you wanted a Map of type Person, you'd use defrecord for that.

This is only really useful when you need a piece of code to at runtime do something different based on the type. So if there is a piece of code which will receive Maps of different types, and you want it to do something different for each type AS WELL as you want better performance.

I'm saying as well, because there's actually many ways to provide this dynamic functionality. One is with using native types, and is provided by records and protocols. The other is with manually modeled types and the use of defmulti. The latter is more powerful, because you can represent the type as anything you want, and the type can even be inferred from the structure or the value of the data itself. On the other hand, it will be slower.

defrecord can only allow functions to do different things based on the type of their first argument, and the structure and values of the record can not dictate its type. This is often called nominal typing. Your record is of a certain type because you named it that way explicitly. On the other hand, defmulti allows for duck typing, in that a Map can be of a certain type because of its inherent structure and/or the values it contains.

defprotocol

You reach for defprotocol when you want functions which do different things based on the native type of their first argument. Such as say a function which will be called with different records, and should do something different for each different record type.

defmulti

As I explained when talking about records briefly. If you want a function to do something different not just based on the type of the first argument, but also because of the type of the other arguments, or because of the structure of the data in any/all arguments, or because of the values of the any/all arguments, you would use defmulti instead of defprotocol.

Multi-arity

Also, if you just want a function to do something different based on the number of arguments it was called with, you can just use multi-arity functions.

by
Amazing. Very valuable, I find very few posts that are able to pass along **tacit knowledge** (as described in the introduction of *Elements of Clojure* by Zachary Tellman). Not being able to work with experienced programmers make the learning of such processes a lot harder.

Thank you for your contribution
+1 vote
by

If you want to take a deep dive about polymorphism in clojure, I recommend a book from Paul Stadig - Clojure Polymorphism. https://leanpub.com/clojurepolymorphism

by
Just bought the book today. Very good read indeed, thanks for the sugestion
...