Collaborative Text Editing with Logoot

Recently, I became intrigued with how collaborative text editing works. I was working on a school assignment in a group, and we were using Google Docs to work. It almost seems like magic, how one person can type something, and a few seconds later have it appear on others’ screens without any conflicts. I also recently discovered that the Keynote and Notes apps have collaborative features built in. So I decided to take a deep dive into collaborative text editing.

The problem with collaboration

To edit a purely text-based document, you actually only do 2 basic operations: insert and delete. Any other operations are built on these 2 operations. A single user editing a document might look something like this.

Single user editing a document
Single user editing a document

To collaborate in a document, the naive approach would be to send over each operation a user performs to all other users editing the document. This works when the connection is fast, perhaps when the users are connected in a LAN.

Two users editing over a LAN
Two users editing over a LAN

However, when network latency becomes significant, it is incredibly easy to achieve race conditions like this one.

Two users editing with network latency
Two users editing with network latency

When these race conditions occur, the documents on both clients become out of sync, and because of the nature of these operations, it is rare that they will become in sync again. Due to this, we need a better way of structuring these operations, or even structuring the document itself, so that reliable and consistent collaborative editing becomes a reality.

Operational Transformation

Enter Operational Transformation, or OT for short. OT is an algorithm that will keep documents consistent across all clients, no matter the order of the operations that get sent from other clients. Wave (the collaborative engine behind Google Docs), uses OT. I didn’t dive into OT too deeply, as I found other, simpler, and more intuitive solutions that I will explain later on. But for now, here is a quick rundown on how basic OT works.

Quick rundown

Let’s begin by understanding the very basics of OT. As mentioned earlier, a document is simply some content, which can be operated on by either inserting or deleting. The way OT solves the issue of concurrent editing is by transforming each operation using the previous operation, before it is applied to the document. Using the earlier example, after client A performs the insert on their document, client B’s delete operation, which is the next in line to be performed, gets transformed.

After the transformation, we can see that the operation now deletes the correct character from the document, not the wrong one in the naive example. Also, over on client B’s document, client A’s insert operation gets transformed after client B’s delete operation is performed, and the resulting document is the same on both ends.

Downsides to OT

Unfortunately, most of the popular OT algorithms are not always commutative. This means that when applying OT to two separate documents, it is not always guaranteed that they will end up being the same. Although this is rare, it can happen, and this will result in measures put into place to ensure consistency every once in a while. Thus, a different way of doing collaborative editing has surfaced in the recent years, known as CRDTs.

CRDTs

There are actually 2 flavours of CRDTs. They are (bear with me) Commutative Replicated Data Type and Convergent Replicated Data Type. The following will be about the former. CRDTs take a different approach from OT, in that they rely on structuring the document in a certain way, and not really on structuring the operations. This means that most of the time, using the CRDT approach, the document will be stored in a different format then just a simple string. This includes using things like binary trees and special position identifiers.

Downsides to CRDTs

When using CRDTs instead of OT, we are trading time complexity for space complexity. This means that while OT will take more time to process than CRDTs, CRDTs will usually end up using significantly more memory than OT.

Logoot

Logoot is a great (and simple) example of a CRDT. It is one of the easier CRDTs to understand, and extensions/variations have also been built upon it to enable extra functionality, like group undo/redo. It is the main CRDT that I will be focusing on (duh, look at the title), and also implementing it, which I will talk about in the next story.

Basic principles

Going back to the original example with the race condition, you can see that the document got out of sync because the operations were using a position within the string. So when an insertion happens, everything after the inserted character is shifted to the right, which means their position is incremented.

Inserting 'x' into string
Inserting 'x' into string

When a second operation is applied onto both content, they are inserted at the same position, as specified in the operation. However, they occur at different characters as a result of the first operation. For example, in the next image, the original intention is to insert a y after the c. However, due to the operation inserting in a fixed position, the y is inserted after the x instead.

Inserting 'y' into different strings
Inserting 'y' into different strings

Logoot tries to solve this problem by making the positions move with the text instead. That is, it assigns unique identifiers to each character/group of characters in the document, and performs the insertions and deletions based on the position of these identifiers.

In Logoot, the document is split up into groups of text. A group of text is called an atom. Each atom is given a unique identifier, or a uid, which has to be generated in a special way that will be covered later. Each uid is comparable to other uids. This means that you can do things like ask if uid1 < uid2 and vice versa. You can also check if they are equal. The document is then stored as an array of (uid, atom) pairs, sorted by the uids within the pairs in ascending order. There will also be two special, blank lines that exist in every document. One at the very start which will have the MIN_uid, and at the very bottom, which will have the MAX_uid.

Phew, that was a lot to take in. Here comes the most important part though. For any two different uids, you should always be able to find a uid in between them. This is the property that makes Logoot work. So, the uids in Logoot need to be represented in a way that adheres to the property.

Representing uids

Let’s start simple. This whole time, we represented the identifiers of each character with its position in the initial string — an integer. However, using integers as uids does not adhere to the property we mentioned. Let’s say we have the uids 4 and 5. It is impossible to find a integer in between 4 and 5, and thus integers cannot be used as uids.

At this point, you might be thinking, what about floats? Now, theoretically, using real numbers as uids would work, as there are an infinite number of real numbers between any two different real numbers. However, in the context of a computer, this isn’t possible. Floating point numbers are simply approximations of real numbers, which means that it’s not always possible to find a floating point number between any two floating point numbers. Simply put, you cannot represent an infinite number of floating point numbers in just 32 or 64-bits.

So how are these uids represented? The answer is lists. Using lists, we can easily adhere to the property. A uid is represented by a list of integers. Let’s say we have the uids [4] and [5] (Take note that these are lists of a single integer element, and not just the integers). In order to find a uid in between [4] and [5], since there is no integer in between [4] and [5], we can simply make a longer list, that starts with a 4 (e.g. [4, 8]). Now, we have the uids [4], [4, 8] and [5]. And this can repeat forever, since lists are infinitely extensible (Of course, the maximum size of a list is limited by the available RAM, but this is big enough that we don’t have to worry.

Therefore, in Logoot, uids are represented by lists of integers. Take note that there are some additional components that make up a uid, but for the sake of understanding Logoot they aren’t necessary.

Now that we have covered uids in Logoot, we can move on the constructing a simple Logoot document.

A sample document

In Logoot, a document is made up of an array of pairs. Each pair containing a uid, and an atom. An atom is simply a string, containing some of the content in the document. The pairs in the document are always sorted by their uids. To get the content of the entire document, simply concatenate all the atoms in order. As with all data structures (remember Logoot is a CRDT), some operations can be performed on it.

An insert operation can be performed on a Logoot document. To perform an insert, you simply need to have the pair to insert. To generate the uid for the pair, you simply have to take two existing uids and generate a random uid in between them.

A delete operation can also be performed. To perform a deletion, you only need to know the uid to delete.

Start and end lines

In the insert operation, you can see that to generate a new uid, you need 2 existing uids. However, in a blank document, how would the two existing uids be found? This is where the start and end lines come into play. The start and end lines are two lines that always exist in a Logoot document. They contain an empty atom and their uids are 0 and MAX respectively, where MAX is the maximum size of an integer.

Conclusion

Hopefully this article has given you a good understanding of Logoot, and even just collaborative editing in general. I implemented Logoot in the Go programming language. Check it out here. Be sure to check out my other stories and thanks for reading!